Measuring Equipment Reliability and 7 Keys to Improving It
Ever since the first industrial revolution, in one form or the other, improving equipment reliability has been the core subject of interest for reliability engineers and maintenance professionals alike.
The purpose of this article is to present a comprehensive overview of the concepts of equipment reliability, define effective ways to improve levels of reliability, and identify steps that any industry can adopt to lay a foundation for an effective reliability management program.
What is equipment reliability?
The term reliability is defined as the probability of equipment failure in a given time and under set conditions. Put simply, it is the likelihood that equipment will continue to deliver its intended function for a specific period of time (without failure).
A decade ago, the equipment was deemed “reliable” as long as it continued to operate and produce output. With the growing maturity of reliability and maintenance engineering strategies in the industry, the goalpost has changed. Today, reliability is evaluated in the context of how well is the equipment being utilized to achieve success at both strategic and operational levels.
How to measure equipment reliability?
Reliability is generally measured by the failure-free duration of the operation.
For equipment that has built-in redundancy, all possible scenarios or modes of those redundancies have to be accounted for when calculating the probability of failure.
For example, if a piece of equipment is designed to operate for 5,000 hours continuously and it indeed continues to operate without failure until this time, the equipment could be characterized as 100% reliable. If that equipment partially or completely fails within 5,000 hours of operation, its reliability would obviously be less than 100%.
It is important to note that most industrial equipment consists of several subsystems and components, each having its own designation of reliability. The overall reliability of the equipment, in this case, would be the combination of the reliability of its subsystems and components.
A good practice for calculating the reliability of the equipment is to understand the functional relationship of its subsystems and understand the impact of their failures on the overall equipment’s reliability. Since reliability is best characterized by equipment uptime and the duration of operations, the industry developed certain metrics that incorporate both of these parameters.
Mean Time Between Failures (MTBF)
MTBF is a measure that is used to measure the reliability of repairable equipment.
MTBF represents an average time between two failures occurring in a given period. To have enough data points to calculate MTBF, the equipment has to undergo at least two failures. As with all statistical calculations, the more data points you have, the more accurate your “averages” are going to be.
Mean Time to Failure (MTTF)
MTTF, on the other hand, is a measure of reliability that is applicable for non-repairable equipment.
Put simply, the MTTF tells the average lifespan of a device. It is calculated by averaging the time of failure of one type of equipment in a given population of production or operation.
Interconnection between reliability and maintainability
Reliability and maintainability are closely related statistical terms that are often studied together. Maintainability is defined as the ease of performing maintenance. The easier it is to perform repairs and maintenance on an asset, the higher its maintainability.
People with a lacking background in reliability theory tend to confuse them, saying that if the equipment has high reliability it will also have high maintainability (and vice versa). While it may be true for some equipment under specific conditions, that is not always true.
For example, mean time to repair (MTTR) – a measure of maintainability – can be improved by reducing the number of actions the technician has to perform during the repair. While this will cut down the repair time, skipping important steps will eventually lead to more failures, reducing reliability in the process.
An optimized maintenance program understands the difference and balances trade-offs between reliability and maintainability of the equipment.
7 ways to improve equipment reliability
Designing highly reliable equipment requires subject matter knowledge of the concepts of reliability, working principles of the equipment under consideration, levels of desired performance, and the context of the operation where the equipment is intended to operate in.
Below are seven ways equipment reliability can be improved at the design and operational phases.
1) Improve data quality
The presence of high-quality data represents the single most important thing to have at any stage of the asset lifecycle.
Oftentimes, the equipment failure and maintenance data is either not available or it is contaminated with errors and biases. This lack of quality data leads to reliability engineers judging equipment performance based solely on their experience – resulting in decision-making that is not able to squeeze the maximum value out of the available assets.
The best way to ensure data quality is to simplify, standardize, and automate equipment data collection and reporting.
All of that can be done by implementing a mobile-enabled CMMS software like Limble. It serves as your centralized data repository with instant access to maintenance information like:
- Who has performed what, when, and for how long
- An overview of all work requests
- List of tools and parts used during the repair process
- Detailed equipment maintenance log with technicians’ notes for each asset (which can be used to identify common problems and failure modes)
- Costs associated with each part, asset, vendor, and contractor
- Various maintenance metrics and KPIs you set up and decide to track (for example, Limble automatically calculates metrics such as MTBF, MTTR downtime, etc.)
Reliability and maintenance engineers can use this granular information to improve future equipment/component designs and develop maintenance strategies and schedules that address the most problematic failure modes.
2) Rank assets based on criticality
An industrial facility can feature hundreds or even thousands of equipment pieces. Performing reliability analysis for each one is neither feasible nor cost-effective.
The equipment has to be prioritized based on some criticality score. The most common approach is to perform Failure Mode Effect and Criticality Analysis, which can be used to rank assets based on the severity of the impact of their failures on the overall operation.
For example, you will not spend the same amount of resources on tracking the condition of the traction converter for the locomotive versus the small manual transfer switch installed on the same locomotive.
As pointed above, the presence of quality data plays a crucial role in performing criticality analysis.
Some of the components which may look non-critical at first sight can have a significant impact on the overall plant reliability. A detailed level of impact and criticality can only be understood if the component level breakdown information is available.
3) Improve the effectiveness of maintenance work
The quality of performed maintenance work on an asset will have a direct impact on its reliability.
There are many ways to improve the quality of executed maintenance work:
- Ensure that maintenance techs and mechanics are properly trained and have access to the right maintenance tools
- Implement condition monitoring sensors, non-destructive testing, and predictive maintenance
- Use SOPs and maintenance checklists to standardize maintenance work according to best practices
- Use a CMMS system to set up maintenance schedules (and stick to them!)
- Use CMMS and other analytics to gather and analyze data and improve upon every point mentioned above
Getting all of this right will reduce the occurrence of unexpected downtime, as well as unnecessary planned downtimes and thus improve the overall levels of reliability and equipment availability on your plant floor.
4) Develop metrics that track reliability
It is hard to improve something you do not measure. Use metrics like MTBF, MTTF, MTTR, and availability to estimate and improve the reliability of critical equipment.
In some cases, critical equipment comes with sensors that provide a real-time overview of equipment performance and health. With these sensors, the equipment time of operation can be automatically logged – along with the time and frequency of failures.
The real-time display of reliability measures provides situational awareness to the equipment operator and enables proactive corrective actions. This reduces the probability of failure and initiates intervention before the equipment reaches complete failure, thus prolonging the P-F interval and equipment reliability.
5) Increase equipment redundancy
One of the methods to improve the reliability of any system is to introduce redundancies that eliminate single points of failure.
Take an oil refining facility as an example. If the facility has only one specific mainline crude oil pump, it has to continuously operate to ensure the sustained operation of that facility. If it fails, the entire plant has to be shut down.
With the use of redundancy, a plant operator can eliminate this scenario. They can install a similar-sized pump as a standby that can automatically take the load if the other one fails.
It is a proven way to improve reliability, albeit a fairly expensive one. The investment is justified in capital-intensive industries where the impact of equipment failure on overall plant reliability is enormous.
The same principle can be applied while designing parts and equipment. Engineers can use redundancies to address specific failure modes and create more fault-tolerant systems.
6) Improve training and skills of equipment operators
Improving the knowledge and skill of equipment operators and other people that come in contact with the equipment is among the most cost-effective and easy ways to improve equipment reliability.
Most of the machines operated in industrial settings have to interact with humans in one way or another. No matter how sophisticated the controls for tracking machine health are, if the operators are not properly trained, the equipment is bound to experience frequent failures.
There are several ways to minimize the chances of human errors and minimize their impact on equipment reliability both at the design as well as operational stages.
At the design stage, reliability engineers can work with ergonomic consultants to simplify the design from a human perspective. In other words, to balance equipment efficiency with its ease of use and maintainability.
Once the human factors are incorporated in the designs, the next phase is often to make sure that operators get proper training and education.
The error-free operation of the equipment and its effective and quick troubleshooting produce a direct impact on reliability by reducing unscheduled downtimes and other unpleasant surprises. As such, they should be one of the building blocks of a solid asset management program.
7) Improve the reliability culture
Most maintenance and plant managers should already know the importance of equipment reliability. This is because they have a broader understanding of the cash flows and profits that equipment is generating for them.
The bottom-line workers, on the other hand, often fail to realize why it is such a big deal for management when equipment that was failing once every six months has all of the sudden started to fail once every four months. The workers may see this change as insignificant.
You need to educate people at all levels of the organization about the importance of asset reliability. It is the only way to foster a culture of reliability and continuous improvement. Ideally, every person of the organization should have a clear understanding of how a machine they operate is contributing to achieving the overarching business objectives.
Developing an equipment reliability program
Developing an equipment reliability program is the foundational activity to systematically achieve the reliability of any asset. The equipment reliability program will be different for equipment manufacturers and plant managers as they have different objectives.
The manufacturer would look to improve reliability at the design stage while the plant manager would look to improve the reliability during the operations (in-service stage).
The following steps, in general, should be taken to develop a well-articulated reliability program.
1) Planning
The planning activity would involve understanding the context of the equipment and the operation within which the equipment is or will be installed. During the planning stage, the performance requirements will be studied and the corresponding level of reliability desired will be established.
For example, to ensure that the train does not exceed more than five (5) min delays in every subsequent train station, the locomotive equipment and subsystems have to attain a specific level of failure-free operation. In short, the plan of action has to consider the end-user requirements.
2) Analysis
A detailed equipment level reliability analysis can be performed to understand the possible failure modes – and failure history – that can arise throughout the operations phase of the equipment. Some of the common analyses include FMECA, Fault Tree Analysis (FTA), and Reliability Centered Maintenance (RCM).
At this stage, the organization will develop maintenance strategies and schedules that correspond to the criticality of the equipment – all in an effort to achieve the desired level of reliability, availability, and maintainability.
3) Implementation
Once maintenance is strategized, the next step is to put those plans into action.
Organizations that are serious about reliability will often seek help in the form of CMMS software. They will use it to set up and organize timely execution of maintenance work, as well as to gather and store valuable data along the way – with optimization being the ultimate goal.
At the end of the day, we can all agree that plans which can’t be followed aren’t worth very much.
4) Continuous improvement
The purpose of continuous improvement is to track reliability and maintainability metrics. As equipment gets older and becomes more worn out, maintenance schedules and reliability improvement initiatives have to be adjusted to prolong its useful life.
Since the operating context and the equipment performance characteristics often change over time, continuous improvement is here to ensure the equipment continues to be reliable over its entire life cycle.
Equipment reliability is a team effort
Asset reliability can’t be the responsibility of reliability engineers alone. Every stakeholder, be it a designer, reliability engineer, maintenance mechanic, or equipment operator, has an impact on the reliability of the equipment they are responsible for.
Effective organizations that excel in reliability recognize this and make sure that each stakeholder has the right tools and knowledge to do their job to the best of their abilities.
Start your reliability journey today by implementing Limble CMMS. It will offer you unprecedented access to the data you need to establish strong maintenance and reliability programs.
1 Comments
-
Alan May 31, 2022, 9:05 pm
Excellent, a succinct brief article works much better for me than a book.
Leave a Comment