Asset performance metrics like MTTR, MTBF, and MTTF are essential for any organization with equipment-reliant operations. Only by tracking these critical KPIs can an enterprise maximize uptime and keep disruptions to a minimum.
Tracking the reliability of assets is one challenge that engineering and maintenance managers face on a daily basis. While failure metrics can be very useful in this context, to use them effectively, you need to know what meaning hides behind their acronyms, how to distinguish between them, how to calculate them, and what does that tell you about your assets.
That’s why we decided to create a simple to follow guide to failure metrics that will help you avoid costly mistakes and successfully monitor equipment performance.
Introduction to Failure Metrics
Even the most efficient maintenance teams experience equipment failures. That’s why it’s critical to plan for them.
But first, what does equipment failure look like?
Failure exists in varying degrees (e.g. partial or total failure) but in the most basic terms, failure simply means that a system, component, or device can no longer produce specific desired results. Even if a piece of manufacturing equipment is still running and producing items, it has failed if it doesn’t deliver the expected quantities.
Managing failure correctly can help you to significantly reduce its negative impact. To help you effectively manage failures, there are a number of critical metrics that should be monitored. Understanding these metrics will eliminate guesswork and empower maintenance managers with the hard data they need to make informed decisions.
Which failure metrics should be tracked? Across industries and applications, we’ve found that those are MTTR, MTBF, and MTTF. We’ll discuss what each of those acronyms means and how you can use them to improve your operations.
But before that, we need to discuss one thing that is often overlooked – the importance of having reliable data behind your failure metrics.
The Importance of Reliable Data
In order to make data-backed improvements in equipment failure, it’s crucial for the right data to be collected and for that data to be accurate.
High-level failure statistics require a significant amount of meaningful data. As we’ll show in the failure metrics calculations below, the following inputs must be collected as part of your maintenance history:
- Labor hours spent on maintenance
- Number of breakdowns
- Operational time (can be calculated from total expected operating hours per week – total equipment downtime)
As tedious as recording maintenance figures can be, it’s an essential part of improving operations. This process can be painfully time-consuming when done manually, but it’s made simple with a mobile CMMS like Limble that lets you quickly and easily log reliable data for labor hours and downtime on your phone while you’re performing maintenance tasks. Additionally, Limble runs all the calculations of MTTR and MTBF automatically for you, as seen below.
Collecting inaccurate data can cause a lot of issues. Maintenance technicians might occasionally write down the wrong figure is just one example. A potentially much bigger problem is neglecting to record tasks, which leads to incomplete data.
If data is missing or inaccurate, your failure metrics will be useless in informing decisions on improving operations. Worse still, if you are unaware that the data is unreliable, you might end up making operational decisions that could actually be counterproductive and harmful.
Now that we got that out of the way, let’s focus on the things you actually came for.
What is Mean Time To Repair (MTTR)?
Mean Time To Repair (MTTR) refers to the amount of time required to repair a system and restore it to full functionality.
The MTTR clock starts ticking when the repairs start and it goes on until operations are restored. This includes repair time, testing period, and return to the normal operating condition.
How do you calculate MTTR?
To calculate MTTR, divide the total maintenance time by the total number of maintenance actions over a given period of time.
Imagine a pump that fails three times over the span of a workday. The time spent repairing each of those breakdowns totals one hour. In that case, MTTR would be 1 hour / 3 = 20 minutes.
A couple of things to note:
- Typically, every instance of failure will vary in severity. So while some incidents will require days to repair, others could take mere minutes to fix. Hence, MTTR gives an average of what to expect.
- To obtain reliable results, it’s important that every repair is handled by competent and trained personnel that can follow well-defined procedures.
Every efficient maintenance system always needs to look at how to reduce MTTR as much as possible. That can be done in a few different ways.
One approach is through tracking spare parts and inventory levels (thereby saving on downtime while sourcing for parts).
Another way is to implement proactive maintenance strategies like predictive maintenance. Predictive maintenance (PdM) will, among other things, allow you to better monitor the condition of in-service equipment and predict potential failure more accurately by using condition-monitoring sensors mounted directly on those components that are prone to failure.
These sensors can alert them well in advance when to expect failure. At this point, the repair is no longer reactive but predictive, as the manager has enough time to arrange for all the resources needed to execute the job.
Why is MTTR helpful?
Taking too long to repair a system or equipment is not desirable as it can have a highly unpleasant impact on business results. This is especially the case for processes that are particularly sensitive to failure. It often results in production downtime, missed deadlines, loss of revenue and so on.
Understanding MTTR is an important tool for any organization because it tells you how efficiently you can respond to and repair any issues with your assets. Most organizations seek to decrease MTTR with an in-house maintenance team supported with the necessary resources, tools, spare parts, and CMMS software.
Maintenance managers can use MTTR to inform maintenance decisions such as:
- when to repair or replace assets
- quantity of parts and inventory to have on hand
- whether to lease or buy equipment
Mean Time To Repair vs Mean Time To Recovery
There are several commonly used terms for the acronym “MTTR.” The two most common are “mean time to repair” (discussed above) and “mean time to recovery.”
Mean Time To Recovery is a measure of the time between the point at which the failure is first discovered until the point at which the equipment returns to operation. So, in addition to repair time, testing period, and return to normal operating condition, it captures failure notification time and diagnosis.
Although both terms are often used interchangeably, the need for distinction becomes important in the context of Service Level Agreements (SLAs) and maintenance contracts.
Hence, all parties to such contracts will need to agree on what exactly are they measuring.
What is Mean Time Between Failures (MTBF)?
The second failure metric we’ll cover is Mean Time Between Failures. MTBF measures the predicted time that passes between one previous failure of a mechanical/electrical system to the next failure during normal operation. Or, the time between one system breakdown and the next.
The expectation that failure will occur at some point is an essential part of MTBF.
The term MTBF is used for repairable systems, but it does not take into account units that are shut down for routine scheduled maintenance (re-calibration, servicing, lubrication) or routine preventive parts replacement. Rather, it captures failures that occur due to design conditions that make it necessary to take the unit out of operation before it can be repaired.
So, while MTTR measures availability, MTBF measures availability and reliability. The higher the figure of the MTBF, the longer the system will likely run before failing.
How do you calculate MTBF?
Expressed mathematically, the lapses of time from one failure to the next can be calculated using the sum of operational time divided by the number of failures.
Looking at the example of the pump we mentioned under MTTR, out of the expected runtime of ten hours, it ran for nine hours and failed for one hour spread over three occasions. So, MTBF = 9 hours / 3 = 3 hours.
Apart from design conditions mentioned earlier, there are other common factors that tend to influence the MTBF of systems in the field.
A major one of these factors is human interaction. For instance, low MTBF could either indicate poor handling of the asset by its operators or a poorly-executed repair job in the past.
Why is MTBF helpful?
MTBF is an important marker in reliability engineering and has its roots in the aviation industry, where airplane failure can result in fatalities.
For critical assets such as airplanes, safety equipment, and generators, MTBF is an important indicator of expected performance. Therefore, manufacturers use it as a quantifiable reliability metric and as an essential tool during the design and production stages of many products. It is commonly used today in mechanical and electronic systems design, safe plant operations, product procurement and so on.
Even everyday decisions like buying a particular brand of car or computer are affected by the buyer’s desire for a product with a higher MTBF than what the next brand has to offer.
Although MTBF does not consider planned maintenance, it can still be applied for things like calculating the frequency of inspections for preventive replacements.
If it is known that an asset will likely run for a certain number of hours before the next failure, introducing preventive actions like lubrication or recalibration can help keep that failure to the minimum and extend the uptime of the asset.
What is Mean Time To Failure (MTTF)?
Mean Time To Failure (MTTF) is a very basic measure of reliability used for non-repairable systems. It represents the length of time that an item is expected to last in operation until it fails.
MTTF is what we commonly refer to as the lifetime of any product or a device. Its value is calculated by looking at a large number of the same kind of items over an extended period of time and seeing what is their mean time to failure.
In the manufacturing industry, MTTF is one of the many metrics commonly used to evaluate the reliability of manufactured products. However, there is still a lot of confusion in differentiating between MTTF and MTBF because they are both somewhat similar in definition. The good news is that this is easily resolved by remembering that while MTBF is used only when referring to repairable items, MTTF is used to refer to non-repairable items.
When using MTTF as a failure metric, repair of the asset is not an option.
How do you calculate MTTF?
MTTF is calculated as the total hours of operation, divided by the total number of items being tracked.
Let’s assume we tested three identical pumps until all of them failed. The first pump system failed after eight hours, the second one failed at ten hours, and the third failed at twelve hours. MTTF in this instance would be (8 + 10 + 12) / 3 = 10 hours.
This would lead us to a conclusion that, that particular type and model of the pump will need to be replaced, on average, every 10 hours.
The only surefire way to increase MTTF is to look for higher-quality items made from more durable materials.
Why is MTTF helpful?
MTTF is an important metric used to estimate the lifespan of products that are not repairable. Common examples of these products range from items like fan belts in automobiles to light bulbs in our homes and offices.
In particular, MTTF is important to reliability engineers when they need to estimate how long a component would last as part of a larger piece of equipment. This is especially true where the entire business process is sensitive to the failure of the equipment in question.
In such cases, MTTF becomes the primary indicator of the equipment’s reliability, with the aim to maximize asset lifetime. Shorter MTTF means more frequent downtime and disruptions.
One of the top priorities of maintenance managers is to ensure maximum operational availability of their equipment, as well as keeping equipment operations safe and efficient. Understanding the calculations and use of failure metrics will enable maintenance professionals to determine, with greater accuracy, when a critical asset is most likely to fail.
Based on their findings, they can proceed to develop better asset management strategies and improve their overall maintenance processes.
By calculating failure metrics and planning maintenance based on those results, they can also reduce their organization’s dependence on reactive maintenance in favor of planned (predictive) maintenance, which can be just the thing they need to spark their business’s growth.