MTTR vs MTBF vs MTTF? – A Simple Guide To Failure Metrics
Measuring failure metrics is an integral part of asset management. MTTR can tell us how efficient our maintenance team is, MTBF points to the reliability of our equipment, and MTTF tries to estimate the average lifespan of non-repairable assets.
Tracking and managing equipment and device failures is essential for any organization that relies on physical assets to deliver its product or service. It is the only way to keep operational disruptions down to a minimum.
For each of the stated metric, we will:
explain what it measures and why is it helpful
provide a graphical representation
use an example to show how it is calculated
and discuss what you can do to improve it
If you are only interested in one particular metric, use the content table to quickly navigate to that section of the article.
Introduction to failure metrics
Even the most efficient maintenance teams experience equipment failures. That’s why it’s critical to plan for them.
But first, what does equipment failure look like?
Failure exists in varying degrees (e.g. partial or total failure). In the most basic terms, failure simply means that a system, component, or device can no longer produce specific desired results. Even if a piece of manufacturing equipment is still running and producing items, it has failed if it doesn’t deliver the expected quantities.
Managing failure correctly means minimizing its negative impact. To help you effectively manage failures, several critical metrics should be monitored. Understanding these metrics will eliminate guesswork and empower maintenance managers with the hard data they need to make informed decisions.
What are the key failure metrics to pay attention to? We’ll discuss 3 of them:
MTTR (Mean Time To Repair)
MTBF (Mean Time Between Failures)
MTTF(Mean Time To Failure)
There is a more detailed graphical representation of each metric in its respective sections of the article. The above graphic is just a tease to present the relationship between these metrics.
Before we dive in, let’s briefly touch on the importance of having reliable data behind your failure metrics.
The importance of reliable data
To make data-backed improvements in equipment failure, it’s crucial for the right data to be collected and for that data to be accurate.
High-level failure statistics require a significant amount of meaningful data. As we’ll show in the calculations below, the following inputs must be collected as part of your maintenance history:
labor hours spent on maintenance
number of breakdowns and repairs
operational time (can be calculated from total expected operating hours per week – total equipment downtime)
As tedious as recording maintenance figures can be, it’s an essential part of improving maintenance operations – identifying items with a high failure rate and finding the root cause of those failures.
This process can be painfully time-consuming when done manually, but it’s made simple with a mobile CMMS like Limble that lets you quickly and easily log reliable data for labor hours and downtime on your phone while you’re performing maintenance tasks. Additionally, Limble runs all the calculations of MTTR and MTBF automatically for you, as seen below.
Collecting inaccurate data can cause a lot of issues. Maintenance technicians might occasionally write down the wrong figure is just one example. A potentially much bigger problem is neglecting to record tasks, which leads to incomplete data.
If data is missing or inaccurate, your failure metrics will be useless in informing decisions on improving operations. Worse still, if you are unaware that the data is unreliable, you might end up making operational decisions that could be counterproductive and harmful.
Now that we got that out of the way, let’s focus on the things you came for.
What is MTTR (Mean Time To Repair)?
Mean Time To Repair (MTTR) refers to the amount of time required to repair a system and restore it to full functionality.
The MTTR clock starts ticking when the repairs start and it goes on until operations are restored.
time to troubleshoot and diagnose the problem
time to assemble and start up the asset
How do you calculate MTTR?
To calculate MTTR, divide the total maintenance time by the total number of maintenance actions over a given period of time. In other words, you need to sum up the time you’ve spent on the repairs and divide it by the number of repairs you performed.
Imagine a pump that fails three times throughout a workday. The first repair lasted for 30 minutes, while the other two repairs lasted only 15 minutes. In this case:
MTTR = (30 + 15 +15) / 3
MTTR = 60 / 3
MTTR = 20
The conclusion might be that the average time for performing repairs on that pump is 20 minutes.
A few things to note:
Typically, every instance of failure will vary in severity so while some incidents will require days to diagnose and repair, others could take mere minutes to fix. Hence, MTTR gives an average of what to expect.
If your sample size is small, you might need to remove outliers that will skew the results. For example, somebody makes a huge mistake during the repair process that basically never happens, which prolongs the repair process by a couple of days (for a task that is otherwise finished in less than an hour).
To obtain reliable results, every repair should be handled by competent and trained personnel that can follow well-defined procedures.
Modern CMMS solutions will often track and automatically calculate MTTR for you. Technicians will be prompted to confirm how much time they took to perform the repair when they are closing a work order.
Why is MTTR helpful?
If an asset is under repair it means it is experiencing downtime. Frequent breakdowns and prolonged downtime periods lower equipment availability and equipment uptime.
That, in turn, has an adverse impact on business results. This is especially the case for processes that are particularly sensitive to failure. In a manufacturing environment, long mean time to repair leads to missed production deadlines, increased labor costs, loss of revenue, and a variety of operational issues.
Understanding MTTR is an important tool for any organization because it tells you how efficiently you can respond to and repair any issues with your assets. Most organizations seek to decrease MTTR with an in-house maintenance team supported with the necessary resources, tools, spare parts, and CMMS software.
Maintenance managers can use MTTR to inform maintenance decisions such as:
repair vs replace analysis
whether they need to invest more resources into training maintenance staff
do they need to upgrade operating procedures and workflow
Ways to reduce MTTR
Every efficient maintenance system needs to look at how to reduce MTTR as much as possible. That can be done in a few different ways:
Use condition-monitoring sensors to track machine health and performance. While sensors should be utilized to prevent unexpected failures, sensor data can also be used to speed-up the diagnosing and troubleshooting process. Additionally, tracking deterioration signs can give maintenance personnel more time to arrange for all the resources needed to execute the repair.
Implement CMMS software. Mobile CMMS solutions like Limble allow technicians quick access to maintenance history (logs, reports, notes from previous repairs…) which can speed up the repair process.
Streamline the repair process. Create clear standard operating procedures and maintenance checklists for repairs that are performed regularly.
Ensure proper training. If you want the job done correctly and in a reasonable timeframe, technicians performing the repairs need to be qualified and know what they are doing.
Mean Time To Repair vs Mean Time To Recovery
The acronym MTTR has A LOT of different meanings. The two most relevant for our discussion are “mean time to repair” (discussed above) and “mean time to recovery.”
Mean Time To Recovery is a measure of the time between the point at which the failure is first discovered until the point at which the equipment returns to operation. So, in addition to repair time, testing period, and return to normal operating condition, it captures failure notification time.
Although both terms are often used interchangeably, the need for distinction becomes important in the context of Service Level Agreements (SLAs) and maintenance contracts.
Hence, all parties to such contracts will need to agree on what exactly are they measuring.
What is MTBF (Mean Time Between Failures)?
Mean Time Between Failures measures the predicted time that passes between one previous failure of a mechanical/electrical system to the next failure during normal operation. In simpler terms, MTBF helps you predict how long an asset can run before the next unplanned breakdown happens.
The expectation that failure will occur at some point is an essential part of MTBF.
MTBF calculation is used for repairable systems and it does not take into account units that are shut down for preventive maintenance (re-calibration, servicing, lubrication) or routine preventive parts replacement. Rather, it captures failures that occur due to design conditions that make it necessary to take the unit out of operation before it can be repaired.
So, while MTTR impacts availability, MTBF measures availability and reliability. The higher the figure of the MTBF, the longer the system will likely run before failing.
How do you calculate MTBF?
Expressed mathematically, the lapses of time from one failure to the next can be calculated using the sum of operating time divided by the numbers of failures.
Looking at the example of the pump we mentioned under MTTR, out of the expected runtime of ten hours, it ran for nine hours and failed for one hour spread over three occasions. So:
MTBF = 9 hours / 3 repairs
MTBF = 3 hours
In conclusion, the pump fails every 3 hours on average.
Keep in mind this is a very simplified example. You need to have a much bigger sample to make any applicable conclusions.
As you can see from the example above, the repair time is not included in the calculation of MTBF.
Apart from the design conditions mentioned earlier, other common factors tend to influence the MTBF of systems in the field.
A major one of these factors is human interaction. For instance, low MTBF could either indicate poor handling of the asset by its operators or a poorly-executed repair job in the past.
For critical assets such as airplanes, safety equipment, and generators, MTBF is an important indicator of expected performance. If the MTBF value is high, it means you are experiencing a significant number of breakdowns which will negatively impact overall equipment effectiveness and other performance metrics.
Therefore, manufacturers can use the mean time between failures as a quantifiable reliability metric and as an essential tool during the design and production stages of many products. It is commonly used today in mechanical and electronic systems design, safe plant operations, product procurement, and so on.
Although MTBF does not consider planned maintenance, it can still be used for things like calculating the frequency of inspections for preventive replacements.
If it is known that an asset will likely run for a certain number of hours before the next failure, introducing preventive actions like lubrication or recalibration can help prevent that failure.
Ways to increase MTBF
There are many small things organizations can do to increase the time between failures. Some of them are:
Do more proactive maintenance work. Assets that are properly maintained are less likely to experience critical malfunctions. Use CMMS to create and adhere to maintenance schedules.
Use quality replacement parts. The system is only as strong as its weakest link. Looking for the cheapest items is never the best long-term decision.
Use recommended input material. Whether it is the size of the chicken in the poultry processing system or the thickness of the foil used for product packaging, every machine is designed to work within certain parameters. Respect those parameters.
Ensure proper working conditions. Constantly pushing machines beyond their limits is a surefire way to decrease their useful life and MTBF.
Have a solid onboarding program for machine operators. Assets should be used in respect of how it is designed. Improper handling is bound to shorten MTBF.
Understand the kinks of old equipment and aging assets. Whenever possible, maintenance technicians should give tips to machine operators in terms of which actions they should avoid doing with old assets to manage avoidable recurring issues.
What is MTTF (Mean Time To Failure)?
Mean Time To Failure is a very basic measure of reliability used for non-repairable systems. It represents the length of time that an item is expected to last in operation until it needs to be replaced.
MTTF can be used to represent the lifetime of a product or device. Its value is calculated by looking at a large number of the same kind of items over an extended period and tracking how long they last.
In the manufacturing industry, MTTF is one of the many metrics commonly used to evaluate the reliability of manufactured products. However, there is still a lot of confusion in differentiating between MTTF and MTBF because they are both somewhat similar in definition. This is resolved by remembering that MTBF is used when referring to repairable items, while MTTF is used for non-repairable items.
When using MTTF as a failure metric, repair of the asset is not an option.
How do you calculate MTTF?
MTTF is calculated as the total time of operation, divided by the total number of items being tracked.
Let’s assume we tested 3 desktop hard drives. The first one failed after 500 000 hours, the second one failed after 600 000 hours, and the third hard drive failed after 700 000 hours in use. MTTF in this instance would be:
This would lead us to the conclusion that this particular type and model of the hard drive is likely to fail after 600 000 hours of use.
Why is MTTF helpful?
MTTF is an important metric used to estimate the lifespan of products that are not repairable. Common examples of these products range from items like fan belts in automobiles to light bulbs in our homes and offices.
MTTF is particularly useful as a reliability metric. Engineers can use it to estimate how long a component would last as part of a larger piece of equipment. This is especially true where the entire business process is sensitive to the failure of the equipment in question.
Shorter MTTF means more frequent downtime as the failing items need to be replaced.
The only surefire way to increase MTTF is to look for better quality items. Items that are made from more durable materials and have gone through a thorough quality control process.
Other ways to impact MTTF are actions that improve the asset lifespan of any type of physical item. You should make sure that the devices are used for their intended purposes and in the conditions (humidity, heat, pressure, voltage…) they are designed for. It can also be helpful to double-check if the device was properly installed/retrofitted before it is used.
Scheduling maintenance is not really helpful in this case as there is usually nothing to maintain. You can maybe wipe off the dust, but you are not going to perform preventative maintenance on a computer hard drive or a lightbulb. You are just going to replace them when they stop working properly.
Understanding the calculations and use of failure metrics help maintenance professionals to holes in their maintenance programs and practices. Based on their findings, they can proceed to develop better asset management strategies and improve their overall maintenance processes.
If you want more insights into how Limble CMMS can help you track and calculate these metrics, you can reach out to us via email. For everything else, feel free to join the discussion in the comments below!
Limble has completed changed the way we do maintenance
"Limble is amazing. It has revolutionized the way we handle repairs and the upkeep of our assets and facilities. We not only can keep track of work that has been done on each asset, but we are also able to track costs associated with the asset itself."
— Ethan Closson
Little Giant Ladders
Perfect for my business
"I run a coffee equipment service company and Limble is great for everything we do. We can store all of our assets and equipment and access it easily with a comprehensive list of all the work done on the equipment. Limble is by far the best CMMS on the market and I wouldn't dream of using anything else."
— Tie Groth
Great Product, Even better support!
"I started using Limble after trying out many other off the shelf CMMS software and I quickly fell in love with the ease of use, intuitiveness yet the power this tool provided me and my team. The ability to see automatically generated and customized reports meant that I could choose whether I wanted to see things at a micro level or a macro one, or both of them together."
— Mohammad Hassaan Akram
A great tool for Facility Managers.
"Limble is very easy to get involved in and no contracts, with simple monthly billing. I have auditioned other CMMS companies and they make it too difficult, to try out. Limble strategy is very simple - here is our software, you can customize it in most categories and let us know if you have any questions. As a multiple building County Facilities Director, I highly recommend trying it!"
— Michael Boursier
Over all very good
"Nice layout and easy to use. Email alerts are very useful, and the comparison between planned and unplanned is very helpful"
— Paul Sheppard
Five star program
"This is one of the most easiest CMM Systems I have used. With unbelievable response times to questions. The Limble staff is very helpful. With this system, our equipment downtime has been cut by 20%."