MTTR vs MTBF vs MTTF: A Simple Guide To Failure Metrics
Before we can get into MTTR, MTBF and MTTF, we want to be clear that maintenance metrics are not the same thing as maintenance KPIs. Maintenance metrics support the achievement of KPIs, which, in turn, support the business’s overall strategy. With so many maintenance performance metrics out there, it’s hard to know which one to choose.
In this article, we focus on the three most common failure metrics. We’ll make it simple for you to understand what they are, how they work, and when to use them. If you want to know more about the rest, we suggest visiting this piece on maintenance-related metrics and KPIs.
Introduction to failure metrics
To understand how to use MTTR, MTBF, and MTTF to your advantage, you need to understand the bigger picture of failure metrics.
What is equipment failure?
Failure exists in varying degrees (e.g., partial or total failure). In the most basic terms, equipment failure simply means that a system, component, or device can no longer produce desired results.
Even if a piece of manufacturing equipment is still running and producing items, it fails when it stops delivering the expected quantities or quality of products.
Why should I be tracking it?
Even the most efficient maintenance teams experience equipment failures. That’s why it’s critical to plan for them. Managing failure correctly means minimizing its negative impact. To help you effectively manage losses, several critical metrics should be monitored.
Tracking and understanding these metrics will eliminate guesswork and empower maintenance managers with the hard data they need to make informed decisions – helping you keep operational disruptions down to a minimum. This is your secret weapon, the source of your superpower!
What are the key failure metrics to pay attention to? We’ll discuss 3 of them:
MTTR (Mean Time To Repair)
MTBF (Mean Time Between Failures)
MTTF (Mean Time To Failure)
Capturing reliable data from the get-go
Data is important. High-level failure statistics require a large amount of meaningful data. As we’ll show in the calculations below, the following inputs must be collected as part of your maintenance history.
Labor hours spent on maintenance
If you’re working with a modern CMMS like Limble, it will track maintenance hours automatically. Technicians report how much time it takes to complete a job every time they finish a WO. Over time, this data becomes invaluable. And it doesn’t take any extra time or work to obtain.
Technicians are prompted to enter “time spent” when closing a WO
The number of breakdowns and repairs
Limble gives you real-time views of all the maintenance tasks that are scheduled and completed — including ones that cause downtime. This lets you see what’s happening with all your assets easily and determine where there is room to improve efficiency.
Limble helps you track tasks that caused downtime
You can calculate total operational time by subtracting asset downtime from the number of expected operating hours per week. (To make it easy on you, Limble lets you see how long it takes to get assets up and running again if they do fail).
An example of asset uptime report in Limble CMMS
Recording maintenance figures can be tedious. Still, it’s an essential part of improving maintenance operations – identifying items with a high failure rate and finding the root cause of those failures.
Doing this manually is painfully time-consuming. But it’s made simple with a mobile CMMS like Limble. Limble lets you quickly and easily log reliable data for labor hours and downtime on your phone while you’re performing maintenance tasks. More than that, Limble runs all the calculations of MTTR and MTBF automatically for you.
MTTR and MTBF report inside Limble CMMS
Collecting inaccurate data can cause a lot of issues. If data is missing or wrong, your failure metrics will be useless in informing decisions on improving operations. Worse still, if you don’t know that the information is unreliable, you might make operational decisions that could damage or slow production down.
Now that we got that out of the way, let’s focus on the things you came here for.
What is MTTR (Mean Time To Repair)?
Mean Time To Repair (MTTR) refers to the amount of time required to repair a system and restore it to full functionality. The MTTR clock starts ticking when the repairs start, and it goes on until operations are restored.
Time to troubleshoot and diagnose the problem
Time to assemble and start up the asset
MTTR is a measure, but it’s not magic
MTTR is the metric you’ll use to prove operational excellence. You cannot, however, expect it to solve all your problems. It needs to be coupled with other metrics to help build a strong and valuable KPI that will speak directly in the greater company strategy.
MTTR can easily be distorted by outliers. If you have a single incident with a vastly different resolution or repair than others, your data might be skewed.
For example, let’s say the bulk of the water heaters in your building tend to suffer from broken thermostats. For most of them, this is relatively easy and inexpensive. But one is standing out from the rest. It’s making strange noises, has mineral buildup, and needs to be drained and the unit repaired before a costly leak or explosion. It takes a lot longer to fix that thermostat. As a result, your mean time to repair a thermostat will seem unusually high.
Furthermore, MTTR is not time-bound. It cannot calculate for on or off-peak usage times, meaning that it cannot accurately report back on overuse or quiet periods impacting repair times.
Because there are so many ways to interpret MTTR, success will only come when you have a clear definition of what it means within your organization. You’ll need to combine this with a well-trained team and the systems to manage the information. Find the right set of metrics that give you the complete picture you are looking for.
Why is MTTR helpful?
Assets under repair equal downtime. Regular system failures and lengthy downtime periods have a huge effect on productivity. That, in turn, has an even bigger impact on business results. This is especially the case for processes that are particularly sensitive to failure.
In a manufacturing environment, long mean time to repair leads to missed production deadlines, increased labor costs, loss of revenue, and various operational issues.
Understanding MTTR is an important tool for any organization because it tells you how well you are responding to issues with your assets. Most organizations work to shorten MTTR with an in-house maintenance team supported with the necessary resources, tools, spare parts, and CMMS software.
How to calculate MTTR?
To calculate MTTR, you need to add all the time you’ve spent on the repairs and divide it by the number of repairs you performed.
Imagine a pump that fails three times throughout a workday. The first repair lasted for 30 minutes, while the other two repairs lasted only 15 minutes. In this case:
MTTR = (30 + 15 +15) / 3
MTTR = 60 / 3
MTTR = 20
The average time for performing repairs on that pump is 20 minutes.
A special note about MTTR calculation —
Each failure will have a different severity level, so while some will require days to diagnose and repair, others could take minutes to fix. MTTR can give you an average of what to expect.
In Limble, you can get two views of MTTR. They can help draw different conclusions:
1.) MTTR for a specific asset, which is calculated based on how many tasks have caused downtime for this asset only.)
2.) Combined MTTR for all assets, which is calculated based on how many tasks have caused downtime for all assets within a certain time frame.
If you are only working with a small number of units, you may want to remove any data that stands out. This can skew your results.
For example, somebody could make a huge mistake during the repair process, like accidentally cutting through a wire or breaking a piece of the unit while fixing the original problem. This could turn a small repair into one that lasts a few days if you didn’t have the part in or know-how to fix it.
What can MTTR tell you?
The MTTR figure itself is excellent, but a lot more data analysis is required to get to a specific action. MTTR can tell you:
Repair vs. replace. MTTR is a particularly good tool to help you decide when it’s time to finally stop repairing an asset and replace it. When you notice it taking more and more time to repair and the costs keep climbing, you can use MTTR as one of the reports to help you make the case for investing in new equipment.
More training. Even the best-trained staff make mistakes. MTTR can highlight gaps in the training or skill of certain staff members or teams. Suppose you see an unusually high MTTR for a certain individual or group. In that case, you may want to look at their training more closely and consider a refresher.
Better processes. As is with training, workflow and operating procedures can have a big impact on the MTTR. These should be evaluated regularly, regardless of performance. Using your MTTR report, you can easily spot issues with assets.
When to use MTTR?
Use MTTR when you want to:
Measure and improve the average time your team takes to repair assets
Understand how much time you should be scheduling for repairs so that you aren’t putting too much pressure on teams
Help reduce downtime in areas of the factory or business that seems to be constantly on hold due to repairs
Pick out anomalies in incident management
Tactics to reduce MTTR
Every efficient maintenance system needs to look at how to reduce MTTR as much as possible. That can be done in a few different ways:
Use condition-monitoring sensors to track machine health and performance. While sensors should be utilized to prevent unexpected failures, sensor data can also speed up the diagnosing and troubleshooting process. Knowing your baseline and tracking deterioration signs can give your team more time to arrange for all the resources needed to complete the repair.
Implement CMMS software. Mobile CMMS solutions like Limble allow technicians quick access to maintenance history (logs, reports, notes from previous repairs…), which can speed up the repair process and shorten both planned and unplanned downtime.
Proper training. If you want the job done properly and in the shortest time possible, your technicians need to be qualified and know what they are doing. Limble allows you to track the productivity of each technician. If you find there are deficiencies, you can swoop in and train up the team members who need it, ensuring you give the best quality service every day.
Mean Time to Repair vs. Mean Time to Recovery
MTTR has A LOT of different meanings. The two most relevant for our discussion are “mean time to repair” and “mean time to recovery.”
Mean Time To Recovery measures the time between when the failure is first discovered until the equipment returns to operation. So, in addition to repair time, testing period, and return to normal operating condition, it captures failure notification time.
Although these terms are often used interchangeably, they need to be more clearly defined when it comes to Service Level Agreements (SLAs) and maintenance contracts so that all parties agree on exactly what they mean and what they are measuring.
What is MTBF (Mean Time Between Failures)?
Mean Time Between Failures measures the time it takes from one equipment failure to the next time it fails. This gives you a better idea of how long equipment can stay running over a given period between unplanned breakdowns. It’s a way for you to plan around the unexpected.
So, while MTTR impacts availability, MTBF measures availability and reliability. The higher the figure of the MTBF, the longer the system will likely run before failing.
Why is MTBF helpful?
Because equipment failure can be expensive and damaging to the organization, you need to be on top of unexpected breakdowns as much as possible. MTBF is an important indicator of expected performance. If the MTBF value is low, it means you are experiencing a significant number of breakdowns, which likely means there’s a deeper issue to uncover.
Manufacturers can use the mean time between failures as a quantifiable reliability metric during many product design and production stages. It is commonly used today in mechanical and electronic systems design, safe plant operations, product procurement, etc.
MTBF does not consider planned maintenance, but it can still be used to calculate the frequency of inspections for preventive replacements.
If it is known that an asset will likely run for a certain number of hours before the next failure, introducing preventive actions like lubrication or recalibration can help prevent that failure.
Essentially, it helps save you money, reduces downtime, and makes you look good at your job (and who doesn’t want that?).
How to calculate MTBF?
The equation for MTBF is simple. It is the sum of operating time divided by the number of failures.
Building on the example of the pump we mentioned under MTTR, out of the expected runtime of ten hours, it ran for nine hours. It failed for one hour spread over three occasions. So:
MTBF = 9 hours / 3 repairs
MTBF = 3 hours
The pump is failing every 3 hours on average.
Keep in mind this is a very simplified example. You will generally want a much bigger sample of information to work with to get a more accurate prediction. As you can see from the example above, we did not include the repair time in the calculation of MTBF.
Other common factors can influence the MTBF of systems in the field. A big one is the fact that we have humans doing the work. For example, low MTBF could either indicate poor handling of the asset by its operators or a poorly executed repair job in the past.
When calculating MTBF, you won’t take predictive or preventive maintenance into account or routine parts replacements. Although predictive and preventative maintenance can sometimes cause brief outages, they are not breakdowns. You will simply calculate the time from the end of the last breakdown as the machine was back in service to the following breakdown.
You need to come in with the expectation that failure will happen at some point.
What can MTBF tell you?
Because of where MTBF falls in the process, it is often coupled with other maintenance strategies. MTBF can help inform your decisions by telling you:
Cost of breakdowns. Pairing MTBF with MTTR and failure codes can help you avoid expensive breakdowns by planning ahead based on the data at hand. This can have a big impact on the bottom line.
Frequency of failures. MTBF measures how frequently you can expect a failure to happen. The higher the figure of the MTBF, the longer the system will likely run before failing.
When to use MTBF?
Use it to start conquering downtime. This is by far the most important application of MTBF. Using MTBF, you will also be able to predict, prevent and prevail over the bulk of your unplanned breakdowns. You’ll be able to use it for:
Planning your maintenance schedule.
Indicator of PM performance
The quality of the information you have in your system and how it is being used
Tactics to increase MTBF
There are small things you can do to increase the time between failures. Some of them are:
Do more proactive maintenancework. Assets that are well maintained are less likely to have critical failures. By using Limble as your CMMS, you can create monthly maintenance schedules in minutes.
Use quality replacement parts. The cheapest part is not always the best long-term. Make sure that you are using quality, proven parts in your work. It will save you a ton in the long run.
Use recommended input material. Whether it is the size of the chicken in the poultry processing system or the thickness of the foil used for product packaging, every machine is designed to work within specific parameters. Respect those parameters.
Ensure proper working conditions. Don’t push machines beyond their limits to make your productivity numbers look good. Misusing machines is a surefire way to decrease their useful life and MTBF.
Have a solid onboarding program for machine operators. Assets should be used in respect of how they are designed. Improper handling is bound to shorten MTBF. Limble allows the user to log work orders and add detail about how the unit was being used at the time of failure, so you’ll be able to monitor asset usage.
Understand the kinks of old equipment and aging assets. Whenever possible, maintenance technicians should give machine operators tips on which actions they should avoid doing with old assets to manage avoidable recurring issues. Limble keeps a complete maintenance history, asset log, and maintenance notes. This makes it easy for anyone on your team to step in and save the day, even if the previous technician is no longer with your organization.
What is MTTF (Mean Time To Failure)?
Mean Time To Failure is a very basic measure of reliability used for non-repairable systems. It represents the length of time that an item is expected to last in operation until it needs to be replaced.
MTTF can be used to represent the lifetime of a product or device. Its value is calculated by looking at a large number of the same kind of items over an extended period and tracking how long they last.
In the manufacturing industry, MTTF is one of the many metrics commonly used to evaluate the reliability of manufactured products. However, there is still a lot of confusion in differentiating between MTTF and MTBF because they are both somewhat similar in definition. This is resolved by remembering that MTBF is used when referring to repairable items, while MTTF is used for non-repairable items.
Why is MTTF helpful?
MTTF is important because it helps estimate the lifespan of products that are not repairable. Some common examples of these products range from items like fan belts in automobiles to light bulbs in our homes and offices.
MTTF is particularly useful as a reliability metric. Engineers can use it to estimate how long a component would last as part of a larger piece of equipment. This is especially true where the entire business process is sensitive to the failure of the equipment in question. Shorter MTTF means more frequent downtime as the failing items need to be replaced.
How to calculate MTTF?
MTTF is calculated as the total time of operation, divided by the total number of units being tracked.
Let’s assume we tested 3 desktop hard drives. The first one failed after 500,000 hours, the second one failed after 600 000 hours, and the third hard drive failed after 700,000 hours in use. MTTF in this instance would be:
MTTF = (500,000 + 600,000 + 700,000) / 3 units
MTTF = 1,800,000 / 3
MTTF = 600,000 hours
We can now assume that this particular type and model of the hard drive is likely to fail after 600,000 hours of use.
What can MTTF tell you?
Mean Time to Failure can be used for:
Inventory lead time. Understanding your MTTF can help you plan for replacement equipment, making sure that you are never stick waiting for new equipment to come in
Quality control. MTTF that gets shorter and shorter can be an indicator of quality issues from your suppliers. Use this information to have conversations or know when to look for new suppliers.
Misuse of equipment. As with quality control, a change to your MTTF could be an indication that users are not using the equipment correctly, meaning that there is a need for better training. It could also indicate an increase in general usage or an upstream problem that negatively affects this particular part. Either way, these are important pieces of information to note.
When to use MTTF?
Suppose you are looking at investing in new equipment that will replace your current equipment. It’s important to know how long they are expected to last. This will be a core component when you are putting your budget together.
MTTF will inform you of a lot of the spending on both your capex and opex budgets. How often will you be replacing things, and at what costs?
You can sometimes find information about MTTF from the OEM. While we are always optimistic that they are correct, they can sometimes be misleading. Reach out to other maintenance managers to get their input if you can. Always run your own MTTF reporting to keep track of your asset lifespans.
Tactics to increase MTTF
Use the best quality items you can find — ones that are made from more durable materials and have gone through a thorough quality control process. Limble let’s you keep track of your preferred parts and vendors in each asset record so you can always source the right parts and service.
Take action to improve the lifespan of the asset. Make sure that the devices are used for their intended purposes and in the conditions (humidity, heat, pressure, voltage, etc.) they are designed for. You can store the equipment operating and maintenance manual in Limble to double-check if the device was properly installed/retrofitted before it is used.
Use your time, budget, and resources wisely. Because you are not planning to fix anything, scheduling maintenance is not an option here. You can maybe wipe off the dust or run high-level diagnostics to estimate remaining useful life. Still, you are not going to perform preventative maintenance on a computer hard drive or a lightbulb. You are just going to replace them when they stop working properly.
Other noteworthy maintenance metrics related to failure
There are at least 10 different metrics, if not more, with overlap between many of them. This article has covered the three most popular. Still, there are a few others we’d like to introduce you to so that you can make an informed decision about the metric set that’s right for you:
MDT (Mean Down Time) is something that the Finance will think about first. How long is something not working for? This is what they care about.
MTTD (Mean Time to Detect) is helpful when monitoring how long it takes to detect and report an issue. This can be helpful when you have a situation where there are multiple assets dependent on the functionality of one another. Using tracking devices, you can get your reporting down to the shortest possible time and keep the system running smoothly.
MTTI (Mean Time to Identify) is a metric that focuses on reducing the length of time it takes your team to identify the issue so they can fix it.
MTTA (Mean Time to Acknowledge) is a great key performance indicator. It helps you track your team’s incident response time and see how they are reacting to the workload over a period of time. If your team is overloaded, their time to acknowledge will be much slower.
Luckily, Limble allows the people responsible for assigning work to see this in real-time. They can quickly and easily reshuffle work order assignments to help alleviate overload and keep your team as productive as possible.
Tracking maintenance metrics with Limble CMMS
Tracking maintenance metrics sounds like a lot of work, and it can be if it’s not automated.
Our system manages this both on the global level for all assets and reporting on each individual asset.
As long as the customer updates each task with all the data, these maintenance metrics will be tracked automatically (example in the image below):
When it comes to maintenance metrics and your KPIs, Limble lets you build out custom dashboards for every report you need. Your team does the work; Limble does the rest.
An example of Limle’s custom dashboard
When you use Limbles Custom Dashboards, you’ll be able to:
Create your own KPIs.
See your critical maintenance metrics like MTTR, MTBF, MTTF, and more.
View a live custom dashboard that your team and the rest of the company can see.
Be able to see how much an asset is costing you at any time and why.
Know exactly where your budget is being spent at all times. (Imagine how this could support your relationship with Finance, especially when you need to talk about replacing assets!)
Using Limble, your maintenance team can access all the information about an asset’s history and past repairs to quickly get to the root cause of the problem and cut down on the amount of troubleshooting it takes to find the solution.
Limble is mobile-friendly, making it easy for you to access important information on your mobile device at all times. No more running between your desktop and the job to get the information you need, making your jobs less frustrating to execute and speeding up your responsiveness.
At Limble, we know that it’s up to you to make sure the equipment is available, in perfect working order. You are the hero of uptime, as well as plant operations and safety.
When you have a handle on the failure metrics and calculations to get to them, you can build better asset management strategies and improve your overall maintenance processes.
Let Limble CMMS take the guesswork out of metrics measurement and automate the process for you. Reach out to us via email, and we’ll get in touch to see how we can make your life easier by placing your maintenance management in the palm of your hand.
"Great experience. Solved our obvious PM tracking issues but also addressing our SHE&S requirements (safety audit task tracking), Environmental checks are being logged, Corporate Audit items tracked"
— Michael Babcock
Five star program
"This is one of the most easiest CMM Systems I have used. With unbelievable response times to questions. The Limble staff is very helpful. With this system, our equipment downtime has been cut by 20%."
— Gordon Shanks
Sunbelt Forest Products
Great product at a great price
"Terrific customer service, easy to use, and at a great value. Our old Maintenance software was very difficult to use and was very expensive."
— Brian Williams
It just works
"Honestly - the customer support has been fabulous. We had a minor feature request that was deployed within 24 hours - which is unheard of. Even better when you consider our business is located in a completely different time zone (somewhere in Australia). Limble is quite intuitive and I love the ability to have assets nested within each other."
— Ed Cronin
Limble CMMS is a great product and is very intuitive
"This CMMS checks many boxes for what we were looking for. Flexible. Mobile App for in the field use. Cost-Effective. Reporting. Great Dashboard. Great Support. Cloud Based. Cost-Effective."
— Roger Beck
Great for smaller or larger facilities
"We haven't fully integrated Limble yet but we are already seeing improvements in our efficiency. As we fully integrate Limble we expect to see more benefits and increase our response and completion times. The customer support has been outstanding. The Limble team is very quick to respond to any questions and they are very open to suggestions."
— Mike Hill
Children's Home of Lubbock
Download our Preventative Maintenance Checklist
Take the management stress away from preventative maintenance.
Cheat-sheet to better productivity and reliability
Steps we've learned over years working with thousands of customers
Important tips to help you avoid common costly pitfalls when creating your PM plan