Reliability Centered Maintenance

Everything you ever needed to know about reliability centered maintenance.
(Free) Essential Guide to CMMS

What is reliability centered maintenance?

Reliability-centered maintenance (RCM) provides a logical and structured framework to identify critical organizational assets, understand the effects of their failure, and select the most cost-effective maintenance methods to minimize those effects. 

RCM focuses on preventing system failures rather than preserving the equipment. 

It provides a structured process for developing a maintenance program that will provide an acceptable level of asset operation while assuming an acceptable level of risk. 

Maintenance program development is based on a deep understanding of system functions, analyzing the consequences of their failure, and identifying the modes by which they fail. Such knowledge enables you to select the best-fit maintenance strategy to prevent or predict each failure.

While the aerospace, nuclear, and defense industries have long used RCM, other asset-intensive industries such as oil and gas, railways, automobile, and food manufacturers are looking into RCM to reduce maintenance costs and improve overall equipment effectiveness (OEE).

The main benefits of reliability centered maintenance

Industries using reliability centered maintenance report safety, reliability, cost, scheduling, and efficiency improvements across the board. Here’s why.

Safety is naturally prioritized

RCM uses a criticality metric to focus on possible failure modes and their effects on the business. This focus implicitly prioritizes maintenance to reduce system unavailability, preventing hidden failures and failures that affect operating safety.

Focus on continuous improvement increases reliability

RCM focuses on incremental asset improvement by constantly analyzing in-service experience to update equipment specifications and adjust the maintenance program for increased effectiveness. The constant improvement loop reduces equipment failure and increases equipment availability.

Optimization brings cost reductions

As failures decrease, the repair costs reduce, and new strategies like condition monitoring replace old preventative maintenance tasks. Repair and total maintenance costs reduce as a primary benefit, while a secondary benefit is often reflected in lower energy costs.

Targeted interventions lead to efficiency improvements

Reliability centered maintenance emphasizes cost-effectiveness by considering equipment’s criticality and applying a proportionate maintenance strategy. The process ensures that you only perform the necessary maintenance, increasing the effectiveness of your team and the utilization of your maintenance resources.

While RCM reduces long-term costs, you should prepare for maintenance costs to increase in the first 12 to 24 months due to increased training, equipment, and implementation requirements. From there, cost recovery occurs rapidly.

Nasa provides an interesting case study of the gains they experienced from implementing RCM in 1996. Between 1996 and 2000, Nasa estimated its cost-avoidance from RCM to be $33,643,000, providing a return on investment (RoI) of 2.2. Nasa reports this RoI is consistent with that experienced in similar RCM implementations by British Petroleum and the Electric Power Research Institute (EPRI).

Outlining a standard reliability centered maintenance process

A typical RCM process comprises seven steps, which can be segmented into three distinct phases: Decision, Analysis, and Act.

Phase 1 – Decision

The first phase is where you assemble a team, select the equipment for analysis, and identify its functionality. Team selection is an important step, requiring a cross-functional representation from key departments like maintenance, engineering, management, and subject matter experts.

Select assets that are critical to production/operation. An asset is particularly suitable if the cost of repair and maintenance is causing discussions on its replacement.

Identifying functionality should begin with the asset’s designed or required operating functionality rather than in-service performance. Be specific and note temperatures, pressures, speeds, and acceptable tolerances.

Phase 2 – Analysis

During the analysis phase, you identify ways the asset may fail to perform its intended function, evaluate the failure’s effects, and identify root causes.

Evaluation of the failure should focus on what occurs when the failure happens, its effect on production, and observed events. You’ll need to document all impacts on safety, operations, the environment, and subsequent economic impacts. Basically, you’ll need to perform failure mode, effects, and criticality analysis (FMECA).  is a structured process to identify how equipment may fail and rank the criticality of that failure by severity and likelihood.

Establishing the cause of failure in an RCM process is usually based on the FMECA and root cause analysis process (RCA)

Phase 3 – Act

The final phase consists of two main steps. 

Using data-driven decision-making, you need to devise a maintenance strategy (or group of strategies) appropriate to the criticality ranking that you believe will provide the required performance at minimum cost. 

Having taken action, you’ll need to provide ongoing monitoring of in-service experience to compare with previous maintenance data and gauge the effectiveness of the implemented changes.

How to conduct RCM analysis for a specific asset

While reliability-centered maintenance has become a catchphrase in maintenance circles, it is not an abstract concept. Neither does it refer to simply using specific maintenance tasks such as PdM or condition-based maintenance. 

A maintenance process must meet clearly defined criteria to align and comply with the RCM philosophy. SAE JA1011_199908 standard establishes the seven questions a process must answer to be considered reliability-centered maintenance compliant.

Let’s go through those questions with an example of a centrifugal pump.

1. What is the item supposed to do and the associated performance standards?

The objective of reliability centered maintenance is to preserve system function. Here, the team must define a complete list of all the functions the item performs. One way of getting started is to brainstorm the system’s outputs and place those outputs into a function statement beginning with a verb. Within that statement, be sure to note any applicable performance standards.

Example: If a multi-stage centrifugal boiler water feed pump was the selected piece of equipment, our list of functions might include:

  • To maintain a water flow rate between 215 and 269 US gallons per minute
  • To maintain a discharge pressure of 25.12 bar with the range of +17% and -10%

2. In what ways can it fail to provide the required functions?

We define functional failure as the failure of the asset to fulfill one or more of its intended functions to the performance standard required and defined by the user. While the asset may still operate, it is classed as a failure if it operates outside the defined thresholds of acceptable operation or carries out an additional unintended function (such as a hydraulic ram hunting before finally latching in its retracted position).

It’s important to understand that one function may have multiple failure modes, creating different effects from different causes. We must also be careful not to apply generic statements to assets of identical type as their operating context will vary, with different user requirements and failure effects.

Example: For our centrifugal pump, the team may include the following failures related to the discharge pressure function:

  • Unable to discharge water
  • Discharge pressure drops below 22.61 bar
  • Discharge pressure exceeds 29.4 bar

3. What are the events that cause each failure?

To successfully carry out this step, the identified failure mode should be granular enough to allow action by applying a maintenance action that prevents or diminishes the potential failure.

You may need to drill down through several levels to reach the core reason for failure. However, SAE JA1012 is clear on the need to avoid paralysis by analysis. You’ve gone too far if you reach a level beyond your organization’s ability to influence through a failure maintenance policy.

Example: Let’s take the discharge pressure drops below 22.61 bar failure mode. The list of possible reasons for low discharge pressure might include:

  • Input water excessively hot
  • Impeller damaged
  • Impeller is loose on its shaft
  • Internal leakage

4. What happens when each failure occurs?

To fully understand what happens when a failure mode occurs requires the team to ask wide-ranging and open-ended questions such as:

  • What will be visually evident when the failure occurs?
  • Will you be able to detect the failure via smell or sounds?
  • What impact does the failure have on the environment?
  • What impact will the failure have on safety?
  • What physical change will occur to the equipment?
  • What changes may the failure induce in adjacent equipment?
  • What alarms or notifications will occur?

To avoid going too broad, it helps to focus the questions at three different levels:

  1. What is the local effect on the individual component?
  2. What is the effect at the sub-system level?
  3. What effect will occur on the system?

Example: Our centrifugal pump identified impeller damage as one possible failure. Below is the breakdown of what the effects of that failure might induce at different levels:

  1. Local effects of impeller damage: Pump vibration, a drop in pumping efficiency, a reduction in suction power…
  2. Sub-system effects of impeller damage: Boiler low, boiler trip…
  3. Overall system effects of impeller damage: Reduced efficiency of the steam system, system trip…  

5. In what way does each failure matter?

At the beginning of this process, we carefully chose an asset that was critical to production. Each failure will have consequences, with the RCM process categorizing these into four groups:

  • Hidden failure: No direct impact but may cause multiple serious failures.
  • Safety and environmental: It may kill or hurt someone or breach environmental standards.
  • Operational: It will affect output, quality, customer service, or operating costs.
  • Non-operational: No impact on safety or production, “just” financial costs from repair.

Each consequence will have a criticality ranking applied from an agreed severity table, forming the basis for subsequent maintenance decisions. This ranking keeps your maintenance activities focused on preventing operational, safety, and environmental issues. 

Evaluating and ranking consequences removes our automatic assumption that all failures are bad and we must spend money to prevent them. It allows us to focus maintenance effort and resources on high criticality issues. 

A criticality table may look like this:

Read our in-depth guide on how to perform a criticality assessment to learn more.  

Example: Let’s see what the criticality ranking for our centrifugal pump might look like.

6. What performed task will proactively prevent or diminish the consequences of the failure?

RCM emphasizes failure management rather than prevention. It divides failure management techniques into proactive tasks and default actions. 

Proactive tasks are actions taken before failure, including condition monitoring, on-condition maintenance, time-life maintenance, and predictive maintenance strategies. However, the RCM requires each proactive maintenance task to be technically feasible and worth doing. If it doesn’t meet these criteria, you should choose a suitable default action (explained in the next section).

Technical feasibility measures whether the risk of failure is significantly reduced or eliminated by the selected task, focusing solely on safety risks and hidden failures. For the operational and cost risks, the cost of doing the task over time should be less than the costs of the operational consequences and/or the repair cost over the same period.

Example: In the previous step, we concluded that the centrifugal pump failure caused by the impeller being loose on the shaft has a high criticality. It has to be spotted and repaired as soon as possible. 

Therefore, monitoring flows or vibrations as a part of a CBM or a predictive maintenance program may be a suitable solution. In both scenarios, you would be able to identify a developing trend, allowing scheduled maintenance/repair during planned shuts. 

7. What actions should occur if you cannot find a suitable preventive task?

We use default actions when it is impossible to identify a proactive task that will be effective enough. Default actions focus on managing the failure and include:

  • Equipment redesign
  • Equipment modification
  • Failure-finding initiatives 
  • Run-to-failure maintenance strategy

Example: The cost may be prohibitive to implement a proactive task to identify the centrifugal pump impeller being loose on the shaft. 

Hypothetically, let’s assume that once the impeller shifts, the damage is already done, and adding sensors and monitoring systems simply adds to the repair costs. We may choose to use a default action to manage the failure as it occurs.

We might implement a run-to-failure strategy, supported by tactical inventory management through kitting and a minimum stock holding of critical spares to minimize downtime when the failure occurs. Parallel to this, we could work with the equipment OEM to remove the failure mode through component modification or redesign.

Requirements for implementing an RCM program

Before implementing reliability centered maintenance in your organization, it pays to check your expectations to ensure you have the sufficient maintenance maturity, skills, time, and budget to execute the process. Over 60% of all RCM projects fail due to one of those factors.

Organizational maintenance maturity

Review your maintenance program and processes. If you have good in-service experience with running a preventive maintenance program and historical data to back it up, you are a good candidate for considering RCM. 

If your maintenance system is unstable or non-existent, will you have the necessary insight into your equipment to enable meaningful analysis?

In-house skills

Many organizations will hire consultants to help with RCM training and implementation. However, your maintenance and engineering staff still needs to have a good understanding of your equipment and operation. If they don’t, the process will suffer. 

Resource availability

The RCM process will take time and tie up key staff from multiple departments. Limit the number of assets for the first round, and don’t expect rapid results. Staff will need to focus on regular RCM initiatives over a long period; who will cover for them in their absence?

Inadequate resourcing is another key reason companies terminate RCM initiatives early, citing unexpectedly high resource requirements and project duration.

Realistic budget expectations

Be prepared for your maintenance costs to increase. Nasa calls this the Implementation Bow-wave.
Graph that shows effects of implementing RCM on maintenance and repair costs. Source: NASA RCM guide

You’ll incur extra expenses for training, technological tools, and equipment condition baselines. As more sophisticated testing occurs, you’ll find more faults, and your repair costs will increase temporarily.

In NASA’s case, this normalized after two to three years, after which savings began to flow. The costs continued to drop for the next five years.

Want to see Limble in action? Get started for free today!

Common mistakes and obstacles to avoid

While the RCM process is clearly defined, some traps regarding implementation can increase the likelihood of failure. 

Selecting too many assets

We tend to get so excited about the benefits of reliability centered maintenance that we want to analyze all of our assets. Unfortunately, this asset overload increases the project’s chances of ending prematurely through cost overruns and resource drain. 

RCM is a process for identifying the most cost-effective activities to manage system failure. By identifying the most critical asset in your organization with substantial maintenance costs, you can target the systems or subsystems likely to return the most benefit. As the benefits accrue, you can move to another critical asset. 

Overloading and distracting your RCM team is not a recipe for success.

Not baselining equipment performance

RCM is a change initiative funded with an expectation of a return on the initial investment. You should be able to pull a straight line from pre to the post-RCM performance by looking at financial, quality, safety, or environmental metrics. 

It is impossible to compare improvement action results without establishing a baseline of system performance.

Lack of employee involvement

Implementing reliability centered maintenance is a fundamental change in how you do business. Before beginning the process:

  • Communicate with and educate your employees.
  • For those not directly involved, help them understand the reasons and objectives for the initiative.
  • For the team carrying out the process, invest in training and support to assist them in staying on track through a process that can be distracting.

Not sponsored by senior management

It may be tempting to announce an RCM initiative, task a middle manager with implementation, and wait for the results. However, you won’t be able to pull off this project without support from the top. 

Failure to have a senior-level sponsor actively involved in the day-to-day RCM process risks a lack of buy-in from staff and suppliers and a gradual loss of team direction, influence, and budget.

Use Limble CMMS to streamline your RCM processes

Efficient RCM implementation requires considerable data from in-service experience, manufacturers’ technical specifications, maintenance logs, and breakdowns. A modern CMMS like Limble provides the data integrity and performance baseline to allow accurate comparisons to be made pre and post-RCM initiatives. 

CMMS provides input to each stage of the RCM process: 

  1. It supports asset selection during the Decision phase.
  2. It helps you identify failure effects and causes in the Analysis phase.
  3. It helps you track cost, safety, and performance metrics in the Act phase.

All of that is really just the icing on the cake. A cloud-based CMMS’s real value is its ability to vastly simplify and automate maintenance planning and the implementation and execution of any type of maintenance. 

So, whichever maintenance actions you decide to apply after RCM, you can be sure that a CMMS will be there to help you schedule them, allocate the necessary resources, execute them on time, and track the cost and effectiveness of your entire asset management process.

Schedule a demo or start a free trial to see Limble in action. You have nothing to lose and a lot to gain.

Related Content

Explore our blog for insightful articles, personal reflections and ideas that inspire action on the topics you care about.

Request a Demo

Give us a call or fill in the form below and we will contact you. We endeavor to answer all inquiries within 24 hours on business days.