Root Cause Analysis
Everything you ever needed to know about Root Cause Analysis.
What is Root Cause Analysis?
Root cause analysis is the process of finding the underlying cause for an effect we observe or experience. In the context of equipment failure analysis, RCA is used to find the root cause of frequent machine malfunctions or significant machine breakdowns.
The goal is to determine:
- What happened
- Why it happened
- How to prevent it from happening again
RCA is a reactive process, meaning it is performed after the event occurs. But once a root cause analysis is done, it takes the shape of a proactive way to predict problems before they occur.
If you fix a symptom of the problem, but you don’t fix the actual cause of the problem, there’s a high chance the failure will happen again. For example, suppose you replace a broken belt but don’t change the misaligned part causing the belt to overheat and break in the first place. In that case, you could bet your paycheck that the belt is going to fail again. RCA follows the chain of cause and effects to pinpoint the problem that will make all the other faults disappear when finally eliminated
The RCA process and outcomes
Conducting root cause analysis can be very complicated. It involves a vast amount of data collection and review. The result of a root cause analysis isn’t always black and white. It can’t always tell you if the problem you identified is the true root of the issue. You will often get only a strong correlation between cause and effect and not the exact cause. From there, you’ll have to use your experience and professional knowledge to judge whether to investigate further or not.
RCA is a craft that requires specialized knowledge and in-the-field experience. Meaning you’re likely the best person for the job here. Otherwise, any fixes implemented will likely be just a cosmetic solution to the problem. In the worst-case scenario, the changes made could actually make the situation worse.
Despite these limitations, RCA is still a powerful tool for understanding and improving the fundamental nature of systems and procedures.
Different types of RCA
RCA comes in different forms depending on the problem you’re trying to solve. Here’s what they look like:
- Safety-based RCA comes from the field of occupational safety and health, as well as accident analysis. This type of root cause analysis is used to determine why an accident happened at work I.e. why someone cut themselves or why a part was accidentally dropped by a worker at heights).
- Production-based RCA is used in the field of manufacturing to ensure quality control. You might use this to find out why the injection-molded plastic parts are coming off the line warped.
- Process-based RCA is used in business and manufacturing to determine the fault in a process or a system. This might be used in accounting to determine why vendors aren’t getting paid on time.
- Failure-based RCA is used in engineering and maintenance to determine the root cause of any type of equipment failure.
- Systems-based RCA originated as a combination of some of the root cause analysis techniques listed above. This methodology is an approach that combines two or more methods of RCA. It can be used in a wide variety of fields/applications.
Examples of Root Cause Analysis
RCA example #1: The case of the faulty parts
Injection molding machines are widely used around the world to create plastic in almost any shape or form. The part the machine produces should match specifications within the allowable tolerance.
Let’s say there is a high incidence rate of faulty products, and we need to get to the bottom of it.
First, the problem needs to be well defined. This includes explaining the exact defect the plastic output is having. By observing the output, we can determine if it is one of the four primary defects within injection molding. They are:
- Flash
- Gassing & venting
- Part distortion
- Short mold
Let’s presume that the defect is part distortion. First, write down the problem, including the number of defects occurring as a percentage. Once that is completed, collect all the available data. Pull any maintenance logs can be pulled from your CMMS, review, manuals from the injection mold machine manufacturer, etc.
Collect information on each defective product. From this, measure the deviation from specifications. Take the heat signature of the product once it comes out of the mold, then measure the temperature of molten plastic in the barrel.
We know that part distortion almost always occurs due to temperature problems. But we cannot be sure where the temperature problem is…is it in the barrel while heating or in the mold while cooling?
By analyzing the data you collected, you would be able to identify that. For this example, we’ll assume the heat signature of the finished product is different from the expected one.
This determines that the problem is in the cooling process. Further investigation concludes that the root problem is the wrong spatial arrangement of cooling liquid conduits.
Changing the conduit arrangement that best fits the mold currently being produced will solve the problem of part distortion.
RCA example #2: The mystery of the blown fuse
Next, let’s say a machine stopped because it overloaded and the fuse blew.
Investigation shows that the machine is overloaded because it had a bearing that wasn’t being sufficiently lubricated.
Your investigation continues, and you find that the automatic lubrication mechanism had a pump that was not pumping sufficiently. A review of the pump shows that it has a worn shaft. Investigation of why the shaft was worn discovers that there isn’t an adequate mechanism in place to prevent metal scraps from getting into the pump. This enabled scraps to get into the pump and damage it.
The apparent root cause of the problem is metal scrap contaminating the lubrication system. Fixing this problem should prevent the whole sequence of events from happening again. The real root cause could be a design issue if no filter prevents the metal scrap from getting into the system. Or if it has a filter that was blocked due to a lack of routine maintenance, then the actual root cause is a maintenance issue.
Compare this with an investigation that does not find the causal factor: replacing the fuse, the bearing, or the lubrication pump will probably allow the machine to go back into operation for a while. But there is a risk that the problem will simply reoccur until the root cause is dealt with.
Want to see Limble in action? Get started for free today!
When to perform a Root Cause Analysis?
When you’re doing an RCA to determine the source of a fault, you’ll usually find 3 basic types of problems:
- Physical causes
- Human causes
- Organizational causes
You can also do a root cause analysis if you want to drill down and find out exactly why a process or procedure is producing better-than-average results. By identifying the cause of a positive event, you could presumably replicate it and see those results elsewhere. Even if it’s time-intensive, one round of RCA can mean a lot of bang for your buck.
Keep in mind that RCA requires a significant investment of time, manpower, and money. And it will likely cause further disruption in the specific production line or the system you’re working on. So bearing that in mind, you don’t need to (and you shouldn’t) do RCA for every single fault.
Unfortunately, there is no cut-and-dry rule when to run an RCA and when not to. As the expert and the experienced professional, you’re generally the best person to determine whether or not to run a root cause analysis.
Persistent faults
If the same fault occurs over and over, it’s worth investigating. If the same defect is repeatedly happening, you can assume that it won’t be cleared simply by fixing the visible problem. There is an underlying reason for the recurring faults. These types of incidents need to be investigated with RCA.
Critical failure
To determine if a failure is critical, you can look at the cost to the plant or the total downtime due to the particular failure. When a critical failure occurs, it needs to be investigated to identify the root cause to help avoid this situation in the future. Explosions at an oil rig and airplane crashes are examples of critical failures that need to be investigated.
Failure impact
There are critical machines and critical subprocesses in any system. A failure of these types of machines will halt the entire operation because there may not be a backup or mitigation plan for that particular machine. In this case, how critical the machine is will determine whether or not to do RCA.
Now is not the time to cut corners
Root cause analysis is complex and should not be done on a whim. Your team might decide to cut corners to save on time and speed up the process. But if you want to get to the bottom of any complex event, rushing the process can be detrimental to the whole project. When you have a good reason to conduct RCA, it is in your best interest to create an environment where the process can be executed successfully.
If you want to know how a CMMS could make your job less stressful, get started with Limble on a free trial, or set up a demo with our team.