Navigating the complex landscape of system failures requires a methodical approach. Fault Tree Analysis (FTA) offers a systematic way to examine potential failure points, helping organizations preemptively mitigate risks. In this guide, we will walk you through the steps to conduct a thorough FTA, from building a diverse team to developing risk mitigation strategies.
Step 1: Build a diverse team
When dealing with complex systems, you want different voices in the room.
Experienced professionals in the field will be able to reference past experiences from their professional life. They will also be aware of the technical aspects of the system that impact them the most. Other team members with less technical knowledge can contribute by pitching out-of-the-box ideas and other helpful information.
Brainstorming sessions and meetings need a leader, someone who has experience in conducting FTA. Engineers of respective fields, industrial engineers, and system design specialists are required for any FTA team.
Step 2: Identify failure causes
FTA works from the top down. Start with the top event, then try to identify the various failures that could cause or contribute to it. If you keep digging to build off of each event, it will eventually lead you to the root causes (now that’s what we call getting your hands dirty!). You will be left with a beautiful fault tree.
Potential failures, their characteristics, duration, and different impacts of the failure have to be defined to start and complete the process. Take fire doors in a high traffic area or factory as an example.
These doors are held open until the power fails or the fire alarm is triggered. If the fire alarm is faulty, there is an issue with the wiring, the backup batteries have run low, or someone has tampered with it. The alarm will trigger the doors to close when they are not supposed to. Resulting in a low-level failure, but one that can cause massive frustration and interrupt the entire organization.
Step 3: Understand the inner workings of the system
The team performing FTA needs to have a deep understanding of the inner workings of the system. The engineers working at the system level will have a good idea of how everything works and what failures you will want to avoid. Other team members can then raise questions that result in an expanded list of failure causes worth exploring.
Someone with knowledge and expertise of the system should be in charge of guiding the discussion. The goal is to get a good grasp of the system’s requirements, connections, and dependencies.
Your team should collect the schematics of the system, specifications of different components, and other available manufacturer information. If you’re using an CMMS like Limble, these asset specifications are available at the touch of a button. Studying these materials should build an understanding of how each sub-system and component are connected to each other.
Step 4: Draw the FTA diagram
Once the team understands the system’s inner workings, the next step is to graphically present a functional map of the system using boolean logic. Using the fault tree symbols and structure above, your team can draw the graphical representation of the system and how they are all connected.
Step 5: Identify MCS, MPS, or CCF
After the fault trees are complete, your team can identify MCS, MPS, or CCF based on what they want to accomplish.
- MCS or minimal cut sets are identified to know the most vulnerable parts of the system.
- MPS or minimal path sets are determined to identify the core components and subsystems required to remain operational.
- CCF identifies the components that cause the maximum number of failures.
Your reason for performing FTA in the first place will determine whether the team needs to find MCS, MPS, CCF, or a combination of the three.
Optional step: Assess the probability of failure
More often than not, you’ll find multiple pathways that can lead to the same failure event. For an extensive system, it would be nearly impossible to address all failure causes at once.
To prioritize which events to address first, the team can calculate the probabilities of each failure for different critical sets. The critical set with the highest chance of failure should be given top priority.
This is an optional but valuable step. If you know the probability of each failure, it will be worth the time to use them!
Step 6: Develop risk mitigation strategies
Now it is time to use your Fault Tree Analysis to minimize your risk of failure.
- High priority has to be given to protect MPS (the minimum set of components to keep the system operational).
- Strict maintenance schedules have to be maintained for CCFs as they can cause a multitude of issues.
One potential risk mitigation strategy, especially for CCFs, is preventive maintenance.
A CMMS software like Limble can help you assure adherence to required maintenance schedules. This includes following the best practices for spare parts management, so the maintenance team always has replacement components in stock. This effort has to be put in to minimize the probability of failure.
As you can tell, a lot of research and expertise has gone into developing the Fault Tree Analysis process. If you would like to dive deeper into this subject, check out these additional resources:
- Book: Fault Tree Analysis Primer by Clifton A Ericson II
- Book: Fault Tree Analysis A Complete Guide by Gerardus Blokdyk
- Coursera lecture on FTA
- FTA lecture on YouTube by Department of Industrial Ans Systems Engineering at IIT Kharagpur
- Another FTA lecture on Youtube by xSeriCon, an engineering consultancy and safety training firm.