How To Properly Perform DFMEA & PFMEA [Examples Included]
A system is only as strong as its weakest link.
Whenever you’re trying to create an efficient process or design a new product, you have to make sure that every step of the process and every component of the product can efficiently do what they are supposed to do.
In complex processes and products, finding and improving the weakest links is easier said than done. Luckily, this is where we can turn to reliability risk assessment techniques like PFMEA and DFMEA to help us find potential problems in theory so we can minimize (or eliminate) the chance of their occurrence in practice.
DFMEA and PFMEA are fairly complex technical processes, but we will do our best to:
explain what they are
when should you use them
how to conduct DFMEA and PFMEA (outline the steps)
and give a practical example for each
Part of the FMEA family
Before we can explain DFMEA and PFMEA, we have to backtrack a bit and touch on the original maintenance reliability analysis tool they both originated from: FMEA.
FMEA stands for Failure Mode and Effects Analysis and it represents a step-by-step approach one can take to identify all possible failures (in a certain design, product, process, or service) and assess the possible effects of those failures. Ultimately, the FMEA helps you determine which steps you can take to mitigate some of the risks that come with the failure of any system.
While the execution is similar, there are some specifics that differentiate how someone approaches identifying design versus process failures – which is why FMEA is split into different categories like DFMEA and PFMEA.
We realize that this might sound pretty confusing to someone that isn’t that familiar with the subject matter, but don’t worry we will explain it in detail! Let’s take a deeper dive into each method to get a better idea of how these techniques are executed in practice.
Design FMEA explained
What is DFMEA?
DFMEA stands for Design Failure Mode and Effect Analysis. Many industries worldwide adopt it as a qualitative tool that puts their designs under the microscope. It serves to review designs to identify what might go wrong, in which way would the failure manifest, and what would be the consequences of those failures.
Key decision-makers then use the information in the product design and development phases to improve the quality of a product by making recommended modifications before execution. However, not all “negatives” of a design will be corrected. In some cases, the design flaws are reworked to mitigate the likelihood of a failure, but not entirely eliminate it.
With DFMEA, material properties, geometry, regulatory requirements, tolerances, interactions, and how the item behaves in its environment are all things that need to be considered. When conducted properly, DFMEA improves the performance and safety of the final product.
There are three main instances in which one might decide to perform Design FMEA:
creating a new design with different design standards, regulations or requirements
changing an existing design
the existing design will be used in a new jurisdiction or environment
Now that we’ve defined DFMEA let’s take an in-depth look at how it’s implemented.
The outline of the DFMEA process
The approach to DFMEA has to be methodological. Best practices suggest following steps outlined in a standard DFMEA worksheet.
Each step of DFMEA builds on the subsequent one, always supplying key indicators that will be used to make crucial decisions.
STEP #1: REVIEW THE DESIGN
The design phase is where the multi-disciplinary team goes through every system, subsystem, interface, and component to determine what could go wrong. Details are then included in the items, functions, and requirements section of the DFMEA worksheet.
The aim is to see just how good a design is and whether changes should be made to make it better.
Breakdown of system, subsystem, and component in DFMEA – Image source
The design dictates the direction that the project will take and therefore plays a vital role in knowing how to decouple each system/component from the whole. During this decoupling, the DFMEA worksheet is completed in stages closely tied to the project’s design timeline. This helps to effectively isolate each item so that interactions are revealed, which helps us to identify failure modes more easily.
It’s then possible to consider collective or selective remedial actions at later stages.
STEP #2: IDENTIFY FAILURE MODES
Once systems/components are decoupled, the next step is to thoroughly assess each to determine all the possible ways they can fail and include this in column 4 of the DFMEA worksheet.
There are five failure modes:
Full failure: The system or component is no longer functional and needs to be removed entirely or replaced.
Partial failure: There is still some functionality, but the system or component is not operating as it should.
Intermittent failure: Where the malfunction occurs on an irregular basis.
Degraded failure: Frequent usage leads to fatigue which weakens the functionality of an item
Unintentional failure: Failure from one item affects another.
A component or system may have more than one failure mode, and it’s important to document each at this stage.
Different things can affect failure modes:
Operating conditions: a look at the environment (hot, cold, humid, covered or exposed…)
Usage: look at the function of the item
Service operations: look at whether the item will be accessible, the possible errors that could occur if it’s substituted with a wrong item, and whether it’s challenging to construct.
STEP #3: LIST THE POTENTIAL EFFECTS OF EACH FAILURE MODE
A system or component rarely fails without triggering aftershocks. These aftershocks – or effects – can either be minor or devastating. A minor effect would be a lightbulb suddenly failing after an electric surge, whereas a catastrophic effect would be a massive fire that results in lost property due to the failure of a lightning protection system. Both have different consequences and are weighted differently in Design FMEA.
Put simply, failure effects can be loss of life, environmental damage, financial loss, property destruction, impact on regulatory requirements, as well as public scrutiny.
STEP #4: ASSIGN SEVERITY RANKING
In this step, you want to know the consequence and impact of a failure mode. In other words, just how serious are the effects of each failure mode on the system, component, user, and other people? What bad news I do not want to hear if there is failure?
The degree of severity you can assign is relative: what you might consider to be very severe might not be severe on another project.
Expert opinions that are guided by experience help to narrow down the severity of each failure mode. To make this easier, DFMEA uses the following scale:
9-10: Very severe consequences where regulatory and safety factors cannot be overlooked. Often results in a change of the design direction.
7-8: Loss or degradation of the primary function of the component/system studied.
5-6: Loss or degradation of the secondary function of the component/system studied.
2-4: Annoyance that doesn’t affect functionality.
STEP #5: DEFINE THE CAUSE/MECHANISMS OF FAILURE
In column 8 of the DFMEA worksheet, we are getting closer to knowing just how good is our design and whether we’ll still have a valid design after all the failure modes are considered. A weakness in the design will result in failure, so the more weaknesses we identify and resolve, the fewer failures we will have to consider.
Conciseness and completeness are the keywords at this stage. You don’t want to get lost in long descriptions where it takes a lot of time to understand what the causes or mechanisms are. Every cause or mechanism you can think of should be listed, but staying focused with pertinent ones will ensure you don’t go off track.
STEP #6: ASSESS CURRENT DESIGN CONTROLS
In keeping with our previous analogy, we want to know how the current design controls react to prevent and detect the weakest links. The design controls should be robust enough to prevent failure modes.
The two types of design controls considered in columns 9 and 11 of the DFMEA worksheet are prevention control and detection control and they provide valuable information to guide corrective actions later on:
Prevention control aims to eliminate the cause/mechanism from occurring. Examples of prevention control include design requirements, engineering requirements, material requirements, and documentation. These are usually carved in stone to ensure that certain regulatory and project considerations are followed.
Detection control seeks to identify the cause/mechanisms that lead to failure and is used in prototyping, functional testing, reliability testing, and simulations.
The aim of each of these is to detect failures before the items go into operation mode.
STEP #7: ASSIGN OCCURRENCE RANKING
Occurrence ranking in DFMEA seeks to establish the likelihood that a specific cause will result in a failure mode based on a scale of 1 to 10, as represented in the table below.
You are looking into the future and using experience from your team and yourself to guide your judgment. With occurrence ranking, we don’t talk about absolute value. Instead, we talk about a relative deduction supported by experience, research, brainstorming, and other information gathering techniques.
STEP #8: ASSIGN DETECTION RANKING
In column 12 of the DFMEA worksheet, you are still working with your team to assess the design controls already in place. In this phase, you are working based on assumptions that the failure has occurred. The exercise aims to create different scenarios that help to sift through the complexities and give you a clear vision of how the current design detects failures.
Detection ranking assigns a scale of 1 to 10, where 1 means that failure cannot occur, while 10 stands for almost impossible to detect.
Now, you might be curious: Do I choose prevention control or detection control for my project? The answer is that you should always aim to have robust prevention control in a place where possible. As explained earlier, prevention control eliminates the cause or mechanism of failure, which has an impact on the occurrence. Detection control identifies when failure has occurred but does not prevent it.
STEP #9: CALCULATE THE RISK PRIORITY NUMBER (RPN)
The Risk Priority Number is calculated simply by multiplying severity, occurrence, and detection (which were covered in steps #4, #7, and #8 respectively).
The value you get will be between 1 and 1000 (as all 3 factors are ranked on a scale of 1 to 10). The higher the RPN, the higher the risk involved. This is where design modifications are considered and also takes us to our next step of determining corrective actions.
STEP #10: DETERMINE CORRECTIVE ACTIONS
A high RPN means that an unchanged design will have numerous risks. How do you resolve this?
A good starting point is to determine what is an acceptable RPN for your organization.
A low RPN on a particular item does not mean there are no risks. Good design practice would be to look at the design holistically and see where the RPN can be lowered – even in cases where it is already low.
Critical components that have high RPN require special care with specific recommendations. These recommendations can take the form of revising the design to look for engineering guidelines that may have been missed, starting from scratch with a new design, or taking additional recommendations into considerations. All these actions aim to see how to adjust severity, occurrence, and detection.
As a guide, severity should only change if the failure has been removed – which may involve a complete design change.
To lower the occurrence ranking, you’ll need to go back to the basics where the causes/mechanisms are listed in column 8 of the DFMEA worksheet. Assess each cause or mechanism carefully with a focus on removing or using control measures.
To lower detection, you can implement more preventative measures bearing in mind that these might increase costs.
STEP #11: ASSIGN, PLAN, AND EXECUTE
Once you have decided which actions you’d like to take to adjust the RPN, the next steps are to identify who will do what and agree on a completion date. That being said, things often won’t go according to plan and column 17 of the DFMEA worksheet reminds you of that.
You can note the actual finished date so that, as you progress throughout the sheet with design changes, you can keep track of where you were before project execution.
STEP #12: REANALYZE THE RPN
So you’ve concluded that the RPN is not what you wanted based on a threshold you had. You decided to change some aspects of the design to remove or lower the failure modes and effects. After the new calculation, there is a slight decrease in occurrence and detection. But you are still not satisfied. Do you continue recalculating and redesigning?
There will come a moment where the cost-effectiveness of a particular item is no longer feasible – even if it has a lower RPN. At this stage, you’ll need to assess your risks carefully to determine which are critical and which aren’t. Further consultations with a multi-disciplinary team may be necessary.
At the end of the day, the goal is to have an optimal design with tolerable or no risks.
After a lot of theory, now it’s time to apply everything with an example. Albert Einstein also believed that when he said, “Example isn’t another way to teach, it is the only way to teach.”
The example we’ll consider is one we’ve already mentioned in step #3 with the lightning arrester and lightning protection system (LPS). To keep our example short and understandable, we’ll only consider items 1 and 3 of the LPS and we won’t be recalculating the RPN. We are also not considering whole items, subparts of interfaces.
We can see from the table above, that the RPN is 800 in all cases. If our predetermined RPN threshold was 200 for example, then we would have to find ways to reduce our failure modes by looking at the severity, occurrence, and detection of each item.
Process FMEA explained
What is PFMEA?
PFMEA stands for Process Failure Mode and Effect Analysis. Unlike DFMEA, which focuses on design, PFMEA focuses on processes. It is also a qualitative tool that zooms in on current processes to see where they can be improved.
The PFMEA can be used in many different scenarios:
before control plans are developed for a new or modified process
when a new process, method, requirements or technology is introduced
when an existing process is slated for improvements
when there’s a new way of implementing an existing process
during the execution of the process for quality control measures
As we have already seen, there is a lot of back-and-forth information between PFMEA and DFMEA. This exchange is crucial in refining the removal of risks in a project to improve the design and process quality. Some risks will not be seen during a DFMEA, but these can appear while preparing the PFMEA.
The outline of the PFMEA process
The steps in PFMEA begin with the worksheet where information in each column gradually works its way to the final calculation of the RPN.
PFMEA begins with reviewing the process by using a process map. This map is a detailed flowchart that identifies what the process does and doesn’t do – a.k.a process flow diagram.
The diagram shows from input to output all the activities associated with each process, including interfaces. To put it simply, the process map guides the team from the moment a process starts to the moment it ends by providing a logical manner to assess what details are involved in its entire journey.
In the PFMEA flowchart, the process function and requirements can be completed at this stage. But before we move to step 2, let’s see what’s involved in the process function and requirement:
Process function: Defines the intent of the operation. Put simply, it specifies the reason/purpose of an action.
Process requirement: In this column, the team defines the What for each process. This “What” includes all the inputs for each process.
STEP #2: IDENTIFY POTENTIAL FAILURE MODES
In PFMEA, potential failure modes are the weak links that cause a process to fail. It is assumed that the materials and the way they are designed to fulfill acceptable standards. If there is reason to believe that defects, quality issues, design flaws, or information from historical records count towards better analysis then these take precedence over the assumptions.
The five failure modes mentioned earlier in DFMEA are also applied here.
STEP #3: LIST THE EFFECTS OF EACH FAILURE
Process failure effects impact end-users, internal customers, external customers, subsequent operations, locations, timeline, planning, and execution. Therefore, a lot of “How” and “What’ questions are necessary.
These impacts should be consistent with what was already decided by the team during the DFMEA brainstorming. They should also be noted separately, even if they are already mentioned in the DFMEA worksheet.
STEP #4 – ASSIGN A SEVERITY RANKING
The severity column in the PFMEA worksheet looks at the criticality of the effects of each failure mode. It reflects how bad the situation might become if the failure mode is not resolved. The severity column assigns a scale of 1 to 10 to measure just how serious the end result would be.
Again, this is a relative scaling based on the experience and knowledge of the multidisciplinary team. We already mentioned how each number is classified in DFMEA, but to refresh your memory, here it is again:
9 to 10: Very severe consequences where regulatory/safety factors cannot be overlooked.
7 to 8: Loss or degradation of the primary function of the component/system studied.
5 to 6: Loss or degradation of the secondary function of the component/system studied.
2-4: Annoyance that doesn’t affect functionality.
The classification column helps to prioritize the failure modes that require urgent attention, special products (material sensitivity in certain environments), and requirements (legislation).
STEP #5: IDENTIFY CAUSE/MECHANISMS OF FAILURE
Moving along the PFMEA worksheet, you now want to know “How could this happen?” in the cause/mechanisms column. This “How” can either be caused by a weakness in the design or the process. The aim is to identify these weaknesses to see what should and can be done to correct them.
To achieve this, broad statements that do not provide any specific detail about the “How” should be avoided. As with the DFMEA worksheet, conciseness and completeness are two essential parameters that define how each cause/mechanism is best detailed.
STEP #6: CURRENT PROCESS CONTROLS
Prevention and detection are two design controls that appear in Process FMEA. The associated definitions are the same as DFMEA, but for thoroughness, let’s have a quick look at how each applies to PFMEA:
Prevention control: During the input and output journey of a process many weak links can cause failure to occur. If there are measures to stop, remove, and eliminate this possibility along a process, then we will succeed at lowering the number of failures and risks involved.
Detection control: Maybe, during a particular process, it’s not possible to prevent a failure. If this is not a critical failure, then measures should be in place to detect it. By recognizing it, you also open the possibility of using prevention controls if needed.
Some degree of assumption is needed to know if the current process controls are robust enough. Therefore, you’ll create what-if scenarios. You will imagine that a particular process has failed to try and see how the other processes might be affected. Doing so narrows down the risks involved and helps to identify how different processes are interconnected.
STEP #7: ASSIGN OCCURRENCE RANKING
Before calculating the RPN, you need to ask: How frequently will the cause/mechanisms that lead to failure modes occur on a scale of 1 to 10? If the likelihood is very high, then you choose rank 10, if highly unlikely, then rank 1 is best.
STEP #8: CALCULATE RISK PRIORITY NUMBER (RPN)
Simply multiply severity, occurrence, and detection together to get the RPN. The meaning of your final result will depend on what you defined as acceptable and unacceptable. High RPNs are known to include high risks, and therefore, steps should be made to lower this value.
STEP #9: TAKING CORRECTIVE ACTIONS
We now have a lot of information about the processes, the associated failure modes, and the effects of each. We also know just how severe the failure mode will be, how often it will occur, and what we have already implemented to detect or prevent different failure types.
Now we need to prioritize our findings to decide which is urgent and should be corrected by appropriate process changes. Naturally, processes with high risks should be at the top of the priority list.
STEP # 10: ASSIGN, PLAN, AND EXECUTE
With the corrective actions defined, the next step is to have a plan of action. We want to remove weak links in our processes, and we need to look at both our design and our process flow chart. We should ask:
Who will be responsible for putting the corrective actions in place?
When can we expect these changes to be made?
How will these changes affect the available resources we have for other processes and design changes?
What are the highest priorities?
After the changes have been implemented, we can calculate RPN to see if we were able to reduce it.
STEP # 11: RECALCULATE RISK PRIORITY NUMBER
An RPN recalculation provides information about the effect of each of our actions on the weakest link. We defined our action plan in step #9 and decided on ways to implement each in step # 10. Now we want to know the effect our changes had on the process and design and see if we were able to address all failure modes with high severity in the best way possible.
Once we know that the changes address all the risks and have identified those risks that aren’t serious, then it’s time to agree with the team and wrap things up.
We take another look at our previous example, but this time from a process and not a design point of view. We consider the processes involved in 1, 2, and 3 to keep our example short and easy to understand. We won’t consider recalculating the RPN.
We look at all the activities involved and also interfaces. Different questions are asked during the different steps to guide where RPN requires attention. From step 1 to 7, we define the failure modes and associated effects if the correct process is not adopted to erect the Lightning Protection System. In the column “requirements”, these are mostly design-related factors that must be optimal to avoid high failure occurrences.
PROCESS FMEA EXAMPLE STEPS #8 TO #10
From our calculations we see that in some cases, the RPN is 160 and in others it is 700. What we choose to do with either will depend on the RPN threshold we use as a measuring stick.
It is worth noting, that an RPN below a certain threshold may still need to be adjusted if the severity is high and there are no control methods in place to prevent or detect certain failure modes. While time-consuming, this holistic approach guarantees optimality in both design and processes.
As all people in technical fields know, theory is not worth much if it doesn’t translate into practice. If your process relies on visual inspections and other preventive maintenance work, maintenance teams need to follow that up by creating and following their preventive maintenance schedules. CMMS is here to help you coordinate those activities and ensure your PFMEA and DFMEA efforts are not wasted.
Despite their many advantages, Design FMEA and Process FMEA will not solve every failure. If that was the case, maintenance managers and reliability engineers would never have to concern themselves with failure metrics.
The types of failures identified and associated solutions are closely tied with the experience and knowledge of the multi-disciplinary team tasked to conduct the process, as well as their access to historical records. One of the ways to gather historical data is to use CMMS software like Limble that will always save data about different failures and their causes in the asset logs.
If you are able to handle a somewhat steep learning curve, DFMEA and PFMEA are invaluable risk assessment tools that can have a big impact on your bottom line.
We hope that this provided a nice overview of what is required to use these risk assessment techniques. We were not able to include every detail of every step as this article would be extremely long. For those that are seriously thinking of using DFMEA and PFMEA, we suggest you look for relevant courses and certifications that are more suitable to provide all of the necessary details.