Fault Detection And Diagnostics In Equipment Maintenance

Understanding equipment failures and developing strategies to detect and diagnose them is one of the key elements of equipment maintenance. 

The purpose of this article is to present an overview of Fault Detection and Diagnostics as they are applied to improve the equipment maintenance process and boost asset reliability. 

The story behind fault detection and diagnostics

In the early days, equipment maintenance was restricted to repairing faulty assets and performing basic routine maintenance based on rigid time intervals. Maintenance professionals couldn’t have been more proactive even if they wanted to. Their capability to collect, store and analyze data on equipment health and performance was simply too limited. 

However, due to consistent advancements in microprocessor-based controls, automation, real-time data acquisition, and systems like Fault Detection and Diagnostics (FDD), the way in which we perform equipment maintenance has been significantly transformed. 

FDD in equipment maintenance

The objective of Fault Detection and Diagnostics in the context of equipment maintenance is to optimize maintenance costs while still improving the reliability, availability, maintainability and safety (RAMS) of the equipment. 

The FDD functions by continuously monitoring and analyzing condition monitoring data and detecting any anomalies (if present). The equipment condition datasets are then processed by fault diagnostics algorithms, sometimes embedded within the equipment itself, to produce failure alerts for the equipment operators and enable timely maintenance intervention. 

In some cases, the algorithms are sophisticated enough to even initiate failure containment actions to auto-correct the failure itself and restore the equipment to its healthy condition.

The Essential Guide to CMMS

The Essential Guide to CMMS

Key elements of the fault detection and diagnostics system 

The FDD, as the name implies, contains the detection and diagnosis of equipment failures. The diagnosis of the failure can be broken down into failure isolation and identification. 

The failure evaluation is often added within the scope of FDD as it helps to understand the severity of failure on system performance – an important aspect of maintenance management

Nevertheless, the Fault Detection and Diagnostics algorithm for any equipment should contain at least the four key processes listed below (these can constitute a nonlinear process as well, provided that some steps happen at the same time):

Key elements of FDD

We need to discuss each element in more detail to really understand how fault detection and diagnostics work.

1. Fault detection

Fault detection is the process of discovering the presence of a fault in any equipment before it manifests itself in the form of a breakdown. It is the most important stage of FDD as all of the downstream processes depend on its accuracy. 

If the equipment is unable to discover the right failure mode (or if detection is incorrect and triggers false alarms), then the isolation, identification and evaluation will also be ineffective. 

There are two main approaches to fault detection:

  1. Model-based fault detection: It is carried out through mathematical modelling of signals and processes. 
  2. Knowledge-based fault detection: It is a method that leverages historical data on equipment performance. 

Model-based fault detection

In model-based detection, we define a set of engineering rules that are written in line with physical laws that define the relationships of subsystems and components within the equipment. Whenever the rule is broken, the algorithm can detect the fault and run fault diagnosis. 

One example of model-based fault detection is the use of time-domain reflectometry (TDR) to detect faults in underground cables. In TDR, the signal is sent across the test cable and is received after being reflected from the point of fault. 

If the cable has a discontinuity or high impedance, the portion of the signal will be reflected back to the test equipment or receiver. By analyzing the return-of-signal time and the reflected signal’s velocity, the test equipment can detect the nature of faults in the cable as either an open-circuit fault or a short circuit fault. 

Another simple rule-based detection example comes from the series operation of bottle filling, capping, and packaging system on a conveyor belt system. A simple rule can be established that indicates the hierarchy of processes such as:

  • the bottle cannot be capped until the bottles are filled with liquid
  • the bottles cannot be packaged unless they are filled and capped 

In case of a fault in the bottle capping mechanism, the algorithm will detect the incoming disruption in the packaging system. It will notify the packaging operator well ahead of time. The necessary preparation can be made to minimize operational losses on the packaging side of the conveyor belt.

Knowledge-based fault detection

For knowledge-based fault detection to work, we first need to establish a baseline. This is done by retrieving the parameters of equipment performance such as voltage, current, vibration, temperature, pressure and other relevant process variables – while the equipment is working under normal conditions. 

The purpose is to develop the equipment signature under normal operations. 

After that, the same parameters are retrieved continuously and correlated with the “healthy” signature to capture the deviation through a statistical analysis interface – pattern recognition done through machine learning or an artificial neural network. 

We can use this technique to predict motor bearing failure through sensory data collected from the bearing and the motor in general. 

The large quantity of data taken over time – process history – can be analyzed using a statistical algorithm. This helps us understand the impact of the different conditions the motor is subjected to, such as  thermal rating, mechanical stress, or some other operating conditions that occur in special circumstances. 

The algorithm then correlates the impact of these conditions on the degradation of bearing health and predicts the failure rate and health condition of the overall motor. 

Based on these data signatures, the analysis can be made to predict the future health of the equipment. Moreover, the necessary alarms can be triggered and fault diagnosis can be conducted, so the operator/technician can take appropriate action. 

The same data can be used to establish a predictive maintenance strategy over the remaining life of the motor.

2. Fault isolation

The goal of the fault isolation process is to localize the fault to the lowest component that can be replaced. In some applications, fault detection and isolation go hand in hand; they can, of course, be separate modules of the process. This is because the processes of detecting and localizing the fault are happening at basically the same time, both done by the Fault Detection and Isolation (FDI) algorithm. 

For instance, consider the example of TDR testing for underground cable. The returned pulse signal from the cable simultaneously indicates the presence and the location of fault through time and velocity of the returned pulsed signal.   

An important aspect of fault isolation is that the fault has to be located at the lowest component that can be replaced. This is done to improve the accuracy of isolation and reduce the impact of downtime. 

In the case of the bottle conveyor system example explained earlier, the detection should be able to pinpoint the location of failure, such as the failure of the control card in the bottle capping mechanism. 

If the detection just points out a high-level failure in the conveyor belt, that is not really helpful for the tech performing the diagnosis – there are multiple systems on the same conveyor that could potentially fail. 

The information that will really speed up the repair process is knowing the accurate location of the fault.      

3. Fault identification

The purpose of fault identification is to understand the underlying failure mode, determine the size of the fault, and find its root cause. Fault diagnosis methods may differ, but the steps to follow are generally the same.

Components of fault identification

Understanding the underlying failure mode

In-depth understanding of the failure mode requires work: 

  • we need to analyze how the fault behaves at different times
  • so we can develop the time-variant signature of the failure mode 
  • and classify it into different categories 

Determining the size of the fault

Regardless of the fault detection method applied, the size or magnitude of the fault plays an important role in defining what is the desired level of fault tolerance that needs to be built into the design of the equipment. 

If the fault magnitude is low, the system just needs to be able to endure the fault for an extra time until the fault is cleared by itself. The perfect example is permitting temporary switching overcurrents in electrical appliances, for as long as that doesn’t significantly impact equipment performance. 

Now, if the fault magnitude is really high, a different methodology is required: engineers have to use active or passive redundancies to enhance fault tolerance on their devices.

Finding root causes 

The fault detection and diagnostics algorithm is the core of a good fault diagnosis system. It is based on machine learning principles, and can be used to identify anomalies in the data streams originating from the equipment, determining the root cause behind it

Identifying some failure modes is really straightforward, while others can be challenging and require extensive mathematical computations. 

Let’s use a high voltage and high power three-phase AC induction motor as an example. 

More often than not, the underlying failure modes are mechanical in nature and associated with the rotary part of the motor: shorted rotor windings, bearing failures, and rotor breakdown. Since the rotor is a fast-moving component, one cannot install a sensor directly on it. 

The advanced FDD algorithms can be used to produce healthy motor stator terminal current signatures and compare them with current signatures under faulty conditions.

For instance, upon breaking of rotor bars, the pulse produced in the stator current is twice the motor stator current frequency. There is an indirect correlation between the mechanical breaking of rotor bars and the fluctuations in the stator current. 

Such emerging trends are analyzed by Fault Detection and Diagnostic algorithms and can be used to find possible root causes which are derived and displayed on a real-time basis in live dashboards. 

The usage of such fault identification algorithms has significantly reduced the amount of time techs need to troubleshoot equipment and reach the root cause of the failures. Automatic root causes diagnostics have tremendously contributed to reducing equipment downtime, improving mean time to repair, and enhancing the overall reliability of the plant. 

4. Fault evaluation 

Once the failure modes and the associated root causes are identified, the next step is to evaluate the impact of that fault type on the overall performance of the system. 

We need to consider factors such as:

  • the impact of the fault on the environment and the rest of the system 
  • the impact of the fault on system safety
  • the financial loss due to downtime
  • the need to make capital replacement decisions (in case the severity of failure is enough to warrant the replacement of equipment as opposed to fixing it) 


Fault evaluation is a significant element of the overall process as it aims to understand the severity of the fault. This helps reliability engineers provide equipment validation and calculate the risk of failures, which will both have a big impact on maintenance requirements, recommendations, and optimization. 

For example, the result of the FDD for one piece of equipment could imply the rapidly increasing failure rates. However, the impact of that fault could be minimal on the overall system performance, thus making the overall risk to be moderate. In this case, the less stringent maintenance strategy such as run-to-failure or preventive maintenance could be sufficient to manage the risk. 

Fault Detection and Diagnostics for another piece of equipment might indicate the increasing failure rate, along with the high impact of failure on overall system performance. In this case, the most stringent predictive maintenance program should be adopted despite its high cost. This is because the increased cost of maintenance is warranted to prevent major fallout that will be way more costly. 

Optimizing maintenance with FDD

In short, fault detection and diagnostics play a decisive role in optimizing the maintenance regime for any piece of equipment, across its lifecycle

With the advent of fast computing technologies, big data processing, and advanced learning algorithms, traditional fault detection has evolved into automatic fault management systems that not only detect faults, but also identify the root cause and implement corrective actions to avoid future recurrence. 

Such automation of a series of manual processes has enabled reliability and maintenance engineers to apply predictions on equipment health, derive future equipment performance, and shape optimal maintenance intervals. 

The only thing they have left to do is fire up their computerized maintenance management software (CMMS), track the condition of their critical assets, and schedule appropriate maintenance work. 

Comments are closed.

Request a Demo

Give us a call or fill in the form below and we will contact you. We endeavor to answer all inquiries within 24 hours on business days.