Troubleshooting 101: General Principles For Fixing Any Device
Be it at home, at your office, or at a manufacturing plant, devices we use are bound to malfunction at some point. When rebooting the system doesn’t solve the problem, we (or a technician) have to test other hypotheses to identify and fix the issue. That is troubleshooting in a nutshell.
This article will be oriented towards the industrial audience, but the troubleshooting steps and general principles we discuss can be applied in most situations.
What is troubleshooting?
Troubleshooting is the process of investigating the problem with a certain device with the mandate to put it back in operation as soon as possible. It can also be done when a machine is simply not performing up to expectations. Effective troubleshooting is an integral part of asset management, commonly used during the diagnosis and repair process.
A machine that is properly operated and regularly maintained has a very low chance of experiencing a major breakdown. But that chance is never zero. In one form or another, troubleshooting is a common occurrence at any facility that depends on physical assets.
Troubleshooting is not limited to manufacturing plants and industry professionals. A cable company trying to identify the reason for the loss in transmission of a fiber with the aim of fixing it is troubleshooting. The same goes for a regular person trying to solve an internet connectivity issue at home. In a way, troubleshooting has become a catch-all term for identifying a cause and solving a problem.
Why and when to troubleshoot?
Broadly speaking, troubleshooting is done in three instances:
Machine failure: This is the most urgent instance where troubleshooting is required. The machine is out of commission and urgent repair has to be done to return it to operation.
Unexpected operation: Every machine will have an expected range within which it has to operate. It also has to deliver the expected output. Any deviations from that have to be investigated. This is not as urgent as complete machine failure but could be indicative of an underlying problem. If it is not fixed on time, it can lead to a major breakdown.
Other anomalies: The machine is working within the ideal operating range and is delivering the expected output. However, an operator has spotted some anomaly. It could be a strange sound, a weird smell, visible smoke, excessive vibration, etc. Such anomalies should also be investigated within an appropriate time window.
The three instances are given in descending order of priority. Complete machine failure has the highest priority and should be tackled as soon as it happens. The other two situations have more leeway, but should not be overlooked.
In the case of unexpected operations and anomalies, troubleshooters will place a lot of focus on reproducing the symptoms. Reproducible problems can be reliably isolated and resolved. It is one of the core principles of troubleshooting.
While equipment manuals can include useful guidelines, the intuitive sense of identifying possible problems can only be developed with years of hands-on experience. Knowing the inner workings of the asset and its common quirks is extremely helpful when trying to diagnose a problem.
For some machinery, equipment manufacturers provide technical training sessions to educate the operation staff about the machine. They might also provide troubleshooting techniques and checklists that can be used. This goes a long way but it is not a match for having past troubleshooting experience.
In some cases, experienced machine operators can perform troubleshooting instead of maintenance techs. Some organizations have seen great success training their operators to perform visual inspections, troubleshooting, and other maintenance tasks. It is an approach known as autonomous maintenance.
Today’s manufacturers are facing a serious staffing challenge. The number of skilled maintenance workers is decreasing as experienced technicians go into retirement. In the future, factories will have to rely more on technology for machine monitoring and troubleshooting.
General troubleshooting steps
The efficacy of the troubleshooting process is dependent upon the experience of the person performing it and their ability to be methodical. Nonetheless, both seasoned and inexperienced professionals should follow the general troubleshooting steps outlined below.
The above image showcases an analytical framework in the form of a flowchart. It can be used as a guideline to troubleshoot any device. Let’s see what goes into each step.
Step 1: Problem definition
The first step of solving any problem is to define it well. It is especially true in the case of troubleshooting.
First, we have to define what kind of problem it is: a machine failure, an unexpected operation, or a random anomaly. The system/device had to present some kind of symptoms for troubleshooting to take place. Oftentimes, there will be signaling and alarm systems like visible red lights or a warning sound for when a certain machine part overheats.
Step 2: Information gathering
All available information regarding the machine and its operations is collated. This includes the machine manual, data regarding operations, maintenance history, problem report, etc. If communication with OEM is possible, the issue can be discussed with them first.
Step 3: Information processing
Gathered information has to be organized and analyzed to identify the root cause of the problem. Reason, logic, and technical know-how have to be employed to catch the underlying problem. The vast experience of the operator/technician will be helpful to arrive at a solution faster. After all, it is much easier to solve a problem you have seen before.
During the information analysis, a lot of focus will be placed on recent changes. Did we use a new replacement part or installed any upgrades? Did we change the type of input material the asset uses? Has the device been operated/used in a different way than usual? Did we have an electrical surge recently? More often than not, recently introduced changes to the system or the environment it operates in will hold a clue to why the problem manifested.
If we still have no clue what caused the problem after analyzing the data – we need to go back to step 2 and collect more info. It is always possible to overlook certain data or disregard something as irrelevant during the initial information gathering process.
After this exercise, the person performing troubleshooting should be able to form a hypothesis and propose a solution.
Step 4: Solution proposal and testing
The root cause identified in step 3 will indicate potential solutions. The right solution is found through the process of elimination. The most suitable solution should be tested first. Taking into account the criticality of the machine, operating conditions, and problem at hand, the suitability of the proposed solutions will be decided based on:
required resources and associated costs
difficulty of implementation
the long-term outlook for the machine
personal biases of the person performing the troubleshooting
For complex problems and machinery, the proposed solution can be tested on a small scale or a model.
If none of the proposed solutions pass the test, the operator will have to revisit step 3 to analyze the information again and generate new ideas. This will need to go on till the test is a success.
Step 5: Implementing the fix
Once the proposed solution passes the test, it needs to be implemented – if the test was performed on a model. If troubleshooting was performed directly at the affected machine, the solution is implemented during the testing process.
The successful solution should be noted down and attached to the asset log so it can be used for future reference.
Tools and resources maintenance professionals use to streamline the troubleshooting process
As Dave Chappelle once said:
Even though troubleshooting is not performed just on modern assets, advanced technologies are starting to gain traction in the maintenance and repair processes. We discuss commonly used troubleshooting tools and resources in this section, and advanced solutions in the next.
Checklists are an effective way of approaching common problems methodically. There are a plethora of digital tools available to create checklists.
Some maintenance platforms like Limble also allow you to create and store troubleshooting checklists which can be accessed via mobile devices and used while in the field. Maintenance engineers can work with experienced techs to identify problematic assets. They can use Limble to create step-by-step troubleshooting instructions that include warnings and annotated images for specific assets/issues. When finished, they can attach the checklist to the corresponding asset.
Resources from OEMs
As mentioned earlier, the original equipment manufacturers can provide manuals, troubleshooting guides, and best practices for maintaining their products. Today, most OEMs provide both physical and electronic copies. If a technician is facing a very specific and uncommon issue, getting on a call with the OEM or checking industry forums are alternative ways to find potential solutions.
CMMS to aid troubleshooting
CMMS system is used to streamline, organize, and automate maintenance operations at any plant or facility. As a centralized repository of maintenance data, it keeps a lot of useful information that can be used during the troubleshooting process like:
contact information for machine and parts vendors
historical maintenance records (maintenance logs and reports)
details of the work request sent to report the problem
past and recent machine-condition and performance data gathered through CBM sensors
Having quick and easy access to all of this information can significantly speed up the troubleshooting process. It is one of many reasons why more and more organizations are implementing cloud-based maintenance solutions.
The future of troubleshooting on the plant floor
With factories becoming more automated, plant floors need fewer and fewer machine operators. It is not crazy to expect that, in the future, factories will employ more technicians (for troubleshooting and equipment maintenance) than machine operators.
At the same time, technology is making troubleshooting easier, faster, and less dangerous. Here are a few emerging solutions that could find their way to many plant floors.
Machine learning and prescriptive analytics
Machine learning can be used to predict potential failures and is a major part of predictive maintenance. Even in troubleshooting, machine learning algorithms can be used to quickly analyze vast amounts of data needed to identify possible causes.
Some organizations are already taking things a step further and testing something called prescriptive analytics. In the context of troubleshooting, prescriptive analytics aims to help machines diagnose themselves and then present possible solutions base on that self-diagnosis.
Augmented reality will provide an additional layer of information to the machine that is being inspected.
For example, when a component is inspected, all the information regarding the component will be displayed on the augmented reality screen. When you move on to the next component it will automatically change what is displayed to suit the current object observed. It might include tips, warnings, and next steps, ensuring the quality and safety of the troubleshooting process.
Virtual reality and simulations
Virtual reality can be used to take the operator/technician and place them in a simulated environment. It is a very useful training tool. In the virtual environment, they can’t damage expensive equipment nor harm themselves. It can be a great way to practice troubleshooting processes before handling real machines.
Digital twins are operated by OEMs. They have a digital copy of all machines that are under operation. If a failure happens at your location, the OEM will already have the data and can compare it against the data from all the other machines of the same kind. Based on this, the OEM can identify if a similar incident happened to a machine at some other plant.
In other words, OEMs could immediately notify technicians/operators and let them know about common causes and solutions.
In the best-case scenario, a malfunctioning device will result in a mild annoyance. In the worst-case scenario, it can cause a safety incident and have a debilitating effect on a business’s bottom line.
Being able to quickly deal with equipment issues is a reflection upon the maintenance department and how well it was able to organize work and train its employees. Since skill and experience are so important, businesses should make extra effort to reduce the turnover of their technicians and machine operators.
If you have any troubleshooting questions, jump to the comment section below. If you want to learn more about Limble CMMS, you can contact us directly.
"The thing that I loved the most right from the start was the ease of use of the Limble software. The customization options available when setting up PM's are great. I love the flexibility it gives to tailor the PM to exactly what your needs are."
— Richard Dunaway
It takes me about 10 seconds.
"Limble made my job easier pretty much right off the bat. Now I create Work Orders on the fly. It takes me about 10 seconds."
— Fraser Cockell
Perfect for my business
"I run a coffee equipment service company and Limble is great for everything we do. We can store all of our assets and equipment and access it easily with a comprehensive list of all the work done on the equipment. Limble is by far the best CMMS on the market and I wouldn't dream of using anything else."
— Tie Groth
Great product at a great price
"Terrific customer service, easy to use, and at a great value. Our old Maintenance software was very difficult to use and was very expensive."
— Brian Williams
Limble has completed changed the way we do maintenance
"Limble is amazing. It has revolutionized the way we handle repairs and the upkeep of our assets and facilities. We not only can keep track of work that has been done on each asset, but we are also able to track costs associated with the asset itself."
— Ethan Closson
Little Giant Ladders
I'm amazed with the functionality & customer service
"Executive summary software produces to give me a snapshot of where each contact center is at in preventative maintenance on critical building assets."