tableau.com is not available in your region.

  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

case study for root cause analysis

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • AI Essentials for Business
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

Root Cause Analysis: What It Is & How to Perform One

A hand stacking building blocks that read "root cause"

  • 07 Mar 2023

The problems that affect a company’s success don’t always result from not understanding how to solve them. In many cases, their root causes aren’t easily identified. That’s why root cause analysis is vital to organizational leadership .

According to research described in the Harvard Business Review , 85 percent of executives believe their organizations are bad at diagnosing problems, and 87 percent think that flaw carries significant costs. As a result, more businesses seek organizational leaders who avoid costly mistakes.

If you’re a leader who wants to problem-solve effectively, here’s an overview of root cause analysis and why it’s important in organizational leadership.

Access your free e-book today.

What Is Root Cause Analysis?

According to the online course Organizational Leadership —taught by Harvard Business School professors Joshua Margolis and Anthony Mayo— root cause analysis is the process of articulating problems’ causes to suggest specific solutions.

“Leaders must perform as beacons,” Margolis says in the course. “Namely, scanning and analyzing the landscape around the organization and identifying current and emerging trends, pressures, threats, and opportunities.”

By working with others to understand a problem’s root cause, you can generate a solution. If you’re interested in performing a root cause analysis for your organization, here are eight steps you must take.

8 Essential Steps of an Organizational Root Cause Analysis

1. identify performance or opportunity gaps.

The first step in a root cause analysis is identifying the most important performance or opportunity gaps facing your team, department, or organization. Performance gaps are the ways in which your organization falls short or fails to deliver on its capabilities; opportunity gaps reflect something new or innovative it can do to create value.

Finding those gaps requires leveraging the “leader as beacon” form of leadership.

“Leaders are called upon to illuminate what's going on outside and around the organization,” Margolis says in Organizational Leadership , “identifying both challenges and opportunities and how they inform the organization's future direction.”

Without those insights, you can’t reap the benefits an effective root cause analysis can produce because external forces—including industry trends, competitors, and the economy—can affect your company’s long-term success.

2. Create an Organizational Challenge Statement

The next step is writing an organizational challenge statement explaining what the gap is and why it’s important. The statement should be three to four sentences and encapsulate the challenge’s essence.

It’s crucial to explain where your organization falls short, what problems that poses, and why it matters. Describe the gap and why you must urgently address it.

A critical responsibility is deciding which gap requires the most attention, then focusing your analysis on it. Concentrating on too many problems at once can dilute positive results.

To prioritize issues, consider which are the most time-sensitive and mission-critical, followed by which can make stakeholders happy.

3. Analyze Findings with Colleagues

It's essential to work with colleagues to gain different perspectives on a problem and its root causes. This involves understanding the problem, gathering information, and developing a comprehensive analysis.

While this can be challenging when you’re a new organizational leader, using the double helix of leadership —the coevolutionary process of executing organizational leadership's responsibilities while developing the capabilities to perform them—can help foster collaboration.

Research shows diverse ideas improve high-level decision-making, which is why you should connect with colleagues with different opinions and expertise to enhance your root cause analysis’s outcome.

4. Formulate Value-Creating Activities

Next, determine what your company must do to address your organizational challenge statement. Establish three to five value-creating activities for your team, department, or organization to close the performance or opportunity gap you’ve identified.

This requires communicating organizational direction —a clear and compelling path forward that ensures stakeholders know and work toward the same goal.

“Setting direction is typically a reciprocal process,” Margolis says in Organizational Leadership . “You don't sit down and decide your direction, nor do you input your analysis of the external context into a formula and solve for a direction. Rather, setting direction is a back-and-forth process; you move between the value you'd like to create for customers, employees, investors, and your grasp of the context.”

Organizational Leadership | Take your organization to the next level | Learn More

5. Identify Necessary Behavior Changes

Once you’ve outlined activities that can provide value to your company, identify the behavior changes needed to address your organizational challenge statement.

“Your detective work throughout your root cause analysis exposes uncomfortable realities about employee competencies, organizational inefficiencies, departmental infighting, and unclear direction from leadership at multiple levels of the company,” Mayo says in Organizational Leadership .

Factors that can affect your company’s long-term success include:

  • Ineffective communication skills
  • Resistance to change
  • Problematic workplace stereotypes

Not all root cause analyses reveal behaviors that must be eliminated. Sometimes you can identify behaviors to enhance or foster internally, such as:

  • Collaboration
  • Innovative thinking
  • Creative problem-solving

6. Implement Behavior Changes

Although behaviors might be easy to pinpoint, putting them into practice can be challenging.

To ensure you implement the right changes, gauge whether they’ll have a positive or negative impact. According to Organizational Leadership , you should consider the following factors:

  • Motivation: Do the people at your organization have a personal desire for and commitment to change?
  • Competence: Do they have the skills and know-how to implement change effectively?
  • Coordination: Are they willing to work collaboratively to enact change?

Based on your answers, decide what behavior changes are plausible for your root cause analysis.

7. Map Root Causes

The next step in your analysis is mapping the root causes you’ve identified to the components of organizational alignment. Doing so helps you determine which components to adjust or change to implement employee behavior changes successfully.

Three root cause categories unrelated to behavior changes are:

  • Systems and structures: The formal organization component, including talent management, product development, and budget and accountability systems
  • People: Individuals’ profiles and the workforce’s overall composition, including employees’ skills, experience, values, and attitudes
  • Culture: The informal, intangible part of your organization, including the norms, values, attitudes, beliefs, preferences, common practices, and habits of its employees

8. Create an Action Plan

Using your findings from the previous steps, create an action plan for addressing your organizational problem’s root cause and consider your role in it.

To make the action plan achievable, ensure you:

  • Identify the problem’s root cause
  • Create measurable results
  • Ensure clear communication among your team

“One useful way to assess your potential impact on the challenge is to understand your locus of control,” Mayo says in Organizational Leadership , “or the extent to which you can personally drive the needed change or improvement.”

The best way to illustrate your control is by using three concentric circles: the innermost circle being full control of resources, the middle circle representing your ability to influence but not control, and the outermost circle alluding to shifts outside both your influence and control.

Consider these circles when implementing your action plan to ensure your goals don’t overreach.

Which HBS Online Leadership and Management Course is Right for You? | Download Your Free Flowchart

The Importance of Root Cause Analysis in Organizational Leadership

Root cause analysis is a critical organizational leadership skill for effectively addressing problems and driving change. It helps you understand shifting conditions around your company and confirm that your efforts are relevant and sustainable.

As a leader, you must not only effect change but understand why it’s needed. Taking an online course, such as Organizational Leadership , can enable you to gain that knowledge.

Using root cause analysis, you can identify the issues behind your organization’s problems, develop a plan to address them, and make impactful changes.

Are you preparing to transition to a new leadership role? Enroll in our online certificate course Organizational Leadership —one of our leadership and management courses —and learn how to perform an effective root cause analysis to ensure your company’s long-term success. To learn more about what it takes to be an effective leader, download our free leadership e-book .

case study for root cause analysis

About the Author

Think Reliability Logo

  • About Cause Mapping®
  • What is Root Cause Analysis?
  • Cause Mapping® Method
  • Cause Mapping® FAQs
  • Why ThinkReliability?
  • Online Workshops
  • On-Demand Training Catalog
  • On-Demand Training Subscription
  • Company Case Study
  • Upcoming Webinars
  • Webinar Archives
  • Public Workshops
  • Private Workshops
  • Cause Mapping Certified Facilitator Program
  • Our Services
  • Facilitation, Consulting, and Coaching
  • Root Cause Analysis Program Development
  • Work Process Reliability™
  • Cause Mapping® Template
  • Root Cause Analysis Examples
  • Video Library
  • Articles and Downloads
  • About ThinkReliability
  • Client List
  • Testimonials

root-cause-analysis-case-study-helicopter-crash.jpg

Case Study: New York City Helicopter Crash

Everyone seems to agree that root cause analysis is about solving problems, but there’s no agreement as to how a root cause analysis is done.

NTSB Preliminary Findings

On March 11, 2018, a sightseeing helicopter made an emergency landing in the East River in New York City.  The pilot survived, but the five passengers drowned when they were unable to escape the submerged helicopter. This tragedy is a combination of several different factors. Any one of them on their own would not have produced the incident, but together, they resulted in tragedy. This is an important aspect of root cause analysis. It’s what James Reason would cite as his Swiss Cheese model . The holes in the cheese (the causes) all align where a straight line passes through each slice. To prevent the incident, just one slice needs to be misaligned.

New call-to-action

I’m going to highlight four areas of this tragedy: harnesses, flotation system, water landing and engine failure. Each one begins with a straight-line cause-and-effect analysis. These four different analyses will combine into one complete explanation containing all the information available to date. Once the NTSB investigation is finished, additional detail can be added to the Cause Map diagram.

SUMMARY VIDEO OF CAUSE MAP OF NTSB FINDINGS

The passengers were unable to quickly escape because they were wearing fall protection harnesses. This was a “doors off” flight where passengers could sit on the edge of the helicopter with their feet hanging over New York. The fall protection harnesses were intended to prevent passengers from falling out of the helicopter. However, in this incident, they are causally related to the passengers being trapped underwater. It’s a tragic example of unintended consequences. A solution to prevent one problem inadvertently created a different problem.

The pilot was not wearing a fall protection harness, so he was able to release the buckle on the standard seat belt, which allowed him to escape the helicopter as it turned over in the water.

5-Why Cause Map™ Diagram – Harnesses

nyc-helicopter-crash-1

Flotation System

Unlike a commercial airplane, most helicopters do not have pressurized cabins. If a helicopter lands in water, it then sinks. Because of this, helicopters that fly over water are required to have a flotation system – either fixed utility floats or emergency floats attached to the landing skids that inflate from pressurized cylinders. This helicopter had three emergency floats on each side. In the video of this incident, the floats on the right side of the helicopter don’t appear to fill properly. The helicopter landed squarely in the water but quickly rolled to its right side. If the floats had inflated evenly, keeping the helicopter upright on the water, the passengers may have had sufficient time to remove their harnesses.

nyc-helicopter-crash-2

Water Landing

The helicopter landed in the East River because the pilot decided to move away from populated Manhattan after he lost power over Central Park. The concern may have been to avoid the risk of casualties on the ground. The pilot may have thought a water impact would be more cushioned and the floats would keep the helicopter upright, at least temporarily.

What’s amazing about a helicopter's design is that in an emergency situation, they can land without power. During powered flight, the blades push air downward giving the helicopter lift. When an engine failure occurs, the helicopter begins falling. Air now flows up through the blades which turn the rotor to slow the falling helicopter. This is called auto-rotation. Single engine helicopters are required by the Federal Aviation Administration (FAA) to be able to land in auto-rotation. 

(People tend to be surprised to learn that a helicopter is designed to land safely with an engine failure.  Watch an autorotation landing of the same model of the helicopter, an AS350 (time is 2:50).  This is another training video of a student performing an autorotation (time is 1:08).  Read here about the basics of autorotation on the Wikipedia page.)

This helicopter auto-rotated toward the East River. An engine failure involves split-second decisions, but there may have been an option for an auto-rotation landing in Central Park. There are several ball fields and large open areas in the park. The Central Park option may be only clear in hindsight, though. There may have been other factors the pilot considered in that instant.

nyc-helicopter-crash-3

You can download a two-page PDF of the incident's root cause analysis by clicking on the thumbnail below.

NYC helicopter crash root cause analysis

Engine Failure

As soon as the engine lost power, the pilot radioed a distress call indicating an engine failure. He began auto-rotation toward the river. One of the steps when performing an emergency landing is shutting off the fuel just before touchdown. As the pilot approached the water, he found the emergency fuel shutoff lever already in the OFF position. The lanyard of the front passenger's harness was underneath the lever. He positioned the fuel shutoff lever back to the ON position. The engine began responding, but not quickly enough, so the pilot turned the fuel back off before contacting the water.

In the initial NTSB interview, the pilot reported the front passenger slid sideways and leaned back to take a picture of his feet outside the helicopter. The lanyard on the fall protection harness may have hooked on the fuel shutoff lever. At this point in the flight, the pilot heard a low rotor RPM in his headset and observed the warning lights for fuel and engine pressure. The fall protection harness was intended to mitigate the risk of someone falling out of the helicopter, but instead, it appears to be causally related to both the loss of the engine (fuel supply) and the five passengers being trapped underwater.

This is not the first fatal accident for this model of helicopter where a passenger inadvertently interfered with the fuel controls to the engine. See the timeline in the video summary .

nyc-helicopter-crash-4

15-Why Cause Map™ Diagram

The two-page PDF shows how the different linear Cause Map diagrams can be combined into one, complete analysis.  Each cause of this incident had to happen. They all contributed to the loss of those five lives. One cause on its own would not have produced this disaster. But changing just one cause would change the outcome. That simple point is missed within most root cause analysis investigations. Most companies become fixated on "the root cause” and miss other viable solutions.  It’s important for organizations to understand the options for mitigating risk within their operations. See the Straw that Broke the Camel’s Back for more on this concept.

Notice that an incident with 15 causes doesn’t need 15 solutions. A more detailed analysis is essential to thoroughly understand how and why a complex issue occurred, so the best solutions can be found. The purpose of an investigation is to provide a complete explanation to identify specifically what needs to be done to minimize risk going forward.

nyc-helicopter-crash-5

Additional resources:

NTSB 1 st Report – The National Transportation Safety Board’s preliminary report

DOWNLOAD: Cause Mapping Root Cause Analysis Template in Excel

Share This Post With A Friend

Share on Facebook

Similar Posts

Other resources.

  • Root Cause Analysis blog
  • Case studies
  • Patient Safety blog

Facilitate Better Investigations | Attend a Webinar

READ BY - - - - - - - - - -

3m-boxed.png

Other Resources - - - - - - - - - -

case study for root cause analysis

Sign Up For Our eNewsletter

1 st Reporting

1st Reporting Logo

Conducting A Root Cause Analysis: Incident To Final Report

Posted 3.02.21 by: Bond Seidel

Root Cause Analysis is vital to health and safety

Incidents can happen in the blink of an eye. And reporting these incidents helps to drive progressive change to a safer workplace. But what happens when the cause of an incident is unclear? A root cause analysis suddenly isn’t just appropriate. It’s crucial. And knowing how to conduct a root cause analysis is as important as knowing how to report the incident in the first place.

A Root Cause Analysis is conducted by following 6 steps, beginning with the incident report and ending with a comprehensive root cause analysis. Understanding each stage of the root cause analysis is vital for successful preventive action plans creation and implementation.

Root cause analysis, also known as RCA, is the investigation process following an incident. The incident may or may not have caused harm to a person or property. Incidents may not have occurred at all, but instead, someone reported a dangerous situation. Either way, a root cause analysis ought to find completion following any form of incident.

This article will discuss the steps necessary to conduct a root cause analysis to a successful end. Tracing the incident’s steps and following the reporting and root cause analysis process to the end, we will find many useful tips and tricks to help facilitate your company’s reporting processes. Let’s jump right into why we want to do a root cause analysis in the first place to get us going.

Table of Contents

Why Conduct A Root Cause Analysis If There Is An Incident Report?

Root Cause Analysis is vital to understanding and mitigating risk.

It may seem duplicative, even counter-productive, to produce a root cause analysis report following a previously reported incident. And in a sense, it may be slightly repetitive, but this is no reason not to complete a root cause analysis.

The sole purpose of a root cause analysis is to determine all the factors that contributed to an incident’s occurrence. It is a proactive management tool used to serve the process of corrective action.

An incident report documents an incident. However, those who complete incident reports are often more involved in reactive management of an incident than proactive management of the event. It is merely the nature of the beast.

To perform a root cause analysis, step out of reactive and progress towards proactive incident solution management.

So, why conduct a root cause analysis? Couldn’t we take a different approach to the incident report? Sure, you could do it, but why complicate a situation where reactive management is vital in controlling further incident damage or injury? Often incidents require a certain amount of reactive action merely to contain a hazard, which is often best left to its own devices.

Maintaining a separate procedure for root cause analysis allows for a more focused approach to proactive incident management. It also evades the corruption of a reactive issue’s causal factors that may, in the heat of the moment, obscure the real root cause as to why an incident occurred in the first place.

6 Steps To Completing A Root Cause Analysis

Analyzing industry-specific responses to incidents, we find that a complete root cause analysis procedure is completed best with a predefined set of steps.  Wikipedia decomposes the RCA into four steps:

  • Identify and describe the event.
  • Establish a timeline from ordinary events to the incident event.
  • Distinguish between the root and causal factors.
  • Establish a causal graph connecting the root cause and the event/problem.

We believe that taking things to a more generalized, less industry-specific approach, and yet in more depth and detail, is appropriate. Here’s our take on completing a root cause analysis in most industries; see below.

  • The Incident Report Analysis

Determining Leading Events

Analyzing leading conditions, documenting further witness information, analyzing completed data collection.

  • Determining Corrective Action 

The Incident And Report Analysis

Beginning with identifying an incident, we analyze the incident to determine its characteristics. The incident often finds presentation via an incident report, but there are many possible sources of information for the incident. For example, the RCA may generate following a customer complaint, risk management referral, or even a complaint presented by HR. No matter the source generated by the RCA, you have to start identifying the problem or incident. 

As many RCAs are generated following incident reports, let’s go a layer deeper into these reports. An incident report should include the following:

  • Administrative details
  • Incident information
  • Witness accounts and observations
  • Actions and recommendations

The information provided ought to include as much information as possible; however, if you perform a root cause analysis, the incident will have already been recorded in the report. Analyze the information provided and look for holes in the information. Try to find any omissions, which can sometimes be the case.

For more information on what to include in incident reports, please read our article: 12 Things To Include In An Incident Report (With 5 Tips) .

As you analyze the incident report, a story should start to form in your mind about how it occurred. Aside from acts of nature, most incidents have precursor events. It is the classic cause and effect scenario. And it would help if you determined said causes.

When an incident occurs, we can determine any events that could have avoided the incident itself if removed. 

Although events that lead up to incidents are apparent contributing factors, sometimes conditions are the prerequisites for incidents. At this stage in the root cause analysis, you should examine the conditions that surrounded the incident. Careful analysis of available data might reveal clues to establish the root cause or causes further. The information may also help us sort out the root causes from the other causal factors.

In certain incident types, witness information may require you to conduct a follow-up investigation. When analyzing the incident report data, depending on the nature of the incident, information might even be omitted simply by the assumptions of those involved at the time. Hindsight is 20/20, as they say.

Further witness information may include more than mere statements of those who were present to perceive the incident. The information may come from other forms of witnesses, such as the electronic sort. Incidents are often recorded using video surveillance equipment. You can use this to review incidents, and if there are any available other sources of information, you should collect and review them.

The fifth stage of the root cause analysis is analyzing the completed sets of data. It would be best if you had determined your leading events, the conditions surrounding and leading up to the incident, as well as any witness information. 

This stage of the root cause analysis requires looking at all of the data you have collected and reviewed. Determine the actual essential factors and events that led up to causing the incident. Separate the essential factors from those that are coincidental or only partially responsible. 

The simple way to accomplish this stage is to ask yourself if you removed that factor, would the incident still have occurred? Would it have been better or worse with that element removed from the equation? If the factor removed means the incident would not have happened, it is an essential contributing factor.

Once all contributing factors are organized, determining the root cause or causes of the incident should come naturally. And this drives us to the inevitable conclusion: how could the incident have been avoided?

Determining Corrective Action – The Final Report

Take a look at the root cause or causes you’ve determined during the course of your analysis. How could these factors have been manipulated to avoid an incident?

Most incidents are preventable. Whether you’re in security, manufacturing, medical care, or any other field, most of the time, we can avoid or prevent hazards from becoming damaging or threatening situations.

Analyze how preventing the incident could have occurred and document any possible and plausible solutions. For best results, document all ideas and eliminate them based on safety and feasibility. Using a methodology to brainstorm possible corrective actions, sometimes we can create a solution that exceeds what standard actions will achieve.

Your final root cause analysis report needs to be concise, comprehensive, and provide solutions. Preventable actions and strategies are always more effective than reactive actions. And you might be able to save someone from injury or worse.

Measures To Follow Through After The Root Cause Analysis

At this point, you’ve completed your RCA if you have followed the steps. But have you followed through on the recommendations? Has anyone completed corrective and preventive actions? The RCA becomes entirely pointless if nothing is done about the incident after all.

During the course of the RCA process, you may be asking yourself what the best method to try to determine the root causes is. Sure, it’s easy to say you need to figure something out, but how should you go about it? Is brainstorming the best option? If not, what is it? Let’s find out.

Root Cause Analysis Methodologies

If root cause analysis were a topic of study, it would be the study of cause and effect, with a major in investigative reporting. But many companies and organizations use a visual charting process as an effective means of communicating the root cause analysis. It is one of several methodologies used in the root cause analysis process. Let’s take a quick look at a few of the most efficient root cause analysis methods.

  • Why 5 Analysis

The concept behind the ‘Why 5 Analysis’ is to ask the question Why, five times. For example, one might ask why did this car crash. The answer might be because a left tire blew up. Then ask why that happened, answer and repeat. The concept is to ask the question multiple times to keep diving deeper towards the problem’s root.

Although this method is brilliant in its simplicity, it also happens to be its curse. Many have argued that this method simplifies situations that you should not simplify. The method may inadvertently miss individual branches of thought by misdirection.

During the “Why 5 Method”, if the second “why” results in an answer that starts to lead away from the actual root cause, the domino effect of such a consequence could potentially skew results.

Due to the potential for misleading discrepancies, the “Why 5” method is best used in parallel, multiple times for a single incident. One may also determine that branching in our question chain is not only possible but often quite or even more probable than the assertion that a singular cause is at fault.

  • Pareto Analysis

The idea behind the Pareto analysis is the Pareto rule. That is to say that eighty percent of the effects come from twenty percent of the causes. Another way of looking at the Pareto analysis method is to equate the methodology to a looking glass.

A looking glass, or magnifying glass to use a more common name, will take a small area of view and enhance it by presenting the small view on a larger scale. The concept of the 80/20 rule is similar.

The Pareto analysis breaks data into percentages of observations and then is represented graphically. Concurrently, Pareto analysis represented graphically is likely best left to massive data collection types of root cause analysis. This form of analysis uses a statistical-based methodology to conclude. Therefore, it may only be relevant for specific RCA applications. 

  • Change Analysis

Change analysis methodology for root cause determination finds credibility in situations involving evolving events or conditions. For example, analyzing the change in roadway conditions over time may allow for determining a root cause when it pertains to a single-vehicle car accident. Or perhaps a facility records notes of equipment conditions over time, and the change of these conditions is analyzed. The idea is that the conditions or events that evolved are analyzed using this method of determining the root cause of an incident.

  • Brainstorming

Our most basic and one of our most potent methods for root cause analysis is brainstorming. Because of its power, this method is the one method described in the six steps of conducting an RCA, as mentioned earlier.

The brainstorming method allows freedom of thought to attempt to determine the possible root causes of an incident. Using rough brainstorming followed by a sort of elimination period is one of humanity’s best abilities. It uses the best of our creativity and real-world experience. The downside is that brainstorming can sometimes end up being mono-directional, depending on the person’s mindset or persons involved in the brainstorming process.

Brainstorming Tip: When using the brainstorming method to determine the root cause of an incident, use a minimum of three people to help with brainstorming. This method works best when there are multiple perspectives to help come up with ideas. It also helps prevent mono-directionality.

Conclusions On Conducting Incident Report Based Root Cause Analysis

From the information you’ve read thus far, you must realize that the root cause analysis, as simple or complex a process as you make it out to be, has three primary goals.

  • To discover the primary root cause or causes of a problem or incident.
  • Next, t o fully comprehend the nature of the incident and how it can be fixed or prevented.
  • To apply resolutions as a proactive management tool to prevent the repeat of the incident.

If these three goals find themselves met, then the root cause analysis may be considered a completed process.

Using an incident report as the basis for a root cause analysis is inherently wise from a safety process standpoint. Although, depending on the industry, it may find itself discarded. Take the medical industry, for example. Many hospitals are inundated daily with hundreds, even thousands of incident reports.

The truth is that as industry leaders, we each need to have a process that involves sorting incidents by the level of priority and thus obtaining a resolution to the flood of incidents. If the most severe face triage to a root cause analysis, there may be the hope of achieving a successful reporting system after all.

In most industries, the hope is that there are nowhere near the number of incident reports filed as there are in the healthcare industry. Most businesses shy away from adding further paperwork to their plates, and for a good reason. But, there is a solution to the paperwork dilemma regarding incident reporting and root cause analysis.

Using a digital solution like that offered by 1ST Incident Reporting is the solution to the seemingly never-ending paperwork. With digitally based incident reports, not only can you set up instant notifications, you can access reports previously completed with lightning speed. What better way to do a root cause analysis than have digital access to the incident report?

  • Featured Photo by cottonbro from Pexels .
  • https://en.wikipedia.org/wiki/Root_cause_analysis
  • https://en.wikipedia.org/wiki/Five_whys
  • https://www.mindtools.com/pages/article/newTED_01.htm
  • https://www.tableau.com/learn/articles/root-cause-analysis
  • https://www.cms.gov/medicare/provider-enrollment-and-certification/qapi/downloads/guidanceforrca.pdf

case study for root cause analysis

Start reporting today

Join the globally-recognized brands that trust 1st Reporting to safeguard their organizations.

Join the globally-recognized brands that trust 1st Reporting to safeguard their organizations!

case study for root cause analysis

Privacy Overview

Customize your template.

Work with our team of experts to customize our templates to your exact business needs.

  • First Name *
  • Last Name *
  • Phone Number * ✓ Valid number ✕ Invalid number
  • Changes Requested
  • By submitting your information you agree to receive email marketing and promotional communications from 1st Incident Reporting
  • Name This field is for validation purposes and should be left unchanged.

Free Template Download

  • I would like to learn more about a digital solution to manage my reports
  • Comments This field is for validation purposes and should be left unchanged.

Illustration with collage of pictograms of gear, robotic arm, mobile phone

Root cause analysis (RCA) is the quality management process by which an organization searches for the root of a problem, issue or incident after it occurs.

Issues and mishaps are inevitable in any organization, even in the best of circumstances. While it could be tempting to simply address symptoms of the problem as they materialize, addressing symptoms is an inherently reactive process that all but guarantees a recurring—and often worsening—series of problems.

Ethical, proactive, well-run companies and organizations with a reactive approach will both encounter problems, but the former will experience fewer and recover faster because they prioritize root cause analyses.     

Root cause analysis helps organizations decipher the root cause of the problem, identify the appropriate corrective actions and develop a plan to prevent future occurrences. It aims to implement solutions to the underlying problem for more efficient operations overall.

Learn how next-generation detection devices shift asset management services from routine maintenance regimes to predictive, AI-powered processes.

Delve into our exclusive guide to the EU's CSRD

Organizations perform root cause analyses when a problem arises or an incident occurs, but there are any number of issues that need an RCA. Triggers for a root cause analysis fall into three broad categories.

When real-world materials or equipment fails in some way (for example, a desktop computer stops working or a component from a third-party vendor delivers substandard performance).

When people make mistakes or fail to complete required tasks (for example, an employee fails to perform regular maintenance on a piece of equipment, causing it to break down).

A breakdown in a system, process or policy people use to make decisions (for example, a company fails to train team members on cybersecurity protocols, leaving the company vulnerable to cyberattacks ).

Organizations can conduct root cause analyses for a range or reasons, from commonplace email service disruptions to catastrophic equipment failures. Regardless of the nature or scope of the issue, performing root cause analysis should include the same fundamental steps.

If you have decided to conduct a root cause analysis, your department or organization is likely experiencing some acute issue, or at least looking to make substantive improvements to a particular process. Therefore, the first step of the root cause analysis process should be identifying and defining the problem that you want to address. Without a clearly defined problem, it is impossible to correctly identify the root causes.

When the department has a clear idea of the problem, it’s time to draft a problem statement spelling out the issue for everyone who will help with the RCA.

Once the issue is identified and clearly articulated to all involved parties, leadership should create a project charter, which will assemble a team to complete the analysis. The team should include a facilitator to lead the team through the analysis and any team members with either personal or professional knowledge of the systems, processes and incidents that you will investigate.

Data collection is the foundation of the problem-solving process. It is vital, at this stage, to find every piece of information that can help you identify contributing factors and ultimately the root causes of the issue. This can include collecting photographs and incident reports, conducting interviews with affected parties and reviewing existing policies and procedures. Some questions that you may want to ask during data collection:

  • When did the problem start and how long has it been going on?
  • What symptoms has the team observed?
  • What documentation does the organization or department must prove that an issue exists?
  • How will the issue affect employees and other stakeholders?
  • Who is harmed or otherwise affected by the existence of this problem?

This is the most important step in the RCA process. At this point, the team has collected all necessary information and starts to brainstorm for causal factors. Effective root cause analyses require openness to all potential underlying causes of an issue, so everyone on the RCA team should enter the brainstorming stage with an open mind. Avoid attempts to determine root causes until every possibility is identified and vetted; starting the incident investigation process with preconceived notions may bias the results and make it more difficult to determine the real root cause.

Once the RCA team has an exhaustive list of possible causes and contributing factors, it is time to determine the root causes of the issue. Analyze every possible cause and examine the actual impact of each one to figure out which possibilities are the most problematic, which ones have similarities and which ones can be altogether eliminated. Be prepared for the possibility that there are multiple root causes to the issue.

After the team narrows the list of possibilities, rank the remaining potential root causes by their impact and the likelihood they are the root cause of the problem. Leadership will examine and analyze each possibility and collaborate with the RCA team to determine the actual root causes.

Once the team settles on root causes and has laid out all the details of the issue, they must start brainstorming solutions. The solution should directly address the root causes, with consideration for the logistics of executing the solution and any potential obstacles the team may encounter along the way. These elements will comprise the action plan that will help the team address the current problem and prevent recurrences.

While all RCAs will include the same basic steps, there are myriad root cause analysis methods that can help an organization collect data efficiently and effectively. Typically, a company will select a method and use root cause analysis tools, such as analysis templates and software, to complete the process.

The 5 Whys approach is rooted in the idea that asking five “Why?” questions can get you to the root cause of anything. 5 Whys implores problem solvers to avoid assumptions and continue to ask “why” until they identify the root cause of a problem. In the case of a formalized organizational root cause analysis, a team may only need to ask three whys to find the root cause, but they may also need to ask 50 or 60. The purpose of 5 Whys is to push the team to ask as many questions as is necessary to find the correct answers.

A failure mode and effects analysis is one of the most rigorous approaches to root cause analysis. Similar to a risk analysis, FMEA identifies every possibility for system/process failure and examines the potential impact of each hypothetical failure. The organization then addresses every root cause that is likely to result in failure.

Pareto charts combine the features of bar charts and line charts to understand the frequency of the organization’s most common root causes. The chart displays root causes in descending order of frequency, starting with the most common and probable. The team then addresses the root cause whose solution provides the most significant benefit to the organization.

An impact analysis allows an organization to assess both the positive and negative potential impacts of each possible root cause.

Change analyses are helpful in situations where a system or process’s performance changed significantly. When conducting this type of RCA, the department looks at how the circumstances surrounding the issue or incident have changed over time. Examining changes in personal, information, infrastructure, or data, among other factors, can help the organization understand which factors caused the change in performance.

An event analysis is commonly used to identify the cause of major, single-event problem, like an oil spill or building collapse. Event analyses rely on quick (but thorough) evidence-gathering processes to recreate the sequence of events that that led to the incident. Once the timeline is established, the organization can more easily identify the causal and contributing factors.

Also known as a causal factor analysis, a causal factor tree analysis allows an organization to record and visually display—using a causal factor tree—every decision, event or action that led to a particular problem.

An Ishikawa diagram (or Fishbone diagram) is a cause-and-effect style diagram that visualizes the circumstances surrounding a problem. The diagram resembles a fish skeleton, with a long list of causes grouped into related subcategories.

DMAIC is an acronym for the Define, Measure, Analyze, Improve and Control process. This data-driven process improvement methodology serves as a part of an organization’s Six Sigma practices.

This RCA methodology proposes finding the root cause of an issue by moving through a four-step problem solving process. The process starts with situation analysis and continues with problem analysis and solution analysis, concluding with potential problem analysis.

An FTA allows an organization to visually map potential causal relationships and identify root causes using boolean logic.

Barrier analyses are based on the idea that proper barriers can prevent problems and incidents. This type of RCA, often used in risk management, examines how the absence of appropriate barriers led to an issue and makes suggestions for installing barriers that prevent the issue from reoccurring.

Companies that use the RCA process want to put an end to “firefighting” and treating the symptoms of a problem. Instead, they want to optimize business operations, reduce risk and provide a better customer experience. Investing in the root cause analysis process provides a framework for better overall decision-making and allows an organization to benefit from:

Continuous improvement : Root cause analysis is an iterative process, seeking not only to address acute issues, but also to improve the entire system over time, starting with the underlying cause. The iterative nature of root cause analysis empowers organizations to prioritize continuous process improvement.

Increased productivity : Preventing downtime, delays, worker attrition and other production issues within an organization saves employees time, freeing up bandwidth to focus on other critical tasks.

Reduced costs : When equipment breaks down or software bugs cause delays, organizations lose money and workers get frustrated. Root cause analysis helps eliminate the cost of continually fixing a recurring issue, resulting in a more financially efficient operation overall.

Better defect detection : When companies fail to address underlying issues, they can inadvertently affect the quality of the end product. Addressing persistent problems before they snowball protects the organization from revenue and reputational losses that are associated with product defects down the line.

Reduced risks : Improving business processes and systems keeps equipment running safely and helps workers avoid safety hazards in the workplace.

Intelligent asset management, monitoring, predictive maintenance and reliability in a single platform.

Enhance your application performance monitoring to provide the context you need to resolve incidents faster.

IBM research proposes an approach to detect abnormality and analyzes root causes using Spark log files.

Learn how IBM Instana provides precision hybrid cloud observability, metrics, traces and logs.

Downer and IBM are using smart preventative maintenance to keep passengers on Australia's light and heavy rail systems moving safely, reliably, comfortably and more sustainably.

Unlock the full potential of your enterprise assets with IBM Maximo Application Suite by unifying maintenance, inspection and reliability systems into one platform. It’s an integrated cloud-based solution that harnesses the power of AI, IoT and advanced analytics to maximize asset performance, extend asset lifecycles, minimize operational costs and reduce downtime.

U.S. flag

An official website of the United States government

The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Browse Titles

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

Cover of StatPearls

StatPearls [Internet].

Root cause analysis and medical error prevention.

Gunjan Singh ; Raj H. Patel ; Joshua Boster .

Affiliations

Last Update: May 30, 2023 .

  • Continuing Education Activity

The term "medical error" encompasses diverse events that vary in magnitude and can potentially harm the patient. According to the 2019 World Health Organization (WHO) Patient Safety Factsheet, adverse events due to unsafe patient care are among the top 10 causes of death and disability worldwide. However, it is essential to understand that healthcare delivery involves multiple variables in a dynamic environment, with many critical decisions made quickly. As such, the healthcare system cannot implement rigid protocols used by other high-risk industries, such as aviation. Reducing medical errors requires a multifaceted approach at various levels of healthcare. In the event of a sentinel occurrence or adverse patient outcomes, a thorough evaluation is warranted to prevent such events. Root cause analyses provide a method of evaluation for these situations so that a system-based intervention can be implemented rather than blaming individual providers. This activity reviews the root cause analysis process in medical error prevention. The course highlights the interprofessional team's role in performing this analysis to prevent medical errors and improve clinical outcomes.

  • Demonstrate effective root cause analysis of a sentinel event and implement strategies for its prevention.
  • Apply root cause analysis reporting standards in accordance with the Joint Commission requirements.
  • Identify the indications for reporting sentinel events to the Joint Commission and the steps that should be taken following the occurrence of such incidents.
  • Collaborate within an interprofessional team to prevent the most common types of clinical errors and improve clinical outcomes.
  • Introduction

Medical error is an unfortunate reality of the healthcare industry and a topic that is continuously discussed due to its grave impact on patient care and outcomes. In a 1999 publication by the Institute of Medicine (IOM), it was highlighted that deaths resulting from medical error exceeded those attributed to motor vehicle accidents, breast cancer, or AIDS. [1]  

Subsequent reports that discuss potential etiologies of medical errors have blamed systemic issues. Others have focused attention on certain groups of patients that may be more vulnerable to medical error than others. [2] [3]  Recently, the impact of medical errors on patient family members and healthcare professionals has been emphasized due to its effects on exacerbating burnout, poor work performance, mental health decline, and even suicidality. [4] [5]  

Though it may be challenging to pinpoint the definitive cause of medical error in certain situations, it is important to evaluate strategies that can be used to mitigate and prevent these adverse events from occurring in the first place. One such method is root cause analysis, which has been previously shown to reduce clinical and surgical errors in various specialties by establishing a quality improvement framework. [6]  This article will discuss the application of root cause analysis in medical error prevention and strategies for maintaining continuous quality improvement in the healthcare setting.

The Institute of Medicine defines a medical error as "the failure of a planned action to be completed as intended or the use of a wrong plan to achieve an aim." [1]  It is essential to recognize the differences between medical malpractice and medical error. An adverse event in a healthcare setting may be attributed to medical error while not meeting the threshold of malpractice or negligence. Medical errors generally result from the improper execution of a plan or improper planning of a method of execution. Medical errors can also occur during preventative care measures, for example, if a provider overlooks a patient's allergy when administering medication. Thus, the complexity of the occurrence of a medical error can range widely and manifest in any aspect of patient care, from admission to discharge and in the outpatient setting. It is essential to recognize that medical errors may occur without causing direct harm to the patient. Regardless, it is critical to evaluate the cause of all medical errors, whether or not the patient is harmed, and develop guidelines and strategies to prevent future occurrences.

If medical errors harm the patient, they are classified as preventable adverse events or sentinel events. Sentinel events are preventable adverse outcomes that warrant urgent investigation to determine the cause of the error. [7]  These events are not only debilitating to patients but can also impact the livelihood of healthcare providers. It is important to note that sentinel events are unrelated to the patient's underlying medical condition and are attributable to improper medical intervention or improper technique. If a patient receives medication and experiences an anaphylactic reaction, it must be determined whether the reaction was due to the medication itself or the provider's failure to review the patient's allergies before administration. Thus, these cases must be critically reviewed to delineate whether or not the etiology of the error was preventable, which is often a challenging task. 

Root cause analysis (RCA) is a process for identifying the causal factors underlying variations in performance. In the case of medical error, this variation in performance may result in a sentinel event. A standardized RCA process is mandated by the Joint Commission to identify the cause of medical errors and thus allow healthcare institutions to develop strategies to mitigate future errors. [7]  Despite its wide adoption in the business, engineering, and industrial sectors, its use in the medical field has been limited. It is important to note that the RCA process aims not to assign individual blame but to identify lapses in system-level processes that can be restructured to prevent patient harm and reduce the likelihood of future sentinel events. Thus, identifying the root cause of a medical error can better direct the need for additional training and resources.

Applying Root Cause Analysis

For accreditation purposes, the Joint Commission requires that healthcare institutions have a comprehensive process for systematically analyzing sentinel events. The RCA process is one of the most commonly utilized tools for this purpose. Through the RCA process, healthcare institutions can optimize patient care and enact measures to mitigate adverse events that compromise patient safety. In addition to improving patient safety and quality metrics, an RCA's purpose includes optimizing process flow and outcomes.

RCA emphasizes lapses in system-level processes. It does not emphasize individual actions. A designated RCA team must be assembled to review and identify necessary changes at the systematic level that can improve performance and reduce the likelihood of a repeat sentinel event. [8]  Failure to perform an RCA within 45 days of a sentinel event may result in the healthcare institution being placed on an 'accreditation watch,' which is public information. Repeat violations may result in an onsite review by the Joint Commission that may jeopardize accreditation. [9]

The first step of an RCA is to form an interprofessional team to analyze and define the problem. There should be a designated process to communicate with senior leadership throughout the journey while meeting deadlines internally and with the Joint Commission. After identifying the problem, the team should evaluate systematic factors that may have contributed to the error. Throughout the process, collecting data regarding the potential underlying causes is important. The team should propose and implement immediate changes so that a repeat sentinel event does not occur during the RCA process. Next, the team should evaluate the list of root causes and consider their interrelationships. During the RCA process, the team will explore risk-reduction and process improvement strategies to prevent future errors at the systematic level. After identifying process improvement strategies, the team must communicate with senior leadership and key stakeholders to evaluate whether the proposed process modifications are acceptable.  

The Joint Commission has created a framework and series of 24 questions to aid in organizing an RCA. This framework should be utilized as a general template when preparing the RCA report that will eventually be submitted to the Joint Commission after thorough evaluation. The 24-question framework recommended by the Joint Commission considers various situational factors that may have contributed to a sentinel event. This includes examining the systematic process, human factors, equipment malfunctions, environmental factors, uncontrollable external factors, organizational factors, staffing and qualifications, contingency plans, performance expectations, informational disruptions, communication, environmental risks, training, and technology. [7]  

With detailed consideration of each of these topics, an in-depth analysis of the cause of the sentinel event can occur. One factor that makes an appearance in several questions is communication. Communication within the team and with leadership is critical to maintaining organizational structure. It can be difficult to convey messages effectively and efficiently without proper communication systems. Environmental factors should also be examined to determine if any situational issues were ongoing at the time of the sentinel event that may have impacted the outcome. Staffing is another important topic that should be examined during an RCA review to determine if the staff were appropriately qualified, competent, and portioned for their assigned duties. 

After discussion, evaluation, and analysis, corrective actions should be developed, identifying areas for targeted improvement. While utilizing the 24-question framework, it is important to always consider causative etiologies because it will help determine the specific area that can be restructured to reduce risk. The root cause analysis should be clear and precise while providing appropriate depth and scope. 

The Joint Commission has identified a series of adverse events subject to their purview. Primarily, this would be a sentinel event that has resulted in death or permanent loss of function unrelated to any underlying medical conditions. Alternatively, a sentinel event can also be considered as one of the following, even if the event did not cause death or severe harm:

  • Patient suicide of any patient receiving care (including emergency department care), treatment, or services within the healthcare setting or 72 hours following their discharge
  • Full-term infant having an unanticipated death
  • An infant discharged to the wrong family
  • Abduction of any patient receiving care, treatment, or services
  • Elopement of a patient within a healthcare setting, leading to their harm
  • Hemolytic transfusion reaction requiring administration of blood products
  • Rape, assault, or homicide of anyone on scene at the healthcare premises
  • Wrong patient, site, or procedure for all invasive procedures, including surgery
  • Unintended retention of a foreign body in a patient following surgery
  • Severe neonatal hyperbilirubinemia
  • Prolonged fluoroscopy with cumulative dose to the wrong body region
  • Fire, flame, or unanticipated smoke, heat, or flashes during patient care
  • Intrapartum maternal death
  • Severe maternal morbidity

The finalized RCA report must follow a set standard to meet the Joint Commission's requirement. It must include the following:

  • Participation of the organization's leadership and key stakeholders involved in the process/system under review
  • Thorough explanation of all findings
  • Consideration of any relevant or applicable literature
  • Internal accuracy and consistency, without contradictions or unanswered questions

Case Illustrations with RCA Interventions

Case Example 1

A 42-year-old primigravida woman at 34 weeks gestation was brought to the obstetric emergency department at midnight with complaints of severe headache, blurry vision, and right upper quadrant pain for the last 5 to 6 hours. She noted gradually increasing lower extremity edema and facial swelling as well. She has a history of gestational hypertension and was prescribed labetalol 200 mg twice a day a week before this presentation. On initial evaluation, her blood pressure was 190/110 mm Hg on 2 separate occasions, 5 minutes apart. She had gained 2 kilograms since her last antenatal checkup in the clinic a week ago.

The patient was diagnosed with severe preeclampsia. The senior obstetric resident ordered a loading dose of magnesium sulfate to prevent imminent seizure. The hospital protocol used an intravenous (IV) and intramuscular (IM) regimen where the patient receives a 4-gram (20% concentration) intravenous solution bolus and a 10-gram intramuscular dose (50% concentration) administered as 5 grams in each buttock. The senior resident verbally provided the order for magnesium sulfate administration to the junior resident, who subsequently verbally communicated the order to the nurse.

This magnesium sulfate dosing regimen is complex, with multiple doses in different locations, and was incorrectly prepared by the nurse who felt rushed in an urgent situation. A chart displaying magnesium sulfate's preparation in the drug preparation room was present but had become faded. Therefore the nurse prepared the medication relying on her memory. Before administering the medicine to the patient, as a part of the protocol, she repeated the dose strength aloud to another nurse, who cross-checked it from a printed chart and picked up the error in time. The senior resident also identified the error as the dose was communicated aloud and stopped administering the drug. 

RCA with Corrective Measures

A root cause analysis was performed, and measures were taken to avoid this problem in the future. Magnesium sulfate was marked as a high-alert medication, as the Institute of Safe Medication Practices recommended. Furthermore, premixed solutions prepared by the pharmacy for the bolus dosing were instituted instead of requiring nurses to mix this high-risk medication on the unit. The second nurse verification measure was retained, with the second nurse instructed to double-check all doses, pump settings, drug names, and concentrations before administration of any drugs.

Moreover, the RCA recommended that all medication orders be provided in writing and/or entered in the electronic medical record using computerized provider order entry (CPOE) systems, regardless of the urgency of the situation, to avoid any dosing errors. The RCA team emphasized that verbal communication for medication administration should always be avoided. If verbal communications are necessary or unavoidable, the RCA recommended that the nurse taking the order should read back the order given to the prescribing physician to minimize any prescribing errors. 

Case Example 2  (The name and date of birth used in this example are imaginative, used for illustrative purposes, and do not represent an actual patient. Any similarities, if noted, are purely coincidental.)

Anna Joy (date of birth October 30, 1991) was admitted to a busy obstetric ward. She was a primigravida woman at 30 weeks of gestation with complaints of intermittent cramping abdominal pain. She had come to visit her sister living in Boston from Spain. The patient's ability to communicate in English was limited, and she preferred speaking Spanish. However, her husband and sister were fluent in English and assisted with translation throughout the history, exam, and admission. The patient was seen by an obstetrician who advised routine investigations for threatened preterm labor and observation.

Another patient Ann Jay (date of birth September 30, 1991), was also admitted to the same obstetric ward. She was 34 weeks gestation and was admitted because of gestational diabetes mellitus with hyperglycemia. Her obstetrician advised an endocrinology referral, and the endocrinologist advised glucose monitoring and insulin administration. The nurse taking care of the patient was provided with the instructions, performed a finger-stick blood glucose check, and informed the endocrinologist about the results over the phone. The endocrinologist advised six units of regular insulin before lunch. The nurse also informed the obstetrician that the patient felt a decrease in fetal movements. The obstetrician advised ongoing observation and fetal kick counts. 

The family members of the first patient, Anna Joy, informed the nurse that they were going to lunch. The morning shift nurse later required a half-day leave because of personal issues and quickly handed over her patients to another nurse. The ward was busy and running at full capacity. The new nurse decided to give the insulin injection first as the patient was about to receive her lunch. She did not know that Anna Joy preferred communication in Spanish. The nurse asked a few questions and rushed through patient identification with the help of two unique patient identifiers. She administered the insulin injection to the first patient and later realized it was supposed to be given to the second patient, Ann Jay. The attending obstetrician and the endocrinologist were informed. They took the necessary measures and closely monitored the patient for the next few hours. No inadvertent effects were noted.

A root cause analysis was performed, and measures were taken to avoid this problem in the future. The RCA team noted that the nurse caring for both patients had worked in the hospital for 5 years and was recently transitioned to the obstetric ward. This had never happened to her before. The team recognized that the modern patient care delivery process relies on the efficient and effective integration of an interprofessional care team. A clear, consistent, and standardized communication method between the team members contributes to safe patient care and minimizes the risk of adverse outcomes. The RCA team did not lay blame on the nurse involved. They instead instituted a standardized handoff platform and required all patient handoffs to occur using this format in the future. During shift change, the handoff between clinicians and nurses is pivotal in providing high-quality care. The aim should be to provide the oncoming team with up-to-date, accurate, and complete information. The RCA team outlined clinical education programs for nurses and clinicians to ensure high-quality and effective handoff occurs at every shift change and patient handoff. 

They also instituted mandatory use of hospital-based interpreters when communicating with patients who are not fluent in English. The hospital procedure for verifying patient identification using two unique patient identifiers, the name and the date of birth, was retained. However, an additional mandatory step of verifying the patient's identity using an arm-band barcode was instituted before every medication administration. They also instituted the highlighting of patient charts and rooms when patients had similar names and dates of birth.

Case Example 3

A 26-year-old primigravida woman with labor pains was admitted to a busy hospital's labor and delivery suite at 39 weeks of gestation. There were no associated high-risk factors. The patient was admitted to the labor ward and managed according to routine protocol. She progressed in spontaneous labor, but the cardiotocograph showed prolonged fetal bradycardia lasting for 3 and a half minutes at 4 centimeters (cm) cervical dilatation. The fetal bradycardia did not resolve with initial conservative measures.

The patient was transferred to the operating room for a category one emergent cesarean section. A category one cesarean section means the baby should be delivered within 30 minutes of the procedure's decision. It is done when there is an immediate threat to the life of the mother or the baby. The baby was delivered in good condition, with no intraoperative complications. Before closure, the operating obstetrician asked the scrub nurse to perform a surgical count. The scrub nurse reported that there might be a missing gauze piece from the surgical trolley. The count was performed several times by the scrub and the floor nurse at this time. A second on-call obstetrician was called to assist the primary surgeon in checking for the surgical field's missing gauze piece.

The surgical gauze had a heat-bonded barium sulfate marker embedded in the fabric to assist with x-ray identification. An intraoperative x-ray was obtained to evaluate for intraperitoneal gauze, and the results were negative. The case was discussed with the department chief, and abdominal closure was performed. Due to the associated delays, the operative time was increased significantly (2 hours and 30 minutes).

An RCA of the event revealed that there were inconsistent practices regarding surgical count before the initiation of a procedure. Moreover, only one person (the scrub nurse) was charged with making this count. The RCA team highlighted that the surgical count is critical and must be performed in a standardized fashion to eliminate variation and minimize the possibility of human error. They highlighted international standards that recommend standardizing the counting process and systematically tracking the instruments, gauze, and sponges in the sterile field. They instituted World Health Organization's Surgical Safety Checklist as a mandatory step for all procedures regardless of the urgency of the procedure. They also recommended that the counting process be concurrently audible and visual to eliminate errors. The RCA recommended that the counting process should be performed by the scrub nurse and the circulating nurse independently, both before and after every procedure. They emphasized that the best practices for surgical count should always be followed regardless of the clinical situation. 

Case Example 4

A 25-year-old man presented for bilateral LASIK surgery at a same-day surgery center. The operating surgeon examined the patient, a community-based surgeon who does not routinely operate at this facility. Informed consent was obtained by the operating surgeon preoperatively. The refractive error was -4 D for the right eye and – 5D for the left eye. The plan was to remove the refractive error altogether. There was a timeout to ensure the correct patient and procedure. The LASIK was started by making corneal flaps on both eyes, which was completed uneventfully. The second step was the excimer laser-guided corneal power correction.

The patient was adjusted on the operating microscope so that the first eye was directly under the excimer laser, and iris recognition was attempted. The machine did not recognize the iris pattern after 3 attempts. The surgeon decided to proceed without iris recognition. The technician thought that this was rare and that they had good iris recognition rates for this month (>98%). However, he did not want to contradict the surgeon.

Before the procedure, the circulating nurse noted that the patient's table was adjusted to the wrong side, and the left eye was under the laser instead of the right. She pressed the emergency stop button, and the treatment was terminated.  After identifying the mistake, the surgeon and technician restarted the machine to treat the correct sequence's correct eyes.

Compared to unilateral procedures, bilateral procedures are especially challenging, particularly if the treatment varies between the 2 sides. An example is LASIK, where both eyes are typically corrected simultaneously, and there is no obvious pathology on the eye except for the refractive error. The correction is determined preoperatively, and the result is not immediately titrated. There is a significant chance for wrong-site procedures, given these ambiguities. To avoid this disaster, the RCA team implemented a verification procedure where the optometrist, technician, and surgeon were ALL required to verify each eye's refractive error before the procedure and after programming the laser.

Some advanced laser machines have an inbuilt layer of defense where the iris pattern of the eye is uniquely identified via iris recognition, which helps determine the correct eye and enhances the treatment fidelity. Some treatments, however, do not include iris recognition, and therefore the onus lies on the technicians, nurses, and surgeons to identify the appropriate eye correctly. 

Case Example 5

A community clinic treats approximately 110 patients per day. The clinic is run by 2 primary care physicians, with the assistance of 2 nurses and scribes. A 10-year-old boy was brought to the clinic by his parents. The child had a runny nose for the last ten days. On examination, the primary care physician noted simple allergic rhinitis and advised them to use over-the-counter cetirizine. One of the scribes had called in sick that day, so a secretary was assisting the physician. The physician advised the parents that cetirizine is an over-the-counter medication, and they can go to their pharmacy of choice to obtain the medication. After 2 days, the patient's mother returned to the clinic and reported that the child was lethargic. The clinic's front desk stated that they would convey the information to the physician, who was very busy that day. The physician said it is typical for children taking cetirizine to be slightly sleepy. He said that they should inform the parents to ask the child to avoid going to school for the next few days. The message was conveyed to the mother.

The patient's mother, however, decided to take the child to another specialist as she was concerned regarding the sedation. At this visit, it was noted that the child was taking a 10-mg cetirizine tablet 2 times a day, which is higher than typically recommended. 

An RCA review was performed at the primary clinic. It was noted that there was a typographical error in the instructions given to the patient, saying 10 mg twice a day instead of 5 mg twice a day, which the physician had intended. The RCA recommended a verification procedure for all prescription recommendations made during the clinic visit. They instituted verbal and written verification with the prescribing physicians of all drugs and doses transcribed by the scribes and/or office personnel to avoid this error in the future. The RCA team also recommended that the physician and the team should read prescription and over-the-counter drug recommendations with their intended doses to the patient/attendant in the clinic from the summary instructions and verify that it matches their notes.

The RCA also mandated a document review for all patient callbacks or return visits before any patient communication is made to avoid such errors in the future.

Case Example 6

All-Eyes Laser Center is a busy same-day ophthalmic laser center with multiple laser procedures being performed throughout the day. The center specializes in retinal and anterior segment lasers.

A 60-year-old man, JM, suffers from chronic angle-closure glaucoma and has been advised to undergo a YAG (Yttrium-Aluminum-Garnett) laser iridotomy. This procedure involves creating a small hole in the peripheral part of the iris to increase the aqueous flow between the anterior chamber and the posterior chamber to prevent a possible angle-closure attack and/or further glaucoma progression.

This was an unusually busy day at the laser center. The laser surgeon was running behind. There were 5 patients ahead of JM, and there was an anticipated delay of around 2 hours. As is the practice at the center, the nurse practitioner prepares the patients before the laser, and then the laser surgeon performs the procedure. The preparation involves checking the history, confirming the examination findings, and then instilling eye drops to prepare the procedure's eyes. This laser surgeon does 2 types of laser procedures. YAG iridotomy needs the eyes constricted with 2% pilocarpine eye drops, which ensures a good exposure of the peripheral iris crypts where the laser is directed to create a small iridotomy. The second procedure is a YAG capsulotomy. The posterior capsule in a pseudophakic eye is lasered to create an opening to counter an after-cataract posterior capsular opacity and improve vision. The YAG laser platform is a combined platform where both procedures can be performed with one machine. 

The surgeon arrived at the laser suite and started the lasers. When JM's turn came, a proper timeout was confirmed, including the correct eye and procedure.  However, when the patient was positioned at the laser machine, the surgeon noticed that the pupil was dilated rather than constricted. The surgeon again verified the patient's tag and name and the correct procedure. It was confirmed that the patient was indeed the correct one, and the procedure intended was YAG iridotomy. It would have been dangerous to attempt an iridotomy in a dilated pupil. The surgeon did not proceed with the procedure, and the patient was transferred out of the laser suite. The patient was counseled regarding the error and instructed that he would be rescheduled for the correct procedure in a few days. The error was misattributed to the nurse administering the wrong eye drop, secondary to high patient volume and practice inconsistencies. 

A root cause analysis was performed, and measures were taken to avoid this problem in the future. This error did not result in harm to the patient. However, there is a significant chance of the wrong type of procedure being performed. Considering this, the RCA team recommended segregating patients for YAG capsulotomy and YAG iridotomy to different seating areas that were clearly labeled. The 2 eye drops, tropicamide and pilocarpine, were kept only in these areas, and the staff was not allowed to carry these drops out from the designated area. A barcode-based verification was also instituted to be used each time the drop was instilled.

There are precautions in place for similar-sounding medications and similar-sounding patient names. However, in a mixed clinic where multiple procedures are being performed with a relatively quick turnover, the pre-procedure medications can be mixed, especially if there is no designated 'bedside area' for the patient. Therefore using the precautions noted above can avoid incorrect medication administration.

  • Issues of Concern

The IOM identifies medical errors as a leading cause of death and injury. [1]  According to the 2019 World Health Organization (WHO) Patient Safety Factsheet, adverse events due to unsafe patient care are among the top ten causes of death and disability worldwide. Preventable adverse events in the United States of America (US) cause an estimated 44,000 to 98,000 hospital deaths annually. [1]  This exceeds the number of deaths attributable to motor vehicle accidents and is estimated to cost the community between 37.6 to 50 billion dollars in terms of added health care cost, disability, and loss of productivity. [1]

Patients and their families face the most critical and severe consequences of medical errors. Therefore, identifying system processes that lead to medical errors and implementing corrective measures is the primary goal in treating this problem. An RCA and response can help identify system-based measures that can minimize the risk of adverse events and improve clinical outcomes. 

Types of Medical Errors

It is essential to recognize that medical errors constitute diverse events. The "error" is not always a human miscalculation or miscommunication, as outlined by the cases above. Some errors are inherent to clinical situations, such as patient falls in hospital settings and healthcare-associated infections. The commonly recognized "types" of medical errors are outlined below.

  • Medication error is widely accepted as the most common and preventable cause of patient injury. [10]  Medication errors include giving the wrong drug or dose, via the wrong route, at an incorrect time, or to the wrong patient. The reported incidence of medication error-associated adverse events in acute hospitals is around 6.5 per 100 admissions. [10]  Medication errors in the peri-discharge from an acute care facility are the most easily overlooked or missed errors. [10]
  • Another common medical error is a diagnostic error with failure to correctly identify the cause of the clinical condition promptly. [10]  Diagnostic errors are "missed opportunities to make a correct or timely diagnosis based on the available evidence, regardless of patient harm." [11]
  • In hospitalized patients, wound infections, pressure ulcers, falls, healthcare-associated infections, and technical complications constitute another group of preventable medical errors. [10]  
  • The most common systems-error is failure to disseminate drug knowledge and patient information. This, in essence, is a communication failure, whether with the patient or other providers. [10]  
  • Failure to employ indicated tests is another medical error that can lead to diagnostic delays or errors. [1]
  • Similarly, using outdated tests or treatments or failing to respond to the results of tests or monitoring also constitutes a type of medical error. [1]
  • Treatment errors include errors during the performance of a test or procedure and inappropriate treatment. [1]

When applying root cause analysis for medical error prevention, it is essential to consider several patient-related factors and underlying issues that may hinder or impede the ability to generate an efficacious root cause analysis. Awareness of particular safety hazards for specific patient demographics and groups can often help mitigate common medical errors and encourage patients to take responsibility for their safety.

Elderly patients represent such a group as various common medical illnesses may result from age-related changes within this group. Elderly patients tend to be prone to falls due to their age-related changes in vision or cardiovascular problems. This patient group also tends to be prone to balance issues and muscle weakness over time, leading to ambulatory dysfunction. Having fall-prevention protocols in place, identifying potential high-risk areas within the home, and mitigating them through safety measures can improve patient safety and outcomes. [7]  Age-associated hearing and cognitive decline increase the likelihood of communication errors regarding medications. Ensuring appropriate communication skills tailored to distinct patient groups is key to preventing such errors. Young children and infants are similarly prone to common medical errors due to the lack of direct participation in decision-making and patient care. Thus, specialized communication is needed to convey medical instructions to this population. It is essential to involve both the family and the child to ensure no lapses in communication.

Reducing diagnostic errors requires a more comprehensive approach. Common conditions misdiagnosed yearly include cancer, coronary artery disease, and surgical complications. [12] [13]  Clinicians within these specialties must be aware of the high rate of misdiagnoses and attempt to combat this through additional measures. Many of these misdiagnoses are easily preventable by implementing standardized protocols, which can be integrated into electronic medical record software. [14]  According to a 2015  New England Journal of Medicine  article, "trigger tools" are essential in reducing this type of medical error. [15]  "Trigger tools" are electronic algorithms that identify potential adverse events. This is accomplished by searching electronic health records and flagging specific occurrences. The use of trigger tools has been shown to decrease the rate of misdiagnoses in recent studies. [15]  

Another important realization clinicians should be aware of is using an interpreter to aid in effective communication. A skilled medical interpreter may be crucial in effectively communicating instructions and information to the patient. Physicians need to utilize an unbiased and neutral medical interpreter, as family members may often be biased in communication.

Communication deficits among medical staff members are another essential root cause of medical errors that can be mitigated through standardized protocols. [16]  The healthcare institution must recognize all staff members' inclusion in communication protocol development and identify processes for clinicians and pharmacists to exchange information regarding medication orders. Training staff to participate in error recognition and medication safety training is another valuable tool that can be implemented within a healthcare institution. Controlling the storage, access, and labeling of medications is another strategy that can be implemented and monitored to prevent errors that can be easily mitigated by storing medications in the accepted manner or by identifying protocols to ensure that similar medications are properly labeled to avoid mismatching. Managing the availability of information within the healthcare organization is also important. Ensuring staff members can readily access important updates and protocol changes can help prevent unnecessary medical errors.

  • Clinical Significance

RCA has important implications in helping healthcare organizations study events that resulted in patient harm or undesired clinical outcomes and identify strategies to reduce future errors and improve patient care and safety. Most notably, RCA can help identify medication errors such as illegible handwritten prescriptions, similar name packaging or misleading drug strength or dosage presentations, ineffective control of prescription labels, and lapsed concentration due to interruptions. [17]  Clinician participation in root cause analysis is vital as these initiatives recognize and address important patient care aspects.

Through a review of data gathered by the Joint Commission, six common categories of clinical error resulting in patient death, which can be prevented through root cause analysis, have been identified. These sentinel events account for a significant proportion of morbidity and mortality within the hospital setting. The six most common categories of clinical errors resulting in patient deaths include: [7]  

  • Wrong-site surgeries
  • Patient suicide
  • Surgical complications
  • Medical treatment delays
  • Medication errors
  • Patient falls.

Wrong-site surgery is a major cause of medical errors that can be mitigated through various safety checkpoints preoperatively and has been the subject of a sentinel event alert by the Joint Commission. [7] [18]  This type of error has most commonly been noted in orthopedic surgeries. [19]  Risk factors include several surgeons involved in surgical care or transfers to another surgeon for patient care, multiple procedures on a single patient, time constraint pressures, and unique circumstances requiring unusual or special positioning during a surgical procedure. [18]  This error can easily be mitigated by ensuring proper pre-operative measures, such as labeling the correct surgical site with an indelible pen or distinctively marking the nonsurgical site before the surgery. Intraoperative radiography can also assist in aiding the correct surgical site during the procedure.

Patient suicide is an unfortunate cause of death commonly seen in psychiatric care settings. [7]  Several risk-reduction methods can be implemented for this adverse event, including ensuring a controlled environment free of hazardous materials, frequent patient observation, effective communication, adequate staffing in the facility, suicide assessment upon admission, regular psychiatric evaluation, and assessment for the presence of contraband.

Delays in medical treatment are preventable adverse events that may result in patient death and permanent injuries. This may result from misdiagnoses, delayed diagnostic test results, lack of staffing or physician availability, delays in order fulfillment, inadequate treatment, and delays within the emergency department. It is important to recognize this root cause and implement steps to improve timeliness, completeness and check the accuracy of medical communication to prevent such errors.

Medication administration errors are a common and avoidable adverse event that can occur at various patient care levels, involving many individuals in a multidisciplinary patient care team. [17]  The primary tool of prevention for this type of error is communication. A standardized protocol for communication between the physician, nurse, pharmacist, and other clinicians involved in patient care is essential to ensure that patients receive the correct medication at the appropriate dosage, route, and frequency.

Similarly, patient falls are a constant source of error within healthcare facilities. It is important to recognize patients at high risk for falls and take appropriate safety precautions. Standardized protocols can reduce fall rates by ensuring a safe environment for risk-prone patients. Patient factors contributing to falls include advanced age, mobility impairment, and surgery. [20]  Organizational factors contributing to falls include nurse staffing and the proportion of new nurses. [20]  Instuting fall prevention protocols in hospitals and long-term care facilities have significantly impacted reducing these errors. Studies have shown that fall risk assessments using standardized scales such as the Morse Fall Scale can decrease patient falls. [21]  Institutional interventions such as staff education, patient mobility training with rehabilitation professionals, and nutritionist support have also been shown to reduce patient falls. [21]  

  • Enhancing Healthcare Team Outcomes

Medical errors are undeniably an important cause of patient morbidity and mortality within the United States healthcare system. These errors are prevalent at rampant levels, and the consequences of such errors can have severe impacts on the patient, family members, and clinicians. The interprofessional healthcare team plays an invaluable role in preventing medical errors; team effort is crucial in identifying strategies and solutions to reduce the burden of medical error on the healthcare system. Nurses, pharmacists, rehabilitation professionals, nutritionists, and physicians are integral to the patient care team and crucial in preventing medical errors. Practitioners who work in error-prone environments must recognize their roles as healthcare team members who are responsible for reducing unnecessary errors. [22]  The interprofessional team members comprising the RCA team should include professionals from all disciplines to ensure an effective and accurate RCA occurs.[Level 5]

Clinicians should not hesitate to provide their peers with assistance in recognizing particular sources of common medical errors to deliver better patient care. Equal accountability and responsibility of all healthcare team members are critical in preventing errors and providing superior patient safety. [1]  

Quality assurance teams should employ RCAs with every sentinal event, especially in situations when the identification of medical errors becomes difficult or complex due to many underlying factors. RCAs can help identify factors within the healthcare delivery process that may impede the ability to provide quality patient care. Given the preventable nature of most medical errors, a thorough RCA can improve patient safety and allow healthcare organizations to serve as a model for others.

Healthcare professionals should be aware of common medical error sources and work as a team to identify possible risks when they become apparent. Doing so will increase the quality and efficiency of the healthcare industry and patient trust in the healthcare system. When an RCA is performed, the cooperation of all healthcare team members and clinicians involved in patient care is critical to understanding the "Why" behind the source of medical error and identifying future strategies to mitigate such errors and improve patient outcomes.[Level 5]

  • Review Questions
  • Access free multiple choice questions on this topic.
  • Comment on this article.

Disclosure: Gunjan Singh declares no relevant financial relationships with ineligible companies.

Disclosure: Raj Patel declares no relevant financial relationships with ineligible companies.

Disclosure: Joshua Boster declares no relevant financial relationships with ineligible companies.

This book is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) ( http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits others to distribute the work, provided that the article is not altered or used commercially. You are not required to obtain permission to distribute this article, provided that you credit the author and journal.

  • Cite this Page Singh G, Patel RH, Boster J. Root Cause Analysis and Medical Error Prevention. [Updated 2023 May 30]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2024 Jan-.

In this Page

Bulk download.

  • Bulk download StatPearls data from FTP

Related information

  • PMC PubMed Central citations
  • PubMed Links to PubMed

Similar articles in PubMed

  • Medical Error Reduction and Prevention. [StatPearls. 2024] Medical Error Reduction and Prevention. Rodziewicz TL, Houseman B, Hipskind JE. StatPearls. 2024 Jan
  • Prevention of Surgical Errors. [StatPearls. 2024] Prevention of Surgical Errors. Santos G, Jones MW. StatPearls. 2024 Jan
  • Pilot Medical Certification. [StatPearls. 2024] Pilot Medical Certification. Matthews MJ, Stretanski MF. StatPearls. 2024 Jan
  • Review Tuberculosis. [Major Infectious Diseases. 2017] Review Tuberculosis. Bloom BR, Atun R, Cohen T, Dye C, Fraser H, Gomez GB, Knight G, Murray M, Nardell E, Rubin E, et al. Major Infectious Diseases. 2017 Nov 3
  • Review Suffering in Silence: Medical Error and its Impact on Health Care Providers. [J Emerg Med. 2018] Review Suffering in Silence: Medical Error and its Impact on Health Care Providers. Robertson JJ, Long B. J Emerg Med. 2018 Apr; 54(4):402-409. Epub 2018 Feb 1.

Recent Activity

  • Root Cause Analysis and Medical Error Prevention - StatPearls Root Cause Analysis and Medical Error Prevention - StatPearls

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

Connect with NLM

National Library of Medicine 8600 Rockville Pike Bethesda, MD 20894

Web Policies FOIA HHS Vulnerability Disclosure

Help Accessibility Careers

statistics

  • Reference Manager
  • Simple TEXT file

People also looked at

Review article, a systematic review on machine learning methods for root cause analysis towards zero-defect manufacturing.

www.frontiersin.org

  • 1 Department of Energy Systems, University of Thessaly, Larisa, Greece
  • 2 Centre for Research and Technology Hellas, Information Technologies Institute, Thessaloniki, Greece
  • 3 Foundation for Research and Technology Hellas (FORTH), Heraklion, Greece

The identification of defect causes plays a key role in smart manufacturing as it can reduce production risks, minimize the effects of unexpected downtimes, and optimize the production process. This paper implements a literature review protocol and reports the latest advances in Root Cause Analysis (RCA) toward Zero-Defect Manufacturing (ZDM). The most recent works are reported to demonstrate the use of machine learning methodologies for root cause analysis in the manufacturing domain. The popularity of these technologies is then summarized and presented in the form of visualizing graphs. This enables us to identify the most popular and prominent methods used in modern industry. Although artificial intelligence gains more and more attraction in smart manufacturing, machine learning methods for root cause analysis seem to be under-explored. The literature survey revealed that only limited reviews are available in the field of RCA towards zero-defect manufacturing using AI and machine learning; thus, it attempts to fill this gap. This work also presents a set of open challenges to determine future developments.

1 Introduction

1.1 motivation and scope.

With the onset of Industry 4.0, manufacturing companies are in need of a continuous upgrade of their manufacturing processes in terms of products and operations to become more competitive. In their effort to provide an optimized production operation and consistent delivery of better products, detecting defects and identifying defect causes upstream becomes a crucial factor for the industry. More specifically, defect detection is used by the industry operators to conduct a quality inspection in the production line, while defect source identification is used to further conduct smart quality control by identifying defects or anomalies per root cause.

Product quality improvement has been the cornerstone of Industry 4.0. The technological advances of the modern industry have brought new challenges in the quality improvement stage, which traditional quality control methodologies cannot handle. New concepts have been born, like the development of policies toward Zero-Defect Manufacturing (ZDM) ( Psarommatis et al., 2020a ), which support the migration to this new era ( Psarommatis et al., 2022 ). ZDM heavily relies on new technologies like virtual metrology, i.e., the ability to inspect product quality from production data without physically measuring it ( Dreyfus et al., 2022 ). An extensive review of ZDM opportunities and shortcomings has been presented in ( Psarommatis et al., 2020b ). ZDM can be further boosted by the integration of numerous Artificial Intelligence (AI) methods into traditional technologies, such as metrology, digital twins, internet of things, computer vision, augmented reality, quality control, and predictive maintenance; at least 15 EU-funded projects (mostly FoF-11 projects) and numerous individual publications may be identified in the literature ( OPTIMAI, 2021–2023 ; Papageorgiou et al., 2021 ).

Defect detection aims to monitor the production line and assess the quality of products. The quality requirements and defect specifications are provided by the end-users on a per-case level. Defect detection is mainly oriented along the following two directions i): defect detection from images, which include photographs, 3D scans, point clouds, and every other data format that can be directly converted into a picture, and ii) anomaly detection, through the analysis of time-series data, feature analysis and discovery of unexpected events. In the era of data-driven smart manufacturing ( Tao et al., 2018 ), defect detection procedures may benefit from AI technologies and transform manual operations into semi- or even fully automatic.

Industry 4.0 takes defect detection to the next level by pursuing not only defects but also their causes. Traditional defect detection targets capturing faults and defects; thus, it fails to provide any information about avoiding recurrence or how the defect is related to production processes. Defects may often go unattended and propagate along multiple production stages before being captured. This leads to taking corrective actions at a later stage, which is probably not related to the defect’s actual cause. A systematic methodology has been realized within smart manufacturing to identify the sources of defects, termed Root Cause Analysis (RCA). This methodology can seek among the production stages and determine the primary cause responsible for delivering defective products. The goal of RCA is obviously to prevent the recurrence of failures. RCA can be thought of as an optimization process in smart manufacturing that focuses on minimizing scrapping yields by addressing “what caused the failure?” rather than “what is the failure?” Thus, RCA can become a key asset to ZDM for industrial applications that involve many sequential and complex processes.

Traditional RCA methodologies such as Pareto Analysis, Fishbone Diagram, and Five-Whys have already been established in manufacturing ( Murugaiah et al., 2010 ; Jayswal et al., 2011 ; Ma et al., 2021 ). However, these methodologies are highly correlated with expertise and knowledge and thus are hindered by biases, individualism (as they cannot be saved or transferred), and time inefficiency. Moreover, the major drawback of traditional methodologies is the under-exploitation of the information that exists in the data from production processes. The large volume of data that is being gathered through production processes in industry 4.0 is considered very important for ZDM. Nevertheless, research questions arise about how these data can be exploited and what are the proper models for processing them, as these data are often multisensorial, multidimensional, and highly non-linear. Machine learning has proved that it can efficiently treat such types of data and thus, should be considered for AI-based RCA.

The aim of this work is to conduct a systematic review survey demonstrating the use of artificial intelligence methods for RCA in smart manufacturing, with a focus on Machine Learning Methods for RCA toward Zero-Defect Manufacturing. A key objective of the paper is to present the employed survey methodology in detail so as to not only present the current state of the art in a repeatable way but also guide readers on how to extend it in a consistent way and according to their needs.

An extensive literature review revealed the existence of only two surveys that regard RCA in manufacturing ( e Oliveira et al., 2022 ; Solé et al., 2017 ), none of which deals with AI technologies. The first survey covers a spectrum of RCA techniques applied in various industrial domains, such as the semiconductor and chemical industries ( e Oliveira et al., 2022 ). According to the survey, the most popular methodologies include association/classification rules, control charts, regression models, and principal component analysis. The second survey follows a different approach and categorizes available methodologies based on causality; deterministic and probabilistic methodologies are thoroughly reviewed, targeting mainly information technology applications ( Solé et al., 2017 ). According to this survey, Bayesian networks are among the most popular learning models in this area. The current paper focuses on the integration of ML methodologies into RCA models applied in smart manufacturing, thus, filling a gap in the current literature. The performed survey covers the most recent advances in a 5-year period, from 2017 up to date.

1.2 Models for ML-based root cause analysis

There is a wide range of AI/ML models that could be involved in the RCA process. These models belong either to the deterministic or the probabilistic group of AI-based methodologies. Each class expresses different attributes, may have different implementations and demonstrates different performance implications ( Solé et al., 2017 ). On the one hand, deterministic models are developed by applying statistical learning techniques, which allow these models to identify patterns in the data automatically. To accomplish that, the models need to be trained on large datasets to unearth boundaries and relationships in the data. The more data are used to train the model, the higher the predictive accuracy. On the other hand, probabilistic models can be constructed hierarchically from data, which allows their wide use for RCA. They enable reasoning behind the uncertainties inherent to most data, thus allowing for fully coherent inferences over complex data structures. Representative probabilistic models are those based on Bayesian Networks and probabilistic Fuzzy Cognitive Maps (FCMs). Since no domain knowledge is available to build these models, the only way is to use learning algorithms that exploit the raw data of the examined system. Various learning algorithms have been developed, oriented on either learning both structures and parameters of the model or just learning the parameters in a given structure.

Deep Learning (DL) is a division of ML which has recently displayed remarkable applicability in a range of different applications, as well as in smart manufacturing. In smart manufacturing, DL has found significant applicability for processing and analyzing big manufacturing data. In most cases, DL networks can be trained using supervised learning with large sets of training data. The most popular DL methods are the following:

1) Deep Neural Networks (DNNs). A DNN resembles an Artificial Neural Network (ANN) with many hidden layers. The difference is in the training process. DNN uses deep learning as a class of machine learning algorithms with the following main aspects: (a) use a cascade of multiple layers of non-linear processing units for feature extraction and transformation, (b) learn in supervised (e.g., classification), and/or unsupervised (e.g., pattern analysis) manners, and (c) learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts. DNNs have more than three layers trained to model non-linear problems.

2) Convolutional Neural Networks (CNNs). They are among the most powerful deep learning techniques presenting notable capabilities in analyzing and classifying images. They are mainly employed in image processing applications (semantic segmentation, image classification, instance segmentation, object detection, etc. ). Their neurons architecture is based on the features of images they process (width, height, depth, etc. ). Typical CNNs have a similar structure to ANN and consist of one or more filters (i.e., convolutional layers), followed by aggregation/pooling layers in order to extract features for classification tasks. Since a CNN has similar characteristics to a standard ANN, it uses gradient descent and backpropagation for training tasks, whereas it contains pooling layers along with layers of convolutions. The vector that is sited at the end of the network architecture can deliver the final outputs.

3) Residual Neural Networks (Res-Nets). They are an extension of DNNs. They are highly considered in industrial applications where precision is vital for machinery health-state diagnosis. Res-Nets typically perform better than CNN-based approaches.

4) Recurrent neural networks (RNN). These are ANNs that utilize connections between units in order to form a directed graph along a sequence. RNNs use their internal memory to process such sequences, something that is not met in feed-forward ANNs. However, RNNs suffer from short-term memory and the problem of vanishing gradient during backpropagation. This is resolved by the Long Short-Term Memory (LSTM) algorithm, presented in the following.

5) Long Short-Term Memory networks (LSTMs). LSTMs excel over the original RNN due to their specific cell structure, which allows the algorithm to add or remove information from this cell using entities called gates. These gates control this memorizing process by allowing the model to learn which information to store in the long memory and which to discard. The cell state resembles a conveyor belt. It runs straight down the entire chain, with only some minor linear interactions. Gates are a way to optionally let information through. They are composed of a sigmoid neural net layer and a pointwise multiplication operation. LSTMs have been applied in predictive maintenance and prognostics in manufacturing processes.

DL techniques enable to i) automatically learn from data ii), detect underlying patterns, and eventually, iii) make efficient decisions. With automatic feature learning and high-volume modeling capabilities, DL provides an advanced analytics tool for smart manufacturing in the big data era. It uses a cascade of layers of non-linear processing to learn the representations of data corresponding to different levels of abstraction. The hidden patterns underneath each other are then identified and predicted through end-to-end optimization. Thus, DL offers great potential to boost data-driven manufacturing applications. Several review papers extracted from the related literature show the actual implementations of ML and DL methods in factory operations within the smart manufacturing domain.

2 Systematic literature review

The presented review is based on the “Preferred Reporting Items for Systematic reviews and Meta-Analyses” (PRISMA) principles ( Moher et al., 2010 ). This methodology is globally accepted in the research community as it leads to well-structured article surveys, allowing investigators to perform accurate systematic reviews. According to PRISMA, a predefined set of questions needs to be defined. Then, identified documents are collected, filtered, analyzed, and critically evaluated.

2.1 Research questions

The goals of this review can be outlined as i) to determine what AI-based technologies have been exploited toward RCA within smart manufacturing; ii) to investigate specific applications and find out how these technologies have been implemented; thus, iii) to shed light in current practices so that further improvements may be built. To achieve these goals, the following research questions (RQs) have been posed:

1) What AI algorithms have been employed? This RQ reveals the available tools in the quiver of current smart manufacturing.

2) What is the accommodated industrial application? This RQ reveals the field of application.

3) What is the popularity of each employed methodology? This RQ reveals the trends in the employment of AI for RCA; popular and well-established methods will probably be the foundation for future developments.

Each article found during the search has been reviewed to answer these questions. The answers were compiled in a comprehensive way to give the reader a clear picture of the current state of the art and its potential for future developments.

2.2 Review protocol

The review protocol includes the selection of proper sources, the definition of search terms, and the definition of acceptance/rejection criteria. These are described in the following.

2.2.1 Search sources

Amongst the many available databases and search engines, Google Scholar and Scopus were preferred because i) they are among the most popular in the research society, ii) they provide consistent and reproducible search results, and iii) Google Scholar provides free access, while Scopus is available to most scientists and researchers through institutional subscriptions.

2.2.2 Search terms

Predetermined search terms were employed for searching the most suitable articles; search terms were pursued in title, abstract, and keywords. The literature search strategy was conducted by utilizing the keywords “defect identification,” “root cause analysis,” “deep learning,” “machine learning,” “artificial intelligence,” “industry,” and “manufacturing,” using the depicted query string ( Figure 1 ).

www.frontiersin.org

FIGURE 1 . Query string for literature search on RCA.

Raw search returned a vast number of results; thus, the inclusion/exclusion criteria needed to be defined so that only the most relevant works were considered for evaluation. The inclusion/exclusion criteria are outlined in the following paragraphs and summarized in Table 1 in the form of acceptance/rejection rules.

www.frontiersin.org

TABLE 1 . Inclusion/Exclusion rules.

2.2.3 Inclusion criteria

This study focuses on ML-based methods for RCA in smart manufacturing. Thus, studies in consideration should include RCA methodologies applied in industrial environments and applications and implementing AI, DL, or ML in any of their flavors. Since the state of the art is investigated, only the latest works within the last 5 years were pursued, namely works published between 1 January 2017, and 15 April 2022 (date of literature search). Only publications in English were considered, as all primary studies are published in English.

The inclusion criteria include: 1) papers published; 2) the process of root cause analysis and defect-cause identification in industry; 3) ML-based algorithms, including both traditional ML and DL techniques incorporated for defect-cause analysis; 4) ML methods for RCA in smart manufacturing.

2.2.4 Exclusion criteria

To reduce the number of articles for investigation and keep this survey within scope, numerous exclusion criteria were set. First, articles published before 2017 were not considered, as they do not reflect the current state of the art. Furthermore, RCA methodologies should be literally based on AI/ML methods; otherwise, they are considered irrelevant. Material of questionable quality content was also neglected; such material includes websites and online material, student theses, book chapters, editorials, commentaries, non-original research articles, protocols, meta-analyses, etc. , as well as all non-peer-reviewed content. Journal and conference reviews were omitted as well. Nevertheless, articles published in languages other than English were eliminated by this search.

2.3 Literature collection

After a thorough review of all mined articles, only those fulfilling the established criteria were considered in the industrial or manufacturing domain. After the adoption of the PRISMA method and only those articles that were explicit about the subject of this short review were retrieved. The overall search process is graphically illustrated in Figure 2 .

www.frontiersin.org

FIGURE 2 . Research screening process.

Finally, 30 research papers were selected for further analysis. The selected articles are listed in Table 2 , along with the year of publication, the method employed, the scope of the study, and the industrial process in which the methodology is applied.

www.frontiersin.org

TABLE 2 . Collection of reviewed articles.

The review for AI-based RCA was divided into two families according to the applied models’ specifications.

The first family of AI methods incorporated for automatic RCA is based on probabilistic models, which comprise several techniques, each one having different implementations and certain performance implications. In this category, the main representatives are the Bayesian and Hybrid Bayesian networks. A Diagnostic Hybrid Bayesian Network is built in ( Chigurupati and Lassar, 2017 ) to model the cause-effect relationship between the degradation parameters (cause) and failure modes (effect) that occur in order to capture the cause-symptom relationship within the examined hardware system. The required step of assigning conditional probabilities for building up the Bayesian network topology is accomplished with the deployment of the linearly varying Weibull and Lognormal distributions. This model was applied in two real-life field use cases concerning a small batch of hardware modules. The utilization of a Bayesian Network is also discussed in ( Lokrantz et al., 2018 ) as part of the proposed framework for automatic root cause analysis and failure diagnostics in two simulated manufacturing processes, which consist of three and five process steps. To build the Bayesian network, various algorithms were utilized for structure learning, parameter learning, and inference. Regarding model training on historical data, the inference is conducted on the causal nodes, whereas the root causes of possible new process failures were determined. Finally, the result of inference was given in the form of conditional probabilities of the desired variables. The next use case involves the determination of possible causes in the manufacturing process of a bottle opener, termed the “Lion’s Jaw” ( Brundage et al., 2017 ). This article introduces a framework for the formal, systematic manufacturing diagnosis of problems arising in manufacturing systems. A Bayesian network was selected because it models the cause-effect (causal) relationship between nodes, increasing the system’s accuracy. The required probabilities for model training were obtained from the Simio simulation model ( simi.com ).

The second family of methods reported in this review comprises ML and Artificial Neural Networks, which are referred to as deterministic models since there is no involvement of randomness in calculating the output state of the model. In this regard, an ANN classifier tuned with an intelligent Genetic Algorithm (GA) is proposed in ( Arias Velásquez and Mejía Lara, 2020 ) to improve the root cause analysis and diagnosis of faults in power transformers. Moreover, the authors in ( Ma et al., 2021 ) develop a big-data-driven root cause analysis system utilizing ML techniques. More specifically, they apply K-Nearest Neighbor (K-nn) and Neural Network (NN) classifiers to improve the performance of RCA in their effort to enhance product quality performance and reduce quality risk. The proposed framework comprises three distinct modules: Problem Identification (PI), Root Cause Identification (RCI), and Permanent Corrective Action. In the RCI Module, a supervised ML method is deployed to detect possible root causes of the defined quality problem. Then, the Multi-Layer Perception (MLP) model is employed to define quality problems and identify the root causes of the quality problems.

The contribution of another NN for industrial root cause diagnosis is presented in ( Chen et al., 2020 ). The Sparse Causal Residual Neural Network (SCRNN) model is a ML-based method which seeks to figure out the causal relationships and causality lags between multiple variables. It seeks to predict the future state of a target variable using as input the previous state of the multiple time series, behaving in a regressive way, thus directly determining the causal relationship between variables after optimization. SCRNN comprises two modules: Variable selection and Fitting. In the second module, the fault variables are determined, and then the RCA follows by deploying the SCRNN, ending up in the isolation and recovery of the faults.

In the field of manufacturing industry, another ML model was built, which deployed several anomaly detection techniques for improving product quality in two assembly lines ( Abdelrahman and Keikhosrokiani, 2020 ). These techniques include Histogram-Based Outlier Score (HBOS), IForest, K-nn, Cluster-Based Local Outlier Factor (CBLOF), One Class Support Vector Machine (OCSVM), Local Outlier Factor (LOF), and Angular-Based Outlier Detector (ABOD). Their behaviour was assessed through the application of two performance metrics. Among these models, K-nn and ABOD showed the highest performance. The authors performed an RCA using the Pareto chart to identify those variables that cause the anomalies. Also, 2 ML algorithms, namely Random Forest (RF) and Support Vector Machine (SVM) were deployed in ( Sarkar et al., 2020 ) to accurately classify accident reports in a steel plant, using text classification approaches and evaluate their usefulness. The proposed RCA implementation aimed to find hidden causal factors that would help the steel plant company to take proper precautionary measures to minimize injuries, as RCA provides a much deeper insight into the root causes behind the incidents. In the case of RF, such a model was built in ( Steurtewagen and Van den Poel, 2019 ) and further compared with an ANN to determine potential root causes of machinery breakdowns and, more specifically, to identify possible breakage points of compressor units that constitute the examined case study. RCA was performed to implement predictive modelling as well as to accurately predict compressor behaviour based on sensor data. Two more approaches based on RF models were developed in ( Gonzalez et al., 2017 ; Berges et al., 2021 ) on two different case studies. The first deals with the identification of root causes of errors that happen in a network, and are based on a historical dataset of events, while the second refers to the detection of the signals triggering defect occurrence in the semiconductor industry of automotive products.

Isolation Forest (IF) is another ML approach, and it is proposed in ( Carletti et al., 2019 ) for the task of Anomaly Detection. Specifically, the IF algorithm is involved in the Depth-based Isolation Forest Feature Importance (DIFFI) framework, which is proposed for defining and evaluating feature importance in industrial scenarios and further enables simple RCA. IF is part of the isolation procedure, which defines a tree-like model of decisions, called an Isolation Tree, in which each node is linked with a variable, and its children are determined based on a splitting value. According to it, a feature is defined as important for anomaly detection when it can isolate samples, meaning that it can induce isolation of outliers at small depths and does not contribute to the isolation of inliers. RCA can then be performed based on features that are marked as the most relevant by the proposed approach.

Among the frameworks that belong in the same ML and ANN family of methods, a scheme that consists of a moving window using kernel principal component analysis (KPCA) and an information geometric causal inference (IGCI) is reported in ( Sun et al., 2021 ) and concerns the adaptive fault detection and RCA. Another method for fault root diagnosis is based on Recurrent Neural Network (RNN) and Granger Causality (GC), as proposed in ( Shen et al., 2021 ). The framework comprises three steps. First, a Principal Component Analysis is performed to detect faults. Then, Dynamic Time Warping (DTW) is used to group fault candidate variables based on similarity and perform causality testing. Lastly, the combination of RNN and GC models is used to locate the root cause of faults. In addition, a new methodology for adaptive anomaly detection and RCA was explored in ( Steenwinckel et al., 2021 ), using ML along with expert knowledge. The developed Fused-AI interpretable Anomaly Generation System (FLAGS) framework combines the advantages of both data- and knowledge-driven techniques towards optimizing anomaly detection, fault recognition, and RCA, while providing interpretable causes for the occurred anomalies. The proposed methodology was tested using a predictive maintenance case in the railway domain. This method seems to reduce downtime and provide more insight into frequently occurring problems while it gives the operator a new tool to investigate possible errors in the system.

A ML-guided methodology concerning tree-based models (Decision Tree, RF, and XGBoost) for RCA is demonstrated in ( Huang and Li, 2021 ) to identify influencing production parameters for repeatability improvement and quality evaluation of additive manufacturing printed parts of the laser powder bed fusion (L-PBF) technology. An unsupervised RCA method deploying a decision-tree model is also demonstrated in ( Pan et al., 2020 ), which is combined with frequent-pattern mining to cluster the data. Two case studies from the industry are involved, adopting real-world test data from network systems. In ( Wasfi et al., 2019 ), a Decision Tree (DT) algorithm along with a Gradient Boosting (GB) model were selected to implement pattern recognition algorithms, which target the recognition of those transmission nodes characterized by unreliable data. On the other hand, the GB model was proposed in ( Tiensuu et al., 2020 ) to find root causes behind the center line deviation of the steel strips. Using feature extraction and domain knowledge, the authors performed data reduction and new features construction to train the GB model. Their case study showed a correlation of errors between a former procedure (the hot rolling process) and a latter procedure (the RAP-line process).

In addition to individual ML methods, it is recorded a combination of such methods in the related literature. For example, a NN ensemble technique was developed in ( Diren et al., 2019 ) to determine the root cause of uncontrolled situations in a Multivariate Manufacturing Process in the automotive sector. Five different root causes were identified in the process of painting seats, door panels, and bumper modules, paying attention to surface quality and fluidity. In ( Pan et al., 2021a ), another AI method based on ensemble learning is built to facilitate transfer learning in order to select the valuable samples from a source product similar to the target product. The examined case studies refer to two industry designs. The aim of the proposed model is to improve the RCA accuracy on the target product.

Apart from the two AI-based families of methodologies, as reported above, there is one more group of models that has a significant contribution to RCA, demonstrating notable performance. This group includes the DL and CNNs models, which are developed and applied in certain case studies as presented in the following lines. More specifically, the authors in ( Crocco and O’Hern, 2018 ) perform a statistical RCA using CNNs for manufacturing quality improvement of an image sensor array. They investigate defects classification in an image sensor product to their corresponding origin. Root cause failure analysis was performed for all pixels exhibiting failures. In another study, a novel process for RCA utilizing unsupervised machine learning techniques for clustering and a CNN deep learning network for classification is proposed ( Weber et al., 2021 ). The use case concerns a known root-cause for specific defect patterns in wafer maps. For automatic defect identification, an end-to-end CNN architecture is also proposed in ( Xie et al., 2021 ), termed fusion feature CNN (FFCNN), consisting of three modules: feature extraction, feature fusion and decision-making. This intelligent machine-vision-based system was developed to detect surface defects on magnetic tiles during the production stage. AlexNet, VGG-16, and Resnet-50 were investigated for the development of the appropriate network. Another CNN model, namely the BiLSTM-CNN classifier, was built in ( Javanbakht et al., 2022 ) to analyze alarm data of the Tennessee Eastman chemical process. After its training, the neural network was used for online fault detection, identifying the root cause of the alarms by the first five alarms of each fault scenario. Its structure includes six layers; input layer, 1D-CNN, BiLSTM, Self-Attention, Dense layer, and output layer.

In an effort to enhance the detection rate and automatically interpret the cause of an anomaly, the author in ( Steenwinckel et al., 2018 ) adds prior experts’ knowledge into ML systems. One such technique is the Relational Graph Convolutional Network (RGCN), which operates on realistic knowledge graphs, fusing both ML and semantics to improve anomaly detection together with the ability to identify root causes inside a stream of data accurately. In another study, a Graph Convolutional Neural Network (GCNN) is also utilized as a part of the proposed model termed Process Estimator Neural Network (PEN), which was developed to tackle the non-linear issue of the state-sparse model ( Leonhardt et al., 2021 ). PEN is actually a NN that uses a single graph convolution layer followed by two fully connected layers and constitutes a novel RCA methodology targeting modern multistage assembly lines for increasing product quality and implementing zero-defect manufacturing.

In order to perform root cause detection, the authors in ( Shah et al., 2018a ) used an RNN to extract two types of causal relationships (interdependencies and lagged dependencies) among the time-series of the examined system. They used dynamic dependency graphs that have been extracted from multivariate time series data.

A CNN is also utilized in the automatic Segmentation of Cells and Defect Detection (SCDD) system, proposed in ( Lin et al., 2021 ), that aims to visually inspect defects in Electroluminescence (EL) images of single-crystalline silicon solar panels in photovoltaic (PV) industries. ResNet50 was utilized as a classifier, while YOLOv4 contributed as a detector for the panel-based defect detection. By applying cutting-edge deep CNNs, this approach achieves highly accurate defect detection rates from limited training samples of cell images. Finally, two ML models, termed STR (Structure transfer) and SER (Structure Expansion Reduction), and an ensemble (MIX) model were proposed to conduct a failure analysis through defect detection on nanoscale field-effect transistors (FET) of the semiconductor industry ( Pan et al., 2021b ). These ML models were trained on the same two defect datasets (FinFET and GAA-FET), providing notable accuracy in the identification of failures of the devices, guiding the acceleration of the production process.

The taxonomy of employed ML-based models is shown in Figure 3 .

www.frontiersin.org

FIGURE 3 . Taxonomy of employed AI-based models for RCA.

To sum up, detecting root causes of defected parts in the production process is a highly demanding task in the manufacturing industry and needs extensive knowledge from experts to perform analyses. However, this demands high costs and offers low flexibility ( Mueller et al., 2018 ). In this direction, ML-based techniques are able to model a vast amount of process data empirically, contributing to an automated root cause analysis, also reducing the costs and the necessary expert knowledge. In contrary to manual RCA which was previously conducted also using predefined root causes as training, ML-based algorithms possess the ability of analyzing complex data of different sources and types providing an automated way for root cause analysis. In that case, the abovementioned literature review brings together a number of ML methods that have been exploited toward the automatic RCA within smart manufacturing.

4 Discussion

4.1 findings.

Through the current systematic review, valuable outcomes have been extracted, as illustrated in the following graphs, and further analyzed, evaluated and discussed, so that certain insights are elicited, highlighting the contribution of ML methods for RCA in smart manufacturing. To begin with, Figure 4 shows the distribution of publications during the investigated period. There is an evident increase in the number of publications, reflecting the growing interest in AI/ML-based RCA. As for the limited number of publications in 2022, this is due to the fact that this review was conducted at the beginning of 2022. However, an increasing tendency in the number of publications is anticipated.

www.frontiersin.org

FIGURE 4 . The distribution of publications for 2017-2022.

The literature analysis regarding the utilization of various AI technologies dictates that the deterministic methods (ML-, DL-, and NN-based) is the most popular choice ( Figure 5 ) having 90% preference against the family of probabilistic models. In particular, the ML methods and their combinations are by far the most popular choice (60%), followed by DL and CNNs (30%), whereas the group of probabilistic methods (Bayesian and their Hybrid modules) come as the least employed methodologies for RCA with only 10%. This indicates that the major categories (ML, DL and CNNs) demonstrate their dominant presence in almost all case studies (90%) regarding AI-based RCA in smart manufacturing. This preference is attributed to the efficient learning capacity and performance of ANNs and the adequate performance of the traditional ML technologies for lesser amounts of input data.

www.frontiersin.org

FIGURE 5 . Categorization of AI models employed for RCA.

The distribution of AI-based models exploitation shown in Figure 5 is further quantified in Figure 6 . CNNs are the most popular architecture (23%), as reported in the reviewed literature. Then, ANNs and Tree/Forest-based methods follow with 20% each, while the third best architecture is ML with 17%. Clustering is also well preferred (13%) either in its traditional forms (K-NN etc. ) or combined with some AI architectures. The Bayesian methods also exhibit the same popularity. On the other hand, SVM demonstrates reduced popularity (7%), whereas RNNs, ResNets and GA seem to be the least favored methods (3%). The rest of the examined studies are scattered among other less popular methodologies. This categorization was performed under the consideration that some of the works investigated might employ more than one AI method; thus, the overall sum of the individual ratios exceeds 100%.

www.frontiersin.org

FIGURE 6 . Popularity of AI methodologies.

It is noteworthy that AI methods have been exploited in a wide variety of industrial sectors. Figure 7 shows the distribution in the percentage of all the industrial applications involved, considering the reviewed studies. The manufacturing sector is the leader in the use of AI methods as most reviewed papers (33%) concern some sort of manufacturing process. Quality Control/Assessment, Tennessee Eastman Process, and Data Analytics are equally (13%) employing AI techniques. The semiconductor industry follows (10%) along with Imaging applications (7%). One paper is related to Incident Analysis, targeting accident prevention. The rest of the papers (27%) are scattered among various other industrial sectors. It needs to be mentioned again, that the sum of the partial ratios exceeds 100%, as in some papers, more than one industrial sector is involved.

www.frontiersin.org

FIGURE 7 . Usage of AI methods in various industrial sectors.

Furthermore, the applications that currently benefit most from the ML-based RCA are summarized in Figure 8 . The majority of the ML-based RCA methods seem to be most embraced by Manufacturing (50%) and Quality Control (27%) applications. Manufacturing is further analyzed and categorized into general product fabrication applications (18%), semiconductor fabrication (14%), hardware and devices manufacturing (14%), and assembly line monitoring (4%).

www.frontiersin.org

FIGURE 8 . Industrial applications currently using ML-based RCA.

4.2 Lessons learned

This systematic review was conducted to investigate the potential of using AI/ML for RCA in smart manufacturing. An increasing trend is demonstrated ( Figure 4 ) regarding the number of relevant publications in the literature, which proves a growing interest in this area. This review was directed by a set of research questions; the answers to these questions provided insight regarding the developments and popularity of employed methods. First, a taxonomy of AI algorithms and methods has been compiled ( Figure 3 ). This taxonomy lists the available tools, in a manner that considers the relevance among each other. That is, if a tool exhibits weaknesses, a similar one in the list may be exploited to overcome the observed limitations of the previous tool. Second, AI/ML-based RCA is not limited to a few specialized fields, but rather spans a wide range of industrial applications. The heterogeneity of industrial sectors ( Figure 7 ) reveals a growing endeavor for novel AI tools for RCA. Third, Neural Networks in their many variants seem to be the dominant technology in AI-based RCA. This is attributed to their exceptional learning capabilities, their architectural flexibility, and their extraordinary capacity to discover underlying patterns.

4.3 Challenges

Despite the successful use of AI in RCA, there are still challenges to be addressed:

1) Explainability. The domain of AI is considered unclear by humans as regards its behavior, especially in the process of decision-making. This originates from AI models’ high complexity, ambiguity and the increased source of data, along with the inexplicit AI learning methods employed. Hence, the produced results cannot be directly and adequately explained by humans, and this leads to a sort of failure in applying AI models to a number of critical problems, failing overall to configure the best decision-making process. In this direction, Explainable AI (XAI) is a promising domain that allows humans to uncover clarity and reasoning in AI systems in an effort to comprehend AI models’ complex behavior at certain tasks and to arrive at a specific decision. On that basis, the incorporation of XAI into RCA methods would surely strengthen confidence in derived decisions.

2) Quality of Training. The performance of AI algorithms in RCA strongly depends on the availability of high-quality data. However, lack of adequate and suitable sources of data, as well as data scarcity, are common issues in the training process of AI models. Four papers were identified dealing with data for AI-based in RCA, including big-data, feature importance, data streams, etc. Thus, innovative data processing methods (e.g., data augmentation) are highly needed for efficient RCA and should probably be a topic of deeper investigation.

3) Standardization and Interoperability. As demonstrated in the presented graphs, there is a broad range of industrial applications that already benefit from AI-based RCA. Each operator may, of course, use their own protocols and practices. However, the establishment of standard rules and compatible interfaces and protocols would enable broad collaborations and joint efforts.

4) Data Privacy. The progress in AI and especially in DL is the result of a collaboration of multiple contributions, including open-source data offered by the community. Open-source datasets e.g., ImageNet ( Deng et al., 2009 ), played a key role in establishing AI methods and boosted their development and optimization. AI-based RCA can also benefit from such datasets. However, the nature of RCA requires datasets from different production stages that could potentially expose sensitive information about the industrial provider. Thus, obtaining a real industrial benchmark dataset to further evaluate the effectiveness of AI methods for RCA is a real challenge.

5) Security. The industrial environment requires secure transactions. On the one hand, data security would prevent the leakage of sensitive or confidential information. On the other hand, data integrity is also necessary to prevent corruption and information loss. Such issues may be dealt with using current technologies like blockchain, but this certainly is a topic that deserves its own research.

6) New technologies. The development of AI-based methods is progressing at a fast pace and new tools appear every day. Digital twins provide a means to create a digital replica of the manufacturing line, which can be employed to tailor and optimize production parameters without affecting the running production. Digital twins may be exploited towards RCA, but this requires future research on this topic.

4.4 Concluding remarks

AI technologies have demonstrated remarkable efficiency in the implementation of RCA in smart manufacturing. A wide range of AI models is increasingly involved in the manufacturing processes that struggle to provide optimized production operation and consistent delivery of better products. To achieve that, quality inspection and smart quality control become crucial processes in the production line towards conducting detection of defects and defect source identification. This systematic literature review aims to demonstrate the integration of various AI methods into RCA models applied in smart manufacturing, investigating their implementation methodology as well as current practices. To accomplish the goals of this work, the PRISMA method was followed, while proper database sources were selected, including Google Scholar and Scopus, to mine published scientific papers from the year 2017 up to 2022. The followed review protocol also included the search term and the inclusion and exclusion criteria set by the authors. The list of the retrieved papers contains the authors, the year of publication, the AI method employed, the scope of the AI model’s implementation, and the industrial process the AI method was applied in toward defect source identification. The results have shown an increasing trend in the number of publications that report on the contribution of AI methods for RCA in smart manufacturing. The first goal regarding the availability of the ΑΙ tools in the realm of current smart manufacturing was accordingly answered. Based on our study, it was found that several deterministic and probabilistic AI technologies were applied for RCA in industry to identify defect sources in the early stages. Such technologies include ML-, DL- and NN-based models as well as Bayesian and their Hybrid modules. A variety of industrial processes were also reported to comply with the second goal of this review study. These were related to the manufacturing sector, semiconductor industry, data analysis, quality control and several others. Furthermore, this study also revealed the popularity of the AI technologies that have been employed for RCA in smart manufacturing so far. That was the third research question to be answered in this study. The findings showed that CNNs are the most popular architecture, followed by ANNs and Tree/Forest-based methods, while ML, Clustering, and Bayesian come with a smaller percentage in their contribution to RCA in industry 4.0. SVM, RNNs, ResNets, and GA seem to be the least favored methods in the examined field. In addition, this research study pinpoints a number of challenges that need to be considered in the future implementation of AI methodologies in RCA. Among them, explainability, quality of training, interoperability, privacy, and security have a significant share and role when AI-based technologies are exploited towards RCA within smart manufacturing, and thus they need to be further investigated.

To sum up, many AI technologies have already been successfully incorporated within the framework for RCA toward zero-defect manufacturing. In fact, there is a growing interest in further development, as implied by the number of publications per year. To accomplish an efficient literature survey, a set of research questions were initially posed. Then, a systematic literature review protocol was properly chosen to answer the posed queries. The conducted survey was focused on the most popular online, well-respected, and highly available databases, covering a wide range of disciplines, from the theoretical to the applied. Taking a close look at the produced outcomes, it is noted that ML and DL have shown a major contribution to RCA for smart industry applications. Seeking further, it seems that Neural Networks in their many variants (ANNs, CNNs, RNNs, etc. ) are the current trend for automatic RCA in smart manufacturing using AI models. However, despite the current success, there are still open issues regarding explainability, data quality, standardization, data security, integrity, etc. These challenges deserve papers of their own and will be dealt with in future extensions of this work.

Author contributions

KP laid the foundation and coordinated with the team to bring out the paper’s content. His significant contributions are establishing the Introduction , AI-based models, systematic literature review, Results , and Discussion of this review paper. AR and TT worked on a systematic literature review and wrote the Results and Discussion of this review paper. The sections Discussion of Results and Conclusion are contributed significantly by EP, TT, and KP. EP made all the edits, referencing, abstract, conclusion, and communication with the group and brought the contents of the manuscript to the current shape. TT has greatly contributed during the conceptualizing phase, and his contributions are greatly in articulating the contents of this manuscript and performing the revisions. ND, DT, and GM have contributed to editing. All authors are involved in discussing the contents/design/flow of the paper. All authors contributed to the article and approved the submitted version.

This work has been supported by the European Commission through project OPTIMAI, funded by the European Union (H2020-NMBP-TR-IND-2020-singlestage, Topic: DT-FOF-11-2020, GA 958264).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Author disclaimer

The opinions expressed in this paper are those of the authors and do not necessarily reflect the views of the European Commission.

Abdelrahman, O., and Keikhosrokiani, P. (2020). Assembly line anomaly detection and root cause analysis using machine learning. IEEE Access 8, 189661–189672. doi:10.1109/access.2020.3029826

CrossRef Full Text | Google Scholar

Arias Velásquez, R. M., and Mejía Lara, J. V. (2020). Root cause analysis improved with machine learning for failure analysis in power transformers. Eng. Fail. Anal. 115, 104684. doi:10.1016/j.engfailanal.2020.104684

Berges, C., Bird, J., Shroff, M. D., Rongen, R., and Smith, C. (2021). “Data analytics and machine learning: Root-cause problem-solving approach to prevent yield loss and quality issues in semiconductor industry for automotive applications,” in 2021 IEEE International Symposium on the Physical and Failure Analysis of Integrated Circuits (IPFA) , Singapore, Singapore ( IEEE ) Available at: https://ieeexplore.ieee.org/document/9617238/ .(Accessed on April 18, 2022). doi:10.1109/IPFA53173.2021.9617238

Brundage, M. P., Kulvatunyou, B., Ademujimi, T., and Rakshith, B. (2017). “Smart manufacturing through a framework for a knowledge-based diagnosis system.” in Manufacturing Equipment and Systems , 3. Los Angeles, California, USA : American Society of Mechanical Engineers ASME . Available at: https://asmedigitalcollection.asme.org/MSEC/proceedings/MSEC2017/50749/Los%20Angeles,%20California,%20USA/269585 .

Google Scholar

Carletti, M., Masiero, C., Beghi, A., and Susto, G. A. (2019). “Explainable machine learning in industry 4.0: Evaluating feature importance in anomaly detection to enable root cause analysis,” in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) , Bari, Italy ( IEEE ). Available at: https://ieeexplore.ieee.org/document/8913901/ (Accessed on Feb 18, 2022). doi:10.1109/SMC.2019.8913901

Chen, J., Zhao, C., and Sun, Y. (2020). “Sparse causal residual neural network for linear and nonlinear concurrent causal inference and root cause diagnosis,” in 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV) , Shenzhen, China ( IEEE ). Available at: https://ieeexplore.ieee.org/document/9305508/ .(Accessed on Feb 18, 2022). doi:10.1109/ICARCV50220.2020.9305508

Chigurupati, A., and Lassar, N. (2017). “Root cause analysis using artificial intelligence,” in 2017 Annual Reliability and Maintainability Symposium (RAMS) , Orlando, FL, USA , 23-26 January 2017 ( IEEE ). Available at: http://ieeexplore.ieee.org/document/7889651/ .(Accessed on Feb 16, 2022). doi:10.1109/RAM.2017.7889651

Crocco, J. D., and O’Hern, J. R. P. R. (2018). Manufacturing quality improvement through statistical root cause analysis using convolution neural networks. US 2018/0293722 A Available at: https://patentimages.storage.googleapis.com/ae/88/9b/7085b4580621de/US20180293722A1.pdf .(Accessed on Oct 11, 2022).

Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., and Fei-Fei, L. (2009). “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition , Miami, FL, USA ( IEEE ), 248–255. doi:10.1109/CVPR.2009.5206848

Diren, D. D., Boran, S., Selvi, I. H., and Hatipoglu, T. (2019). “Root cause detection with an ensemble machine learning approach in the multivariate manufacturing process,” in Industrial engineering in the big data era . Editors F. Calisir, E. Cevikcan, and H. Camgoz Akdag (Cham: Springer International Publishing ), 163–174. (Lecture Notes in Management and Industrial Engineering). Available at: http://link.springer.com/10.1007/978-3-030-03317-0_14 .(Accessed on April 18, 2022).

Dreyfus, P. A., Psarommatis, F., May, G., and Kiritsis, D. (2022). Virtual metrology as an approach for product quality estimation in industry 4.0: A systematic review and integrative conceptual framework. Int. J. Prod. Res. 60 (2), 742–765. doi:10.1080/00207543.2021.1976433

e Oliveira, E., Miguéis, V. L., and Borges, J. L. (2022). Automatic root cause analysis in manufacturing: An overview & conceptualization. J. Intell. Manuf . Available at: https://link.springer.com/10.1007/s10845-022-01914-3 .(Accessed on April 14, 2022). doi:10.1007/s10845-022-01914-3

Gonzalez, J. M. N., Jimenez, J. A., Lopez, J. C. D., and Parada, G. H. A. (2017). Root cause analysis of network failures using machine learning and summarization techniques. IEEE Commun. Mag. 55 (9), 126–131. doi:10.1109/mcom.2017.1700066

Huang, D. J., and Li, H. (2021). A machine learning guided investigation of quality repeatability in metal laser powder bed fusion additive manufacturing. Mater. Des. 203, 109606. doi:10.1016/j.matdes.2021.109606

Javanbakht, N., Neshastegaran, A., and Izadi, I. (2022). Alarm-based root cause analysis in industrial processes using deep learning. Available at: https://arxiv.org/abs/2203.11321 .(Accessed on April 18, 2022).

Jayswal, A., Li, X., Zanwar, A., Lou, H. H., and Huang, Y. (2011). A sustainability root cause analysis methodology and its application. Comput. Chem. Eng. 35 (12), 2786–2798. doi:10.1016/j.compchemeng.2011.05.004

Leonhardt, V., Claus, F., and Garth, C. (2021). Pen: Process Estimator neural Network for root cause analysis using graph convolution. J. Manuf. Syst. 62, 886–902. doi:10.1016/j.jmsy.2021.11.008

Lin, H. H., Dandage, H. K., Lin, K. M., Lin, Y. T., and Chen, Y. J. (2021). Efficient cell segmentation from electroluminescent images of single-crystalline silicon photovoltaic modules and cell-based defect identification using deep learning with pseudo-colorization. Sensors 21 (13), 4292. doi:10.3390/s21134292

PubMed Abstract | CrossRef Full Text | Google Scholar

Lokrantz, A., Gustavsson, E., and Jirstrand, M. (2018). Root cause analysis of failures and quality deviations in manufacturing using machine learning. Procedia CIRP 72, 1057–1062. doi:10.1016/j.procir.2018.03.229

Ma, Q., Li, H., and Thorstenson, A. (2021). A big data-driven root cause analysis system: Application of Machine Learning in quality problem solving. Comput. Industrial Eng. 160, 107580. doi:10.1016/j.cie.2021.107580

Moher, D., Liberati, A., Tetzlaff, J., and Altman, D. G. (2010). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Int. J. Surg. 8 (5), 336–341. doi:10.1016/j.ijsu.2010.02.007

Mueller, T., Greipel, J., Weber, T., and Schmitt, R. H. (2018). Automated root cause analysis of non-conformities with machine learning algorithms. J. Mach. Eng. 18, 60–72. doi:10.5604/01.3001.0012.7633

Murugaiah, U., Benjamin, S. J., Marathamuthu, M. S., and Muthaiyah, S. (2010). Scrap loss reduction using the 5-whys analysis. Int. J. Qual. Reliab. Manag. 27 (5), 527–540. doi:10.1108/02656711011043517

OPTIMAI (2021–2023). OPTIMAI project. Available from: https://optimai.eu/ . GA 958264.

PubMed Abstract | Google Scholar

Pan, J., Low, K. L., Ghosh, J., Jayavelu, S., Ferdaus, M. M., and Lim, S. Y. (2021). Transfer learning-based artificial intelligence-integrated physical modeling to enable failure analysis for 3 nanometer and smaller silicon-based CMOS transistors. ACS Appl. Nano Mat. 4 (7), 6903–6915. doi:10.1021/acsanm.1c00960

Pan, R., Li, X., and Chakrabarty, K. (2021). “Unsupervised root-cause analysis with transfer learning for integrated systems,” in 2021 IEEE 39th VLSI Test Symposium (VTS) , San Diego, CA, USA . ( IEEE ). Available from: https://ieeexplore.ieee.org/document/9441030/ .(Accessed on April 18, 2022). doi:10.1109/VTS50974.2021.9441030

Pan, R., Zhang, Z., Li, X., Chakrabarty, K., and Gu, X. (2020). “Unsupervised root-cause analysis for integrated systems,” in 2020 IEEE International Test Conference (ITC) , Washington, DC, USA ( IEEE ). Available from: https://ieeexplore.ieee.org/document/9325268/ .(Accessed on April 18, 2022). doi:10.1109/ITC44778.2020.9325268

Papageorgiou, E., Theodosiou, T., Papageorgiou, K., Casanovas, P., Charalampous, P., and Dimitriou, N. (2021). State of the art survey. Available from: https://optimai.eu/wp-content/uploads/2021/07/OPTIMAI-D2.3-State-of-Art_V2.0_FINAL_SUBMISSION.pdf .

Psarommatis, F., May, G., Dreyfus, P. A., and Kiritsis, D. (2020). Zero defect manufacturing: State-of-the-art review, shortcomings and future directions in research. Int. J. Prod. Res. 58 (1), 1–17. doi:10.1080/00207543.2019.1605228

Psarommatis, F., Prouvost, S., May, G., and Kiritsis, D. (2020). Product quality improvement policies in industry 4.0: Characteristics, enabling factors, barriers, and evolution toward zero defect manufacturing. Front. Comput. Sci. 2, 26. doi:10.3389/fcomp.2020.00026

Psarommatis, F., Sousa, J., Mendonça, J. P., and Kiritsis, D. (2022). Zero-defect manufacturing the approach for higher manufacturing sustainability in the era of industry 4.0: A position paper. Int. J. Prod. Res. 60 (1), 73–91. doi:10.1080/00207543.2021.1987551

Sarkar, S., Ejaz, N., Kumar, M., and Maiti, J. (2020). “Root cause analysis of incidents using text clustering and classification algorithms,” in Proceedings of ICETIT 2019. Lecture notes in electrical engineering . Editors P. K. Singh, B. K. Panigrahi, N. K. Suryadevara, S. K. Sharma, and A. P. Singh (Cham: Springer International Publishing ), 605, 707–718. Available at: http://link.springer.com/10.1007/978-3-030-30577-2_63 .(Accessed on April 18, 2022).

Shah, S. Y., Dang, X. H., and Zerfos, P. (2018). “Root cause detection using dynamic dependency graphs from time series data,” in 2018 IEEE International Conference on Big Data (Big Data) , Seattle, WA, USA ( IEEE ), 1998–2003. doi:10.1109/BigData.2018.8622059

Shah, S. Y., Dang, X. H., and Zerfos, P. (2018). “Root cause detection using dynamic dependency graphs from time series data,” in 2018 IEEE International Conference on Big Data (Big Data) , Seattle, WA, USA ( IEEE ). Available at: https://ieeexplore.ieee.org/document/8622059/ .(Accessed on April 4, 2022). doi:10.1109/BigData.2018.8622059

Shen, G., Wang, P., Hu, K., and Ye, Q. (2021). “fault root cause diagnosis method based on recurrent neural network and granger causality,” in 2021 CAA Symposium on Fault Detection, Supervision, and Safety for Technical Processes (SAFEPROCESS) , Chengdu, China ( IEEE ). Available at: https://ieeexplore.ieee.org/document/9693579/ . doi:10.1109/SAFEPROCESS52771.2021

Solé, M., Muntés-Mulero, V., Rana, A. I., and Estrada, G. (2017). Survey on models and techniques for root-cause analysis. Available at: https://arxiv.org/abs/1701.08546 .(Accessed on April 14, 2022).

Steenwinckel, B. (2018). “Adaptive anomaly detection and root cause analysis by fusing semantics and machine learning,” in The semantic web: ESWC 2018 satellite events Lecture notes in computer science . Editors A. Gangemi, A. L. Gentile, A. G. Nuzzolese, S. Rudolph, M. Maleshkova, and H. Paulheim (Cham: Springer International Publishing ). Available at: http://link.springer.com/10.1007/978-3-319-98192-5_46 . doi:10.23919/PCICEurope46863.2019.9011628

Steenwinckel, B., De Paepe, D., Vanden Hautte, S., Heyvaert, P., Bentefrit, M., and Moens, P. (2021). Flags: A methodology for adaptive anomaly detection and root cause analysis on sensor data streams by fusing expert knowledge with machine learning. Future Gener. Comput. Syst. 116, 30–48. doi:10.1016/j.future.2020.10.015

Steurtewagen, B., and Van den Poel, D. (2019). “Root cause analysis of compressor failure by machine learning,” in 2019 Petroleum and Chemical Industry Conference Europe (PCIC EUROPE) , Paris, France ( IEEE ). Available at: https://ieeexplore.ieee.org/document/9011628/ .

Sun, Y., Qin, W., Zhuang, Z., and Xu, H. (2021). An adaptive fault detection and root-cause analysis scheme for complex industrial processes using moving window KPCA and information geometric causal inference. J. Intell. Manuf. 32 (7), 2007–2021. doi:10.1007/s10845-021-01752-9

Tao, F., Qi, Q., Liu, A., and Kusiak, A. (2018). Data-driven smart manufacturing. J. Manuf. Syst. 48, 157–169. doi:10.1016/j.jmsy.2018.01.006

Tiensuu, H., Tamminen, S., Haapala, O., and Röning, J. (2020). Intelligent methods for root cause analysis behind the center line deviation of the steel strip. Open Eng. 10 (1), 386–393. doi:10.1515/eng-2020-0041

Wasfi, B., Alqasimi, M., Al Kadem, M., and Alghamdi, A. (2019). “Innovative machine learning method to locate the root cause of the unreliable data coming from intelligent field equipment,” in Abu Dhabi International Petroleum Exhibition & Conference , Abu Dhabi, UAE , November 12, 2019 . Available at: https://onepetro.org/SPEADIP/proceedings/19ADIP/2-19ADIP/Abu%20Dhabi,%20UAE/216802 .doi:10.2118/197270-MS

Weber, C., Tripuramallu, A., Czerner, P., and Fathi, M. (2021). “Clustering wafer defect patterns within the semiconductor industry based on wafer Maps, using an agile unsupervised deep learning approach,” in 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC) [Internet] , Melbourne, Australia ( IEEE ). Available at: https://ieeexplore.ieee.org/document/9658907/ . doi:10.1109/SMC52423.2021.9658907

Xie, L., Xiang, X., Xu, H., Wang, L., Lin, L., and Yin, G. (2021). FFCNN: A deep neural network for surface defect detection of magnetic tile. IEEE Trans. Ind. Electron. 68 (4), 3506–3516. doi:10.1109/tie.2020.2982115

Keywords: smart manufacturing (Industry 4.0), root cause analysis, defect identification, artificial intelligence, deep learning, machine learning

Citation: Papageorgiou K, Theodosiou T, Rapti A, Papageorgiou EI, Dimitriou N, Tzovaras D and Margetis G (2022) A systematic review on machine learning methods for root cause analysis towards zero-defect manufacturing. Front. Manuf. Technol. 2:972712. doi: 10.3389/fmtec.2022.972712

Received: 18 June 2022; Accepted: 14 October 2022; Published: 28 October 2022.

Reviewed by:

Copyright © 2022 Papageorgiou, Theodosiou, Rapti, Papageorgiou, Dimitriou, Tzovaras and Margetis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Elpiniki I. Papageorgiou, [email protected]

This article is part of the Research Topic

Zero Defect Manufacturing in the Era of Industry 4.0 for Achieving Sustainable and Resilient Manufacturing

IMAGES

  1. Root Cause Analysis 101: Insights for Effective Problem Solving in 2023

    case study for root cause analysis

  2. 40+ Effective Root Cause Analysis Templates, Forms & Examples

    case study for root cause analysis

  3. 5 Steps Perform Root Cause Analysis as Part of Problem Solving

    case study for root cause analysis

  4. Infographic: Root Cause Analysis (RCA)

    case study for root cause analysis

  5. How to use 5 Whys Tree Diagram for Root Cause Analysis?

    case study for root cause analysis

  6. Leading an Effective Root Cause Analysis

    case study for root cause analysis

VIDEO

  1. What is "Root Cause Analysis"?

  2. Product Interviews: Root Cause (Analytical/Execution)

  3. A three-day root cause analysis training

  4. Mastering Root Cause Analysis: Practical examples revealed #rca #root cause analysis #problemsolving

  5. To Solve any Problem Look for the Cause

  6. Root Cause Analysis in Hindi #RootcauseanalysisthroughWhyWhyAnalysis

COMMENTS

  1. Root Cause Analysis Examples

    The following root cause analysis example incidents demonstrate how Cause Mapping can be used to document problems and identify solutions in various industries. Select an industry on the left to view its case studies on the right. Each example has a downloadable PDF to accompany the write-up.

  2. Root Cause Analysis with 5 Whys Technique (With Examples)

    Step 1: Identify the Problem. Before diving into a 5 Whys analysis, it's crucial to clearly identify the problem or issue at hand. This step sets the stage for the entire process and ensures that the focus remains on addressing the right concern. Take the time to gather relevant data, observe patterns, and consult with team members or ...

  3. Root Cause Analysis: A Complete Guide With Examples (2023)

    There are three fundamental types of root causes: Environmental root cause. These are causes related to external factors such as moisture levels, weather, or geography. Individual root cause. These are causes related to an individual's behaviour, personal choices, ability, or circumstance. Organisational root cause.

  4. Root Cause Analysis: Methods, Tools, and Case Studies

    At its core, Root Cause Analysis operates on a principle of causality. Every problem or failure has an origin, and RCA's goal is to trace back to that origin. This involves a combination of investigative techniques, data analysis, brainstorming sessions, and sometimes even a bit of intuition. Once the root cause is pinpointed, measures are ...

  5. Challenger Explosion

    Our case studies demonstrate how root cause analysis applies to a variety of problematic scenarios. This study covers the Challenger explosion. (281) 412-7766. [email protected]. ... For decades, conventional root cause analysis has defined a root cause as a special type of cause. Cause and effect naturally splits into multiple causal ...

  6. PDF Root cause analysis (RCA) of accidents

    Case study selection: the who Case studies were selected to cover various: - Industries - Organizational structures ... •Based on root cause analysis Investigational (after accident) Finding II: responsibilities differ Organizations with regulatory oversight function:

  7. Root Cause Analysis: Definition, Examples & Methods

    The first goal of root cause analysis is to discover the root cause of a problem or event. The second goal is to fully understand how to fix, compensate, or learn from any underlying issues within the root cause. The third goal is to apply what we learn from this analysis to systematically prevent future issues or to repeat successes.

  8. Root Cause Analysis: What It Is & How to Perform One

    8 Essential Steps of an Organizational Root Cause Analysis. 1. Identify Performance or Opportunity Gaps. The first step in a root cause analysis is identifying the most important performance or opportunity gaps facing your team, department, or organization. Performance gaps are the ways in which your organization falls short or fails to deliver ...

  9. Case Study: New York City Helicopter Crash

    This root cause analysis example uses information from the NTSB preliminary report. Our cause-and-effect analysis starts with a simple 1-Why Cause Map™ diagram, then builds into a 5-Why, a 15-Why and eventually a more detailed 30-Why Cause Map diagram.

  10. What Is Root Cause Analysis?

    Root cause analysis is a form of quality management, often used in organizational management, quality control, and in healthcare fields like nursing. Root cause analysis can be a helpful study tool for students, too, when used for brainstorming or memorization exercises. Table of contents.

  11. How Much of Root Cause Analysis Translates into Improved Patient Safety

    Root cause analysis (RCA) emerged in the health care field almost 20 years ago. This technique is used worldwide to understand the remote and direct factors favouring the occurrence of an avoidable adverse event (AAE) [ 1 ], and improvement of patient safety [ 2 ]. Three studies have analysed the utility and limitations of this technique [ 3, 4 ...

  12. PDF Root Cause Analysis in Healthcare A Case Study

    ANALYZE. Identify the mistakes, errors, and failures that directly led to the incident or failed to mitigate the consequences (Causal Factors). Effective root cause analysis for Causal Factor to find fixable root causes. Definition of a Root Cause : The absence of a best practice or knowledge that would have prevented the problem or ...

  13. What is Root Cause Analysis (RCA)?

    A root cause is defined as a factor that caused a nonconformance and should be permanently eliminated through process improvement. The root cause is the core issue—the highest-level cause—that sets in motion the entire cause-and-effect reaction that ultimately leads to the problem (s). Root cause analysis (RCA) is defined as a collective ...

  14. Root Cause Analysis Case Studies

    WEB Aruba called Sologic to help facilitate a root cause analysis. The investigation identified previously unknown technology and cultural issues, along with solutions designed not only to prevent the same problem from recurring, but to have a broader positive impact on the business. Read the case study.

  15. Conducting A Root Cause Analysis: Incident To Final Report

    Root cause analysis, also known as RCA, is the investigation process following an incident. The incident may or may not have caused harm to a person or property. Incidents may not have occurred at all, but instead, someone reported a dangerous situation. Either way, a root cause analysis ought to find completion following any form of incident.

  16. What Is a Root Cause Analysis?

    Root cause analysis (RCA) is the quality management process by which an organization searches for the root of a problem, issue or incident after it occurs. Issues and mishaps are inevitable in any organization, even in the best of circumstances. While it could be tempting to simply address symptoms of the problem as they materialize, addressing ...

  17. PDF 60 Empirical Case Studies of the Root Cause Analysis Method in

    Empirical Case Studies of the Root Cause Analysis Method in Information Security Niclas Hellesen, Henrik Miguel Nacarino Torres, and Gaute Wangen Norwegian University of Science and Technology Gjøvik, Norway Email: [email protected], [email protected], [email protected] Abstract Root cause analysis is a methodology that comes

  18. A case study: the benefits and challenges of root cause analysis

    Root cause analysis (RCA) is a method of problem solving used for identifying the root cause or causes of a problem, with the purpose of mitigating the root cause to eliminate or reduce the like... A case study: the benefits and challenges of root cause analysis presented through a real world example from the rail industry: Safety and ...

  19. Root Cause Analysis and Medical Error Prevention

    Root cause analysis (RCA) is a process for identifying the causal factors underlying variations in performance. In the case of medical error, this variation in performance may result in a sentinel event.

  20. A systematic review on machine learning methods for root cause analysis

    This paper implements a literature review protocol and reports the latest advances in Root Cause Analysis (RCA) toward Zero-Defect Manufacturing (ZDM). The most recent works are reported to demonstrate the use of machine learning methodologies for root cause analysis in the manufacturing domain. ... Their case study showed a correlation of ...

  21. Root Cause Analysis (RCA) Tools & Learning Resources

    Root Cause Analysis for Beginners, Part 1 and Part 2 (Webcast) ASQ Fellow Jim Rooney walks you through the basics of root cause analysis. Quality Revolution Reduces Defects, Drives Sales Growth at 3M (PDF) Case study looking at how 3M combined lean Six Sigma and a Top-200 customer focus to improve its belt fabrication processes and reduce ...

  22. Root Cause Analysis Case Study

    Empower your team with powerful root cause analysis to quickly identify, analyze, and resolve system incidents, enhancing efficiency and minimizing impact. Unified Health View Gain a comprehensive overview of your system's health with real-time monitoring data fusion and insight extraction, enabling informed decision-making and proactive ...

  23. AEROSPACE COMPANY USES ROOT CAUSE ANALYSIS TO IMPROVE ...

    CASE STUDY: AEROSPACE COMPANY IMPROVES ENTERPRISE (IT) PROBLEM MANAGEMENT For more information on Sologic's industry-leading RCA training and investigation services and their benefits Visit: www.sologic.com Call: 800-375-0414 Email: [email protected] Sologic provides root cause analysis (RCA) training,

  24. PDF The Importance of Root Cause Analysis During Incident Investigation

    It is important to consider all possible "what," "why," and "how" questions to discover the root cause(s) of an incident. In this case, a root cause analysis may have revealed that the root cause of the spill was a failure to have an effective mechanical integrity program—that includes inspection and repair— that would prevent ...

  25. PDF SPOTLIGHT

    that firms' analysis of the root cause(s) has been helpful in determining the appropriate actions to remedy repeated or persistent criticisms from our inspections. The nature and extent of the root cause process will differ significantly based on a firm's size and structural complexity. Successfully performed RCA may be helpful

  26. Root cause analysis (RCA) is a systematic approach used to identify the

    Root cause analysis (RCA) is a systematic approach used to identify the underlying causes of problems or incidents within an organization. The primary goal of RCA is to prevent the recurrence of issues by addressing their root causes rather than merely treating symptoms. This executive summary provides an overview of RCA, its importance, and key steps involved.

  27. ROOT-CAUSE ANALYSIS AND SAFETY IMPROVEMENT PLAN.docx

    Safety Improvement Plan with Evidence-Based Strategies In this case, the Safety Improvement Plan will aid in reducing the likelihood of future drug mistakes. This must be established following the root cause analysis and aids in the development of many strategies. A proper analysis of root cause with appropriate safety.