Failure Modes and Effects Analysis in Operational management

Failure Modes and Effects Analysis (FMEA) is a procedure executed in operations management for analyzing the potential failure modes within a specific system and their severity or determining the effect of such failures on the system (Effects Analysis). Failure modes are the errors or defects – actual or potential – that occur in a design, process, or an item which affect the customers. Using FMEA, you can also analyze how to detect these errors, and their frequency.

FMEA also documents actions taken with regard to failure risks, enabling continuous improvement. The Failure Modes and Effects Analysis is ideally used during the process design stage and continues throughout the product cycle.

FMEA is aimed to take remedial actions to reduce or eliminate failures according to priority and is also used for evaluation of risk management priorities.

In case there are possible failures in a design, then an engineer can think of alternate ways to develop the product. These actions will ensure the product meets the set standards. FMEA has a simple tool for prioritizing the risks.

It is employed in many quality systems like ISO/TS 16949, QS-9000, etc.

The process of FMEA is categorized into three phases with definition of actions for each. First, you have to do some pre-work, so that the FMEA analysis includes past history and robustness. You can use boundary diagrams, parameter diagrams, and interface matrices for obtaining robustness analysis. The causes of many failures are shared interfaces with other parts/systems, and noise factors.

Process Steps

FMEA begins with describing the system and its function. The engineer has to consider both intentional and unintentional uses of the system. FMEA always includes a block diagram, so the next step is to create one, which outlines the process steps, called logical relations. FMEA can be developed around them. A coding system for identifying different system elements can also be created.

Step 1 – Severity

In this step, all failure modes associated with product functioning and their effects are determined. Corrosion, deformation, and electrical short-circuits are some examples of failure modes. Since failure modes lead to a chain process, it is necessary to list each failure mode according to its function and in technical terms. Then the failure effect for each failure node is analyzed. Noise, injury to user, and degraded performance are some examples of failure effects.

A Severity Number (S) is assigned to each failure effect ranging from 1 (no danger) to 10 (critical). This helps in prioritization of failure modes and effects. If a failure effect has a severity rating of 9 or 10, then it would cause injury to the user or end up in litigation. Such failure modes are immediately eliminated, and the design is changed.

Step 2 – Occurrence

In this step, the causes of each failure and its frequency are identified and documented in technical terms. Earlier documentation for similar processes would be of value here. Examples of failure causes are excessive voltage, improper operating conditions, erroneous algorithms, etc. Failure causes are weaknesses in design. An Occurrence Ranking (O) is assigned to each failure mode, ranging from 1 to 10. It can also be expressed in %.

Non-safety failure modes have ranking >4, if their Severity Number are 9 or 10, they have ranking >1. All these cases demand determination of action. Occurrence Ranking is based on the product and specifications of customer. This step is known as the detailed development category of FMEA.

Step 3 – Deduction

In this step, the actions determined are tested for their efficiency. Design verification is done, and proper inspection methods are chosen. To do this, the engineer should look at the current system controls and assess their ability to prevent failure modes or detect them before they move over to the customer. The testing, monitoring, analysis, and other techniques used in similar system controls to detect failures can be identified.

A Detection Number (D) is assigned to the inspections or planned tests according to their ability to detect or prevent failures. D measures the extent of risk of a failure escaping detection. Thus, a higher ranking of D indicates the chance of failure detection is low.

After the above steps, Risk Priority Numbers (RPN) are computed. Once the S, O, and D are ranked, then RPN is arrived at by multiplying the three numbers:

RPN = S x O x D.

The RPN pinpoints the areas that are of greatest concern. Risk Priority Numbers should be done for the entire life cycle, so that failure modes with highest RPN are set right. There are some failure modes which may be less in severity, but greater in frequency and less detectable. After allocation of values, the next step is to note the recommended actions with targets and dates of implementation.

These actions may include testing, specific inspection, quality procedures, redesign, limiting environmental stresses, etc. After implementation, it is necessary to check the new RPN for confirming the improvements. The tests can be noted in a graphical form. In a word, in the face of failure modes, you have to first eliminate the failure mode, then minimize its severity. Then take steps to reduce its frequency and increase its detection possibility.

The following are different types of FMEA:

Design: products analysis prior to production
Process: analysis of assembly and manufacturing processes
Equipment: analysis of equipment and machinery design prior to purchase
Concept: analysis of subsystems or systems in the initial stages of design concept
Software: analysis of software functions
Service: analysis of service industry processes before they reach the customer
System: analysis of global system functions

The FMEA should be updated whenever:

There is a new cycle (product/process)
A change has been made to the design.
A change is being made in the operating conditions.
A new regulation comes into force.
A problem is indicated in the customer feedback.

The following are uses of FMEA:

Development of a system that minimizes the event of failures.
Development of test systems and design methods which ensure that failures are eliminated.
Evaluation of customer requirements so that they do not pave way for potential failures.
Identification of design characteristics which create failures, and to eliminate or minimize them.
Identification and correction of potential risks in design, so they can be avoided in future projects.
Ensure that failure events will not seriously affect a system or the customer.
Production of world-class quality products.

The following are the advantages of using FMEA:

Quality, safety, and reliability of the product/process are improved.
Competitiveness and the image of the organization is enhanced.
Customer satisfaction is greatly improved.
The duration and cost of system development is reduced.
Information and engineering know-how collected by using FMEA can be used to reduce future failures.
Warranty issues are reduced.
Potential failure modes are identified at the initial stage and eliminated. Their future possibility is also reduced.
Late changes to product/process and associated costs are minimized.
It serves as a catalyst for teamwork and encourages exchange of ideas between functions.

The use of FMEA is limited by the experience of committee members regarding previous failures. In the event they cannot identify the failure mode, then they have to approach consultants to solve this. In FMEA, documentation is an important factor for implementation.

If FMEA is used as a top-down tool, it may identify only major failure modes in a system, when compared to Fault Tree Analysis (FTA).

It can complement FTA when used as a bottom-up tool. FMEA does not have the capacity to identify complex failure modes which involve multiple failures within a subsystem. It also cannot predict failure intervals of a particular failure mode up the upper-level system/subsystem.

There is also a risk that the multiplication of S, O, and D rankings may end up in rank reversals, so that a less severe failure mode is shown as a more severe one. The reason is that the rankings are numbers based on ordinal scale, which only compare one ranking to another, but does not measure the degree of difference.

Many specific national and international standards require that FMEA should be used in the evaluation of product integrity.