AI in Maintenance

AI-Generated FMEA: Faster, More Complete Failure Analysis

January 31, 202610 min readDovient Learning

A reliability team sits in a conference room with a whiteboard, a stack of OEM manuals, and a spreadsheet with 200 rows. They are building an FMEA for a critical compressor. The facilitator asks: "What are the failure modes for the thrust bearing?" The senior mechanic lists three from memory. The engineer adds one from the manual. Nobody mentions the fourth failure mode that caused a $140,000 unplanned shutdown two years ago, because the mechanic who was on shift that day has since retired and the repair log just says "replaced thrust bearing assembly."

This is the central weakness of manual FMEA. It depends on who is in the room, what they remember, and what the facilitator thinks to ask. AI-generated FMEA flips this: it starts from your actual maintenance history, your failure data, your repair logs, and builds the failure mode list from what has actually happened, not from what people remember happening. This article explains how it works, when to trust it, and when to override it.

What Is FMEA?

FMEA stands for Failure Mode and Effects Analysis. It is a structured method for identifying everything that can go wrong with a piece of equipment, ranking those failure modes by severity and likelihood, and deciding what to do about each one. A completed FMEA tells you: what can fail, how it fails, what happens when it fails, how bad the consequences are, how likely it is, and what actions you should take to prevent or detect it.

FMEA is the foundation of a good PM program. Without it, PMs are based on OEM recommendations (which are often conservative), calendar-based schedules (which ignore actual equipment condition), or gut feel (which varies by person). With a proper FMEA, every PM task is tied to a specific failure mode, and every failure mode has a documented prevention or detection strategy. For context on how to use failure analysis more broadly, see our guide on root cause analysis.

The Problem with Manual FMEA

Manual FMEA is time-consuming, expertise-dependent, and incomplete. Here is why:

  • Time. A thorough FMEA on a complex asset (compressor, turbine, packaging line) takes 20-40 hours of workshop time with 4-6 people. That is 80-240 person-hours per asset. A plant with 50 critical assets needs 4,000-12,000 person-hours of FMEA work. Most plants never complete FMEAs for all their critical equipment because the time cost is prohibitive.
  • Expertise dependency. The quality of the FMEA depends entirely on who is in the room. If the experienced vibration analyst is absent, vibration-related failure modes get missed. If nobody in the room has worked on this specific equipment model, the FMEA is generic instead of plant-specific.
  • Memory gaps. People forget. A failure that happened 4 years ago may be the most important one to include, but if nobody in the room remembers it, it does not make the list. Repair logs might document it, but nobody reads through 5 years of repair logs during an FMEA workshop.
  • Inconsistency. Two FMEA workshops on identical equipment, facilitated by different people, will produce different results. Risk Priority Numbers (RPNs) vary widely depending on how the facilitator calibrates severity, occurrence, and detection scores. What one team calls "severity 7" another calls "severity 5."
  • Static output. A manual FMEA is a document that is finished on a specific date and then rarely updated. Six months later, new failure modes have occurred, equipment has been modified, and the FMEA is already out of date. Nobody schedules a follow-up workshop to update it because the original one was so painful.

How AI Generates FMEA

AI-generated FMEA does not replace the reliability team. It gives them a 90%-complete draft to review instead of a blank spreadsheet to fill. The AI does the data mining and pattern recognition. The humans do the judgment and validation.

AI FMEA Generation Pipeline DATA SOURCES Work Orders 5+ years history Repair Logs Detailed notes OEM Manuals Known failure modes Condition Data Vibration, temp, oil Industry Data Generic failure rates Step 1: Failure Mode Extraction AI reads all records and identifies distinct failure modes with evidence Step 2: Frequency and Severity Analysis Calculates occurrence rates, downtime impact, cost per failure from data Step 3: RPN Scoring Data-driven severity, occurrence, detection scores with justification Step 4: Recommended Actions PM tasks, CBM strategies, redesign suggestions based on what worked before Draft FMEA with source citations for team review

The process follows four stages:

Stage 1: Data Mining

The AI ingests all available data for the target equipment: work order history, repair logs, condition monitoring data, OEM manuals, and any tribal knowledge entries. It reads every record, extracts failure events, and clusters them into distinct failure modes. "Thrust bearing failure" might appear as "replaced thrust bearing," "high axial vibration leading to bearing replacement," "brg failure on thrust side," and "compressor tripped on high temp, found thrust brg damaged." The AI recognizes these as instances of the same failure mode, even though the language varies.

This is where AI has a fundamental advantage over humans: it reads everything. Every work order, every repair log, every note. A human in a workshop draws on memory and whatever documents they brought. The AI draws on the complete history. That failure from 4 years ago that nobody remembers? The AI finds it.

Stage 2: Frequency and Severity Calculation

For each identified failure mode, the AI calculates actual occurrence frequency from the data. Thrust bearing failure on this compressor: 3 occurrences in 7 years, MTBF of 28 months. Mechanical seal leak: 5 occurrences in 7 years, MTBF of 17 months. These are not estimates. They are calculated from your actual failure data.

Severity is estimated from the documented consequences: downtime duration, repair cost, production loss, safety incidents. If the thrust bearing failure caused a 72-hour shutdown and cost $140,000, the AI uses that data to score severity. If the mechanical seal leak was a 4-hour repair at $3,200, the severity score reflects that difference.

Stage 3: RPN Scoring

Risk Priority Numbers in traditional FMEA are subjective. One facilitator scores severity as 7, another scores it as 5. The AI generates RPN scores based on data, with transparent reasoning. "Severity: 8. Based on 3 occurrences averaging 68 hours downtime and $135,000 total cost. Occurrence: 4. Based on MTBF of 28 months. Detection: 6. Current vibration monitoring detected 1 of 3 failures before catastrophic damage."

Every score comes with a justification. The reliability team can see exactly why the AI assigned each number and decide whether to agree or adjust. This is fundamentally different from a workshop where someone says "I think that is a 7" and nobody can explain why.

Stage 4: Recommended Actions

Based on the failure mode, its RPN, and what has worked in the past, the AI recommends prevention and detection strategies. If vibration monitoring detected 1 of 3 thrust bearing failures, the AI might recommend: "Increase vibration monitoring frequency from monthly to weekly. Add axial displacement measurement. Historical data suggests onset-to-failure window of 3-6 weeks, which monthly monitoring can miss."

The recommendations are grounded in your plant's actual experience, not generic textbook advice. If your team has tried a particular PM approach and it did not prevent failures, the AI knows that and suggests alternatives.

AI vs Manual FMEA

Manual FMEA vs AI-Generated FMEA Dimension Manual FMEA AI-Generated FMEA Time to complete 20-40 hours workshop + weeks of scheduling 2-4 hours AI generation + 4-6 hours team review Completeness Limited by who is in the room Misses forgotten failures Reads complete history Finds failures humans forget RPN consistency Varies by facilitator Same failure, different scores Data-driven, reproducible Every score has justification Updates Rarely updated after initial Stale within 6 months Auto-updates with new data Always reflects current state Cost per asset $4,000-$12,000 in labor (4-6 people x 20-40 hours) $500-$1,500 in review time (2-3 people x 4-6 hours) Plant coverage 10-20% of assets (limited by time) Only most critical equipment 80-100% of assets feasible Scale limited by review capacity AI does the data mining. Humans do the judgment. Both are required for a good FMEA.

The comparison comes down to this: manual FMEA is better at capturing knowledge that is not in the data (design intent, operating philosophy, future plans). AI FMEA is better at finding everything that IS in the data and presenting it consistently. The best approach uses both: AI generates the draft, and the human team reviews it, adding their expertise and judgment where the data is insufficient.

The Validation Process

An AI-generated FMEA should never go straight into production without human review. The AI is doing data analysis, not engineering judgment. Here is how the validation process works:

Step 1: Review Failure Mode List

The reliability team checks: Did the AI find all the failure modes we know about? Did it identify any we missed? Are any of the listed failure modes actually the same thing described differently (duplicates that the AI did not merge)? This step usually takes 1-2 hours and almost always uncovers at least one failure mode that the team had forgotten about.

Step 2: Validate RPN Scores

Each RPN score comes with a data-driven justification. The team reviews these justifications and adjusts where their expertise says the data is misleading. For example, the AI might score a failure mode as low severity because the historical failures were caught early. But the team knows that if that failure mode goes undetected, the consequences are catastrophic. They override the severity score upward and document why.

Step 3: Review Recommended Actions

The AI recommends PM tasks and detection strategies based on what has worked in the past. The team reviews these for practicality: Can we actually perform weekly vibration monitoring on this equipment? Do we have the instruments? Is the equipment accessible during operation? They adjust recommendations to fit the plant's actual capabilities and resources.

Step 4: Add Context the Data Does Not Contain

This is the step where human expertise is irreplaceable. The AI does not know that this compressor is scheduled for replacement in 18 months, so spending $50,000 on a redesign makes no sense. It does not know that the operations team is planning to increase throughput by 15% next quarter, which will change the loading on the thrust bearing. The team adds this context and adjusts the FMEA accordingly.

When to Trust the AI

The AI-generated FMEA is most trustworthy when:

  • There is plenty of data. Equipment that has been in service for 5+ years with good work order records gives the AI enough history to identify failure patterns accurately. The more data, the more complete the failure mode list.
  • The failure modes are well-documented. If your plant has detailed repair logs and consistent failure coding, the AI's failure mode extraction is highly accurate. It can distinguish between "bearing failure due to lubrication starvation" and "bearing failure due to overload" because the repair logs describe the root cause.
  • The question is "what has happened." The AI is excellent at answering "what failure modes has this equipment experienced?" It is less good at answering "what failure modes could this equipment experience that have never happened before?" For that, you need engineering judgment.
  • RPN trends matter more than absolutes. The AI's RPN scores are most useful for ranking failure modes relative to each other (which ones are higher risk). The absolute numbers are less meaningful because scoring calibration varies. Use the AI's rankings to prioritize, but calibrate the absolute scores to your plant's risk tolerance.

When to Override the AI

Override the AI's output in these situations:

  • Low-frequency, high-consequence failures. If a failure mode has only happened once in 10 years but caused a $2 million loss, the AI might underweight its occurrence because the sample size is small. Your engineering judgment says this failure is a real risk, and you should score it accordingly.
  • Changed operating conditions. The AI builds its analysis from historical data. If operating conditions have changed (new product, higher throughput, different raw material), the historical failure patterns may not apply. You need to adjust the FMEA for the current reality.
  • Design or modification knowledge. The AI does not know about design weaknesses that have not yet caused a failure. If you know from industry experience or engineering analysis that a specific component is undersized, add that failure mode even if the data does not show it.
  • Regulatory and safety requirements. The AI scores based on data. Regulatory requirements may demand specific prevention strategies regardless of what the data says. If your process safety management program requires annual inspection of a pressure vessel, that goes in the FMEA even if the data suggests a 5-year inspection interval would be sufficient.

Keeping the FMEA Current

The biggest advantage of AI-generated FMEA over manual FMEA is that it stays current. Every new work order, every new failure event, every new repair log feeds back into the analysis. The AI can flag when a failure mode's occurrence rate has changed: "Mechanical seal failure rate on P-107 has increased from once every 18 months to once every 8 months over the past 2 years. Current FMEA occurrence score of 4 may need to be revised to 6."

This kind of automatic monitoring is impossible with manual FMEA. A manual FMEA gets reviewed once a year if you are disciplined, once every 3 years if you are typical, and never if you are honest. AI-generated FMEA sends you alerts when something changes, so you can update the analysis when it matters instead of on an arbitrary calendar. For maintenance teams looking to understand the broader impact of these efficiency metrics, see our guide on OEE.

Getting Started

You do not need perfect data to start with AI-generated FMEA. Here is a practical path:

  1. Pick one critical asset. Choose something with at least 3-5 years of work order history and a few known failure modes. A pump, compressor, or gearbox that your team knows well is ideal.
  2. Generate the AI draft. Feed the historical data into the system and generate the initial FMEA. This takes minutes, not weeks.
  3. Review with your team. Schedule a 2-4 hour session (not 20-40 hours) to review the AI's output. Focus on: What did the AI miss? What did the AI find that we forgot? Are the RPN scores reasonable?
  4. Finalize and deploy. Make your adjustments, finalize the FMEA, and use it to update your PM program. Link the recommended actions to actual PM work orders in your CMMS.
  5. Expand. Once you have validated the process on one asset, expand to more. Because the AI does the heavy lifting, you can do 5-10 FMEAs per month instead of 1-2 per year.

Try it yourself with Dovient's FMEA Generator tool, which creates a structured FMEA from equipment data and failure history. It is a good way to see what AI-generated FMEA looks like before committing to a full implementation.

Where Dovient Fits

Dovient's MissingDots engine includes FMEA generation as part of its reliability analytics. It reads your work order history and maintenance records using semantic search, identifies failure modes, calculates data-driven RPN scores, and produces a draft FMEA with full source citations. Every failure mode links back to the specific work orders and repair logs that evidence it.

The output integrates with your existing reliability workflow. You review the draft, add your engineering judgment, and finalize an FMEA that is both data-complete and expert-validated. As new maintenance data comes in, the system flags when the analysis may need updating.

To see how AI-generated FMEA works with your equipment data, schedule a conversation with our team.


Related Articles