AI in Maintenance

Machine Learning for Predictive Maintenance: A Practical Introduction

January 30, 202613 min readDovient Learning

A reliability engineer hears the term "machine learning for predictive maintenance" at a conference and thinks: that sounds expensive, complicated, and like it requires hiring a data science team we cannot afford. They are half right. Machine learning IS the technical engine behind predictive maintenance. But you do not need to be a data scientist to use it, understand it, or make decisions about it. You need to understand what it does, what data it needs, and where it works well versus where it does not.

This article is written for maintenance and reliability people, not data scientists. If you know how a pump works but not how an algorithm works, this is for you. By the end, you will understand enough to have a real conversation with a vendor, evaluate a predictive maintenance proposal, and decide whether your plant is ready for it.

What Machine Learning Actually Does

Machine learning is pattern recognition at scale. That is it. A machine learning model looks at a large amount of data, finds patterns in that data, and uses those patterns to make predictions about new data it has not seen before.

In maintenance terms: you give the model thousands of sensor readings from a motor. Some of those readings came from periods when the motor was healthy. Some came from periods just before the motor failed. The model learns the difference between "healthy pattern" and "about to fail pattern." Then, when it sees new sensor readings from a different motor of the same type, it can say: "This pattern looks like the pre-failure pattern. This motor will probably fail in 2-4 weeks."

That is predictive maintenance. Instead of running a motor until it breaks (reactive maintenance) or replacing parts on a fixed calendar regardless of condition (preventive maintenance), you replace parts when the data says they are actually degrading. You get the equipment life that calendar-based PM leaves on the table, without the unplanned failures that reactive maintenance guarantees.

Supervised vs Unsupervised Learning

There are two main approaches to ML for equipment monitoring. Which one works for your situation depends entirely on whether you have labeled failure data.

Supervised Learning: "I Know What Failure Looks Like"

Supervised learning requires labeled examples. You show the model data from periods when the equipment was healthy and data from periods when the equipment was failing, and you label each period: "healthy" or "failing." The model learns to distinguish between the two.

This works well when:

  • You have a decent number of historical failures (at least 10-20 for the same failure mode)
  • You have sensor data from the period leading up to those failures
  • The failure mode is consistent (bearing wear always looks roughly the same in the data)

A common example: bearing failure on a motor. You have vibration data from 30 motors over 5 years. 8 of those motors had bearing failures. You can label the vibration data from the 6-8 weeks before each failure as "pre-failure" and everything else as "healthy." The supervised model learns what pre-failure vibration patterns look like and applies that knowledge to all 30 motors going forward.

The limitation: you need failures to have happened, and you need data from those failure periods. If a failure mode has only occurred once, or if you did not have sensors installed when it happened, supervised learning does not have enough examples to learn from.

Unsupervised Learning: "I Know What Normal Looks Like"

Unsupervised learning does not require labeled failure examples. Instead, it learns what "normal" looks like and flags anything that deviates from normal. You feed it a long period of data from healthy operation, and it builds a model of the equipment's normal behavior. When the equipment starts behaving differently, the model detects the deviation and raises an alert.

This works well when:

  • You have good data from normal operation (at least a few months)
  • You do not have enough labeled failures for supervised learning
  • You want to catch unknown or unexpected failure modes

The advantage: it can detect problems that have never happened before. If a new failure mode causes the equipment to behave abnormally, the unsupervised model will catch it even though it was never trained on that specific failure.

The limitation: it tells you something is wrong, but not what is wrong. A supervised model says "bearing failure in 3 weeks." An unsupervised model says "this equipment is behaving differently than usual." You still need a human or an additional diagnostic system to figure out what the anomaly means. For connecting that anomaly to an actual diagnosis, see our article on AI-powered repair diagnostics.

ML Model Training Flow for Predictive Maintenance PHASE 1: DATA COLLECTION Sensor Data Vibration, temp, pressure, current Operating Context Load, speed, ambient conditions, product type Maintenance Records Work orders, repairs, failures, parts replaced Failure Labels What failed, when, root cause (if known) PHASE 2: MODEL TRAINING Supervised Path Learns: "This pattern = failure in X days" Needs: 10+ labeled failure examples Unsupervised Path Learns: "This is what normal looks like" Needs: months of healthy operation data PHASE 3: VALIDATION AND DEPLOYMENT Test on held-out data Did it predict known failures correctly? Shadow mode monitoring Run alongside human judgment, compare LIVE: Predictions feed into work order system "Motor M-203 bearing: estimated 3-5 weeks remaining. Schedule replacement during planned shutdown."

Data Requirements

Data is the fuel for ML. Without good data, the best algorithm in the world produces garbage. Here is exactly what you need, broken into "must have," "should have," and "nice to have."

Must Have

  • Sensor data at reasonable frequency. At minimum, you need readings at intervals short enough to capture degradation patterns. For vibration on rotating equipment, that means at least daily readings (hourly or continuous is better). For temperature and pressure, hourly readings are usually sufficient. The readings must be timestamped.
  • Work order history with dates. You need to know when failures occurred so you can correlate them with sensor data. "Bearing replaced on Motor M-203 on 2024-07-15" lets the model look at the sensor data from the weeks before that date and learn the pre-failure pattern.
  • Equipment identification. The sensor data must be linked to a specific piece of equipment. "Vibration reading of 4.2 mm/s from the drive-end bearing of Motor M-203 at 2024-06-20 14:00" is useful. "Vibration reading of 4.2 mm/s" with no equipment tag is not.

Should Have

  • Operating context data. Equipment behaves differently under different conditions. A motor running at 80% load has different vibration than the same motor at 40% load. If you do not account for load, the model might confuse "high load" with "degrading bearing." Process parameters (flow rate, speed, product type, ambient temperature) are the most valuable context variables.
  • Failure mode labels. Knowing that a bearing failed is useful. Knowing that it failed due to lubrication starvation versus overload is more useful. The more specific your failure labels, the more specific the model's predictions.
  • Multiple failure instances. One example of a failure is not enough for supervised learning. You need at least 10-20 instances of the same failure mode across your fleet. If you only have 2-3 instances, use unsupervised learning (anomaly detection) instead.

Nice to Have

  • High-frequency continuous data. Instead of daily or hourly snapshots, continuous streaming data (every second or every minute) gives the model much more to work with. This is especially valuable for fast-developing failure modes.
  • Oil analysis results. Particle counts, viscosity, water content, and wear metal analysis are strong predictors of mechanical degradation. If you do regular oil analysis, include those results.
  • Similar equipment across multiple sites. If you have 50 identical pumps across 3 plants, the model can learn from all 50 instead of just the 5 in your plant. More data means better predictions.
Data Readiness Checklist for Predictive Maintenance MUST HAVE (Required) Sensor data (daily minimum) Work orders with failure dates Equipment-to-sensor mapping 6+ months of history SHOULD HAVE (Recommended) Operating context (load, speed) Failure mode labels 10+ failure instances per mode 2+ years of history NICE TO HAVE (Bonus) Continuous streaming data Oil analysis results Fleet data across sites Infrared thermography data Quick Assessment All 4 "Must Have" items? You can start with anomaly detection today. + "Should Have" items? You can build supervised models for specific failure modes. + "Nice to Have" items? You can build a comprehensive predictive maintenance program.

Common Algorithms Explained Simply

You do not need to build these algorithms. Your vendor or software platform handles that. But understanding what they do helps you ask better questions and evaluate claims.

Random Forest

Imagine 100 experienced technicians, each looking at different aspects of the sensor data. One focuses on vibration trends. Another looks at temperature patterns. A third examines current draw. Each technician makes their own prediction: "I think this motor will fail in the next 30 days" or "I think this motor is fine." The random forest takes a vote. If 78 out of 100 technicians say "failing," the prediction is "failing" with 78% confidence.

This is literally what a random forest does, except with mathematical decision trees instead of technicians. Each tree looks at a random subset of the data, makes a prediction, and the final answer is the majority vote. It works well for structured sensor data and is relatively easy to interpret (you can see which sensor readings contributed most to the prediction).

Best for: Equipment with clear, measurable degradation patterns. Rotating equipment (motors, pumps, fans) where vibration and temperature follow predictable degradation curves.

LSTM (Long Short-Term Memory)

LSTM is a type of neural network designed for time-series data. Where random forest looks at individual snapshots ("what do the sensor readings look like right now?"), LSTM looks at sequences ("what has happened over the last 30 days, and what does that trajectory predict for the next 30 days?").

Think of it as the difference between looking at a single photograph versus watching a video. The photograph tells you the motor is vibrating at 4.2 mm/s right now. The video tells you the motor was vibrating at 2.1 mm/s a month ago, 3.0 mm/s two weeks ago, and 4.2 mm/s today, which is a clear upward trend that will likely reach the alarm threshold in 2-3 weeks.

Best for: Equipment where the trajectory matters more than the current value. Slow-degrading components (bearings, gears, seals) where the rate of change is the strongest predictor.

Anomaly Detection (Isolation Forest, Autoencoders)

Anomaly detection algorithms learn what "normal" looks like and flag anything that deviates. They do not need labeled failure examples. They just need enough normal data to build a baseline.

The analogy: imagine listening to an engine every day for 6 months. You learn its normal sound. One day it sounds different. You cannot necessarily say what is wrong, but you know something has changed. Anomaly detection works the same way, except it is listening to 50 sensor channels simultaneously and can detect changes too subtle for human senses.

Best for: Equipment with limited failure history. New or highly reliable equipment where failures are rare. Also good as a first layer of defense that catches unusual behavior early, before a supervised model can classify the specific failure mode.

Which Algorithm Should You Use?

Short answer: your vendor should recommend the right one based on your data. But here is a decision guide:

  • 10+ labeled failures and structured sensor data: Start with Random Forest
  • 10+ labeled failures and time-series data where trends matter: Use LSTM
  • Fewer than 10 failures or no labeled failures: Use Anomaly Detection
  • You are not sure: Start with Anomaly Detection. It works with the least data and catches the broadest range of problems. Add supervised models for specific failure modes as you accumulate more labeled data.

Getting Started Without a Data Science Team

You do not need to hire data scientists to use predictive maintenance ML. Here is a practical path for a maintenance team that has sensor data and wants to get started.

Step 1: Pick Your Best-Instrumented, Highest-Impact Equipment

Choose 3-5 assets that already have sensors (vibration probes, temperature sensors, pressure transmitters) and have a history of failures that cost real money. Do not start with equipment that rarely fails or that has no sensors. Start where you have the most data and the most to gain.

Step 2: Get Your Data in Order

You need sensor data and work order data in a format that can be analyzed. If your sensor data lives in a historian (PI, OSIsoft, Wonderware) and your work orders are in a CMMS, you need to be able to export both and align them by timestamp and equipment ID. This is often the hardest part. The data exists, but it is in separate systems that do not talk to each other.

Step 3: Start with Anomaly Detection

Anomaly detection is the easiest entry point because it does not require labeled failure data. Point it at 6-12 months of sensor data from your target equipment, let it learn the normal patterns, and start monitoring for deviations. You will get alerts when something changes. Some will be false alarms (process changes, sensor drift). Some will be real early warnings. The maintenance team evaluates each alert and provides feedback: "real issue" or "false alarm." This feedback improves the model.

Step 4: Build Supervised Models Where You Have Data

As you accumulate labeled failure data (both from new failures and from historical data that you have retroactively labeled), build supervised models for your most important failure modes. Bearing failures on critical motors, seal leaks on reactor pumps, whatever costs you the most money. These models will be more specific and accurate than anomaly detection for those particular failure modes.

Step 5: Close the Loop with Your CMMS

The final step is connecting model predictions to your work order system. When the model predicts a bearing failure in 3-5 weeks, it should automatically generate a work order (or at least a notification) in your CMMS so the planner can schedule the replacement during the next planned downtime. Without this connection, predictions just sit in a dashboard that nobody checks. For more on how AI can automate the work order side, see our article on AI-enhanced work order management.

What Predictive Maintenance Does Not Do

Honest expectations matter more than hype. Here is what ML-based predictive maintenance cannot do:

  • It does not predict sudden failures. If a foreign object enters a pump and destroys the impeller in 5 seconds, no model is going to predict that 2 weeks in advance. ML predicts gradual degradation, not sudden catastrophic events.
  • It does not work without sensors. If you have no sensors on the equipment, you have no data for the model. You cannot predict bearing failure from work order text alone. You need physical measurements.
  • It does not replace all other maintenance. Calendar-based PMs for things like lubrication, filter changes, and inspections still make sense. You do not need to predict when oil needs changing. You change it on a schedule. ML is for failure modes that benefit from condition-based decision-making.
  • It is not 100% accurate. Even the best models miss failures and generate false alarms. A good predictive maintenance program catches 70-85% of predictable failures and has a false alarm rate low enough that people still pay attention to the alerts. Expecting perfection is unrealistic. Expecting significant improvement over calendar-based PM is realistic.

The Business Case

The financial case for predictive maintenance is simple: you avoid unplanned downtime, which is 5-10x more expensive than planned downtime, while also avoiding unnecessary PM work on equipment that is still in good condition.

Maintenance Strategy Typical Cost Profile
Reactive (run to failure) Lowest routine cost, highest emergency repair cost. Unpredictable downtime. Secondary damage common.
Preventive (calendar-based) Moderate routine cost. Replaces parts with remaining life. Still gets some unplanned failures.
Predictive (condition-based) Higher monitoring cost. Lower part replacement cost (use full life). Lowest emergency repair cost. Planned downtime only.

A typical manufacturing plant spending $2 million per year on maintenance can expect to save 10-25% by adding predictive maintenance to their most critical equipment. That is $200,000-$500,000 per year, primarily from avoiding unplanned downtime and eliminating unnecessary PM tasks. The ROI timeline is typically 6-12 months.

Where Dovient Fits

Dovient's platform connects sensor data, CMMS records, and AI models into a single system. The semantic search engine reads your maintenance history to label failure events. The ML layer builds anomaly detection and supervised models from your sensor data. And the work order system turns predictions into planned maintenance actions.

You do not need a data science team. The platform handles model training, validation, and deployment. Your reliability engineers focus on reviewing predictions and making maintenance decisions, not on coding algorithms.

To evaluate whether your plant's data is ready for predictive maintenance, schedule a conversation with our team.


Related Articles