Home / Blog / Data Science / Predictive Maintenance- Manufacturing Analytics

Predictive Maintenance- Manufacturing Analytics

September 23, 2024
92

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Problem Description

The high expenses associated with production delays brought on by mechanical issues are significant issues for companies in asset-intensive industries like manufacturing. The majority of these organizations are interested in foreseeing these issues to proactively avert the issues before they arise, which will lessen the expensive impact brought on by downtime.

This article aims to outline the procedures for putting a predictive model into action for a scenario that is built on the synthesis of several actual business challenges. This example combines common data components seen in numerous predictive maintenance use cases, and the data was produced using data simulation techniques.

The main aim of the experiment is to find the probability that a machine would fail shortly due to the failure of a given component. A machine learning technique is used to develop a predictive model that learns from previous data gathered from machines. The challenge is written as a multi-class classification problem. The steps of putting such a model into practice—feature engineering, label design, training, and evaluation—are covered in the sections that follow.

The following are typical data sources for predictive maintenance issues:

• Failure histories: The failure histories of a machine or a machine component.

• Maintenance history: The history of repairs made to a machine, such as error codes, prior maintenance tasks, or component replacements.

• Machine usage and conditions: A machine's operational circumstances, such as information gathered through sensors.

• Machine features: A machine's attributes, such as the location, manufacture, and model of the engine.

• Operator features: The characteristics of the operator, such as gender and previous employment The information used in this example was gathered from four different sources: real-time telemetry data from the machines, error messages, previous maintenance records that include failures, and machine details like kind and age.

TelemetryThe first data source is the telemetry time-series data, which comprises real-time measurements of voltage, rotation, pressure, and vibration taken by 100 machines and averaged throughout each hour in 2015. The dataset's first 10 records are shown here. There is also a summary of the entire dataset available.

Errors:

The error logs are the second important data source. These are non-breaking errors that are made while the machine is still in use, thus they are not failures. Since the telemetry data is gathered at an hourly pace, the incorrect dates and times are rounded to the nearest hour.

Maintenance:

These are the records for both scheduled and unscheduled maintenance that relate to both routine component inspections and failures. If a component has to be replaced as a result of a breakdown or during the scheduled inspection, a record is created. Failures are the records that are produced as a result of breakdowns; this term is defined in more detail in the following sections. Data on maintenance includes recordings from 2014 and 2015.

Machines

The model type and age of the machines are some of the details included in this data collection (years in service).

Failures

These are the records of failed component replacements. Date, time, machine ID, and failed component type are all included in each entry.

Feature Engineering:

Feature engineering, the initial step in predictive maintenance applications, entails combining the data from many sources to develop features that best capture a machine's state of health at a specific moment in time. Several feature engineering techniques are employed in the following sections to produce features based on the characteristics of each data source.

Lag Features from Telemetry

Telemetry data is ideal for estimating lagged characteristics because time stamps are usually always included. To reflect the short-term history of the telemetry over the lag window, it is typical practise to select a window size for the lag features to be formed and compute rolling aggregate measures like mean, standard deviation, minimum, and maximum. The following calculates the rolling mean and standard deviation of the telemetry data for the most recent 3-hour lag window every 3-hours.

24 hour lag characteristics are also calculated as follows in order to capture a longer-lasting influence.

Lag Features from Errors

Errors have timestamps, just as telemetry data. One significant distinction is that, unlike telemetry measures, error IDs are categorical values and should not be averaged over time intervals. Instead, we track the amount of errors in a lagged timeframe for each category of error. We start by reformatting the error data so that each machine has a single entry for each time when at least one fault occurred:

Days Since Last Replacement from Maintenance

The maintenance records, which include information on component replacement records, are a key data set in this case. The number of replacements for each component in the last three months to incorporate replacement frequency is just one example of a feature that might be derived from this data set. Calculating when a component was last replaced would be more relevant information, as that would be expected to correspond better with component failures since the longer a component is utilised, the greater degradation should be anticipated.

Note that unlike telemetry and faults, it is more difficult to create lagging features from maintenance data, thus these features are produced in a more specialised manner. Given the importance of domain knowledge in comprehending the predictors of a problem, this kind of ad-hoc feature engineering is particularly prevalent in predictive maintenance. The days since the last component replacement are computed for each component type as features from the maintenance data in the sections that follow.

Machine Features

One need not make any additional modifications to use the machine's functions. These offer detailed information on each machine's type and age (number of years in service). A transformation would have been required to convert the age information from "initial use dates" for each machine to numeric values representing the years in service.

Label Construction

When utilizing multi-class classification to anticipate asset failure as a result of an issue, labeling is accomplished by selecting a time window before asset failure and designating all feature records that fall into that window as "about to fail due to a problem" and all other data as "normal." This window should be chosen based on the business case: in some circumstances, being able to forecast failures hours in advance may be sufficient, whilst in others, allowing for things like the arrival of new components, days or weeks may be required.

In this example scenario, the prediction task is to determine the likelihood that a machine would break down soon as a result of the failure of a specific component. Calculating the likelihood that a machine would fail in the following 24 hours as a result of a given component failure is the precise objective (components 1, 2, 3, or 4). The label is created as a categorical failure feature below. Records that are within a 24-hour frame of a comp4\nent failure have failure=comp1, and records that are not within a 24-hour window of a component failure have failure=none.

In this hypothetical example, the aim of prediction is to ascertain the probability that a machine would soon malfunction as a result of the failure of a certain component.

The records in the sample below have the failure=comp4 label in the failure column. Take note that the first eight records all take place within the 24-hour interval prior to the first component 4 failure to be recorded. The following 8 records are recorded within the next 24 hours before component 4 fails again.

Modelling

Either Azure Machine Learning Studio or this notebook can be used to generate a predictive model after the feature engineering and labelling processes. The Predictive Maintenance Modelling Guide Experiment is the suggested Azure Machine Learning Studio experiment and can be found in the Cortana Intelligence Gallery. Here, we'll outline the modelling procedure and offer a sample Python model.

Training, Validation and Testing

Record splitting into training, validation, and test sets should be done carefully when working with time-stamped data, as in this example, to avoid overestimating the performance of the models. In predictive maintenance, features are typically created using lagged aggregates since records from the same time window are likely to have comparable feature values and identical labels. When predicting on a test set record that shares its time window with a training set record, these correlations may provide a model a "unfair edge." To reduce the amount of time intervals shared by training, validation, and test sets, we consequently divide records into these groups in large chunks.

Future chronological trends are unknown to predictive models in advance, but in reality, they are likely to occur and have a negative impact on the model's accuracy. We advise training on more dated records and validating/testing on more recent records in order to achieve an accurate evaluation of a predictive model's performance.

A time-dependent record splitting approach is a great option for predictive maintenance models because of both of these factors. All records prior to the timepoint are used for training the model, and all records after are used for testing. This split is accomplished by selecting a timepoint depending on the required sizes of the training and test sets. (The timeline could be further segmented, if needed, to produce validation sets for parameter choice.) We eliminate any records at the boundary, in this case by disregarding 24 hours' worth of data anterior to the timepoint, to prevent any records in the training set from sharing time frames with the records in the test set.

Evaluation

Machine failures are often uncommon during the course of an asset's lifetime when compared to normal functioning in predictive maintenance. This leads to an imbalance in the label distribution, which typically results in subpar performance because algorithms have a tendency to classify examples from the majority class more accurately at the expense of examples from the minority class because the overall misclassification error is greatly reduced when the majority class is correctly labelled. Even while accuracy may be good, this leads to low recall rates, which worsens when false alarms cost the organisation a lot of money. In order to address this issue, more advanced techniques—not discussed in this notebook—along with sample strategies like oversampling minority cases are typically used.

Additionally, due to the issue of class imbalance, it is crucial to include evaluation criteria other than accuracy alone and compare those metrics to the baseline metrics that are calculated when using random chance rather than a machine learning model to create predictions. The comparison will further highlight the advantages and worth of utilising a machine learning model. In the sections that follow, we employ an evaluation function that generates a variety of crucial evaluation metrics as well as standard metrics for categorization issues. Please refer to the scikit-learn documentation and a related blog article for a further explanation of the metrics (with examples in R, not Python).