4 sets of best practices to use Multivariate Anomaly Detector when monitoring your equipment

Microsoft

Jun 10, 2022

Multivariate Anomaly Detector overview

The multivariate anomaly detection feature from Cognitive Services is designed to solve one of the most important and difficult problems in the industry – predictive maintenance, by enabling developers to easily integrate advanced AI of anomaly detection into their applications with little machine learning knowledge, and without labeled data. In addition to considering abnormal behaviors observed for individual sensors separately, this multivariate anomaly detection feature also takes dependencies and inter-correlations among different signals into account. Thus, you will be empowered to protect your mission-critical systems and physical assets, such as machines in the production line, spacecraft, equipment on oil rigs from failures with a holistic view.
Imagine there are 20 sensors from an automotive engine generating 20 different signals, e.g., vibration, temperature, etc. The readings of those signals individually may not tell much on system-level risks, but when viewed together, these signals can represent the health of the engine. When the synergy of those signals turns odd, the multivariate anomaly detection feature can sense the anomaly like a seasoned floor expert. Moreover, the AI models are trained and customized based on your data such that it understands your business well.

Best practices to get more accurate results with minimal noises

Over the past few months, we have gathered from existing customers some best practices which have allowed them to achieve even more accurate anomaly detection results with less noise. To help you get the most out of the Multivariate Anomaly Detector (MVAD) feature, we recommend that you refer to the common best practices we summarized in four pillars: data preparation, tuning, post-processing, and retraining.

Data preparation

Data volume and quality

When training a model, try to use a dataset with abnormal data points ratio below 1%. This assumes that your datasets include labels of known down time or maintenance time and which should not be used for model training.
In general, more data points will help enhance model performance, but keep in mind that data freshness is also important as well as data volume, find the right balance between those factors.

Data pre-processing

Exclude data when equipment/sensors are off or out-of-service.
Exclude data before or after equipment/sensors restart. There usually will be irregular fluctuations right after a piece of equipment or a sensor restarts so including these data for model training may negatively impact the model’s performance.
Only numerical data is acceptable for Multivariate Anomaly Detector, but if there's categorical data in your dataset, we recommend you transform them into numerical values. For instance, the “on” and “off” of equipment status could be transformed into “0” and “1”, which could help the model learn better about the correlations between different signals.
Timestamps should be properly aligned for all your variables through aggregation or re-indexing.
- For example, your sensors record readings every minute but may not always at the exact same second, then you should align them to the same minute (see below).

- In cases where your sensors’ readings come at different frequencies, some users prefer to convert time series data with different frequencies into the same frequency. For example, if some of your sensors record data every 5 minutes and others record every 10 minutes, then you can aggregate all data to 10-min intervals by taking the sum/mean/min/max etc. over a 10-min span for sensors that originally have 5-min frequency. Alternatively, MVAD will also help you fill in the missing data points when you joined variables with different granularity, you could specify the fill-in method including linear, previous, subsequent, zero, or fixed number.

Feature engineering (optional)

Some of our users have developed more advanced ways to generate features to achieve the best possible outcome. These advanced methods are highly dependent on your business context and may require rounds of experimentation to identify the optimal feature set. We encourage you to consult a subject matter expert (SME) or data scientist in your organization for recommendations. Please note that these advanced approaches are not required to obtain accurate model predictions, and we have seen success in users who did not apply these advanced pre-processing steps.

Value aggregation for a specific sensor: for certain types of sensors, sometimes their pattern over a period of time is more meaningful than their raw readings when predicting system-level failures. You can consider creating a new variable that represents the trend of a given sensor (e.g., sum/mean/min/max) and use this calculated variable as the feature for model training to get the best possible outcomes (see example below).
Customized formulas: There are other more advanced ways to generate features (e.g., sensor 1 value divided by sensor 2 value), but these formulas will be more business- and scenario-dependent and require experimentation to identify what could be the best feature set.

Feature generation: Sometimes you might not have enough variables to help you learn the pattern of your scenario, you could generate more variables by leveraging packages like ‘tsfresh’ in python.

Tuning

The screenshot above is a sample output from our model. Use ‘isAnomaly’ first to find the anomalous timestamps, then use ‘score’ or ‘severity’ to sift out anomalies that are not that severe for your business.

This filter allows you to find the right balance between true anomalies and false alerts that make sense in your scenario. Note: severity and score are from the same algorithm origin, but severity is normalized into a number between 0 and 1. We suggest that you leverage severity first for filtering, and if you have advanced need to aggregate anomalies or calculate between anomalies, you could use ‘score’. How to set the right threshold for such filters varies from business to business. Consider using some key metrics like number of false positives per month, true positive rate (i.e., recall), average forewarning time, etc. to identify the optimal threshold.

Post-processing of inference

There are some good practices you can apply to further refine the inference results based on business-specific knowledge that is beyond what MVAD can learn from training data. For example,

If you have known information and logic to determine when the equipment is down or under maintenance, you can set up a rule to suppress the anomalies identified during that period.
If your equipment is coming out of a recent restart, try to suppress the anomalies in inference results within X number of hours after restart as the signals could take time to stabilize.
If your SME or business process has preferences like “I don’t want to get repeated anomalies within X number of hours”, you can correlate the anomalies identified within the window of X hours to one incident so no repeated alerts will be triggered.

Retraining

When to retrain: Data drift is one of the top reasons why model accuracy degrades over time. Consider retraining your models when there is a known data drift in your inference dataset comparing to your original training dataset.
Common causes of data drift:
Upstream process changes, such as a sensor being replaced that changes the units of measurement from inches to centimeters.
Data quality issues, such as a broken sensor always reading 0.
Natural drift in the data, such as mean temperature changing with the seasons.
Identify Data drift: You can also leverage Univariate Anomaly Detector feature of Anomaly Detector to monitor the potential data drift.
Data for retraining: refer to training data in section above. It is recommended to
- remove timestamps during known equipment down time or maintenance period
- add latest fresh data and remove some oldest data.