Blog Post

Microsoft Developer Community Blog
2 MIN READ

Overview of SR-CNN algorithm in Azure Anomaly Detector

Tony_Xing's avatar
Tony_Xing
Icon for Microsoft rankMicrosoft
Nov 05, 2019

Author: Tony Xing (@XingGuodong), AI Platform, C + AI

 

In the last blog “Introducing Azure Anomaly Detector API”, I didn't provide enough details on one of the algorithms. As the algorithm paper was in the publishing process. The paper was accepted by KDD 2019 for oral presentation later, and this blog serves as an overview of the SR-CNN algorithm and for more details user can always read the paper. By the way, we have a 2-minute video here.

Problem definition

Before we go into details, let us revisit the problem definition of time series anomaly detection.

Challenges

For any time-series anomaly detection system that is operating in production with a large scale, there are quite a few challenges, especially on the three areas below:

1. Lack of labels - As you can imagine, with signals generated from clients, services, and sensors every second, the huge amount of volume makes it infeasible to manually label the data.

2. Generalization - With real-world data, there are so many different types of time series with different characteristics, which make it hard to generalize and find a silver bullet to solve all the problems. Some examples can be found in the figure below.

3. Efficiency - For any online anomaly detection system, efficiency is one of the key challenges. The system is expected to have low compute cost and low latency for serving.

Inspiration

In the computer vision domain, there is this concept called “visual saliency detection”. Saliency is what "stands out" in a photo or scene, enabling our eye-brain to quickly focus on the most important regions, as shown in figures below.

Fig. Original image

Fig. The salient part of the original image

 

When we look at the time series chart, the most dominant and stand-out part is the anomalies. This similarity is where we got the inspiration and it turned out to generate great results.

Algorithm

Our solution then borrowed Spectral Residual (SR) from the visual saliency detection domain, then apply CNN on the results produced by the SR model

As you can see from the algorithm architecture, after SR transformation, the transformed result magnifies the anomalies and the resulting signal is easier to generalize, therefore it provides us a way to training CNN with synthetic data.

 

Spectral Residual

The spectral residual algorithm consists of three major steps:

 

  1. Fourier Transform to get the log amplitude spectrum
  2. Calculation of spectral residual
  3. Inverse Fourier Transform that transforms the sequence back to the spatial domain

Benefits

  • SR is unsupervised, efficient, and has good generality.
  • The problem becomes much easier based on the output of the SR model.
  • We can train CNN on the SR output using fully synthetic data with simple synthetic rule
    • Randomly select several points in the saliency map and calculate the injection value to replace the original point.

Result

We have performed online and offline experimentation, it outperformed state-of-the-arts consistently on open datasets and internal production datasets.

 

Updated Nov 07, 2019
Version 8.0
  • MartinLavoie's avatar
    MartinLavoie
    Copper Contributor

    Thanks for the details. I found out about the application of this algo. through the preview feature in PowerBI. In the world of SPC, XmR charts (control-charts) have historically been used to detect 'signal'. While a few vendor have created Xmr visualizations add-on in PowerBI, I wonder how the anomaly detection compares to traditional XmR methods.

  • BenjaminOgden's avatar
    BenjaminOgden
    Copper Contributor

    Thanks for sharing this. I find anomaly detection particularly interesting as it changes the way I play cards (when its my brain detecting the anomalies). I factor in variable change and its affect on the probable outcomes when making strategic decisions. 

  • Ivan_Lai's avatar
    Ivan_Lai
    Copper Contributor

    Thanks for sharing this algorithm.

    I wonder how to calculate the expected value, average value, expected max and min value in Power BI. Do all values explain in the paper or is it confidential parts in this function?  

    Apologized if I missed something.

     

    Thanks again.