Looking for unknown anomalies - what is normal? Time Series analysis & its applications in Security
Published May 15 2019 07:00 AM 19.5K Views
Microsoft

We previously blogged about Machine learning powered detections with Kusto query language in Azure Sentinel and Time series analysis applied in a security hunting context.

This article provides a practical outline for using Time Series analysis to surface anomalies on security event log data sources, visualizing and alerting on anomalies for further investigation in Azure Sentinel. 

 

We will describe the various functions which are used in compiling the query and how to use those KQL queries to either visualize the output or transform it into tabular data outputs to configure alerts on specific anomalies.

 

What is Time Series data?

Per Wikipedia, Time Series is a series of data points indexed (or listed or graphed) in time order. The data points are often discrete numeric points such as frequency of counts or occurrences against a column of the dataset.

Time series analysis and forecasting is effectively used in multiple sectors, , IoT, service monitoring etc to identify meaningful statistics from historical data as well as predicting future outcomes.

We will keep the overview short and simple for the scope of this article and focus on applied techniques on security event log data sources.

 

When to use Time Series analysis:

By analyzing time series data over an extended period, we can identify time-based patterns (e.g. seasonality, trend etc.) in the data and extract meaningful statistics which can help in flagging outliers.  A particular example in a security context is user logon patterns over a period of time exhibiting different behavior after hours and on weekends: computing deviations from these changing patterns is rather difficult in traditional atomic detections with static thresholds. KQL built-in functions can automatically identify such seasonality and trend from the input data and take it into consideration when flagging anomalies.

To prepare data for time series analysis, select the data sources and the data points associated with it. 

These data points can then be transformed series of  aggregated values against continuous time intervals from the original data.

Ideally, the dataset in scope should have consistent values populated for most of the time windows. If there are too many missing data points, the outlier results may be affected.

However, make-series operator used to prepare the time series data inserts values for missing time window intervals. By default it sets value to 0 for missing values, you  also have control to change default values to avg() or any other constants by setting it to default argument of make-series.

 

Typical time series analysis workflow involves below steps:

  • Select the data source table which contains raw events defined for the scope of the analysis.
  • Define the field (such as sourceip, accountname, hostname etc) from the schema against which numeric data points (such as count of outbound network connections, count of logons etc.) will be calculated.
  • Use built-in KQL operators to convert the input data into time series by using make-series operator which transforms base logs into series of aggregated values of specified data points against time windows (e.g. count of logon per hour, outbound data transfer per day etc).
  • Use time series functions (e.g. series_decompose and series_decompose_anomalies) to apply decomposition transformation on an input data series and extract anomalous points.
  • You can plot the output in a time chart by splitting seasonal, trend, residual components in the data or you can expand the output to filter for anomalies and flag it as an alert.

Data transformation and Time series analysis with KQL built-in functions and operators:

The KQL operators and functions below are generally used to compile time series-based detections in Kusto. Check the reference section for more operators to customize the analysis or read more about its syntax and usage.

Apart from using KQL built-in functions, one can also load input data in data frame and then use other time series methods/libraries in Python and R language via Jupyter notebook. However built-in functions natively available in KQL are optimized for performance and works efficient on large scale of data.

The most convenient way of using these functions is applying it to the results of make-series operator which will transform your input data into multi-value series ready to either plot on time chart or alert on outliers.

 

make-series:

This operator creates a series of specified aggregated values along specified axis. It is used to aggregate data points in a series by specified column available in data source schema. This will transform multi row table output into a single row as multi-value array. The operator also fills in 0 automatically for missing values or you can override with default option with average.

 

series_decompose() :

This function applies a decomposition transformation on a series. It takes time series data prepared by make-series and decompose it to seasonal, trend and residual components.

 

series_decompose_anomolies() :

This function is for anomaly detection which is based on series decomposition (refer to series_decompose())

It takes an expression containing a series (dynamic numerical array) as input and extract anomalous points with scores. anomaly detections used with this function are based on Tukey’s test. The function has several parameters including AD_method, threshold, seasonality and trend which are explained below.

  • AD_method parameter in above syntax specifically controls the anomaly detection method on the residual time series. Available options are ctukey (default option- 10th-90th percentile range) and tukey (standard - 25th-75th percentile range)
  • threshold value greater than 1.5 or less than -1.5 indicates a spike or dip anomaly respectively in the same element of the input.  In case default threshold (1.5) is still producing lot of results with false positives then, you can set to 3 which corresponds to a far outlier
  • Seasonality parameter controls seasonal analysis. It will have one of below value.
    • -1 : autodetect seasonality
    • Period: positive integer, specify number of bins e.g if input data is hourly, weekly period will be 168 bins.
    • 0 : no seasonality (skip extracting component)
  • Trend parameter controls trend analysis.  It can be set to one of below value.
    • ‘avg’ : define trend component as average(x) [default]
    • ‘linefit’: extracts trend component using linear regression.
    • ‘none’ : no trend, skip extracting this component.

mv-expand :

This operator is primarily used to expand results of time series analysis decomposition functions which are originally a collection of multi value array  into associated timestamps and data points with total, baseline counts and score as individual rows. This output is useful to filter out just the anomalies which are then used to alerting purpose or join against other tables on timestamp columns to gather additional context around anomalies.

 

Decomposing Time series as a Kusto Query:

Below is a representation of various sections of Time series analysis and corresponding Kusto Query templates to understand it better. The queries consist of different steps such as data preparation, visualizing the results or alerting on the outliers.

 

Preparing Time Series Data1-Data Preparation.png2-Data Preparation.png

 

Visualizing the decomposition and anomalies from the results of the Time Series Data.3-Data Visualization.png4-Data Visualization.png

 

Configure alerts on specific outliers from the results of the Time Series Analysis

 

5-Alerting.png

Investigate anomalies by joining it against base logs to populate additional fields

6-Investigation.png

 

Practical Time Series Analysis applications on Security Event Log Data sources:

As part of security monitoring and incident response, analysts often develop several detections based on static thresholds within a specified time interval window. Traditionally this threshold value is identified manually by historical trend of events and is configured as a static value in the detection. e.g. brute force attack may have logic of 50 logon failures in 1 min etc.

It is often cumbersome to take various patterns at different time intervals such as after hours, weekends into considerations while flagging anomaly which results in false positives and often addressed by maintaining whitelist manually.

In addition, despite of static threshold being reached/ exceeded slightly, the results are often uninteresting and generate false positives for analysts. As part of triage, analyst improve detections via whitelisting over the period to reduce false positive rate.

This approach is not scalable and time series analysis-based detections can effectively replace these static detections.  The results are robust to outliers as it considers seasonality and historical trend when flagging an anomaly. These functions also perform very well at scale due to vectorized implementation to process thousands of time series in seconds.

 

Scenario 1: Time Series anomaly for process execution frequency.

GitHub Link

 

For the demonstration purpose, we will run it against sample data from lab and also split the query to display various results step by step.

 

Query:

First part of the query will prepare the time series data per each process by using make-series operator.

let starttime = 7d;
let endtime = 1d;
let timeframe = 1h;
let TotalEventsThreshold = 5;
let ExeList = dynamic(
["powershell.exe","cmd.exe","wmic.exe"]);
let TimeSeriesData= 
SecurityEvent
| where EventID == "4688" | extend Process = tolower(Process)
| where TimeGenerated between (startofday(ago(starttime))..startofday(ago(endtime)))
| where Process in (ExeList)
| project TimeGenerated,Computer, AccountType, Account, Process
| make-series Total=count() on TimeGenerated from ago(starttime) to ago(endtime) step timeframe by Process;
TimeSeriesData

Results:

Sample results will look like below. Total and TimeGenerated columns are vector of multi-value array per process with associated Timestamp hour windows and execution count in that hour.  You can also notice 0 values which are filled by make-series operator for missing values.

1-TimeSeriesData-Results.png

 

For this article, we will not look into visualizing results, but you can try out the queries yourself described in previous section on your logs. We will cover visualization aspect in follow-up blog post with another use case.

 

Query:

Next part of the query will detect seasonality, trend in your data automatically and will use it to flag spikes as anomalies based on provided parameters. (1.5 as threshold, -1 to detect auto-seasonality and linefit for trend analysis).

let TimeSeriesAlerts=TimeSeriesData
| extend (anomalies, score, baseline) = series_decompose_anomalies(Total, 1.5, -1, 'linefit')
| mv-expand Total to typeof(double), TimeGenerated to typeof(datetime), anomalies to typeof(double),score to typeof(double), baseline to typeof(long)
| where anomalies > 0
| project Process,TimeGenerated, Total, baseline, anomalies, score
| where Total > TotalEventsThreshold;
TimeSeriesAlerts

Results:

Sample result will look like below. Total column indicates the actual count observed in that hour and baseline count was expected count in that hour. Please note, since the timestamps are aggregated per hour, timestamp will show the hour at which total value crossed baseline.The results are spikes as anomalies observed against historical baseline count.

2-TimeSeriesAlerts.png

 

To investigate the results of the anomalies, you can join it with base table of windows event logs and gather additional context. You can also use other machine learning capabilities related to clustering such as autocluster , diffpatterns to perform automated Root Cause Analysis associated with anomaly which will be covered in follow-up blogs.

 

Query:

The last part of the query will join the results of the anomalies with base data to populate additional fields to gather additional context to determine if its malicious or not.

TimeSeriesAlerts
| join (
SecurityEvent
| where EventID == "4688" | extend Process = tolower(Process)
| summarize CommandlineCount=count() by bin(TimeGenerated, 1h),Process, CommandLine, Computer, Account
) on Process, TimeGenerated | extend AnomalyTimeattheHour = TimeGenerated 
| project AnomalyTimeattheHour, Computer, Account,Process, CommandLine ,CommandlineCount, Total, baseline, anomalies , score 

Results:

Sample results may look like below along with host , account name , process and command line details seen in the respective hour. Below screenshots filtered to interesting events for demo purposes but ideally query will return all events happened during that hour and analyst has to manually see what additional events were observed during the time window and if anything is malicious or worth investigating.

3-AlertResults.png

 

You can find additional Time Series based detection queries in Azure Sentinel github repo.

Below direct links for reference.

Scenario 2: Time series anomaly for Data exfiltration:

Github Link

Scenario 3: Time Series anomaly for total volume in network logs:

Github Link

 

Conclusion

Time series analysis is an effective technique to understand various time-based patterns in your data. Applying time series analysis technique on various security data sources provides unique capability as compared to traditional detection mechanisms which are atomic or static in nature. It helps in finding deviations from normal patterns by KQL built-in functions which detects seasonality and trend at scale automatically from the input time series data.

In this article, we looked into practical example where we analyzed series of process execution data for sensitive processes which are often leveraged as attack vectors. By analyzing trend of frequency of executions over 30 days, we detected seasonality and trend in the data which helped to identify two anomalies on 19th Apr at 6 and 7 UTC hour. The deviation from the baseline count alone are not necessarily an indication of malicious activity but gathering more context around the timestamp of anomaly such as command line activity helps uncover additional events seen during that time window. In this case, we saw records of Powershell running mimikatz on multiple hosts.

In traditional scenario, calculating right threshold for the range of execution count would have been challenging, resulting in either too many false positives or true negatives. The built-in functions also

provides capability to customize thresholds to flag the outliers which are far from the normal baseline in multiple ways (adjust the threshold value, custom score threshold) so as to reduce false positive rate if required.

 

Feel free to submit pull requests on other time series analysis based queries on Azure Sentinel Github Repo.  Happy Hunting.

 

References:

5 Comments
Co-Authors
Version history
Last update:
‎Mar 07 2022 12:16 PM
Updated by: