Time series analysis applied in a security hunting context
Published May 01 2019 08:34 AM 9,311 Views

This article expands on the time series analysis example given in the "Machine learning powered detections with Kusto query language in Azure Sentinel" Azure blog post. 

Scenario: identify user accounts authenticating from an unexpectedly large number of locations. The intuition is that these accounts may be of security interest, and potentially compromised.

 

This Kusto tutorial discusses using time series analysis to investigate change patterns in data using the make-series operator and series_fit_line function from the Kusto language used in Azure Log Analytics. This post describes a possible application of such techniques in a security context.

 

Note that for simplicity we are not evaluating the reachability of one sign-in location from another – clearly that is an important consideration and indeed Azure Active Directory runs sophisticated analysis to provide eventing and alerts for such impossible travel scenarios 

 

For the purposes of this example we restrict ourselves to the count of distinct locations and to hunting for ‘the most unusual’ sign-in activity – even if that is below the threshold that would result in an alert. 

A typical organization may have many users and many applications using Azure Active Directory for authentication. Some applications (for example Office365 Exchange Online) may have many more authentications than others (say Visual Studio) and thus dominate the data. Also users may have a different location profile depending on the application – high location variability for email access may be expected, but less so for development activity associated with Visual Studio authentications for example. For both these reasons it may be desirable to track location variability for every user/application combination and then investigate just some of the most unusual cases. 

 

Analysis 

The time series analysis make-series and series_fit_line operators allow just that. Our starting point is the Azure Active Directory sign-in logs – stored in the SigninLogs table in Azure Log Analytics: 

SigninLogs 

| extend  locationString= strcat(tostring(LocationDetails["countryOrRegion"]), "/", tostring(LocationDetails["state"]), "/", tostring(LocationDetails["city"]), ";") 

| project TimeGenerated, AppDisplayName, UserPrincipalName, locationString 

 

The next steps are: 

  1. Create the series of events of interest – in this instance distinct location count for every combination of user and application in the data: 

<previous query text> 

| make-series dLocationCount = dcount(locationString)

on TimeGenerated from datetime(01-01-2019) to datetime(01-31-2019) step 1d 

by UserPrincipalName, AppDisplayName 

 

Each series vector in the result set represents the number of locations for a given account/application pair: 

 Series.png

  1. Compute the best fit line for each series: 

<previous query text> 

| extend (RSquare,Slope,Variance,RVariance,Interception,LineFit)=series_fit_line(dLocationCount) 

// Chart the 3 most interesting lines  

// 0 slope corresponds to completely stable over time 

| top 3 by Slope desc  

| render timechart  

 

A completely stable profile over time – constant number of locations – will lead to a horizontal line – i.e. a slope of zero.  

 

A spike in number of sign-in locations translates to a positive slope value, so of all the best-fit lines – each line corresponding to a particular user/application combination - we can pick those with the largest slope values.  

 

The top slope values across all the best fit lines in a sample test set were around 0.2 – 0.3: 

 Slopes.png

The graph below shows the location count for these users over time – the typical pattern of 0 or 1 sign-in locations daily for these user accounts increased to 6-8 sign-in locations daily. Are these locations legitimate – that’s the starting point for investigation…  

 TimeSeriesGraph.png

 

 Tim Burrell, Microsoft Threat Intelligence Center 

April 2019 

 

Appendix 

 

Final consolidated query described in the main text 

SigninLogs 

| extend  locationString= strcat(tostring(LocationDetails["countryOrRegion"]), "/", tostring(LocationDetails["state"]), "/", tostring(LocationDetails["city"]), ";") 

| project TimeGenerated, AppDisplayName , UserPrincipalName, locationString 

// create time series 

| make-series dLocationCount = dcount(locationString) on TimeGenerated from datetime(01-01-2019) to datetime(01-31-2019) step 1d 

by UserPrincipalName, AppDisplayName 

// Compute best fit line for each entry 

| extend (RSquare,Slope,Variance,RVariance,Interception,LineFit)=series_fit_line(dLocationCount) 

// Chart the 3 most interesting lines  

// 0 slope corresponds to completely stable over time 

| top 3 by Slope desc  

| render timechart  

 

1 Comment
Version history
Last update:
‎Nov 02 2021 05:43 PM
Updated by: