Core Infrastructure and Security Blog

7 MIN READ

Demystifying Anomaly Detection in Microsoft Sentinel using KQL

Kausd

Microsoft

May 27, 2025

Anomaly detection is a powerful tool in cybersecurity and IT monitoring, helping detect unusual patterns in data that could indicate a breach, failure, or unexpected behaviour. Microsoft Sentinel, using Kusto Query Language (KQL), offers built-in functions like series_decompose_anomalies and series_decompose_forecast that make this process both effective and approachable.

In this article, we break down the math behind anomaly detection, explain it in simple terms, and walk through practical use cases using sample data such as user login attempts and firewall logs.

Understanding the Basics: Key Math Concepts

Trend Line / Line of Best Fit: A straight line that shows the overall direction of data. It is calculated using linear regression with the formula: y = mx + c.
Linear Regression: A method to calculate the best-fit line that minimizes the difference between actual values and the predicted line.

Standard Deviation (SD): Tells us how much the values deviate from the average. A small SD means values are tightly clustered; a large SD means more variability.

Z-Score: Measures how far a point is from the expected value, in terms of standard deviations. A Z-score > 3 usually indicates an anomaly.

Simple Explanation of Anomaly Detection Imagine tracking your daily steps. If every day you walk around 5,000–6,000 steps, and suddenly one day you walk 12,000 — that’s an anomaly. Anomaly detection automates this spotting process using statistical thresholds.

In KQL: series_decompose_anomalies()

series_decompose_anomalies(series, threshold, seasonality_hint, decomposition_kind, test_points)

threshold: Number of SDs to flag a point as anomaly (e.g., 1.5).
seasonality_hint: Set to -1 for auto detection. You could set it to specific values like 7 (weekly), 30 (monthly), etc.
test_points: Number of most recent points excluded from training. How many of the most recent points should be excluded from training and treated as test data (to check for anomalies).

Use Case 1: User Login Activity Here’s a simplified dataset of login attempts:

Date	Login Attempts
Mar 18	12
Mar 19	10
Mar 20	14
Mar 21	13
Mar 22	11
Mar 23	12
Mar 24	150 ← Anomaly
...	...
Apr 03	180 ← Anomaly
Mar 18	12

Most days, the number of logins is normal — between 10 and 14 logins per day. If you calculate the average (mean), it comes out to about 12. Also, the data does not jump around too wildly — the standard deviation (how much the numbers usually vary) is only about 1.3.

But then suddenly, on:

March 24, we saw 150 logins.

April 3, we saw 180 logins.

That’s more than usual — not just a little bump. It is more like a huge spike.

To measure how unusual these values are, we use the Z-score — it tells us how many "steps" away these numbers are from the average.

If something is 3 steps (3 standard deviations) away, it’s already suspicious

But these values were more than 100 steps away — that's massively abnormal

On a bell curve, almost all the normal values fall near the center. These spikes? They’d be way out in the flat, far end of the curve — the kind of values you'd almost never expect.

Sample KQL to Explore Anomalies – Adjust the Values and Observe the Changes

let UserLoginActivity = datatable(TimeGenerated: datetime, UserId: string, LoginAttempts: int) 
    [ 
       datetime(2025-03-18), "UserA", 12, 
       datetime(2025-03-19), "UserA", 10, 
       datetime(2025-03-20), "UserA", 14, 
       datetime(2025-03-21), "UserA", 13, 
        datetime(2025-03-22), "UserA", 11, 
        datetime(2025-03-23), "UserA", 12, 
        datetime(2025-03-24), "UserA", 150, 
        datetime(2025-03-25), "UserA", 13, 
        datetime(2025-03-26), "UserA", 12, 
        datetime(2025-03-27), "UserA", 14, 
        datetime(2025-03-28), "UserA", 11, 
        datetime(2025-03-29), "UserA", 10, 
        datetime(2025-03-30), "UserA", 12, 
        datetime(2025-03-31), "UserA", 13, 
        datetime(2025-04-01), "UserA", 14, 
        datetime(2025-04-02), "UserA", 13, 
        datetime(2025-04-03), "UserA", 180, 
        datetime(2025-04-04), "UserA", 11, 
        datetime(2025-04-05), "UserA", 12, 
        datetime(2025-04-06), "UserA", 13, 
        datetime(2025-04-07), "UserA", 14 
    ]; 
    UserLoginActivity 
    | make-series LoginAttempts=sum(LoginAttempts) default=0 
    on TimeGenerated from datetime(2025-03-18) to datetime(2025-04-07) step 1d 
    | extend (Anomalies, AnomalyScore, Baseline) = series_decompose_anomalies(LoginAttempts, 1.5, -1, 'linefit', 1) 
    | mv-expand TimeGenerated, LoginAttempts, Anomalies, AnomalyScore, Baseline 
    | where Anomalies != 0 
    | project TimeGenerated, LoginAttempts, Baseline, AnomalyScore

Use Case 2: Firewall Logs with BytesIn/BytesOut

Date	SourceIP	DestinationIP	BytesIn	BytesOut
Mar 18	10.0.0.5	8.8.8.8	12000	10000
Mar 19	10.0.0.5	8.8.8.8	12500	9800
...	...	...		...
Mar 24	10.0.0.5	203.0.113.20	11500	145000 ← Anomaly
...	...	...		...
Apr 03	10.0.0.5	198.51.100.99	12000	160000 ← Anomaly

A typical BytesOut value is ~10,000. The values on Mar 24 and Apr 3 stand out as large outliers.

Sample KQL to Explore Anomalies – Adjust the Values and Observe the Changes

let FirewallLogs = datatable(TimeGenerated: datetime, SourceIP: string, DestinationIP: string, BytesIn: long, BytesOut: long, Action: string) 

[ 
    datetime(2025-03-18), "10.0.0.5", "8.8.8.8", 12000, 10000, "Allowed", 
    datetime(2025-03-19), "10.0.0.5", "8.8.8.8", 12500, 9800, "Allowed", 
    datetime(2025-03-20), "10.0.0.5", "8.8.8.8", 13000, 10200, "Allowed", 
    datetime(2025-03-21), "10.0.0.5", "8.8.8.8", 12800, 10100, "Allowed", 
    datetime(2025-03-22), "10.0.0.5", "8.8.8.8", 12700, 9700, "Allowed", 
    datetime(2025-03-23), "10.0.0.5", "8.8.8.8", 12900, 9900, "Allowed", 
    datetime(2025-03-24), "10.0.0.5", "203.0.113.20", 11500, 145000, "Allowed", 
    datetime(2025-03-25), "10.0.0.5", "8.8.8.8", 12400, 10000, "Allowed", 
    datetime(2025-03-26), "10.0.0.5", "8.8.8.8", 12300, 10100, "Allowed", 
    datetime(2025-03-27), "10.0.0.5", "8.8.8.8", 12600, 9800, "Allowed", 
    datetime(2025-03-28), "10.0.0.5", "8.8.8.8", 12200, 9500, "Allowed", 
    datetime(2025-03-29), "10.0.0.5", "8.8.8.8", 12100, 9900, "Allowed", 
    datetime(2025-03-30), "10.0.0.5", "8.8.8.8", 12500, 10100, "Allowed", 
    datetime(2025-03-31), "10.0.0.5", "8.8.8.8", 12400, 10000, "Allowed", 
    datetime(2025-04-01), "10.0.0.5", "8.8.8.8", 12600, 10200, "Allowed", 
    datetime(2025-04-02), "10.0.0.5", "8.8.8.8", 12700, 10300, "Allowed", 
    datetime(2025-04-03), "10.0.0.5", "198.51.100.99", 12000, 160000, "Allowed", 
    datetime(2025-04-04), "10.0.0.5", "8.8.8.8", 12400, 9900, "Allowed", 
    datetime(2025-04-05), "10.0.0.5", "8.8.8.8", 12200, 10100, "Allowed", 
    datetime(2025-04-06), "10.0.0.5", "8.8.8.8", 12300, 10000, "Allowed", 
    datetime(2025-04-07), "10.0.0.5", "8.8.8.8", 12400, 9900, "Allowed" 
]; 
FirewallLogs 
| make-series TotalBytesOut = sum(BytesOut) default=0 on TimeGenerated from datetime(2025-03-18) to datetime(2025-04-07) step 1d 
| extend (Anomalies, AnomalyScore, Baseline) = series_decompose_anomalies(TotalBytesOut, 1.5, -1, 'linefit', 1) 
| mv-expand TimeGenerated to typeof(datetime), TotalBytesOut to typeof(long), Anomalies to typeof(double), AnomalyScore to typeof(double), Baseline to typeof(long) 
| where Anomalies != 0 
| project TimeGenerated, TotalBytesOut, Baseline, AnomalyScore 
| sort by TimeGenerated

Forecasting with series_decompose_forecast()

In addition to detecting anomalies, KQL allows you to forecast future values based on historical trends. This is useful for proactive monitoring and capacity planning.

In cybersecurity, forecasting can help by:

Anticipating abnormal behavior before it becomes critical (e.g., sudden spikes in outbound traffic or failed login attempts)

Early warning for potential data exfiltration, DDoS activity, or malware spreading.

Capacity planning for VPN usage, firewall throughput, or bandwidth during predictable high-traffic windows (e.g., month-end backups)

Setting dynamic thresholds instead of fixed values — more adaptive detection logic
Predicting user login activity based on historical patterns

To better understand how forecasting works, let’s step away from security for a moment — and explore a fun example: forecasting pizza shop sales based on hourly order data.

Sample KQL – Adjust the Values and Observe the Changes

// Step 1: Simulate 4 weeks of hourly pizza sales
let PizzaSales = 
range hour from 1 to 24*7*4 step 1
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * hour
| extend 
    HourlySales = 
        10 + (hour / 100.0) +                                 // gradual growth
        iff((hour / 24) % 7 in (5,6,0), 15.0, 5.0) +          // weekend boost
        (-1.5 * pow((hour % 24 - 12)/8.0, 2)) +               // night dip
        (rand() * 5.0)                                        // random noise
| extend 
    HourlySales = iff(hour in (150, 200, 780), HourlySales - 10.0, HourlySales)  // rainy dip
| extend 
    HourlySales = iff(hour in (300, 400, 600), HourlySales + 10.0, HourlySales)  // promo spike 
| project Timestamp, HourlySales;
// Step 2: Build time series and forecast next week's hourly sales
PizzaSales
| make-series Sales = max(HourlySales) 
    on Timestamp 
    from datetime(2018-03-01 05:00) 
    to datetime(2018-03-01 05:00) + 24*7*5h  // 5 weeks (4 real + 1 empty)
    step 1h
| extend SalesForecast = series_decompose_forecast(Sales, 24*7)  // forecast 7 days
| render timechart with(title = "Pizza Shop Hourly Sales + Forecast")

Conclusion

With just a few lines of KQL, we can unlock powerful anomaly detection and forecasting capabilities directly within Microsoft Sentinel. From catching suspicious login spikes to identifying potential data exfiltration events, and even predicting future usage trends, these tools allow analysts to move from reactive investigation to proactive defense.

By combining statistical understanding (like standard deviation and regression) with real-world examples and automated functions like series_decompose_anomalies() and series_decompose_forecast(), teams can stay ahead of threats and optimize their security operations.

In short: let the math work for you — and focus your attention where it matters most.

Updated May 27, 2025

Version 2.0

Anomaly Detection in Microsoft Sentinel Using KQL

Anomaly detection Microsoft Sentinel

Kausd

Microsoft

Joined June 18, 2019

View Profile

Core Infrastructure and Security Blog

Follow this blog board to get notified when there's new activity