Anomaly detection is a powerful tool in cybersecurity and IT monitoring, helping detect unusual patterns in data that could indicate a breach, failure, or unexpected behaviour. Microsoft Sentinel, using Kusto Query Language (KQL), offers built-in functions like series_decompose_anomalies and series_decompose_forecast that make this process both effective and approachable.
In this article, we break down the math behind anomaly detection, explain it in simple terms, and walk through practical use cases using sample data such as user login attempts and firewall logs.
Understanding the Basics: Key Math Concepts
- Trend Line / Line of Best Fit: A straight line that shows the overall direction of data. It is calculated using linear regression with the formula: y = mx + c.
- Linear Regression: A method to calculate the best-fit line that minimizes the difference between actual values and the predicted line.
- Standard Deviation (SD): Tells us how much the values deviate from the average. A small SD means values are tightly clustered; a large SD means more variability.
- Z-Score: Measures how far a point is from the expected value, in terms of standard deviations. A Z-score > 3 usually indicates an anomaly.
Simple Explanation of Anomaly Detection Imagine tracking your daily steps. If every day you walk around 5,000–6,000 steps, and suddenly one day you walk 12,000 — that’s an anomaly. Anomaly detection automates this spotting process using statistical thresholds.
In KQL: series_decompose_anomalies()
series_decompose_anomalies(series, threshold, seasonality_hint, decomposition_kind, test_points)
- threshold: Number of SDs to flag a point as anomaly (e.g., 1.5).
- seasonality_hint: Set to -1 for auto detection. You could set it to specific values like 7 (weekly), 30 (monthly), etc.
- test_points: Number of most recent points excluded from training. How many of the most recent points should be excluded from training and treated as test data (to check for anomalies).
Use Case 1: User Login Activity Here’s a simplified dataset of login attempts:
Date |
Login Attempts |
Mar 18 |
12 |
Mar 19 |
10 |
Mar 20 |
14 |
Mar 21 |
13 |
Mar 22 |
11 |
Mar 23 |
12 |
Mar 24 |
150 ← Anomaly |
... |
... |
Apr 03 |
180 ← Anomaly |
Mar 18 |
12 |
Most days, the number of logins is normal — between 10 and 14 logins per day. If you calculate the average (mean), it comes out to about 12. Also, the data does not jump around too wildly — the standard deviation (how much the numbers usually vary) is only about 1.3.
But then suddenly, on:
March 24, we saw 150 logins.
April 3, we saw 180 logins.
That’s more than usual — not just a little bump. It is more like a huge spike.
To measure how unusual these values are, we use the Z-score — it tells us how many "steps" away these numbers are from the average.
If something is 3 steps (3 standard deviations) away, it’s already suspicious
But these values were more than 100 steps away — that's massively abnormal
On a bell curve, almost all the normal values fall near the center. These spikes? They’d be way out in the flat, far end of the curve — the kind of values you'd almost never expect.
Sample KQL to Explore Anomalies – Adjust the Values and Observe the Changes
let UserLoginActivity = datatable(TimeGenerated: datetime, UserId: string, LoginAttempts: int)
[
datetime(2025-03-18), "UserA", 12,
datetime(2025-03-19), "UserA", 10,
datetime(2025-03-20), "UserA", 14,
datetime(2025-03-21), "UserA", 13,
datetime(2025-03-22), "UserA", 11,
datetime(2025-03-23), "UserA", 12,
datetime(2025-03-24), "UserA", 150,
datetime(2025-03-25), "UserA", 13,
datetime(2025-03-26), "UserA", 12,
datetime(2025-03-27), "UserA", 14,
datetime(2025-03-28), "UserA", 11,
datetime(2025-03-29), "UserA", 10,
datetime(2025-03-30), "UserA", 12,
datetime(2025-03-31), "UserA", 13,
datetime(2025-04-01), "UserA", 14,
datetime(2025-04-02), "UserA", 13,
datetime(2025-04-03), "UserA", 180,
datetime(2025-04-04), "UserA", 11,
datetime(2025-04-05), "UserA", 12,
datetime(2025-04-06), "UserA", 13,
datetime(2025-04-07), "UserA", 14
];
UserLoginActivity
| make-series LoginAttempts=sum(LoginAttempts) default=0
on TimeGenerated from datetime(2025-03-18) to datetime(2025-04-07) step 1d
| extend (Anomalies, AnomalyScore, Baseline) = series_decompose_anomalies(LoginAttempts, 1.5, -1, 'linefit', 1)
| mv-expand TimeGenerated, LoginAttempts, Anomalies, AnomalyScore, Baseline
| where Anomalies != 0
| project TimeGenerated, LoginAttempts, Baseline, AnomalyScore
Use Case 2: Firewall Logs with BytesIn/BytesOut
Date |
SourceIP |
DestinationIP |
BytesIn
|
BytesOut |
Mar 18 |
10.0.0.5 |
8.8.8.8 |
12000 |
10000 |
Mar 19 |
10.0.0.5 |
8.8.8.8 |
12500 |
9800 |
... |
... |
... |
|
... |
Mar 24 |
10.0.0.5 |
203.0.113.20 |
11500 |
145000 ← Anomaly |
... |
... |
... |
|
... |
Apr 03 |
10.0.0.5 |
198.51.100.99 |
12000 |
160000 ← Anomaly |
A typical BytesOut value is ~10,000. The values on Mar 24 and Apr 3 stand out as large outliers.
Sample KQL to Explore Anomalies – Adjust the Values and Observe the Changes
let FirewallLogs = datatable(TimeGenerated: datetime, SourceIP: string, DestinationIP: string, BytesIn: long, BytesOut: long, Action: string)
[
datetime(2025-03-18), "10.0.0.5", "8.8.8.8", 12000, 10000, "Allowed",
datetime(2025-03-19), "10.0.0.5", "8.8.8.8", 12500, 9800, "Allowed",
datetime(2025-03-20), "10.0.0.5", "8.8.8.8", 13000, 10200, "Allowed",
datetime(2025-03-21), "10.0.0.5", "8.8.8.8", 12800, 10100, "Allowed",
datetime(2025-03-22), "10.0.0.5", "8.8.8.8", 12700, 9700, "Allowed",
datetime(2025-03-23), "10.0.0.5", "8.8.8.8", 12900, 9900, "Allowed",
datetime(2025-03-24), "10.0.0.5", "203.0.113.20", 11500, 145000, "Allowed",
datetime(2025-03-25), "10.0.0.5", "8.8.8.8", 12400, 10000, "Allowed",
datetime(2025-03-26), "10.0.0.5", "8.8.8.8", 12300, 10100, "Allowed",
datetime(2025-03-27), "10.0.0.5", "8.8.8.8", 12600, 9800, "Allowed",
datetime(2025-03-28), "10.0.0.5", "8.8.8.8", 12200, 9500, "Allowed",
datetime(2025-03-29), "10.0.0.5", "8.8.8.8", 12100, 9900, "Allowed",
datetime(2025-03-30), "10.0.0.5", "8.8.8.8", 12500, 10100, "Allowed",
datetime(2025-03-31), "10.0.0.5", "8.8.8.8", 12400, 10000, "Allowed",
datetime(2025-04-01), "10.0.0.5", "8.8.8.8", 12600, 10200, "Allowed",
datetime(2025-04-02), "10.0.0.5", "8.8.8.8", 12700, 10300, "Allowed",
datetime(2025-04-03), "10.0.0.5", "198.51.100.99", 12000, 160000, "Allowed",
datetime(2025-04-04), "10.0.0.5", "8.8.8.8", 12400, 9900, "Allowed",
datetime(2025-04-05), "10.0.0.5", "8.8.8.8", 12200, 10100, "Allowed",
datetime(2025-04-06), "10.0.0.5", "8.8.8.8", 12300, 10000, "Allowed",
datetime(2025-04-07), "10.0.0.5", "8.8.8.8", 12400, 9900, "Allowed"
];
FirewallLogs
| make-series TotalBytesOut = sum(BytesOut) default=0 on TimeGenerated from datetime(2025-03-18) to datetime(2025-04-07) step 1d
| extend (Anomalies, AnomalyScore, Baseline) = series_decompose_anomalies(TotalBytesOut, 1.5, -1, 'linefit', 1)
| mv-expand TimeGenerated to typeof(datetime), TotalBytesOut to typeof(long), Anomalies to typeof(double), AnomalyScore to typeof(double), Baseline to typeof(long)
| where Anomalies != 0
| project TimeGenerated, TotalBytesOut, Baseline, AnomalyScore
| sort by TimeGenerated
Forecasting with series_decompose_forecast()
In addition to detecting anomalies, KQL allows you to forecast future values based on historical trends. This is useful for proactive monitoring and capacity planning.
In cybersecurity, forecasting can help by:
- Anticipating abnormal behavior before it becomes critical (e.g., sudden spikes in outbound traffic or failed login attempts)
- Early warning for potential data exfiltration, DDoS activity, or malware spreading.
- Capacity planning for VPN usage, firewall throughput, or bandwidth during predictable high-traffic windows (e.g., month-end backups)
- Setting dynamic thresholds instead of fixed values — more adaptive detection logic
- Predicting user login activity based on historical patterns
To better understand how forecasting works, let’s step away from security for a moment — and explore a fun example: forecasting pizza shop sales based on hourly order data.
Sample KQL – Adjust the Values and Observe the Changes
// Step 1: Simulate 4 weeks of hourly pizza sales
let PizzaSales =
range hour from 1 to 24*7*4 step 1
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * hour
| extend
HourlySales =
10 + (hour / 100.0) + // gradual growth
iff((hour / 24) % 7 in (5,6,0), 15.0, 5.0) + // weekend boost
(-1.5 * pow((hour % 24 - 12)/8.0, 2)) + // night dip
(rand() * 5.0) // random noise
| extend
HourlySales = iff(hour in (150, 200, 780), HourlySales - 10.0, HourlySales) // rainy dip
| extend
HourlySales = iff(hour in (300, 400, 600), HourlySales + 10.0, HourlySales) // promo spike
| project Timestamp, HourlySales;
// Step 2: Build time series and forecast next week's hourly sales
PizzaSales
| make-series Sales = max(HourlySales)
on Timestamp
from datetime(2018-03-01 05:00)
to datetime(2018-03-01 05:00) + 24*7*5h // 5 weeks (4 real + 1 empty)
step 1h
| extend SalesForecast = series_decompose_forecast(Sales, 24*7) // forecast 7 days
| render timechart with(title = "Pizza Shop Hourly Sales + Forecast")
Conclusion
With just a few lines of KQL, we can unlock powerful anomaly detection and forecasting capabilities directly within Microsoft Sentinel. From catching suspicious login spikes to identifying potential data exfiltration events, and even predicting future usage trends, these tools allow analysts to move from reactive investigation to proactive defense.
By combining statistical understanding (like standard deviation and regression) with real-world examples and automated functions like series_decompose_anomalies() and series_decompose_forecast(), teams can stay ahead of threats and optimize their security operations.
In short: let the math work for you — and focus your attention where it matters most.