Jun 21 2023 03:57 AM
Hi everyone,
I want to have a analytic rule / Automation Rule that everytime that a certain connector (e.g Some Firewall Connector) is down, to receive a Alert in Sentinel.
I've been searching for various alternatives but until now can't find anything that i can put working in my organization.
Anyone as some suggestion, on what you implemented before and that is working right now ?
Thank you.
Jun 21 2023 03:15 PM
Solution
There are lots of scenarios for this. The most common solution is to monitor for a time delay - so if there is no data in say 15mins then it's probably down. However it could just as easily not have sent any data in that period, so you may have to also check back to the same period the day or week before to see if its uncommon. You may need different thresholds for each connector/Table - so a watchlist can help.
Anomaly detection can help here as well - look at series_decompose_anomalies(), however in a Rule you are limited to 14days lookback - which isn't often enough to detect seasonal patterns.
If the data is from Syslog /CommonSecurtitylog, you may actually want to monitor the Log collector server(s), using the Heartbeat table, so if for example one server fails out of 4 you still have 75% online capacity - if you just monitored the connector/Table then all 4 have to fail (or not send data).
There are some basic examples in the Queries pane for Heartbeat.
Jun 27 2023 02:16 AM
Jun 27 2023 04:59 AM
Jun 27 2023 06:17 AM
@Kaaamil Not quite like that, still trying to figure it out..
I'm using this query:
let Now = now();
let queryResult = range TimeGenerated from (Now - 1d) to (Now - 4h) step 4h
| extend Count = 0
| union isfuzzy=true
(CommonSecurityLog
| where DeviceVendor == "connector_name_here"
| summarize Count = count() by bin(TimeGenerated, 8h))
| union (
range x from (Now - 1d) to (Now - 4h) step 8h
| project TimeGenerated = x, Count = 0
)
| summarize Count = max(Count) by bin(TimeGenerated, 8h)
| sort by TimeGenerated
| project Value = iff(isnull(Count), 0, Count), Time = TimeGenerated, Legend = "connector_name_here";
queryResult
Trying something like this, and with the alert threshold: is equal to 0
But it isn't working, i have the connector returning me 0 values and it doesn't open a alert
Jun 27 2023 07:34 AM
Try this one - very basic but does the work 🙂
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"workspace": {
"type": "String"
}
},
"resources": [
{
"id": "[concat(resourceId('Microsoft.OperationalInsights/workspaces/providers', parameters('workspace'), 'Microsoft.SecurityInsights'),'/alertRules/8c6e05a5-26ad-49ae-9cd6-a3e0f9df305b')]",
"name": "[concat(parameters('workspace'),'/Microsoft.SecurityInsights/8c6e05a5-26ad-49ae-9cd6-a3e0f9df305b')]",
"type": "Microsoft.OperationalInsights/workspaces/providers/alertRules",
"kind": "Scheduled",
"apiVersion": "2022-11-01-preview",
"properties": {
"displayName": "No logs from CommonSecuritylog from last 1 hour",
"description": "Rule triggers when Sentinel doesn't receive commonsecurity logs",
"severity": "High",
"enabled": true,
"query": "CommonSecurityLog\r\n|summarize Events = count()\r\n|where Events ==0",
"queryFrequency": "PT1H",
"queryPeriod": "PT1H",
"triggerOperator": "GreaterThan",
"triggerThreshold": 0,
"suppressionDuration": "PT5H",
"suppressionEnabled": false,
"startTimeUtc": null,
"tactics": [],
"techniques": [],
"alertRuleTemplateName": null,
"incidentConfiguration": {
"createIncident": true,
"groupingConfiguration": {
"enabled": false,
"reopenClosedIncident": false,
"lookbackDuration": "PT5H",
"matchingMethod": "AllEntities",
"groupByEntities": [],
"groupByAlertDetails": [],
"groupByCustomDetails": []
}
},
"eventGroupingSettings": {
"aggregationKind": "SingleAlert"
},
"alertDetailsOverride": null,
"customDetails": null,
"entityMappings": null,
"sentinelEntitiesMappings": null,
"templateVersion": null
}
}
]
}
Jul 03 2023 01:51 AM
Jul 03 2023 01:57 AM
Jul 03 2023 09:03 AM
Jul 07 2023 12:27 AM
You can import json file:
Import and export Microsoft Sentinel analytics rules | Microsoft Learn
Let me explain more this logic.
I want to check if CommonSecurityLog table doesn't have logs:
CommonSecurityLog
|summarize Events = count()
|where Events ==0
If query returns no results it means that CommonSecurityLog is not empty for last X amount of time.
Look how many log entries I have for last 30 minutes:
So lets check if we have 0 logs for last 30 minutes:
Events==0 is false so it won't be triggered If it'd be true it would mean no logs for last 30 minutes and triggered an incident 🙂
Jul 07 2023 01:35 AM
Jul 12 2023 11:53 AM - edited Jul 12 2023 11:56 AM
I dug this up from when I was a KQL beginner back in 2020. It still works for many of our use cases, though. I made a logging thresh hold because some log sources I would still get heartbeats or something else was "just wrong" with the log source. Alerting on zero logs is easy.
I'm not sure if this is elegant or a mess, but it works! 😃
let CurrentLog_lookBack = 1h;
let MinimumThresh_lookBack = 1d;
let HistoricalLog_lookBack = 1d;
CommonSecurityLog
| where DeviceVendor == "YourVendorHere"
//Chage the *.03 to *.06 from the line below to make the AverageHourlyLogThreshold lower than normal for testing.
| summarize Total24HRcount=count(TimeGenerated > ago(HistoricalLog_lookBack)), CurrentHRCount=count(TimeGenerated > ago(CurrentLog_lookBack)), AverageHourlyLogThreshold=count(TimeGenerated > ago(MinimumThresh_lookBack*0.03))
| extend Percentofaverage = iif( CurrentHRCount < AverageHourlyLogThreshold, "Logging has dropped below threshold - Check Log Source", "Logging Normal" )
| extend Code = iif( CurrentHRCount < AverageHourlyLogThreshold, "1", "" )
| project CurrentHRCount, Total24HRcount, Percentofaverage, Code, AverageHourlyLogThreshold
Change "YourVendorHere" to your vendor in your logs. The "code" is null if logs are above the set thresh hold and 1 if they fall below. You can use the to generate an alert with a playbook or however you like.
Normal
Below Thresh Hold (I didn't have a sample so I just changed the thresh hold for an example)
Here is the ChatGpt explanation of how it works 😃
1. `let CurrentLog_lookBack = 1h; let MinimumThresh_lookBack = 1d; let HistoricalLog_lookBack = 1d;`: These are variable declarations. The `let` keyword in KQL allows you to create a variable and assign it a value. `CurrentLog_lookBack` is set to 1 hour, `MinimumThresh_lookBack` is set to 1 day, and `HistoricalLog_lookBack` is also set to 1 day. These are used to set the time frames for the queries.
2. `CommonSecurityLog | where DeviceVendor == "YouDeviceVendor"`: This line is querying logs from the `CommonSecurityLog` data source, specifically filtering to only include logs where the `DeviceVendor` is "YouDeviceVendor".
3. `| summarize Total24HRcount=count(TimeGenerated > ago(HistoricalLog_lookBack)), CurrentHRCount=count(TimeGenerated > ago(CurrentLog_lookBack)), AverageHourlyLogThreshold=count(TimeGenerated > ago(MinimumThresh_lookBack*0.03))`: This line is summarizing the data in a few ways. It's getting a count of the logs in the past 24 hours (`Total24HRcount`), the past hour (`CurrentHRCount`), and the average hourly log threshold (`AverageHourlyLogThreshold`), which is calculated as the count of logs over the past day multiplied by 0.03.
4. `| extend Percentofaverage = iif( CurrentHRCount < AverageHourlyLogThreshold, "Logging has dropped below threshold - Check Log Source", "Logging Normal" )`: This line is creating a new column (`Percentofaverage`) that contains a message about whether the current hour's log count has dropped below the average hourly log threshold. If it has, the message is "Logging has dropped below threshold - Check Log Source"; otherwise, it's "Logging Normal".
5. `| extend Code = iif( CurrentHRCount < AverageHourlyLogThreshold, "1", "" )`: This line is creating another new column (`Code`) that contains "1" if the current hour's log count has dropped below the average hourly log threshold, and an empty string otherwise.
6. `| project CurrentHRCount, Total24HRcount, Percentofaverage, Code, AverageHourlyLogThreshold`: This line is limiting the output of the query to just the columns specified: `CurrentHRCount`, `Total24HRcount`, `Percentofaverage`, `Code`, and `AverageHourlyLogThreshold`.
In summary, the script is checking whether the number of logs from a "YouDeviceVendor" device in the past hour has fallen below a certain threshold (3% of the number of logs in the past day). If it has, a warning message and code are generated. The final output includes the counts of logs in the past hour and day, the threshold, and the warning message and code.
Jul 13 2023 04:05 AM
oh thank you a lot! it looks really nice! And I can just put this code in a Analytic Rule, I will try! I just have to figure it out what is the Rule threshold that i have to set in this analytic, so that it generates a alert in my SIEM. Do you have a idea?
Jul 13 2023 10:27 AM
@miguelfac Thanks!
Add this additional line to the query.
| where Code == "1"
That makes it so it only returns a result if the code is 1, which is when your logs are below the threshold.
Then just select "Is Greater than" 0 or "Is Equal to" 1 for your analytic rule.
Jul 13 2023 01:11 PM
I made this simpler, that old thing was such a mess lol. This does basically the same thing with the same result. If the logs in the past 1 hour fall below 1% of the prior 24 hour window. You can change the percentage from 1% to 5% by changing the 0.01 to 0.05 to fit your needs. Have fun!
let averageCount = toscalar(
CommonSecurityLog
| where TimeGenerated >= ago(24h)
| summarize count()
);
CommonSecurityLog
| where TimeGenerated >= ago(1h)
| summarize LogCount = count()
| extend isBelowThreshold = iff(LogCount < averageCount * 0.01, 1, 0)
| where isBelowThreshold == 1
Jul 14 2023 01:13 AM
Jul 14 2023 01:14 AM
Jul 14 2023 09:00 AM - edited Jul 14 2023 09:01 AM
Yep! Just make sure you add it to both places.
let averageCount = toscalar(
CommonSecurityLog
| where DeviceVendor == "YourVendor"
| where TimeGenerated >= ago(24h)
| summarize count()
);
CommonSecurityLog
| where DeviceVendor == "YourVendor"
| where TimeGenerated >= ago(1h)
| summarize LogCount = count()
| extend isBelowThreshold = iff(LogCount < averageCount * 0.01, 1, 0)
| where isBelowThreshold == 1
Jul 14 2023 09:24 AM
Jun 21 2023 03:15 PM
Solution
There are lots of scenarios for this. The most common solution is to monitor for a time delay - so if there is no data in say 15mins then it's probably down. However it could just as easily not have sent any data in that period, so you may have to also check back to the same period the day or week before to see if its uncommon. You may need different thresholds for each connector/Table - so a watchlist can help.
Anomaly detection can help here as well - look at series_decompose_anomalies(), however in a Rule you are limited to 14days lookback - which isn't often enough to detect seasonal patterns.
If the data is from Syslog /CommonSecurtitylog, you may actually want to monitor the Log collector server(s), using the Heartbeat table, so if for example one server fails out of 4 you still have 75% online capacity - if you just monitored the connector/Table then all 4 have to fail (or not send data).
There are some basic examples in the Queries pane for Heartbeat.