Forum Discussion
How to Prevent Duplicate Incidents from Being Generated due to Long Data Look Back
- Oct 29, 2022
Step 1:
Let's do a simple query over the past 2hrs to find distinct computers that start with "S" in their nameHeartbeat | where TimeGenerated between (now(-2h) .. now()) //| summarize min(TimeGenerated), max(TimeGenerated) | where Computer startswith "S" | distinct Computer
This returns two computers that match:
SQL12.na.contosohotels.comSQL00.na.contosohotels.comStep 2:Now we do similar for the past 14days, minus the most recent 2hrsHeartbeat | where TimeGenerated between (ago(14d) .. ago(2h)) //| summarize min(TimeGenerated), max(TimeGenerated) | where Computer startswith "S" | distinct Computer
This returns 3 computers:SQL12.na.contosohotels.comSQL00.na.contosohotels.comSQL01.na.contosohotels.com
Step 3 (join it all together, in one query)// // Look back in the past 2 hours for a list of distinct Computers - // I only wanted to show named ones that start with an "S" to keep it simple // Heartbeat | where TimeGenerated between (now(-2h) .. now()) //| summarize min(TimeGenerated), max(TimeGenerated) | where Computer startswith "S" | distinct Computer // // Now do the same for the past 14days minus the last 2 hours - this is key so we dont process the same data!!! // | join kind=rightanti ( Heartbeat | where TimeGenerated between (ago(14d) .. ago(2h)) //| summarize min(TimeGenerated), max(TimeGenerated) | where Computer startswith "S" | distinct Computer ) on Computer​
This returns the single computer that is only found in the last 2hrs but not in the previous 14days, usig a JOIN and rightanti :
SQL01.na.contosohotels.com
Click here to see this in action (it works at the time of writing but as the hours change, the filter I used on "S" may mean it won't be a good demo).Go to Log Analytics and run query
An alternative would be a similar query but lookup the found email or IP in the past two hours in the Incident/Alert in the SecurityIncident/SecurityAlert tables
One suggestion would be to add those entities to a watchlist and just exclude watchlist entries in your query.
And you could add a timestamp column to the watchlist to filter out entries that are x days old.
- Clive_WatsonOct 28, 2022Bronze ContributorThis is a classic look back query. What you will need to do is split the query, part 1 is to look at the past 2hrs or whatever the rule frequency is set to. Then if that gets a result compare to 14days minus the look back period...which will remove the duplicates
- Browtastic670Oct 28, 2022Copper ContributorHey Clive!
Thank you so much for your response.
The method that you are proposing seems very interesting (and a bit confusing) to us. How exactly do we go about implementing it?
For reference we have our original query below:
TheatIntel Table
| summarize by indicator_s
| join kind=rightsemi (
Web_Traffic_Table
| summarize by User=Email_s,Action_s,DestinationIP_s,HTTPMethod_s,HTTPHost_s,URL_s,Referer_s,UserAgent_s,DownloadedFileNames_s,UploadedFileNames_s
) on $left.indicator_s==$right.HTTPHost_s
Here what we are trying to do is correlate any matching Threat Intel Feeds with the Web Browsing Logs of our users.
Another solution that stems from SocInABox's comment is that we can add all the Threat Intel Feeds into a Watchlist instead of a table and that way we won't even have to worry about the time range.
But the issue of duplicate alerts will still remain 😕- Clive_WatsonOct 29, 2022Bronze Contributor
Step 1:
Let's do a simple query over the past 2hrs to find distinct computers that start with "S" in their nameHeartbeat | where TimeGenerated between (now(-2h) .. now()) //| summarize min(TimeGenerated), max(TimeGenerated) | where Computer startswith "S" | distinct Computer
This returns two computers that match:
SQL12.na.contosohotels.comSQL00.na.contosohotels.comStep 2:Now we do similar for the past 14days, minus the most recent 2hrsHeartbeat | where TimeGenerated between (ago(14d) .. ago(2h)) //| summarize min(TimeGenerated), max(TimeGenerated) | where Computer startswith "S" | distinct Computer
This returns 3 computers:SQL12.na.contosohotels.comSQL00.na.contosohotels.comSQL01.na.contosohotels.com
Step 3 (join it all together, in one query)// // Look back in the past 2 hours for a list of distinct Computers - // I only wanted to show named ones that start with an "S" to keep it simple // Heartbeat | where TimeGenerated between (now(-2h) .. now()) //| summarize min(TimeGenerated), max(TimeGenerated) | where Computer startswith "S" | distinct Computer // // Now do the same for the past 14days minus the last 2 hours - this is key so we dont process the same data!!! // | join kind=rightanti ( Heartbeat | where TimeGenerated between (ago(14d) .. ago(2h)) //| summarize min(TimeGenerated), max(TimeGenerated) | where Computer startswith "S" | distinct Computer ) on Computer​
This returns the single computer that is only found in the last 2hrs but not in the previous 14days, usig a JOIN and rightanti :
SQL01.na.contosohotels.com
Click here to see this in action (it works at the time of writing but as the hours change, the filter I used on "S" may mean it won't be a good demo).Go to Log Analytics and run query
An alternative would be a similar query but lookup the found email or IP in the past two hours in the Incident/Alert in the SecurityIncident/SecurityAlert tables