Forum Discussion
mrboxx
Sep 14, 2020Brass Contributor
Anomaly detection - how to
Hi - I would like to detect anomalies across multiple fields that are not numeric (e.g. looking for unusual azure ad sign-in events using source IP, app name, account name, client name). To the best ...
mergene
Sep 17, 2020Brass Contributor
Could you be more specific? what kind of anomaly do you want to detect exactly? if you give an example, that would be better. you can count by IP address and other fields, then use the anomaly detection if you are trying to detect anomalies based on numbers. there are some ML functions you can use to detect anomalies as well. evaluate baseket() and evaluate autocluster() can be used to detect anomalies.
mrboxx
Sep 17, 2020Brass Contributor
Hi Cyb3rMonk I want to identify unusual sign-in activity in Azure AD logs so that these can be investigated as potential compromised accounts. As a really simple example - I want to consider events fields (i) the UPN and and (ii) the country from the location field. - I consider an unusual event to occur when a sign-in occurs from a country that is not typical for each user For example, I rarely ever travel and live in one isolated country, so my signins each day always come from that one country. If a signin happens from a different country then that's an anomaly that needs to be investigated. In practice, by considering events fields USN, AppDisplayName and the location (or even better the IP ASN), a small number of unusual events can be identified. I typically use the same set of apps, at work (corp network), on the bus (cell phone carrier) and then at home (residential xdsl). All of the examples that I've seen using sentinel (e.g. https://github.com/Azure/Azure-Sentinel/blob/master/Hunting%20Queries/SigninLogs/AnomalousUserAppSigninLocationIncrease.yaml) summarise events to a numeric series (e.g. number of locations that a user signed in from per day) and then look for outliers in the count. In practice this event is fallible - because one of the locations in the count could be highly unusual while the count is still numerically normal. Our most important users do travel regularly so their normal pattern of use is more complex than most people, making count based approaches less effective and more likely to miss something significant.
- mergeneSep 19, 2020Brass Contributor
mrboxx you can create a baseline data and compare the last 1d of data with your baseline by using join. There are several ways to accomplish this. The below is an example:
// Logic: create a baseline by using data from 15 days ago until 1 day ago. // compare the last 1d of data with the baseline let startdate=15d; let enddate=1d; let baseline = materialize ( SigninLogs | where TimeGenerated between ( ago(startdate) .. ago(enddate)) | where OperationName == "Sign-in activity" | extend countryOrRegion_ = tostring(LocationDetails.countryOrRegion) //| summarize Country_=make_set(countryOrRegion_) by Identity, bin(TimeGenerated, 1d) | summarize max(TimeGenerated) by Identity, countryOrRegion_, bin(TimeGenerated,1d) ); let countries_by_identity = baseline | summarize previous_countries=make_set(countryOrRegion_) by Identity; let existing_users = baseline | summarize make_list(Identity); SigninLogs | where TimeGenerated > ago(1d) | where OperationName == "Sign-in activity" | where Identity in~ (existing_users) // to remove the false positive where an identity is first seen. | extend countryOrRegion_ = tostring(LocationDetails.countryOrRegion) | summarize LastSigninActivity=max(TimeGenerated) by Identity, countryOrRegion_ | join kind=leftanti baseline on Identity, countryOrRegion_ | join kind=inner countries_by_identity on Identity | project-away Identity1