Approximate, partial and combined lookups in Azure Sentinel
Published May 16 2020 09:10 AM 14K Views
Microsoft

Many of you found Nir Gafni's "Implementing Lookups in Azure Sentinel" article useful. A common question you made was how to lookup partial values. For example, does the account name matches a list of patterns? Or is an IP address in a list of subnets? Partial lookups are a significant challenge with lookups in other SIEM products and often require reverting to hard to maintain methods such as nested filters.

 

In this blog post, I will show you how to implement partial lookups with Azure Sentinel. I will also venture into more advanced areas such as combined multi-condition lookups.

 

This post of part of a series of blog posts on writing rules in Azure Sentinel:

 

Phrase lookup: the has_any operator

 

The simplest and most efficient way to do partial lookups in Azure Sentinel is to use the "has_any" operator. While the examples in Implementing Lookups in Azure Sentinel used the "in" and "!in" operators which do an exact match, the "has_any" operator searches for any one of a list of lookup phrases in the target field.

 

Let's look into a user watchlist example in Linux Syslog messages. We want to alert only when specific users, stored in a list, fail to login with a wrong password.

Let's start with the lookup table:

 

let keywords_table = datatable (w: string) ['postgres','nagios','doker'];

 

I am using the datatable operator to create the table for convenience so that you can use the example. However, you can use any lookup table source as described in implementing Lookups in Azure Sentinel, such as using the "externaldata" data operator, custom tables, or storing the "datatable" operator in a function.  

 

Next, we need to convert the table to a list:

 

let lookup = toscalar(keywords_table| summarize l=make_list(w));

 

This command could be combined with the previous or the next and is separated here to make it more readable.

 

Lastly, we use the has_any operator to look for only failed password events for the users in the lookup table:

 

Syslog
| where SyslogMessage startswith "Failed password"
| where SyslogMessage has_any (lookup)

 

The "has_any" operator is handy, but is limited to a phrase delimited by word boundaries and would not match any substring. We need another solution for substring matching and other partial matching operators.

 

Watch listing by IP ranges: the mv-apply operator

 

A widespread watch list scenario that requires partial lookups is selecting events based on IP ranges. IP ranges are usually represented in a CIDR notation. For example, the internal IP address range 192.168.x.x is represented by the CIDR notation 192.168.0.0/16.

 

As an example, let's implement an IP range watch list using Azure Sentinel that selects only events that originated in an internal network. The lookup table is the following table, which includes private IP address ranges:

 

let private_ranges = datatable (ip_range: string) ['192.168.0.0/16', '172.16.0.0/12', '10.0.0.0/8'];

 

As mentioned, the “datatable” operator can be “externaldata”, a custom table, or a function.  

Here again, we need to convert the table to a list, using the following command:

 

let lookup = toscalar(private_ranges| summarize l=make_list(ip_range));

 

Now, that we have the watch list ready, we can use it:

 

CommonSecurityLog
| where TimeGenerated > ago(5m)
| where …
| mv-apply l=lookup to typeof(string) on
(
  where ipv4_is_match (SourceIP, l)
)
| project-away l

 

The important operator here is "mv-apply", which applies the query (in red) to every value in the lookup list over every record.  

 

Since mv-apply applies every value in the lookup table to every record, it is performance and memory intensive. If not used carefully, your query may fail. Therefore:

  • Ensure that you use it at the end of your query after you filtered by other criteria that are less demanding.
  • Limit the timespan on which you perform the query as much as possible.
  • Use a reasonably sized lookup table.
  • Lastly, the slightly more complex method presented below for allow-listing can be used for watch lists as well and has better performance.

 

A corner case you may encounter is duplicate results. The result set includes a record for each match, so if an input record matches two elements, it appears twice in the result set. The duplication cannot happen in the example above as the lookup values are mutually exclusive.

 

In some cases, duplicate records are useful, as you would like to analyze each lookup value the record matched. If not, you might need to add something along the following lines. The query requires that a set of fields uniquely identify an event. For SecurityEvents, "EventOriginId" does the trick:

 

| summarize arg_max(TimeGenerated, *) by EventOriginId

 

The solution presented for allow-lists below also overcomes this issue.

There are also a few subtleties when using mv-apply

  • You need to explicitly say what the type of "l" is using the "to typeof()" phrase is.
  • "l" is added to every record and has to be "projected-away"

 

Allow-listing by username

 

Since it returns a record for every match, the technique above cannot be used for allow-listing. For example, if you try to exclude private IP ranges by reversing the logic to:

 

where not(ipv4_is_match (SourceIP, l))

 

You get 3 repeats of every record that does not match the ranges, and 2 repeats for each one that matches one of them. Certainly not the intended result.

 

So, how can we allow-list?

 

The solution is to find using the technique presented for watchlists the list of matching values and apply it using a regular, exact, lookup.

 

Setting up the lookup table is similar to the examples above. In this case, I combined the table and list creation into one statement:

 

let lookup=toscalar(datatable(p:string) ['user','admin'] | summarize l=make_list(p));

 

Next, we use the same mv-apply to check the Account names against the lookup list:

 

let matched_users = toscalar(
SecurityEvent
| where TimeGenerated > ago(5m)
| where …
| summarize by Account
| mv-apply l=lookup to typeof(string) on
(
  where Account contains l
)
| summarize make_list(Account));

 

Notice that here we used the “contains” operator. Any operator, complex expressions, or even full queries, can be used in an “mv-apply” sub-query.

 

Since we do not need the events but only user names to continue the analysis, we can summarize by account (in green) to get a list of unique account names, making this version much more efficient. Lastly, we have created a list of the resulting values (orange), which is a list of account names that contain any of the lookup values.

 

We use the list generated above as a allow list using the "!in" operator (blue)

 

SecurityEvent
| where TimeGenerated > ago(5m)
| where …
| where Account !in (matched_users)

 

We used the same event filter when generating the exclusion list and when using it. While in a simple Azure Sentinel analytics rule which does not filter explicitly by the time this would work, it runs a risk, in some circumstance, that the two runs will not apply to the same event set. The following variant uses the "materialize" operator (magenta) to resolve that:

 

let lookup=toscalar(datatable(p:string) ['user','admin'] | summarize l=make_list(p));
let events=materialize(SecurityEvent | where TimeGenerated  > ago(5m) | where …);
let matched_users = toscalar(events
| summarize by Account
| mv-apply l=lookup to typeof(string) on
(
  where Account contains l
)
| summarize make_list(Account));
events
| where Account !in (matched_users)

 

Handling multiple conditions

 

Ready to get the full strength of Azure Sentinel? The following example demonstrated how to take the technique presented in this blog post a step further. It adds a couple of dimensions:

  • Allowing a different operator to be selected for each lookup value
  • Combining conditions over several fields

 

First, the lookup table includes, in addition to the username pattern, also an operator. The query supports “contains”, “startswith” and “endswith”. It also includes the condition for the additional field we want to lookup, AccountType in this case.

 

let lookup=toscalar(datatable(op:string, account:string, accounttype:string) [
    'startswith', '\\admin', 'User',
    'endswith', 'dc$', 'Machine']
| summarize l=make_list(pack('op',op,'account',account, 'accounttype',accounttype)));

 

Notice that to make all the values available to the mv-apply operation which requires a list input, the list is now make of tuples using KQL “dynamic” type and the “pack” function which creates the dynamic value.

 

The main template used below for the query is the user allow-listing described above. However, the sub-query used by the mv-apply operator now handles both the operator selection and the additional condition, demonstrating the capabilities available when using mv-apply:

 

 

let events=materialize(SecurityEvent | where TimeGenerated > ago(5m) | where …);
let matched_users = toscalar(
events
| summarize by Account, AccountType
| mv-apply l=lookup on
(
    extend match = case (
        l['op'] == "contains",
            Account contains l['account'] and AccountType == l['accounttype'],
        l['op'] == "startswith",
            Account startswith l['account'] and AccountType == l['accounttype'],
        l['op'] == "endswith",
            Account endswith l['account'] and AccountType == l['accounttype'],
        False
    )
)
| where match | summarize make_list(strcat(AccountType, Account)));
events | where strcat(AccountType, Account) in (matched_users)

 

Summary

While exact lookups are useful, many use cases require partial lookups, which challenge every SIEM system. In this article, I presented how to implement such approximate and partial lookups in Azure Sentinel. I hope you will find it useful.

4 Comments
Copper Contributor

Hi @Ofer_Shezaf 
Thank you for this post.


May I ask you how to query public/ remote IP events in CommonSecurityLogs ? if my ranges are stored in WatchList ?

_GetWatchlist('PrivateIPs')
PrivateIPs are ranges e.g 192.168.0.0//16, 172.16.0.0//12

in other words, Im interested in the opposite case that you mentioned in this post, something like this but more dynamic/ from Watchlist
CommonSecurityLog
| where not(ipv4_is_match (SourceIP, '10.111.40.51'))
| where not(ipv4_is_match (SourceIP, '172.16.0.0/12'))
 
Regards,
Iron Contributor

Hi everyone,

I'd like to use NOT has_any (watchist)

How do you say NOT with has_any?

 

Thanks!

Iron Contributor

figured it out just put the not outside parentheses, thanks.

Iron Contributor

Is there a way to find the country for an IP?

eg. Fortinet has SourceIP, DestinationIP. Can I map those to a country or country abbreviation?

 

I've seen this geoData (below) used somewhere, but I don't know how to map this to the provided subnet to source, destination IP addresses, and get the resulting country information.

 

let geoData =
materialize (externaldata(network:string,geoname_id:string,continent_code:string,continent_name:string,
country_iso_code:string,country_name:string,is_anonymous_proxy:string,is_satellite_provider:string)
[@"https://raw.githubusercontent.com/datasets/geoip2-ipv4/master/data/geoip2-ipv4.csv"] with
(ignoreFirstRecord=true, format="csv"));
let lookup = toscalar( geoData | summarize list_CIDR=make_set(network) );


Version history
Last update:
‎Jul 06 2020 11:18 AM
Updated by: