Jul 30 2020 04:02 AM
Hi everyone! I'm working with Sentinel and I've run into some join behaviour that either I don't understand or is broken. I'm using a join to compare data in an IoC watchlist (Anomali stored in ThreatIntelligenceIndicator) and some basic firewall logs in CommonSecurityLog.
If I run the query below I get one result per "hit" on the watch list. I also get a warning about this query utilising resources and it takes a while to execute.
CommonSecurityLog
| join ( ThreatIntelligenceIndicator | where ThreatType=="WatchList" | project-rename DestinationIP = NetworkIP ) on DestinationIP
If I run this query where I "reverse the join" I get each individual firewall event, which is multiple hits per target IP on the watchlist. This query is also faster!
ThreatIntelligenceIndicator | where ThreatType == "WatchList"
| join ( CommonSecurityLog | project-rename NetworkIP = DestinationIP ) on NetworkIP
| project TimeGenerated1, DeviceAction, SourceIP, NetworkIP, DestinationPort, ReceivedBytes, SentBytes, Message
Obviously, the second query and its multiple results are preferable. But is this expected behaviour and why?
Thanks!
Jul 30 2020 04:30 AM
@JKatzmandu The inner join you are doing also does a de-duplication (removing any duplicate rows) before trying to join with the table on the right. Since you are not doing any sort of filtering in the first query, that alone will take some time since it has to look at all the rows in the selected time period. It will then perform the inner join with the right table. My guess is there is only one entry in the left table, after the rows were de-duplicated, that matches any rows in the right table hence there is only one row returned. Take a look at the row being matched in the left table. Does it have duplicate entries?
By switching the table order you are now doing the de-duplication on a table that does have a filter so that step alone will be faster. You then match everything on the left table with the matching values on the right table. Since the right table is no longer being de-duplicated there are more matches (and I am guessing at that part).
Does that make sense?
Jul 30 2020 04:33 AM
@Gary Bushey Understood! It's just me figuring out proper join behaviour, that's all :D
Aug 02 2020 06:25 AM
Two additional points to @Gary Bushey's excellent answer:
~ Ofer