Forum Discussion
KQL Performance Optimization
I suspect you have read this: https://docs.microsoft.com/en-us/azure/data-explorer/kusto/query/best-practices
Is the summarize that's having the issue within the Join or outside the join? If the data is Syslog, I assume its the Syslog table and you have that on the right of the join - see link above (the datatable being the smaller table on the left?). Are you able to reduce the data from syslog, maybe use a bin if its across many hours or days?
Can you offer a sample of the datatable and Syslog syntax?
Yes, i read it.
Summarize is outside the JOIN, and i confirmed that it's the culprit, not the JOIN. Any ideas to improve the performance of JOIN, given that the dataset it's acting on it very huge (tens of millions of logs).
//PART1: Logs preparation, note that syslog1 is a function created elsewhere:
Syslog1
| where TimeGenerated > startDatetime and TimeGenerated < endDatetime
| where Activity == "TRAFFIC" or Activity == "THREAT"
| project-rename signature = ThreatName
| project-rename threat_category = ThreatCategory
| project-rename category= ThreatType, severity = SeverityLevel, action = DeviceAction, content_version = ContentVersion
| extend app = tostring(Application), day = dayofmonth(TimeGenerated), month = getmonth(TimeGenerated), year = getyear(TimeGenerated)
| extend traffic = toint(Bytes)
| extend parsed_src_IP = parse_ipv4(SourceIP)
| extend src_zone = case(parsed_src_IP between ( parse_ipv4("192.168.0.0") .. parse_ipv4("192.168.255") ),"Guest",
parsed_src_IP between ( parse_ipv4("127.0.0.0") .. parse_ipv4("127.0.0.1") ),"Corporate",
parsed_src_IP between ( parse_ipv4("10.0.0.0") .. parse_ipv4("10.255.255.255") ),"Corporate",
parsed_src_IP between ( parse_ipv4("172.16.0.0") .. parse_ipv4("172.31.255.255") ),"Corporate",
//else
"Public")
| extend parsed_dst_IP = parse_ipv4(DestinationIP)
| extend dst_zone = case(parsed_dst_IP between ( parse_ipv4("192.168.0.0") .. parse_ipv4("192.168.255") ),"Guest",
parsed_dst_IP between ( parse_ipv4("127.0.0.0") .. parse_ipv4("127.0.0.1") ),"Corporate",
parsed_dst_IP between ( parse_ipv4("10.0.0.0") .. parse_ipv4("10.255.255.255") ),"Corporate",
parsed_dst_IP between ( parse_ipv4("172.16.0.0") .. parse_ipv4("172.31.255.255") ),"Corporate",
//else
"Public")
// Problematic Part which is the following summarize statement
| summarize sum(traffic), count(app) by app, src_zone, dst_zone
// PART3:
JOIN
- Ofer_ShezafJul 28, 2020
Microsoft
majo01 :
I think there is little to do with "summarize" beyond shuffle. Is the performance prior to summarize reasonable?
One option is to filter before parsing. If it is easy to check Activity on raw syslog, it should accelerate things considerably. I woudl first check by including the parser in the query. If it works, the parser can be omdified to be a parameter function that can be applied after initial filtering.
- majo01Aug 04, 2020Brass Contributor
Thank you.
I tried a bit to make a parser as parameter function, something is going wrong. How can we call the tabular parameter from within the function ?
Below is sample:
//PART1: prepare the filtered dataset that's to be passed to the parser
let filteredDataSet = Syslog| where SyslogMessage contains ",TRAFFIC,";//PART2: the parameterized function with tablular parameter. The function will perform Extend/Project actions on the table / message passed to it as parameterlet parserFunc = (T:(*) ){T | extend testField = SyslogMessage // here it fails to call SyslogMessage, even if SyslogMessage is passed as a strandalone string};// PART3: invoke the function and pass the filtered dataset as parameterparserFunc(filteredDataSet)Note that the function seems to be able to read the passed table as a whole, but not to invoke a specific column from the table.This func returns the filtered dataset properly:let parserFunc = (T:(*) ){T //| extend testField = SyslogMessage};- Ofer_ShezafAug 05, 2020
Microsoft
majo01 : the function needs to be defined with parameters for the fields that you will reference; the table you pass in will require those fields but can have others.