Jan 16 2019 11:39 AM
We are having a discussion here and thought I’d ask the larger group. When would you NOT want a shuffle hint when doing a join and summary? And if you always want to use it, why is it not by default?
Jan 17 2019 12:30 AM
Full docs of shuffle join and shuffle summarize are here:
https://docs.microsoft.com/en-us/azure/kusto/query/shufflejoin
https://docs.microsoft.com/en-us/azure/kusto/query/shufflesummarize
Docs say:
".Shuffle summarize strategy can provide significant performance benefit when the 'by' clause has columns with high cardinality which may be causing the regular summarize strategy to hit query limits."
When not to use: when the cardinality of the key is low.
For example: if you have table 'Data' with column Level which is one of "Error", "Info", "Warning" (cardinality = 3) - you don't want to use shuffle summarize as it will move the data between the nodes executing the query.
Similar logic applies to join