When NOT to use shuffle hint?

Microsoft

We are having a discussion here and thought I’d ask the larger group.  When would you NOT want a shuffle hint when doing a join and summary?  And if you always want to use it, why is it not by default?

1 Reply

Full docs of shuffle join and shuffle summarize are here:

https://docs.microsoft.com/en-us/azure/kusto/query/shufflejoin

https://docs.microsoft.com/en-us/azure/kusto/query/shufflesummarize

 

Docs say:

".Shuffle summarize strategy can provide significant performance benefit when the 'by' clause has columns with high cardinality which may be causing the regular summarize strategy to hit query limits."

 

When not to use: when the cardinality of the key is low.

For example: if you have table 'Data' with column Level which is one of "Error", "Info", "Warning" (cardinality = 3) - you don't want to use shuffle summarize as it will move the data between the nodes executing the query.

 

Similar logic applies to join