Partition strategy to improve performance

%3CLINGO-SUB%20id%3D%22lingo-sub-1145406%22%20slang%3D%22en-US%22%3EPartition%20strategy%20to%20improve%20performance%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1145406%22%20slang%3D%22en-US%22%3E%3CP%3EHi%2C%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EI%20am%20querying%20loganalytics%20workspace%20from%20workbook.%20By%20nature%20of%20the%20solution%2C%20the%20log%20analytics%20workspace%20will%20have%20large%20number%20of%20records.%20And%20the%20query%20is%20bit%20complex%20with%20number%20of%20joins%20and%20summarize.%3CBR%20%2F%3E%3CBR%20%2F%3EMost%20of%20my%20queries%20work%20fine.%20But%20few%20of%20the%20queries%20actually%20lead%20to%20memory%20peak%20and%20then%20aborting%26nbsp%3B%20query%20execution.%3CBR%20%2F%3E%3CBR%20%2F%3EHence%20I%20looked%20at%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fkusto%2Fquery%2Fshufflequery%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fkusto%2Fquery%2Fshufflequery%3C%2FA%3E%26nbsp%3Bto%20improve%20performance%20by%20using%20shuffle.key%20operator.%20I%20could%20see%20the%20difference.%20The%20query%20which%20used%20to%20get%20aborted%20is%20now%20running%20(but%20occasionally%20fails).%26nbsp%3B%3CBR%20%2F%3E%3CBR%20%2F%3EIn%20the%20above%20link%2C%20it%20mentions%20about%26nbsp%3B%3CSPAN%3Ehint.num_partitions%20and%20using%20which%20we%20can%20specify%20the%20number%20of%20partitions%2Fcluster%20to%20execute%20query%20parallely.%3CBR%20%2F%3E%3C%2FSPAN%3E1.%20How%20many%20clusters%20will%20be%20allocated%20for%20log%20analytics%20workspace.%20Is%20it%20configurable%3F%3CBR%20%2F%3E2.%20In%20some%20of%20the%20log%20queries%20I%20noticed%20%22hint.strategy%3Dpartitioned%22.%20But%20I%20couldn't%20find%20details%20about%20it.%20Could%20you%20please%20explain%2Fprovide%20pointers%3F%26nbsp%3B%20%3CBR%20%2F%3E(Eg%20-%20%3CSPAN%3E%20%7C%20summarize%20hint.strategy%3Dpartitioned%20arg_max(TimeGenerated%2C%20UpdateState)%20by%20SourceComputerId%2C%20UpdateID%20%3C%2FSPAN%3E)%3CBR%20%2F%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-1145406%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%20Log%20Analytics%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EQuery%20Language%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1148203%22%20slang%3D%22en-US%22%3ERe%3A%20Partition%20strategy%20to%20improve%20performance%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1148203%22%20slang%3D%22en-US%22%3EI%20think%20this%20has%20been%20answered%20offline%20%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F263110%22%20target%3D%22_blank%22%3E%40Vino55%3C%2FA%3E%20%3F%3C%2FLINGO-BODY%3E
Microsoft

Hi,

 

I am querying loganalytics workspace from workbook. By nature of the solution, the log analytics workspace will have large number of records. And the query is bit complex with number of joins and summarize.

Most of my queries work fine. But few of the queries actually lead to memory peak and then aborting  query execution.

Hence I looked at https://docs.microsoft.com/en-us/azure/kusto/query/shufflequery to improve performance by using shuffle.key operator. I could see the difference. The query which used to get aborted is now running (but occasionally fails). 

In the above link, it mentions about hint.num_partitions and using which we can specify the number of partitions/cluster to execute query parallely.
1. How many clusters will be allocated for log analytics workspace. Is it configurable?
2. In some of the log queries I noticed "hint.strategy=partitioned". But I couldn't find details about it. Could you please explain/provide pointers? 
(Eg - | summarize hint.strategy=partitioned arg_max(TimeGenerated, UpdateState) by SourceComputerId, UpdateID )



1 Reply
I think this has been answered offline @Vino55 ?