Forum Discussion
Remove duplicates from query
- Dec 01, 2017So does arg_max() does the job? From my understanding is the thing that will work for you.
Thanks Stan, sorry for the bad description.
So my original query to get information about deleted App Services returns multiple events for the deletion of the same App. Though these have the same CorrelationId. Therefore I was trying with my limited knowledge of the query language to only return 1 record that has the same CorrelationId. Though still get all the columns the event contains.
I'm trying to use this to drive a webhook and trigger a runbook to do some house cleaning on both creation and deletion of App Services.
My thinking is that the best way to drive the runbook was to use the query language to filter out all the data that the runbook does not need to make its logic work.
Hopefully this makes more sense?
If one just ignores time at the moment this almost gets me there but it will only retain CorrelationId on the result output, but I need to get all the columns (or use project to choose the ones I need)
AzureActivity | where OperationName == 'Delete website' and ActivityStatus == 'Succeeded' and ResourceProvider == 'Azure Web Sites' | summarize by CorrelationId
- Morten LerudjordetDec 01, 2017Copper Contributor
Think I maybe figured it out?
Something like this seems to work
AzureActivity | where OperationName == 'Delete website' and ActivityStatus == 'Succeeded' and ResourceProvider == 'Azure Web Sites' | project ResourceId, CorrelationId, TimeGenerated | summarize arg_max(TimeGenerated, *) by CorrelationId
- Dec 01, 2017Yes. Forgot the '*' in arg_max() sorry. It is best practice to project at the end after summarize
- Morten LerudjordetDec 01, 2017Copper Contributor
Seems I can also do this, so I dont need to do project at the end:
AzureActivity | where OperationName == 'Delete website' and ActivityStatus == 'Succeeded' and ResourceProvider == 'Azure Web Sites' | summarize arg_max(TimeGenerated, ResourceId, Resource, ResourceGroup, OperationName) by CorrelationId
Thanks for steering me in the right direction.
- Dec 01, 2017
Hmm the query you've posted last is different from the one I've posted. It does not contain arg_max(). arg_max() retains all columns. If you do not want to use arg_max() when you get two results what is the difference between them? Which one of the two you want to be displayed only?