Forum Discussion

Morten Lerudjordet's avatar
Morten Lerudjordet
Copper Contributor
Nov 30, 2017
Solved

Remove duplicates from query

Hi, hope somebody can help me as I'm a bit stuck in my understanding of the query language.

So I'm trying to get some creation events for App Services, though there seems to be multiple entries for the same App. Therefore I'm trying to find a way to remove duplicates on a column but retain the rest of the columns in the output / or a defined set of columns. Though after dodging distinct on a specific column only this is retained in the output.

This is my query:

 

AzureActivity 
| where OperationName == 'Delete website' and ActivityStatus == 'Succeeded' and ResourceProvider == 'Azure Web Sites' 

Though this produces two entires for the same deletion, so I tired this:

 

AzureActivity 
| where OperationName == 'Delete website' and ActivityStatus == 'Succeeded' and ResourceProvider == 'Azure Web Sites' 
| distinct CorrelationId

Though this only leaves the CorrelationId in the output but I need the Resource, ResourceID,OperationName also to be shown in the output. Any tips on how to get the syntax correct?

 

Thanks

 

 

 

  • Stanislav_Zhelyazkov's avatar
    Stanislav_Zhelyazkov
    Dec 01, 2017
    So does arg_max() does the job? From my understanding is the thing that will work for you.

11 Replies

  • Hi There is some basics that you need to understand about Log Analytics. Data in Log Analytics is stored with different time stamp (TimeGenerated column). So basically when a solution sends new data it does not delete the old one it just writes a new record with new date. That way you can view historically the data. With that said for example if you want to get the latest record for something there is function for that arg_max(). You can read about it here:

     

    https://docs.loganalytics.io/docs/Language-Reference/Aggregation-functions/arg_max()

     

    I am not sure what you want to achieve but may be it is something like this:

     

    AzureActivity 
    | where OperationName == 'Delete website' and ActivityStatus == 'Succeeded' and ResourceProvider == 'Azure Web Sites' 
    | summarize arg_max(TimeGenerated, *) by CorrelationId

     You might need to aggregated on another column or more than one column depending on your scenario. 

    • Morten Lerudjordet's avatar
      Morten Lerudjordet
      Copper Contributor

      Thanks Stan, sorry for the bad description.

       

      So my original query to get information about deleted App Services returns multiple events for the deletion of the same App. Though these have the same CorrelationId. Therefore I was trying with my limited knowledge of the query language to only return 1 record that has the same CorrelationId. Though still get all the columns the event contains. 

       

      I'm trying to use this to drive a webhook and trigger a runbook to do some house cleaning on both creation and deletion of App Services. 

       

      My thinking is that the best way to drive the runbook was to use the query language to filter out all the data that the runbook does not need to make its logic work. 

       

      Hopefully this makes more sense?

      • Morten Lerudjordet's avatar
        Morten Lerudjordet
        Copper Contributor

        If one just ignores time at the moment this almost gets me there but it will only retain CorrelationId on the result output, but I need to get all the columns (or use project to choose the ones I need)

         

        AzureActivity 
        | where OperationName == 'Delete website' and ActivityStatus == 'Succeeded' and ResourceProvider == 'Azure Web Sites' 
        | summarize by CorrelationId

Resources