I need to report on Azure Backup products (MABS, Azure Backup Agent, Azure VM Backup)

%3CLINGO-SUB%20id%3D%22lingo-sub-1762030%22%20slang%3D%22en-US%22%3EI%20need%20to%20report%20on%20Azure%20Backup%20products%20(MABS%2C%20Azure%20Backup%20Agent%2C%20Azure%20VM%20Backup)%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1762030%22%20slang%3D%22en-US%22%3E%3CP%3EI'm%20having%20real%20difficulty%20with%20the%20Kusto%20language%20and%20the%20relatively%20undocumented%20fields%20used%20in%20Azure%20Monitor.%3C%2FP%3E%3CP%3EHere%20is%20what%20I'd%20like%20to%20do%3A%20I%20need%20a%20simple%20dashboard%20that%20can%20tell%20me%20if%20I%20have%20my%20systems%20have%20not%20had%20a%20successful%20backup%20for%20over%2024%20hours.%20I%20cannot%20use%20the%20Log%20Analytics%20alerts%20feature%20for%20this%20because%20I%20have%20multiple%20jobs%20that%20will%20fail%20over%20and%20over%2C%20yet%20actually%20will%20end%20up%20backing%20up%20successfully%20after%204%20or%205%20failures%2C%20so%20I%20am%20trying%20to%20not%20spam%20our%20poor%20Offshore%20team%20with%20500%20alerts%20a%20day.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIs%20a%20query%20like%20this%20reasonable%20for%20Log%20Analytics%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-1842011%22%20slang%3D%22en-US%22%3ERe%3A%20I%20need%20to%20report%20on%20Azure%20Backup%20products%20(MABS%2C%20Azure%20Backup%20Agent%2C%20Azure%20VM%20Backup)%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1842011%22%20slang%3D%22en-US%22%3E%3CP%3EHi%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F563839%22%20target%3D%22_blank%22%3E%40amk_19238%3C%2FA%3E%2C%3C%2FP%3E%0A%3CP%3EIt%20depends%20on%20how%20you%20monitor%20your%20backups.%3C%2FP%3E%0A%3CP%3EIf%20you're%20sending%20logs%20of%20Recovery%20services%20vaults%20to%20Log%20Analytics%2C%20you%20can%20analyze%20them%20with%20the%20queries%20shared%20on%20our%20%3CA%20href%3D%22https%3A%2F%2Fgithub.com%2Fmicrosoft%2FAzureMonitorCommunity%2Ftree%2Fb44477a3e21f988accffdb21635e5f53cc670ec6%2FAzure%2520Services%2FRecovery%2520Services%2520vaults%2FQueries%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%22%3EGitHub%20repo%3C%2FA%3E.%3C%2FP%3E%0A%3CP%3EAnother%20option%20is%20to%20review%20the%20AzureDiagnostics%20table.%20This%20query%20will%20return%20all%20failed%20jobs%3A%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-applescript%22%3E%3CCODE%3E%2F%2F%20Failed%20backup%20jobs%0AAzureDiagnostics%20%20%0A%7C%20where%20ResourceProvider%20%3D%3D%20%22MICROSOFT.RECOVERYSERVICES%22%20and%20Category%20%3D%3D%20%22AzureBackupReport%22%20%20%0A%7C%20where%20OperationName%20%3D%3D%20%22Job%22%20and%20JobOperation_s%20%3D%3D%20%22Backup%22%20and%20JobStatus_s%20%3D%3D%20%22Failed%22%20%0A%7C%20project%20TimeGenerated%2C%20JobUniqueId_g%2C%20JobStartDateTime_s%2C%20JobOperation_s%2C%20JobOperationSubType_s%2C%20JobStatus_s%20%2C%20JobFailureCode_s%2C%20JobDurationInSecs_s%20%2C%20AdHocOrScheduledJob_s%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3E%26nbsp%3BBut%20since%20you%20don't%20want%20to%20spam%20your%20teams%2C%20this%20query%20can%20list%20resources%20that%20had%20successful%20backups%20over%20the%20last%203%20days%2C%20but%20not%20on%20the%20last%2024%20hours%3A%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-applescript%22%3E%3CCODE%3Elet%20LastSuccessfulBackup%3D%0AAzureDiagnostics%20%20%0A%7C%20where%20TimeGenerated%20%26gt%3B%20ago(3d)%0A%7C%20where%20ResourceProvider%20%3D%3D%20%22MICROSOFT.RECOVERYSERVICES%22%20and%20Category%20%3D%3D%20%22AzureBackupReport%22%20%20%0A%7C%20where%20OperationName%20%3D%3D%20%22Job%22%20and%20JobOperation_s%20%3D%3D%20%22Backup%22%20and%20JobStatus_s%20%3D%3D%20%22Completed%22%0A%7C%20summarize%20arg_max(TimeGenerated%2C%20*)%20by%20_ResourceId%0A%7C%20project%20TimeGenerated%2C%20_ResourceId%2C%20JobUniqueId_g%2C%20JobStartDateTime_s%2C%20JobOperation_s%2C%20JobOperationSubType_s%2C%20JobStatus_s%20%2C%20JobFailureCode_s%2C%20JobDurationInSecs_s%20%2C%20AdHocOrScheduledJob_s%3B%0ALastSuccessfulBackup%0A%7C%20where%20TimeGenerated%20%26gt%3B%20ago(24h)%0A%7C%20summarize%20by%20_ResourceId%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E
Senior Member

I'm having real difficulty with the Kusto language and the relatively undocumented fields used in Azure Monitor.

Here is what I'd like to do: I need a simple dashboard that can tell me if I have my systems have not had a successful backup for over 24 hours. I cannot use the Log Analytics alerts feature for this because I have multiple jobs that will fail over and over, yet actually will end up backing up successfully after 4 or 5 failures, so I am trying to not spam our poor Offshore team with 500 alerts a day.

 

Is a query like this reasonable for Log Analytics?

 

1 Reply

Hi @amk_19238,

It depends on how you monitor your backups.

If you're sending logs of Recovery services vaults to Log Analytics, you can analyze them with the queries shared on our GitHub repo.

Another option is to review the AzureDiagnostics table. This query will return all failed jobs:

// Failed backup jobs
AzureDiagnostics  
| where ResourceProvider == "MICROSOFT.RECOVERYSERVICES" and Category == "AzureBackupReport"  
| where OperationName == "Job" and JobOperation_s == "Backup" and JobStatus_s == "Failed" 
| project TimeGenerated, JobUniqueId_g, JobStartDateTime_s, JobOperation_s, JobOperationSubType_s, JobStatus_s , JobFailureCode_s, JobDurationInSecs_s , AdHocOrScheduledJob_s

 But since you don't want to spam your teams, this query can list resources that had successful backups over the last 3 days, but not on the last 24 hours:

let LastSuccessfulBackup=
AzureDiagnostics  
| where TimeGenerated > ago(3d)
| where ResourceProvider == "MICROSOFT.RECOVERYSERVICES" and Category == "AzureBackupReport"  
| where OperationName == "Job" and JobOperation_s == "Backup" and JobStatus_s == "Completed"
| summarize arg_max(TimeGenerated, *) by _ResourceId
| project TimeGenerated, _ResourceId, JobUniqueId_g, JobStartDateTime_s, JobOperation_s, JobOperationSubType_s, JobStatus_s , JobFailureCode_s, JobDurationInSecs_s , AdHocOrScheduledJob_s;
LastSuccessfulBackup
| where TimeGenerated > ago(24h)
| summarize by _ResourceId