Alerts on suspended update management jobs

%3CLINGO-SUB%20id%3D%22lingo-sub-2812082%22%20slang%3D%22en-US%22%3EAlerts%20on%20suspended%20update%20management%20jobs%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2812082%22%20slang%3D%22en-US%22%3E%3CP%3EI'm%20using%20Azure%20ARC%20and%20update%20management.%20When%20a%20machine%20retries%20a%20certain%20update%20for%20to%20many%20times%20that%20update%20job%20gets%20%22Suspended%22.%20%3CU%3EThis%20marks%20the%20update%20job%20as%20failed%2C%3C%2FU%3E%20but%20since%20there%20has%20been%20no%20Update%20failure%20%22UpdateRunProgress%22%20wont%20catch%20it.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIm%20able%20to%20filter%20out%20the%20Suspended%20job%20using%20below%20in%20Log%20Analytics%3A%3CBR%20%2F%3E%22%3CBR%20%2F%3E%3CSPAN%3EAzureDiagnostics%3C%2FSPAN%3E%3CBR%20%2F%3E%3CSPAN%3E%7C%20where%20ResourceProvider%20%3D%3D%20%22MICROSOFT.AUTOMATION%22%20and%20ResourceType%20%3D%3D%20%22AUTOMATIONACCOUNTS%22%3C%2FSPAN%3E%3CBR%20%2F%3E%3CSPAN%3E%7C%20where%20ResultDescription%20has_any%20(%22%5BAUM%5D%5BIS%5D%5Blevel%3DError%5D%5Bmessage%3DGet-StatusFromException%3A%22%2C%22Status%20%3D%20FailedToStart.%20Exception%3A%20Job%20was%20suspended.%20%22)%3CBR%20%2F%3E%3C%2FSPAN%3E%22%3C%2FP%3E%3CP%3EIssue%20here%20is%20that%20above%20does%20not%20give%20me%20a%20computer%2Fdevice%20name%20i%20cleartext%2C%20so%20setting%20up%20an%20alert%20for%20this%20would%20not%20really%20give%20me%20any%20information%20except%20%22a%20job%20was%20suspended%22.%3CBR%20%2F%3E%3CBR%20%2F%3EI've%20noticed%20that%20i%20can%20find%20the%20computers%20name%20in%20cleartext%20by%20searching%20for%20the%20%3CSTRONG%3EJobid_g%3C%2FSTRONG%3E%20that%20above%20code%20provides%2C%20but%20i%20don't%20know%20enough%20about%20Kusto%20to%20be%20able%20to%20put%20it%20in%20the%20same%20%22search%22.%3CBR%20%2F%3E(If%20it%20was%20powershell%20id%20save%20it%20in%20a%20variable%20and%20then%20search%20for%20that%20specific%20Jobid_g%20to%20find%20the%20computer%20name)%3CBR%20%2F%3E%3CBR%20%2F%3ECode%20used%20to%20find%20device%20name%3A%3CBR%20%2F%3E%3CSPAN%3EAzureDiagnostics%3C%2FSPAN%3E%3CBR%20%2F%3E%3CSPAN%3E%7C%20where%20ResourceProvider%20%3D%3D%20%22MICROSOFT.AUTOMATION%22%20and%20ResourceType%20%3D%3D%20%22AUTOMATIONACCOUNTS%22%3C%2FSPAN%3E%3CBR%20%2F%3E%3CSPAN%3E%7C%20where%20ResultType%20%3D%3D%20%22Started%22%3C%2FSPAN%3E%3CBR%20%2F%3E%3CSPAN%3E%7C%20distinct%20JobId_g%2C%20ResultType%20%2C%20ResultDescription%2C%20RunOn_s%3CBR%20%2F%3E%3CBR%20%2F%3E%3C%2FSPAN%3EIs%20what%20im%20trying%20to%20do%20possible%20in%20Log%20Analytics%20or%20should%20i%20look%20towards%20other%20means%20to%20catch%20this%20error%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E
Occasional Contributor

I'm using Azure ARC and update management. When a machine retries a certain update for to many times that update job gets "Suspended". This marks the update job as failed, but since there has been no Update failure "UpdateRunProgress" wont catch it.

 

Im able to filter out the Suspended job using below in Log Analytics:
"
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.AUTOMATION" and ResourceType == "AUTOMATIONACCOUNTS"
| where ResultDescription has_any ("[AUM][IS][level=Error][message=Get-StatusFromException:","Status = FailedToStart. Exception: Job was suspended. ")
"

Issue here is that above does not give me a computer/device name i cleartext, so setting up an alert for this would not really give me any information except "a job was suspended".

I've noticed that i can find the computers name in cleartext by searching for the Jobid_g that above code provides, but i don't know enough about Kusto to be able to put it in the same "search".
(If it was powershell id save it in a variable and then search for that specific Jobid_g to find the computer name)

Code used to find device name:
AzureDiagnostics
| where ResourceProvider == "MICROSOFT.AUTOMATION" and ResourceType == "AUTOMATIONACCOUNTS"
| where ResultType == "Started"
| distinct JobId_g, ResultType , ResultDescription, RunOn_s

Is what im trying to do possible in Log Analytics or should i look towards other means to catch this error?

1 Reply
Sooo, its not exactly what i wanted but it does the job...
I found a solution using the "LET" and "Join" cmdlet. So i now get alerts for these issues.