04-15-2019 09:02 AM
04-15-2019 09:02 AM
Surely there is a better solution for this? My use case doesn't work:
1. Create a computer group
2. Alert when an agent in computer group has not "heartbeated" for over 24 hours.
By the logic in Alerts, even if I set the query as I do below, the time span that I defineis ignored because of the "Period" in Alerts:
| project TimeGenerated, Computer
| where TimeGenerated < now()
| where Computer in (COMPUTERGROUP)
| summarize ["Last Heartbeat"]=max(TimeGenerated) by Computer
| where ["Last Heartbeat"] < ago(24h)
This query--when run outside of Alerts--returns several machines that have no heartbeat in the last 24 hours, going back to as long as we've been collecting the data. But because Alerts confines me to a max 24 hour period to check against, I get 0 results.
I essentially want an alert generated every 24 hours as a "nag alert" with a list of the machines that have not sent heartbeat data in over 24 hours.
Is there any way to get around this extremely limiting design?
04-15-2019 09:24 AMSolution
Alerts are designed to look back on 24hrs as you state
For a report of this nature, I'd suggest a Logic App, something like this mock up? This fires at a pre-set time (recurrence), then runs your query, then sends an email (you could send to teams/Slack/ServiceNow etc.. instead or in parallel)
I also like the example availability rate query:
// Availability rate // Calculate the availability rate of each connected computer Heartbeat // bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago | summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer | extend availablePerHour = iff(heartbeatPerHour > 0, true, false) | summarize totalAvailableHours = countif(availablePerHour == true) by Computer | extend availabilityRate = totalAvailableHours*100.0/24 | project-away totalAvailableHours | render barchart
Note: I added the last two lines, as I prefer how it looks as a chart
04-16-2019 05:40 AM - edited 04-16-2019 05:41 AM
Thanks Clive. T
his is a pretty straightforward recommendation. It does, however, stray from our deliberate move to Azure Monitor for all alerting (taking advantage of Action Groups and automation). It would be nice to have the option to remove some of these "guardrails" for Alerts... or at the very least, have a viable explanation as to why the guardrail is necessary.
cc: @Daniel Thilagan
04-16-2019 07:13 AM
The best public explanation I've seen is:
https://feedback.azure.com/forums/267889-log-analytics/suggestions/32043751-alerting-timewindow-limi... which gives an explanation - you could add your 'vote' to this?