Forum Discussion
Alerting on Heartbeat issue
Surely there is a better solution for this? My use case doesn't work:
1. Create a computer group
2. Alert when an agent in computer group has not "heartbeated" for over 24 hours.
By the logic in Alerts, even if I set the query as I do below, the time span that I defineis ignored because of the "Period" in Alerts:
Heartbeat
| project TimeGenerated, Computer
| where TimeGenerated < now()
| where Computer in (COMPUTERGROUP)
| summarize ["Last Heartbeat"]=max(TimeGenerated) by Computer
| where ["Last Heartbeat"] < ago(24h)
This query--when run outside of Alerts--returns several machines that have no heartbeat in the last 24 hours, going back to as long as we've been collecting the data. But because Alerts confines me to a max 24 hour period to check against, I get 0 results.
I essentially want an alert generated every 24 hours as a "nag alert" with a list of the machines that have not sent heartbeat data in over 24 hours.
Is there any way to get around this extremely limiting design?
Alerts are designed to look back on 24hrs as you state
For a report of this nature, I'd suggest a Logic App, something like this mock up? This fires at a pre-set time (recurrence), then runs your query, then sends an email (you could send to teams/Slack/ServiceNow etc.. instead or in parallel)
I also like the example availability rate query:
// Availability rate // Calculate the availability rate of each connected computer Heartbeat // bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago | summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer | extend availablePerHour = iff(heartbeatPerHour > 0, true, false) | summarize totalAvailableHours = countif(availablePerHour == true) by Computer | extend availabilityRate = totalAvailableHours*100.0/24 | project-away totalAvailableHours | render barchart
Note: I added the last two lines, as I prefer how it looks as a chart
- CliveWatsonMicrosoft
Alerts are designed to look back on 24hrs as you state
For a report of this nature, I'd suggest a Logic App, something like this mock up? This fires at a pre-set time (recurrence), then runs your query, then sends an email (you could send to teams/Slack/ServiceNow etc.. instead or in parallel)
I also like the example availability rate query:
// Availability rate // Calculate the availability rate of each connected computer Heartbeat // bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago | summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer | extend availablePerHour = iff(heartbeatPerHour > 0, true, false) | summarize totalAvailableHours = countif(availablePerHour == true) by Computer | extend availabilityRate = totalAvailableHours*100.0/24 | project-away totalAvailableHours | render barchart
Note: I added the last two lines, as I prefer how it looks as a chart
- ScottAllisonIron Contributor
Thanks Clive. T
his is a pretty straightforward recommendation. It does, however, stray from our deliberate move to Azure Monitor for all alerting (taking advantage of Action Groups and automation). It would be nice to have the option to remove some of these "guardrails" for Alerts... or at the very least, have a viable explanation as to why the guardrail is necessary.cc: Daniel Thilagan
- CliveWatsonMicrosoft
Hi ScottAllison
The best public explanation I've seen is:
https://feedback.azure.com/forums/267889-log-analytics/suggestions/32043751-alerting-timewindow-limitation-of-24-hours-makes-a which gives an explanation - you could add your 'vote' to this?
Thanks Clive