Forum Discussion

Iron Contributor

Apr 15, 2019

Solved

Alerting on Heartbeat issue

Surely there is a better solution for this? My use case doesn't work: 1. Create a computer group 2. Alert when an agent in computer group has not "heartbeated" for over 24 hours. By the log...

azure monitor

CliveWatson

Apr 15, 2019

ScottAllison

Alerts are designed to look back on 24hrs as you state

For a report of this nature, I'd suggest a Logic App, something like this mock up? This fires at a pre-set time (recurrence), then runs your query, then sends an email (you could send to teams/Slack/ServiceNow etc.. instead or in parallel)

I also like the example availability rate query:

// Availability rate
// Calculate the availability rate of each connected computer
Heartbeat
// bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago
| summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer
| extend availablePerHour = iff(heartbeatPerHour > 0, true, false)
| summarize totalAvailableHours = countif(availablePerHour == true) by Computer 
| extend availabilityRate = totalAvailableHours*100.0/24
| project-away totalAvailableHours 
| render barchart

Note: I added the last two lines, as I prefer how it looks as a chart

CliveWatson

Former Employee

Apr 15, 2019

ScottAllison

Alerts are designed to look back on 24hrs as you state

I also like the example availability rate query:

// Availability rate
// Calculate the availability rate of each connected computer
Heartbeat
// bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago
| summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer
| extend availablePerHour = iff(heartbeatPerHour > 0, true, false)
| summarize totalAvailableHours = countif(availablePerHour == true) by Computer 
| extend availabilityRate = totalAvailableHours*100.0/24
| project-away totalAvailableHours 
| render barchart

Note: I added the last two lines, as I prefer how it looks as a chart

ScottAllison
Iron Contributor
Apr 16, 2019
Thanks Clive. T

his is a pretty straightforward recommendation. It does, however, stray from our deliberate move to Azure Monitor for all alerting (taking advantage of Action Groups and automation). It would be nice to have the option to remove some of these "guardrails" for Alerts... or at the very least, have a viable explanation as to why the guardrail is necessary.

cc: Daniel Thilagan
- CliveWatson
  Former Employee
  Apr 16, 2019
  Hi ScottAllison
  
  The best public explanation I've seen is:
  
  https://feedback.azure.com/forums/267889-log-analytics/suggestions/32043751-alerting-timewindow-limitation-of-24-hours-makes-a which gives an explanation - you could add your 'vote' to this?
  
  Thanks Clive
  - ScottAllison
    Iron Contributor
    Apr 17, 2019
    Voted and commented.