Alerting on Heartbeat issue

Scott Allison · ‎Apr 15 2019

Surely there is a better solution for this? My use case doesn't work:

1. Create a computer group

2. Alert when an agent in computer group has not "heartbeated" for over 24 hours.

By the logic in Alerts, even if I set the query as I do below, the time span that I defineis ignored because of the "Period" in Alerts:

Heartbeat
| project TimeGenerated, Computer
| where TimeGenerated < now()
| where Computer in (COMPUTERGROUP)
| summarize ["Last Heartbeat"]=max(TimeGenerated) by Computer
| where ["Last Heartbeat"] < ago(24h)

This query--when run outside of Alerts--returns several machines that have no heartbeat in the last 24 hours, going back to as long as we've been collecting the data. But because Alerts confines me to a max 24 hour period to check against, I get 0 results.

I essentially want an alert generated every 24 hours as a "nag alert" with a list of the machines that have not sent heartbeat data in over 24 hours.

Is there any way to get around this extremely limiting design?

Scott Allison · ‎Apr 15 2019

@Scott Allison

Alerts are designed to look back on 24hrs as you state

For a report of this nature, I'd suggest a Logic App, something like this mock up? This fires at a pre-set time (recurrence), then runs your query, then sends an email (you could send to teams/Slack/ServiceNow etc.. instead or in parallel)

I also like the example availability rate query:

// Availability rate
// Calculate the availability rate of each connected computer
Heartbeat
// bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago
| summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer
| extend availablePerHour = iff(heartbeatPerHour > 0, true, false)
| summarize totalAvailableHours = countif(availablePerHour == true) by Computer 
| extend availabilityRate = totalAvailableHours*100.0/24
| project-away totalAvailableHours 
| render barchart

Note: I added the last two lines, as I prefer how it looks as a chart

Scott Allison · ‎Apr 16 2019

Thanks Clive. T

his is a pretty straightforward recommendation. It does, however, stray from our deliberate move to Azure Monitor for all alerting (taking advantage of Action Groups and automation). It would be nice to have the option to remove some of these "guardrails" for Alerts... or at the very least, have a viable explanation as to why the guardrail is necessary.

cc: @Daniel Thilagan

CliveWatson · ‎Apr 16 2019

Hi @Scott Allison

The best public explanation I've seen is:

https://feedback.azure.com/forums/267889-log-analytics/suggestions/32043751-alerting-timewindow-limi... which gives an explanation - you could add your 'vote' to this?

Thanks Clive

Scott Allison · ‎Apr 17 2019

Voted and commented.

Scott Allison · ‎Apr 15 2019

@Scott Allison

Alerts are designed to look back on 24hrs as you state

For a report of this nature, I'd suggest a Logic App, something like this mock up? This fires at a pre-set time (recurrence), then runs your query, then sends an email (you could send to teams/Slack/ServiceNow etc.. instead or in parallel)

I also like the example availability rate query:

// Availability rate
// Calculate the availability rate of each connected computer
Heartbeat
// bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago
| summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer
| extend availablePerHour = iff(heartbeatPerHour > 0, true, false)
| summarize totalAvailableHours = countif(availablePerHour == true) by Computer 
| extend availabilityRate = totalAvailableHours*100.0/24
| project-away totalAvailableHours 
| render barchart

Note: I added the last two lines, as I prefer how it looks as a chart

View solution in original post

Alerting on Heartbeat issue

Alerting on Heartbeat issue

Re: Alerting on Heartbeat issue

Re: Alerting on Heartbeat issue

Re: Alerting on Heartbeat issue

Re: Alerting on Heartbeat issue

Re: Alerting on Heartbeat issue

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Alerting on Heartbeat issue