Forum Discussion

ScottAllison's avatar
ScottAllison
Iron Contributor
Apr 15, 2019

Alerting on Heartbeat issue

Surely there is a better solution for this? My use case doesn't work:

 

1. Create a computer group

2. Alert when an agent in computer group has not "heartbeated" for over 24 hours

 

By the logic in Alerts, even if I set the query as I do below, the time span that I defineis ignored because of the "Period" in Alerts:


Heartbeat
| project TimeGenerated, Computer
| where TimeGenerated < now()
| where Computer in (COMPUTERGROUP)
| summarize ["Last Heartbeat"]=max(TimeGenerated) by Computer
| where ["Last Heartbeat"] < ago(24h)

This query--when run outside of Alerts--returns several machines that have no heartbeat in the last 24 hours, going back to as long as we've been collecting the data. But because Alerts confines me to a max 24 hour period to check against, I get 0 results. 

I essentially want an alert generated every 24 hours as a "nag alert" with a list of the machines that have not sent heartbeat data in over 24 hours. 

Is there any way to get around this extremely limiting design?

  • ScottAllison 

     

    Alerts are designed to look back on 24hrs as you state 

     

    For a report of this nature, I'd suggest a Logic App, something like this mock up?  This fires at a pre-set time (recurrence), then runs your query, then sends an email (you could send to teams/Slack/ServiceNow etc.. instead or in parallel)

     

     

    I also like the example availability rate query:

    // Availability rate
    // Calculate the availability rate of each connected computer
    Heartbeat
    // bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago
    | summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer
    | extend availablePerHour = iff(heartbeatPerHour > 0, true, false)
    | summarize totalAvailableHours = countif(availablePerHour == true) by Computer 
    | extend availabilityRate = totalAvailableHours*100.0/24
    | project-away totalAvailableHours 
    | render barchart 

    Note: I added the last two lines, as I prefer how it looks as a chart  

  • ScottAllison 

     

    Alerts are designed to look back on 24hrs as you state 

     

    For a report of this nature, I'd suggest a Logic App, something like this mock up?  This fires at a pre-set time (recurrence), then runs your query, then sends an email (you could send to teams/Slack/ServiceNow etc.. instead or in parallel)

     

     

    I also like the example availability rate query:

    // Availability rate
    // Calculate the availability rate of each connected computer
    Heartbeat
    // bin_at is used to set the time grain to 1 hour, starting exactly 24 hours ago
    | summarize heartbeatPerHour = count() by bin_at(TimeGenerated, 1h, ago(24h)), Computer
    | extend availablePerHour = iff(heartbeatPerHour > 0, true, false)
    | summarize totalAvailableHours = countif(availablePerHour == true) by Computer 
    | extend availabilityRate = totalAvailableHours*100.0/24
    | project-away totalAvailableHours 
    | render barchart 

    Note: I added the last two lines, as I prefer how it looks as a chart