Forum Discussion

Brass Contributor

Mar 21, 2018

Solved

Heartbeat Azure Monitor OMS VMs

Hi all, i am trying to create an Alert for heartbeats if VM is didn't heartbeat in the last 15 min... here what i did and unfortunately it didnt fire alert. so i created a new alert rule in Az...

azure monitor

Query Language

Stanislav_Zhelyazkov
Mar 22, 2018
Hi

Your query is correct. Probably you should remove

| where TimeGenerated > ago(1d)

because when the query is used in alert the timespan/timeframe is defined in the alert itself. In the heartbeat alert you would want your the evaluation time to be longer than 15 minutes. Make it at least one hour but probably 24 hours would be better as that was the time span in your query. With that setting you should get alert in 15 minutes after the vm goes down. Keep in mind that you want the VM to be down for 15 minutes at least. If it goes down only for 5 minutes you will probably not be alerted because heartbeat events will start to be send again so the alert will never trigger that the last heartbeat event was 15 minutes ago.

Let me know if you have further questions.

Stanislav_Zhelyazkov

MVP

Mar 22, 2018

Your query is correct. Probably you should remove

| where TimeGenerated > ago(1d)

because when the query is used in alert the timespan/timeframe is defined in the alert itself. In the heartbeat alert you would want your the evaluation time to be longer than 15 minutes. Make it at least one hour but probably 24 hours would be better as that was the time span in your query. With that setting you should get alert in 15 minutes after the vm goes down. Keep in mind that you want the VM to be down for 15 minutes at least. If it goes down only for 5 minutes you will probably not be alerted because heartbeat events will start to be send again so the alert will never trigger that the last heartbeat event was 15 minutes ago.

Let me know if you have further questions.

ScottAllison
Iron Contributor
Sep 13, 2018
Question... is the following configuration fine for alerting? Or should I increase the frequency?

Query:
Heartbeat

| summarize ["Last Heartbeat"]=max(TimeGenerated) by Computer
| where ["Last Heartbeat"] < ago(15m)

Based on:
Number of results
Condition:
Greater than
Threshold:
0

Evaluated based on:
Period: 1440 minutes
Frequency: 15 minutes
- Stanislav_Zhelyazkov
  MVP
  Sep 13, 2018
  Keep in mind that for example if server goes down and it is not available for an hour . You will receive within an hour roughly 4-5 alerts for the same server. This is because you period is 24 hours. My recommendation is Period and frequency to be the same. For example 15 mins. That way you will not receive so many alerts for the same thing.
  - ScottAllison
    Iron Contributor
    Apr 11, 2019
    Surely there is a better solution for this? My use case doesn't work:
    
    1. Create a computer group
    2. Alert when an agent in computer group has not "heartbeated" for over 24 hours.
    
    By the logic in Alerts, even if I set the query as I do below, the time span that I define is ignored because of the "Period" in Alerts:
    
    Heartbeat
    | project TimeGenerated, Computer
    | where TimeGenerated < now()
    | where Computer in (COMPUTERGROUP)
    | summarize ["Last Heartbeat"]=max(TimeGenerated) by Computer
    | where ["Last Heartbeat"] < ago(24h)
    
    Is there any way to get around this extremely limiting design?