Forum Discussion
Heartbeat Azure Monitor OMS VMs
- Mar 22, 2018
Hi
Your query is correct. Probably you should remove
| where TimeGenerated > ago(1d)
because when the query is used in alert the timespan/timeframe is defined in the alert itself. In the heartbeat alert you would want your the evaluation time to be longer than 15 minutes. Make it at least one hour but probably 24 hours would be better as that was the time span in your query. With that setting you should get alert in 15 minutes after the vm goes down. Keep in mind that you want the VM to be down for 15 minutes at least. If it goes down only for 5 minutes you will probably not be alerted because heartbeat events will start to be send again so the alert will never trigger that the last heartbeat event was 15 minutes ago.
Let me know if you have further questions.
Hi
Your query is correct. Probably you should remove
| where TimeGenerated > ago(1d)
because when the query is used in alert the timespan/timeframe is defined in the alert itself. In the heartbeat alert you would want your the evaluation time to be longer than 15 minutes. Make it at least one hour but probably 24 hours would be better as that was the time span in your query. With that setting you should get alert in 15 minutes after the vm goes down. Keep in mind that you want the VM to be down for 15 minutes at least. If it goes down only for 5 minutes you will probably not be alerted because heartbeat events will start to be send again so the alert will never trigger that the last heartbeat event was 15 minutes ago.
Let me know if you have further questions.
- ScottAllisonSep 13, 2018Iron Contributor
Question... is the following configuration fine for alerting? Or should I increase the frequency?
Query:Heartbeat| summarize ["Last Heartbeat"]=max(TimeGenerated) by Computer| where ["Last Heartbeat"] < ago(15m)Based on:Number of resultsCondition:Greater thanThreshold:0Evaluated based on:Period: 1440 minutesFrequency: 15 minutes- Sep 13, 2018
Keep in mind that for example if server goes down and it is not available for an hour . You will receive within an hour roughly 4-5 alerts for the same server. This is because you period is 24 hours. My recommendation is Period and frequency to be the same. For example 15 mins. That way you will not receive so many alerts for the same thing.
- ScottAllisonApr 11, 2019Iron Contributor
Surely there is a better solution for this? My use case doesn't work:
1. Create a computer group
2. Alert when an agent in computer group has not "heartbeated" for over 24 hours.
By the logic in Alerts, even if I set the query as I do below, the time span that I define is ignored because of the "Period" in Alerts:
Heartbeat
| project TimeGenerated, Computer
| where TimeGenerated < now()
| where Computer in (COMPUTERGROUP)
| summarize ["Last Heartbeat"]=max(TimeGenerated) by Computer
| where ["Last Heartbeat"] < ago(24h)
Is there any way to get around this extremely limiting design?