Jun 10 2019
- last edited on
Apr 07 2022
We have around 500 azure VMs in our subscription. What is the best way to monitor them for below alerts;
Can we use Log Analytics Workspace to monitor them (with Queries) OR is there any way we can use Azure monitor with metrics (here I don’t find Guest Metrics while creating new rules though I enabled them in Diagnostics settings and i can see them in metrics tab)?
Jun 10 2019 11:18 AM
For 1,2 and 3 Metric alerts are on the fastest pipeline, so you will see them quickly - subject to being able to set Guest metric alerts in your case. However in all three cases being able to view and query the data in log Analytics can also provide benefits. E,g, Alerts just look at the past 24hrs, so you will miss patterns that occur beyond that, a Log query can go back and look at whatever data you have retained.
|% Processor Time||2019-06-04T05:00:00Z||60.91787148115488|
|% Used Memory||2019-06-04T05:00:00Z||31.629809780248415|
|% Free Space||2019-06-04T05:00:00Z||82.89545989470517|
|% Free Space||2019-06-04T02:00:00Z||82.91151748380445|
|% Processor Time||2019-06-04T02:00:00Z||59.627356628337175|
|% Used Memory||2019-06-04T02:00:00Z||31.471773493138787|
|% Processor Time||2019-06-04T03:00:00Z||59.52993177313613|
|% Free Space||2019-06-04T03:00:00Z||82.8896254469986|
|% Used Memory||2019-06-04T03:00:00Z||31.54788573413215|
|% Processor Time||2019-06-04T04:00:00Z||59.35017016026048|
Neither do methods do #4 well, there are some data points in the Log, such as Heartbeat but its not a reliable up/down indicator on its own. e.g. the Heartbeat can fail but the server is still up. You can check certain EventIDs, but if you get a crash and the server never comes back up, you might not see what the last EventID was.
Queries that can show Heartbeat are:
More examples are shown in the LOGS portal, when you open a new Query tab.