Azure Observability Blog

6 MIN READ

How to leverage Azure Monitor to meet functional and non-functional requirements - No.2 Compute

Microsoft

Aug 22, 2022

This article is a part of series articles for Azure Monitor. Please refer to How to leverage Azure Monitor to meet functional and non-functional requirements - No.1 overview first before reading this post. This post dives deeply for Compute category among monitoring categories as highlighted blue.

Article No	monitoring category	monitoring target	Note
2	Compute	Reboot	monitor reboot frequency
		CPU	monitor CPU usage
		Memory	monitor memory usage
3	Compute/Inside OS	Log file	monitor event log and syslog
		Process	monitor available process
4	Storage/Disk	Disk	monitor disk usage
		folder/file	monitor folder usage and file size
5	Endpoint/IPv4 address	response/service	monitor specific address and port
	Web site	Scenario	monitor web scenario
6	Network	Connectivity	monitor vNiC and VNET peering
		Firewall	monitor Azure Firewall rule usage
7	Backup	Backup	monitor backup status
	Azure Resources	Resource health	monitor resource availability

There are three monitoring targets on Compute monitoring objective as follows.

Reboot
CPU
Memory

There are some options to monitor them for each. Let's dive deeply for them.

1. Reboot monitoring

We will try three options below to monitor Reboot monitoring.

Azure Monitor for VMs
Activity Log
Resource Health

1.1 Azure Monitor for VMs for reboot monitoring

You can retrieve logs on Log Analytics workspace in 10-30 min as usual if you have configure Azure Monitor for your VM. Note that it might takes 8 to 10 hours right after setting up Log Analytics workspace.

Run Kusto query to take logs within 5 min, which "Name" is "HeartBeat" and "Computer" is "CentOSVM01" on "InsightsMetrics" table.

InsightsMetrics
| where Name == "Heartbeat"
| where Computer == "CentOSVM01"
| where TimeGenerated > ago(5m) 
| order by TimeGenerated

You can check that agents send heartbeats per about 1 min. This record indicates that OS on the VM is running. We need to send notification with alerts if the heartbeats might not show up. Minimum granularity for alerts is 1 min. Setup action group or create new one, and configure to send notification to your email address. You can check this alert rule status on Log Analytics page.

This email is delivered in 9 min after shutting down your VM. We setup granularity as 5 min, but it takes about 9 min as total.

The email is described based on default templates, but we can send a customized mail by using PaaS services for example LogicApps, Azure Automation, or others. The email was permanently delivered every 5 min, so we can disable the rule on the portal as follow. Don't forget to enable the rule again after restarting your VM.

You can setup durations for logs on portal as follow if you haven't specify with Kusto queries. You can visualize data as charts by checking "chart" tab. You can download the logs as CSV and Excel format files.

Trigger can be setup from action on action group. It means that you can use Azure Functions or Logic Apps with alert detections, thus you can extend your operations by using your apps on the PaaS services.

https://docs.microsoft.com/en-us/azure/azure-monitor/alerts/action-groups-logic-app
https://docs.microsoft.com/en-us/azure/app-service/tutorial-send-email?tabs=dotnet

Reboot monitoring can send a notification and have primary action with the trigger at once not only send a template mail. You can also send a customized mail by using Logic Apps.

1.2 Activity Log for reboot monitoring

Configure ServiceHealth and ResourceHealth categories to send logs to Log Analytics workspace on "Monitor | Activity Log" page. Refer to a screenshot as follows to setup. Azure Blob Storage might be enough to simply store log, but to store logs on Log Analytics workspace allow you to retrieve and analyze logs.

Reboot log is generated only when users run reboot operation on Azure Portal, thus we can't use the log for OS issue, OS updates with reboot, or reboot operation on OS.

AzureActivity
| where CategoryValue == "ResourceHealth" or CategoryValue == "ServiceHealth"
| where Properties contains "Rebooted"
| where TimeGenerated between(datetime("2022-07-03 00:00:00") .. datetime("2022-08-11 17:00:00"))
| order by TimeGenerated desc

1.3 Resource Health

Service Health is useful to validate status of Azure resources. You can add an alert rule on Service Health.

Alert rules on Service Health can send a mail when Azure platform recognizes the resource as unhealthy including reboot, shutdown and others.

Finally, here is check result of Reboot monitoring.

Type	category	Goal and outcome	Result
1	monitoring	Azure Monitor can satisfy functional requirements	OK
2		Azure Monitor can setup short granularity for detections	1 min
3		Azure Monitor can setup thresholds detections	OK
4		Azure Monitor can setup retry detections	OK
5		Azure Monitor can suspend and resume for checking threshold	OK
6		Azure Monitor can send a mail for detection results	OK
7	statistics	Azure Monitor can retrieve workspace logs with specific duration	OK
8		Azure Monitor can visualize statistic data	OK
9	automation	Azure Monitor can have primary action based on alert rules	OK
10		Azure Monitor can send validation results	OK

2. CPU monitoring

Here is an option to monitor CPU usage.

"Percentage CPU" VM metric

2.1 "Percentage CPU" VM metric

Choose "Percentage CPU" metric on your VM menu. Choose "Ave" or "Max", and configure to send a notification when CPU usage Ave or Max is xx% or higher on "New alert rule".

You can reuse an alert rule, which you created for reboot monitoring. Azure Monitor fire triggers based on its tailored thresholds if you use Dynamics Threshold.

https://docs.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-dynamic-thresholds
Choose static threshold if you have to align with company policies or specific system policies for example CPU usage is 80% or higher.

It takes about 4 minutes to receive a mail when your VM becomes high CPU usage. Here is an example to put heavy CPU load to the VM.

You can disable the your rules on Azure Portal.

You can configure any time range of the graphs by choosing "Custom" as "Time range" on Virtual Machine metric.

Finally, here is check result of CPU monitoring.

Type	category	Outcome and goal	Result
1	monitoring	Azure Monitor can satisfy functional requirements	OK
2		Azure Monitor can setup short granularity for detections	1 min
3		Azure Monitor can setup thresholds detections	OK
4		Azure Monitor can setup retry detections	OK
5		Azure Monitor can suspend and resume for checking threshold	OK
6		Azure Monitor can send a mail for detection results	OK
7	statistics	Azure Monitor can retrieve workspace logs with specific duration	OK
8		Azure Monitor can visualize statistic data	OK
9	automation	Azure Monitor can have primary action based on alert rules	OK
10		Azure Monitor can send validation results	OK

3. memory usage monitoring

Here are some options to monitor memory usage.

Performance counter on Log Analytics Agent
(Preview) VM metrics

3.1 Performance counter on Log Analytics agent

Configure performance counter on "Agents configuration" of Log Analytics. Then, find out data tables for memory usage by putting "memory" on search box.
LogManagement tables are populated based on the configuration after a while. "% Available Memory" is memory usage percentage. "Used Memory Mbytes" is memory usage(MB).

Here is an example query, which search VM has less than 20 & available memory. Healthy VM, which have 20% or higher available memory, won't show up as a record.

Perf
| where Computer == "CentOSVM01"
| where CounterName == "% Available Memory"
| where CounterValue < 20
| order by TimeGenerated desc

Do not confuse the value when you configure threshold of alert. In the alert rule settings, threshold value is specified as the number of rows in the result of query search. The threshold value of the condition is set when one or more lines are output. Note that xx% memory usage is not the value to set as threshold.
You can disable the your rules on Azure Portal and configure any time range of the graphs by choosing "Custom" as "Time range" on Virtual Machine metric like CPU monitoring scenario.

3.2 (Preview) VM metrics

Choose "Available Memory Byte (Preview)" metric on your VM menu. This is almost same setting with CPU usage.

Finally, here is check result of memory monitoring.

Type	Category	Outcome and goal	Result
1	monitoring	Azure Monitor can satisfy functional requirements	OK
2		Azure Monitor can setup short granularity for detections	1 min
3		Azure Monitor can setup thresholds detections	OK
4		Azure Monitor can setup retry detections	OK
5		Azure Monitor can suspend and resume for checking threshold	OK
6		Azure Monitor can send a mail for detection results	OK
7	statistics	Azure Monitor can retrieve workspace logs with specific duration	OK
8		Azure Monitor can visualize statistic data	OK
9	automation	Azure Monitor can have primary action based on alert rules	OK
10		Azure Monitor can send validation results	OK