How to leverage Azure Monitor to meet functional and non-functional requirements - No.2 Compute
Published Aug 22 2022 05:48 PM 2,620 Views
Microsoft

This article is a part of series articles for Azure Monitor. Please refer to How to leverage Azure Monitor to meet functional and non-functional requirements - No.1 overview first before reading this post. This post dives deeply for Compute category among monitoring categories as highlighted blue. 

Article No

monitoring category

monitoring target

Note

2

Compute

Reboot

monitor reboot frequency

 

 

CPU

monitor CPU usage

 

 

Memory

monitor memory usage

3

Compute/Inside OS

Log file

monitor event log and syslog

 

 

Process

monitor available process

4

Storage/Disk

Disk

monitor disk usage

 

 

folder/file

monitor folder usage and file size

5

Endpoint/IPv4 address

response/service

monitor specific address and port

 

Web site

Scenario

monitor web scenario

6

Network

Connectivity

monitor vNiC and VNET peering

 

 

Firewall

monitor Azure Firewall rule usage

7

Backup

Backup

monitor backup status

 

Azure Resources

Resource health

monitor resource availability

 

There are three monitoring targets on Compute monitoring objective as follows.

  1. Reboot
  2. CPU
  3. Memory

There are some options to monitor them for each. Let's dive deeply for them.

1. Reboot monitoring

We will try three options below to monitor Reboot monitoring. 

  1. Azure Monitor for VMs
  2. Activity Log
  3. Resource Health

1.1 Azure Monitor for VMs for reboot monitoring

You can retrieve logs on Log Analytics workspace in 10-30 min as usual if you have configure Azure Monitor for your VM. Note that it might takes 8 to 10 hours right after setting up Log Analytics workspace.

daisami_0-1660290426475.png

Run Kusto query to take logs within 5 min, which "Name" is "HeartBeat" and "Computer" is "CentOSVM01" on "InsightsMetrics" table.

daisami_1-1660290590639.png

 

InsightsMetrics
| where Name == "Heartbeat"
| where Computer == "CentOSVM01"
| where TimeGenerated > ago(5m) 
| order by TimeGenerated

 

You can check that agents send heartbeats per about 1 min. This record indicates that OS on the VM is running. We need to send notification with alerts if the heartbeats might not show up. Minimum granularity for alerts is 1 min. Setup action group or create new one, and configure to send notification to your email address. You can check this alert rule status on Log Analytics page.

daisami_0-1660293271407.png

This email is delivered in 9 min after shutting down your VM. We setup granularity as 5 min, but it takes about 9 min as total.

daisami_2-1660294912906.png

The email is described based on default templates, but we can send a customized mail by using PaaS services for example LogicApps, Azure Automation, or others. The email was permanently delivered every 5 min, so we can disable the rule on the portal as follow. Don't forget to enable the rule again after restarting your VM.

daisami_0-1660294281563.png

You can setup durations for logs on portal as follow if you haven't specify with Kusto queries. You can visualize data as charts by checking "chart" tab. You can download the logs as CSV and Excel format files.

daisami_1-1660294845158.png

Trigger can be setup from action on action group. It means that you can use Azure Functions or Logic Apps with alert detections, thus you can extend your operations by using your apps on the PaaS services.

daisami_0-1660295162314.png

https://docs.microsoft.com/en-us/azure/azure-monitor/alerts/action-groups-logic-app
https://docs.microsoft.com/en-us/azure/app-service/tutorial-send-email?tabs=dotnet

Reboot monitoring can send a notification and have primary action with the trigger at once not only send a template mail. You can also send a customized mail by using Logic Apps.

 

1.2 Activity Log for reboot monitoring

Configure ServiceHealth and ResourceHealth categories to send logs to Log Analytics workspace on "Monitor | Activity Log" page. Refer to a screenshot as follows to setup. Azure Blob Storage might be enough to simply store log, but to store logs on Log Analytics workspace allow you to retrieve and analyze logs. 

daisami_0-1660296031200.png

Reboot log is generated only when users run reboot operation on Azure Portal, thus we can't use the log for OS issue, OS updates with reboot, or reboot operation on OS.

daisami_1-1660296817742.png

 

AzureActivity
| where CategoryValue == "ResourceHealth" or CategoryValue == "ServiceHealth"
| where Properties contains "Rebooted"
| where TimeGenerated between(datetime("2022-07-03 00:00:00") .. datetime("2022-08-11 17:00:00"))
| order by TimeGenerated desc

 

1.3 Resource Health

Service Health is useful to validate status of Azure resources. You can add an alert rule on Service Health.

daisami_0-1660375864488.png

Alert rules on Service Health can send a mail when Azure platform recognizes the resource as unhealthy including reboot, shutdown and others. 

 

Finally, here is check result of Reboot monitoring. 

Type

category

Goal and outcome

Result

1

monitoring

Azure Monitor can satisfy functional requirements

OK

2

 

Azure Monitor can setup short granularity for detections

1 min

3

 

Azure Monitor can setup thresholds detections

OK

4

 

Azure Monitor can setup retry detections

OK

5

 

Azure Monitor can suspend and resume for checking threshold

OK

6

 

Azure Monitor can send a mail for detection results

OK 

7

statistics

Azure Monitor can retrieve workspace logs with specific duration

OK

8

 

Azure Monitor can visualize statistic data

OK

9

automation

Azure Monitor can have primary action based on alert rules

OK

10

 

Azure Monitor can send validation results

OK

 

2. CPU monitoring

Here is an option to monitor CPU usage.

  1. "Percentage CPU" VM metric

2.1 "Percentage CPU" VM metric

Choose "Percentage CPU" metric on your VM menu. Choose "Ave" or "Max", and configure to send a notification when CPU usage Ave or Max is xx% or higher on "New alert rule".

daisami_0-1660376532276.png

You can reuse an alert rule, which you created for reboot monitoring. Azure Monitor fire triggers based on its tailored thresholds if you use Dynamics Threshold.

daisami_1-1660377273517.png

https://docs.microsoft.com/en-us/azure/azure-monitor/alerts/alerts-dynamic-thresholds
Choose static threshold if you have to align with company policies or specific system policies for example CPU usage is 80% or higher.

It takes about 4 minutes to receive a mail when your VM becomes high CPU usage. Here is an example to put heavy CPU load to the VM.

daisami_0-1660377879206.png

You can disable the your rules on Azure Portal.

daisami_1-1660377922094.png

You can configure any time range of the graphs by choosing "Custom" as "Time range" on Virtual Machine metric.

daisami_2-1660378031540.png

 

Finally, here is check result of CPU monitoring. 

Type

category

Outcome and goal

Result

1

monitoring

Azure Monitor can satisfy functional requirements

OK

2

 

Azure Monitor can setup short granularity for detections

1 min

3

 

Azure Monitor can setup thresholds detections

OK

4

 

Azure Monitor can setup retry detections

OK

5

 

Azure Monitor can suspend and resume for checking threshold

OK

6

 

Azure Monitor can send a mail for detection results

OK 

7

statistics

Azure Monitor can retrieve workspace logs with specific duration

OK

8

 

Azure Monitor can visualize statistic data

OK

9

automation

Azure Monitor can have primary action based on alert rules

OK

10

 

Azure Monitor can send validation results

OK

 

3. memory usage monitoring

Here are some options to monitor memory usage.

  1. Performance counter on Log Analytics Agent
  2. (Preview) VM metrics

3.1 Performance counter on Log Analytics agent

Configure performance counter on "Agents configuration" of Log Analytics. Then, find out data tables for memory usage by putting "memory" on search box.
LogManagement tables are populated based on the configuration after a while. "% Available Memory" is memory usage percentage. "Used Memory Mbytes" is memory usage(MB).

daisami_0-1660418408114.png

Here is an example query, which search VM has less than 20 & available memory. Healthy VM, which have 20% or higher available memory, won't show up as a record.

 

Perf
| where Computer == "CentOSVM01"
| where CounterName == "% Available Memory"
| where CounterValue < 20
| order by TimeGenerated desc

 

daisami_1-1660418702272.png

Do not confuse the value when you configure threshold of alert. In the alert rule settings, threshold value is specified as the number of rows in the result of query search. The threshold value of the condition is set when one or more lines are output. Note that xx% memory usage is not the value to set as threshold.
You can disable the your rules on Azure Portal and configure any time range of the graphs by choosing "Custom" as "Time range" on Virtual Machine metric like CPU monitoring scenario.


3.2  (Preview) VM metrics

Choose "Available Memory Byte (Preview)" metric on your VM menu. This is almost same setting with CPU usage.

daisami_2-1660418850792.png

 

Finally, here is check result of memory monitoring. 

Type

Category

Outcome and goal

Result

1

monitoring

Azure Monitor can satisfy functional requirements

OK

2

 

Azure Monitor can setup short granularity for detections

1 min

3

 

Azure Monitor can setup thresholds detections

OK

4

 

Azure Monitor can setup retry detections

OK

5

 

Azure Monitor can suspend and resume for checking threshold

OK

6

 

Azure Monitor can send a mail for detection results

OK 

7

statistics

Azure Monitor can retrieve workspace logs with specific duration

OK

8

 

Azure Monitor can visualize statistic data

OK

9

automation

Azure Monitor can have primary action based on alert rules

OK

10

 

Azure Monitor can send validation results

OK

 

Now, we can start these series articles for Azure Monitor. In next post, we will dive deep to "compute/ inside OS" monitoring objective.

 

 

Special thanks for this post.

daisami_1-1661216193478.png Avanade Japan K.K., Director - Japan Microsoft Azure Platform Services Lead & Japan Azure CoE Lead.
Version history
Last update:
‎Sep 02 2022 11:00 AM
Updated by: