SOLVED

Monitor Azure Resource Only

Brass Contributor

Hi All,

 

I am using Azure LA for VM performance monitoring but it include on-prem servers also. When we run CPU or memory or disk utilization alerts it include all the servers in Azure and on-prem.

 

Here we want to monitor only Azure resources only not on-prem, how to achieve that?

 

Queries :

 

CPU 

Perf
| where CounterName == "% Processor Time"
and CounterValue > 95
and ObjectName == "Processor"
and InstanceName == "_Total"
| summarize arg_max(TimeGenerated, CounterValue) by Computer, CounterName
 
Memory 

Perf

| where ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"

| where CounterValue > 80

| summarize (TimeGenerated, Free_Memory_Percent)=arg_max(TimeGenerated, CounterValue) by Computer

 

Disk 

Perf
| where ObjectName == "LogicalDisk" and CounterName == "% Free Space"
| where CounterValue < 10
| summarize (TimeGenerated, Free_Space_Percent)=arg_max(TimeGenerated, CounterValue) by Computer, InstanceName
| where InstanceName contains ":"

 

Thanks in advance.

6 Replies

Hello @Rahul_Mahajan

 

You can add a join to the Heartbeat table, as that has a Azure / non-Azure value.  You'll need to do the same for the other two examples.

 

Heartbeat
| where ComputerEnvironment =="Azure"
| distinct Computer
| join (
    Perf
    | where CounterName == "% Processor Time"
        and CounterValue > 95
        and ObjectName == "Processor"
        and InstanceName == "_Total"
    | summarize arg_max(TimeGenerated, CounterValue) by Computer, CounterName
) on Computer

 

To test run it here Go to Log Analytics and Run Query

 

best response confirmed by Stanislav Zhelyazkov (MVP)
Solution

Hi@Rahul_Mahajan 

Another way is to use _ResourceId column as a way of finding which VMs are in Azure.

Currently all Azure VMs have _ResourceId so you can do:

Perf
| where CounterName == "% Processor Time" and _ResourceId contains "virtualmachines" 

Additionally I would like to point out that all your queries are written incorrectly. You do not filter on CounterValue. First you need to summarize and than filter the threshold on the desired value. If you will use the queries for alerts than you do not filter in the query at all as the threshold is applied via the alert configuration. If you do not follow these rules you will get false positive on alerts.

Thanks @CliveWatson and @Stanislav Zhelyazkov both work completely fine.

Query to @Stanislav Zhelyazkov if I use _ResourceId contains "virtualmachines", it will only query VMs but not SQL DBs if I send logs to LA in future but if I use how to include them also or its good to keep SQL CPU usage in different alert.

Also I tried your suggestion on First you need to summarize and than filter the threshold on the desired value but unable to get results as I am not good in query writing, can you please edit above given query in correct way so I can do it for others and help me to understand.

@Rahul_Mahajan You should have single alert per resource type and performance counter. Creating one alert for multiple resource types that do not have things in common and multiple metrics is not approach I would recommend. In queries that you will not use for alerts but rather to visualize data you can do whatever correlation you want.

 

Below is example query on your first posted query. The rest queries will follow similar logic. Here is the logic of this query

  • You are getting data from Perf table
  • You filter on specific counter by specifying CounterName, ObjectName and InstaneName
  • You summarize the the value in CounterValue by using average function. The summarization is done for every computer in buckets of 5 minutes
  • The summarized value is put into column AggregatedValue
Perf
| where CounterName == "% Processor Time"
and ObjectName == "Processor"
and InstanceName == "_Total"
| summarize AggregatedValue = avg(CounterValue) by Computer, bin(TimeGenerated, 5m)
| where AggregatedValue > 95

As I have mentioned before if you will use the above query in alert than you will remove the last filter 'where AggregatedValue > 95' because you will set the threshold on the alert definition.

If I remember there is documentation on how to get started with Kusto query language, alerting etc. Also this forum contains a lot of examples on queries for a lot of things.

 

@Stanislav Zhelyazkov Got it now. One last issue is that when we run query it give multiple results for same computer name, how to make sure to get only one result per server which is latest

 

Test.PNG

@Rahul_Mahajan This is how the bin function works. You have to do some reading on the documentation to understand it better. https://docs.microsoft.com/en-us/azure/azure-monitor/log-query/get-started-queries .  The interval of the buckets it is important also when you create the alerts as there you have frequency and time windows of the alert.

1 best response

Accepted Solutions
best response confirmed by Stanislav Zhelyazkov (MVP)
Solution

Hi@Rahul_Mahajan 

Another way is to use _ResourceId column as a way of finding which VMs are in Azure.

Currently all Azure VMs have _ResourceId so you can do:

Perf
| where CounterName == "% Processor Time" and _ResourceId contains "virtualmachines" 

Additionally I would like to point out that all your queries are written incorrectly. You do not filter on CounterValue. First you need to summarize and than filter the threshold on the desired value. If you will use the queries for alerts than you do not filter in the query at all as the threshold is applied via the alert configuration. If you do not follow these rules you will get false positive on alerts.

View solution in original post