repeated alerts

roopesh_shetty · ‎May 22 2019

Hi Guys, Using below query I have enabled the alert for Processor Utilization with threshold of 80% targeting multiple windows servers on my workspace in Azure monitoring log space analytics. Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" | summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer | where AggregatedValue > 80 Now the problem is I am getting multiple repeated alerts for the same severs for every 5 minutes as the query frequency is set for 5 minutes. Can Some one please guide me to stop this behavior so that it should trigger new alerts only when the new server breaches the threshold of 80%.

CliveWatson · ‎May 22 2019

Take a look at

https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alerts-unified-log#example-of-metric-m...

to only alert when 2 or more breaches occur?

roopesh_shetty · ‎May 23 2019

Hi Clive,

Thanks for response. Actually is my requirement is different, let me clarify it again;

What we have done is:

We have around 200 servers which are reporting to a log analytics workspace. We have created the CPU usage alerts with 80% threshold for them by using the below query

Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" | summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer | where AggregatedValue > 80

We have set the Frequency for this alert as 5 minutes so that the query will be executed on every 5 minutes

Now what issue we are facing is;

At 10:00 AM we will get the alerts for around 20 servers as the processor usage of them is above 80%

At 10:05 AM again we will get the alerts for same 20 servers as the processor usage of them is above 80%

At 10:10 AM we will get the alerts for around 25 servers (in this 5 servers are new servers and 20 servers are the same previous servers) as the processor usage of them is above 80%

At 10:15 AM again we will get the alerts for same 25 servers as the processor usage of them is above 80%

What we are looking for is; For every server, alert should be triggered only once (until issue resolves) and new alert should be triggered only when there is an alert for a new server.

Any suggestions how to accomplish this….

CliveWatson · ‎May 23 2019

@roopesh_shetty

I think the challenge is the 5min window, the alert only sees the data within the past 5mins and has no concept of what happened before, hence it will fire the alert again. I'm happy to be corrected here but you'll probably need to add a longer window or use something like dynamic thresholds https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alerts-dynamic-thresholds#what-do-the-...

My other thought, was some logic to check the Alerts, still a work in progress (I just got 10 randon records, but we need to match the computer names with past alerts) but might help?

Perf
| where TimeGenerated > ago(5m)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 1m), Computer
| join (
    AlertHistory
    | limit 10
) on $left.Computer == $right.SourceDisplayName

roopesh_shetty · ‎May 23 2019

Hi,

I tried to run this query provided by you, but getting the error as ;

'take' operator: Failed to resolve table or column expression named 'AlertHistory' Support id: 6b982987-9b2b-4b24-b555-9b6ee8787e87

Query :

Perf
| where TimeGenerated > ago(5m)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 1m), Computer
| join (
AlertHistory
| limit 10
) on $left.Computer == $right.SourceDisplayName

What could be wrong on this query.

CliveWatson · ‎May 24 2019

@roopesh_shetty

Hi, just change AlertHistory to Alert - it will only show if you have some?

Alert
| where TimeGenerated > ago(30d)
| summarize by Computer, AlertName

Go to Log Analytics and Run Query

roopesh_shetty · ‎May 24 2019

@CliveWatson

Hi CLive,

this query output is always blank. Where we need to specify the threshold as 80% on this query?

Perf
| where TimeGenerated > ago(5m)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 1m), Computer
| join (
Alert
| limit 10
) on $left.Computer == $right.SourceDisplayName

CliveWatson · ‎May 24 2019

Hi, I was just giving you (and others) some KQL suggestions, hence a basic query, this isn't a fully working solution - it will need extra logic, and I don't even know if it will work...

repeated alerts

repeated alerts

Re: repeated alerts

Re: repeated alerts

Re: repeated alerts

Re: repeated alerts

Re: repeated alerts

Re: repeated alerts

Re: repeated alerts

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

repeated alerts