repeated alerts

%3CLINGO-SUB%20id%3D%22lingo-sub-633023%22%20slang%3D%22en-US%22%3Erepeated%20alerts%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-633023%22%20slang%3D%22en-US%22%3E%3CP%3EHi%20Guys%2C%20Using%20below%20query%20I%20have%20enabled%20the%20alert%20for%20Processor%20Utilization%20with%20threshold%20of%2080%25%20targeting%20multiple%20windows%20servers%20on%20my%20workspace%20in%20Azure%20monitoring%20log%20space%20analytics.%20Perf%20%7C%20where%20ObjectName%20%3D%3D%20%22Processor%22%20and%20CounterName%20%3D%3D%20%22%25%20Processor%20Time%22%20%7C%20summarize%20AggregatedValue%20%3D%20avg(CounterValue)%20by%20bin(TimeGenerated%2C%205m)%2C%20Computer%20%7C%20where%20AggregatedValue%20%26gt%3B%2080%20Now%20the%20problem%20is%20I%20am%20getting%20multiple%20repeated%20alerts%20for%20the%20same%20severs%20for%20every%205%20minutes%20as%20the%20query%20frequency%20is%20set%20for%205%20minutes.%20Can%20Some%20one%20please%20guide%20me%20to%20stop%20this%20behavior%20so%20that%20it%20should%20trigger%20new%20alerts%20only%20when%20the%20new%20server%20breaches%20the%20threshold%20of%2080%25.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-633023%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%20Log%20Analytics%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-633121%22%20slang%3D%22en-US%22%3ERe%3A%20repeated%20alerts%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-633121%22%20slang%3D%22en-US%22%3ETake%20a%20look%20at%20%3CBR%20%2F%3E%3CBR%20%2F%3E%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fazure-monitor%2Fplatform%2Falerts-unified-log%23example-of-metric-measurement-type-log-alert%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fazure-monitor%2Fplatform%2Falerts-unified-log%23example-of-metric-measurement-type-log-alert%3C%2FA%3E%20%3CBR%20%2F%3E%3CBR%20%2F%3Eto%20only%20alert%20when%202%20or%20more%20breaches%20occur%3F%3CBR%20%2F%3E%3CBR%20%2F%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-637057%22%20slang%3D%22en-US%22%3ERe%3A%20repeated%20alerts%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-637057%22%20slang%3D%22en-US%22%3E%3CP%3EHi%20Clive%2C%3C%2FP%3E%3CP%3EThanks%20for%20response.%20Actually%20is%20my%20requirement%20is%20different%2C%20let%20me%20clarify%20it%20again%3B%3C%2FP%3E%3CP%3E%3CSTRONG%3EWhat%20we%20have%20done%20is%3A%20%3C%2FSTRONG%3E%3C%2FP%3E%3CP%3EWe%20have%20around%20200%20servers%20which%20are%20reporting%20to%20a%20log%20analytics%20workspace.%20We%20have%20created%20the%20CPU%20usage%20alerts%20with%2080%25%20threshold%20for%20them%20by%20using%20the%20below%20query%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSTRONG%3EPerf%20%7C%20where%20ObjectName%20%3D%3D%20%22Processor%22%20and%20CounterName%20%3D%3D%20%22%25%20Processor%20Time%22%20%7C%20summarize%20AggregatedValue%20%3D%20avg(CounterValue)%20by%20bin(TimeGenerated%2C%205m)%2C%20Computer%20%7C%20where%20AggregatedValue%20%26gt%3B%2080%20%3C%2FSTRONG%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWe%20have%20set%20the%20Frequency%20for%20this%20alert%20as%205%20minutes%20so%20that%20the%20query%20will%20be%20executed%20on%20every%205%20minutes%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSTRONG%3ENow%20what%20issue%20we%20are%20facing%20is%3B%20%3C%2FSTRONG%3E%3C%2FP%3E%3CP%3EAt%2010%3A00%20AM%20we%20will%20get%20the%20alerts%20for%20around%2020%20servers%20as%20the%20processor%20usage%20of%20them%20is%20above%2080%25%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAt%2010%3A05%20AM%20again%20we%20will%20get%20the%20alerts%20for%20same%2020%20servers%20as%20the%20processor%20usage%20of%20them%20is%20above%2080%25%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAt%2010%3A10%20AM%20we%20will%20get%20the%20alerts%20for%20around%2025%20servers%20(in%20this%205%20servers%20are%20new%20servers%20and%2020%20servers%20are%20the%20same%20previous%20servers)%20as%20the%20processor%20usage%20of%20them%20is%20above%2080%25%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAt%2010%3A15%20AM%20again%20we%20will%20get%20the%20alerts%20for%20same%2025%20servers%20as%20the%20processor%20usage%20of%20them%20is%20above%2080%25%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSTRONG%3EWhat%20we%20are%20looking%20for%20is%3B%3C%2FSTRONG%3E%20For%20every%20server%2C%20alert%20should%20be%20triggered%20only%20once%20(until%20issue%20resolves)%20and%20new%20alert%20should%20be%20triggered%20only%20when%20there%20is%20an%20alert%20for%20a%20new%20server.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAny%20suggestions%20how%20to%20accomplish%20this%E2%80%A6.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-637795%22%20slang%3D%22en-US%22%3ERe%3A%20repeated%20alerts%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-637795%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F338025%22%20target%3D%22_blank%22%3E%40roopesh_shetty%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EI%20think%20the%20challenge%20is%20the%205min%20window%2C%20the%20alert%20only%20sees%20the%20data%20within%20the%20past%205mins%20and%20has%20no%20concept%20of%20what%20happened%20before%2C%20hence%20it%20will%20fire%20the%20alert%20again.%26nbsp%3B%20%26nbsp%3BI'm%20happy%20to%20be%20corrected%20here%20but%20you'll%20probably%20need%20to%20add%20a%20longer%20window%20or%20use%20something%20like%20dynamic%20thresholds%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fazure-monitor%2Fplatform%2Falerts-dynamic-thresholds%23what-do-the-advanced-settings-in-dynamic-thresholds-mean%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fazure-monitor%2Fplatform%2Falerts-dynamic-thresholds%23what-do-the-advanced-settings-in-dynamic-thresholds-mean%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EMy%20other%20thought%2C%20was%20some%20logic%20to%20check%20the%20Alerts%2C%20still%20a%20work%20in%20progress%20(I%20just%20got%2010%20randon%20records%2C%20but%20we%20need%20to%20match%20the%20computer%20names%20with%20past%20alerts)%20but%20might%20help%3F%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CPRE%3EPerf%0A%7C%20where%20TimeGenerated%20%26gt%3B%20ago(5m)%0A%7C%20where%20ObjectName%20%3D%3D%20%22Processor%22%20and%20CounterName%20%3D%3D%20%22%25%20Processor%20Time%22%0A%7C%20summarize%20AggregatedValue%20%3D%20avg(CounterValue)%20by%20bin(TimeGenerated%2C%201m)%2C%20Computer%0A%7C%20join%20(%0A%20%20%20%20AlertHistory%0A%20%20%20%20%7C%20limit%2010%0A)%20on%20%24left.Computer%20%3D%3D%20%24right.SourceDisplayName%3C%2FPRE%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-642862%22%20slang%3D%22en-US%22%3ERe%3A%20repeated%20alerts%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-642862%22%20slang%3D%22en-US%22%3E%3CP%3EHi%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20tried%20to%20run%20this%20query%20provided%20by%20you%2C%20but%20getting%20the%20error%20as%20%3B%3C%2FP%3E%3CP%3E%3CSTRONG%3E'take'%20operator%3A%20Failed%20to%20resolve%20table%20or%20column%20expression%20named%20'AlertHistory'%20Support%20id%3A%206b982987-9b2b-4b24-b555-9b6ee8787e87%3C%2FSTRONG%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSTRONG%3EQuery%20%3A%3C%2FSTRONG%3E%3C%2FP%3E%3CP%3EPerf%3CBR%20%2F%3E%7C%20where%20TimeGenerated%20%26gt%3B%20ago(5m)%3CBR%20%2F%3E%7C%20where%20ObjectName%20%3D%3D%20%22Processor%22%20and%20CounterName%20%3D%3D%20%22%25%20Processor%20Time%22%3CBR%20%2F%3E%7C%20summarize%20AggregatedValue%20%3D%20avg(CounterValue)%20by%20bin(TimeGenerated%2C%201m)%2C%20Computer%3CBR%20%2F%3E%7C%20join%20(%3CBR%20%2F%3EAlertHistory%3CBR%20%2F%3E%7C%20limit%2010%3CBR%20%2F%3E)%20on%20%24left.Computer%20%3D%3D%20%24right.SourceDisplayName%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWhat%20could%20be%20wrong%20on%20this%20query.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-643130%22%20slang%3D%22en-US%22%3ERe%3A%20repeated%20alerts%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-643130%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F338025%22%20target%3D%22_blank%22%3E%40roopesh_shetty%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EHi%2C%20just%20change%20AlertHistory%20to%20%3CSTRONG%3EAlert%3C%2FSTRONG%3E%20-%20it%20will%20only%20show%20if%20you%20have%20some%3F%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CPRE%3EAlert%0A%7C%20where%20TimeGenerated%20%26gt%3B%20ago(30d)%0A%7C%20summarize%20by%20Computer%2C%20AlertName%3C%2FPRE%3E%0A%3CP%3E%3CA%20href%3D%22https%3A%2F%2Fportal.loganalytics.io%2FDemo%3Fq%3DH4sIAAAAAAAAA3PMSS0q4eWqUcjJzM0sUTA0AABH1qsaEQAAAA%253D%253D%26amp%3Btimespan%3DP7D%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noreferrer%22%3EGo%20to%20Log%20Analytics%20and%20Run%20Query%3C%2FA%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-643161%22%20slang%3D%22en-US%22%3ERe%3A%20repeated%20alerts%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-643161%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F239477%22%20target%3D%22_blank%22%3E%40Clive%20Watson%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EHi%20CLive%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3Ethis%20query%20output%20is%20always%20blank.%20Where%20we%20need%20to%20specify%20the%20threshold%20as%2080%25%20on%20this%20query%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EPerf%3CBR%20%2F%3E%7C%20where%20TimeGenerated%20%26gt%3B%20ago(5m)%3CBR%20%2F%3E%7C%20where%20ObjectName%20%3D%3D%20%22Processor%22%20and%20CounterName%20%3D%3D%20%22%25%20Processor%20Time%22%3CBR%20%2F%3E%7C%20summarize%20AggregatedValue%20%3D%20avg(CounterValue)%20by%20bin(TimeGenerated%2C%201m)%2C%20Computer%3CBR%20%2F%3E%7C%20join%20(%3CBR%20%2F%3EAlert%3CBR%20%2F%3E%7C%20limit%2010%3CBR%20%2F%3E)%20on%20%24left.Computer%20%3D%3D%20%24right.SourceDisplayName%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-643169%22%20slang%3D%22en-US%22%3ERe%3A%20repeated%20alerts%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-643169%22%20slang%3D%22en-US%22%3EHi%2C%20I%20was%20just%20giving%20you%20(and%20others)%20some%20KQL%20suggestions%2C%20hence%20a%20basic%20query%2C%20this%20isn't%20a%20fully%20working%20solution%20-%20it%20will%20need%20extra%20logic%2C%20and%20I%20don't%20even%20know%20if%20it%20will%20work...%3C%2FLINGO-BODY%3E
Contributor

Hi Guys, Using below query I have enabled the alert for Processor Utilization with threshold of 80% targeting multiple windows servers on my workspace in Azure monitoring log space analytics. Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" | summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer | where AggregatedValue > 80 Now the problem is I am getting multiple repeated alerts for the same severs for every 5 minutes as the query frequency is set for 5 minutes. Can Some one please guide me to stop this behavior so that it should trigger new alerts only when the new server breaches the threshold of 80%.

7 Replies

Hi Clive,

Thanks for response. Actually is my requirement is different, let me clarify it again;

What we have done is:

We have around 200 servers which are reporting to a log analytics workspace. We have created the CPU usage alerts with 80% threshold for them by using the below query

 

Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" | summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 5m), Computer | where AggregatedValue > 80

 

We have set the Frequency for this alert as 5 minutes so that the query will be executed on every 5 minutes

 

Now what issue we are facing is;

At 10:00 AM we will get the alerts for around 20 servers as the processor usage of them is above 80%

 

At 10:05 AM again we will get the alerts for same 20 servers as the processor usage of them is above 80%

 

At 10:10 AM we will get the alerts for around 25 servers (in this 5 servers are new servers and 20 servers are the same previous servers) as the processor usage of them is above 80%

 

At 10:15 AM again we will get the alerts for same 25 servers as the processor usage of them is above 80%

 

What we are looking for is; For every server, alert should be triggered only once (until issue resolves) and new alert should be triggered only when there is an alert for a new server.

 

Any suggestions how to accomplish this….

@roopesh_shetty

 

I think the challenge is the 5min window, the alert only sees the data within the past 5mins and has no concept of what happened before, hence it will fire the alert again.   I'm happy to be corrected here but you'll probably need to add a longer window or use something like dynamic thresholds https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alerts-dynamic-thresholds#what-do-the-...

 

My other thought, was some logic to check the Alerts, still a work in progress (I just got 10 randon records, but we need to match the computer names with past alerts) but might help?

 

Perf
| where TimeGenerated > ago(5m)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 1m), Computer
| join (
    AlertHistory
    | limit 10
) on $left.Computer == $right.SourceDisplayName

Hi,

 

I tried to run this query provided by you, but getting the error as ;

'take' operator: Failed to resolve table or column expression named 'AlertHistory' Support id: 6b982987-9b2b-4b24-b555-9b6ee8787e87

 

Query :

Perf
| where TimeGenerated > ago(5m)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 1m), Computer
| join (
AlertHistory
| limit 10
) on $left.Computer == $right.SourceDisplayName

 

What could be wrong on this query.

@roopesh_shetty 

 

Hi, just change AlertHistory to Alert - it will only show if you have some?

 

Alert
| where TimeGenerated > ago(30d)
| summarize by Computer, AlertName

Go to Log Analytics and Run Query

 

 

@CliveWatson 

 

 

Hi CLive,

 

this query output is always blank. Where we need to specify the threshold as 80% on this query?

 

Perf
| where TimeGenerated > ago(5m)
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AggregatedValue = avg(CounterValue) by bin(TimeGenerated, 1m), Computer
| join (
Alert
| limit 10
) on $left.Computer == $right.SourceDisplayName

 

Hi, I was just giving you (and others) some KQL suggestions, hence a basic query, this isn't a fully working solution - it will need extra logic, and I don't even know if it will work...