Oct 11 2019 01:18 AM
When you say high load, do you mean CPU/Memory on the Windows or Linux server? If so you need to capture info into the Perf Table using Log Analytics, i.e the "process" counters for the agent or the server a whole. You may also want to consider Azure Monitor Metric alerts - as they are near real-time.
However if its delays you are looking for then the Heartbeat (and all tables) provide the ingestion and latency info, Some examples from: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-ingestion-time
// https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-ingestion-time Heartbeat | where TimeGenerated > ago(8h) | extend E2EIngestionLatency = ingestion_time() - TimeGenerated | extend AgentLatency = _TimeReceived - TimeGenerated | summarize percentiles(E2EIngestionLatency,50,95), percentiles(AgentLatency,50,95) by Computer | top 20 by percentile_E2EIngestionLatency_95 desc Heartbeat | where TimeGenerated > ago(24h) | extend E2EIngestionLatencyMin = todouble(datetime_diff("Second",ingestion_time(),TimeGenerated))/60 | extend AgentLatencyMin = todouble(datetime_diff("Second",_TimeReceived,TimeGenerated))/60 | summarize percentiles(E2EIngestionLatencyMin,50,95), percentiles(AgentLatencyMin,50,95) by bin(TimeGenerated,30m) , Computer | limit 3
note: latency can be caused by many factors
Oct 13 2019 11:10 AM
1. To Alert you typically need an AggregatedValue, this allows an Azure Monitor Alert to display a value, that the alert can threshold from.
union withsource = tt * | where TimeGenerated < now() | where isnotempty(Type) | summarize maxTimeGenerated=max(TimeGenerated) by Type | where maxTimeGenerated < ago(24h) | extend SolutionName = strcat(Type, ': LatestData: ', maxTimeGenerated) | summarize AggregatedValue = count() by SolutionName, maxTimeGenerated
So if I run the above, I would make an Alert by pressing the "Add New Alert Rule"
Please see https://docs.microsoft.com/en-us/azure/azure-monitor/learn/tutorial-response
and also this series of posts (this is Post 7, but start at #1): https://cloudadministrator.net/2019/10/07/azure-monitor-alert-series-part-7/?fbclid=IwAR0pBvGLhqmZFI...
2. Maybe something like this, please modify to suit.
Heartbeat | where TimeGenerated > ago(1h) | extend E2EIngestionLatency = ingestion_time() - TimeGenerated | extend AgentLatency = _TimeReceived - TimeGenerated | summarize avgLatency = avg(AgentLatency) by Computer , E2EIngestionLatency | where avgLatency > E2EIngestionLatency | extend avgLatencyBreachedfor = strcat(Computer, ' : ', avgLatency) | summarize AggregatedValue = count() by avgLatencyBreachedfor