Monitoring & Ingestion

Brass Contributor

Anyone got any suggestions to get around some of the lengthy ingestion times you get with Log Analytics sometimes ? Alerting on Heartbeat provides a simple way of checking a VM is up and running but we've seen instances of up to an hour for the latest Heartbeat to be available for querying in Log Analytics. 

So you either have a lengthy period to check for (i.e. if no Heartbeat received for > 60mins then or alert) or you face plenty of false positives if you set the threshold for say 10mins.

Any ideas ?

 

2 Replies

@JK_UK 

 

You can ,measure the latency as well, see here: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/data-ingestion-time

 

The product group are always looking on improving latency, and also note from the above link many tables differ in their upload frequency.   So you may need some monitoring logic to have a case for your data sources.

 

Generally the agent data is quick but factors like your agent location to the Azure Region, topology and time of day may affect this
"To ensure the Log Analytics agent is lightweight, the agent buffers logs and periodically uploads them to Azure Monitor. Upload frequency varies between 30 seconds and 2 minutes depending on the type of data. Most data is uploaded in under 1 minute. Network conditions may negatively affect the latency of this data to reach Azure Monitor ingestion point."

 

If the machine is also in Azure, consider a Azure Monitor Metric alert, as that will give you a second check.  The Metric alerts have a low latency: https://docs.microsoft.com/en-us/azure/azure-monitor/platform/alerts-metric-overview, some people check both.

 

 

 

 

@CliveWatson 

 

Thanks Clive, much appreciated. We've looked at measuring ingestion time and we might have to look at that further, my concern with that is can LA measure ingestion time for a resource that is taking a long time to report in ? So for example, if a VM reports its Heartbeat 30 mins ago and the ingestion time was 1 minute then would LA just measure that and say that's great, everything's fine ?

LA surely won't know about a slow ingestion time until the latest Heartbeat (or whatever) arrives and by then it's too late ?

 

I'll also take a look at Metric Alerts but I think my hands are tied a bit there. We need to send over a custom JSON payload when an alert is triggered and unfortunately you can't do that with Metric Alerts. 

 

As an update, I'm just looking into this further. If I enable the Dependency Agent extension on the appropriate VMs then that seems to push through ServiceMap data to Azure Monitor, is that correct ? And if I do that, would VMConnection based on a Computer name give another option to check whether the VM is running ?

 

So in simple terms:

Heartbeat    : Check a Hearbeat has been received within the last 5 mins

VMConnection     : Check Computer has sent through 'some' data within the last 5 mins 

 

If either of those are true then we're happy the VM is running. The big question is though, would ingestion time affect VMConnection and Heartbeat data in different ways ? I think you're saying (based on the article you mentioned above)  it would, which is a good thing.