Forum Discussion
Real time metric monitoring solution
Hi James,
Can you post your query? I think you may be doing something in the query that is causing that level of lag. I'd say 20 minutes is a pretty reliable level of lag from condition to alert in my experience, so this sounds like either something wrong with the query you're using or there is some latency elsewhere in the system.
I have this query for latency that runs as an alert, and it pretty reliably gives me an idea when things are slow in the system:
Heartbeat
| order by TimeGenerated
| limit 1
I alert on that when the number of entries is less than 1.
As for your cloud based solution: OMS/Azure Log Analytics isn't very suited to endpoint monitoring (such as URLs and DNS responses). I've turned to Anturis for endpoint monitoring in the past as it is very low cost and can monitor anything with a URL attached to it.
Out of curiosity, how real time are you looking for? The last time I checked the SLA from OMS, latency of up to 2 hours was within the SLA, but we've recently moved over to Azure Log Analytics and I haven't seen the SLAs within Azure Log Analytics.