We've confirmed that all systems are back to normal with no customer impact as of 07/14, 03:10 UTC. Our logs show the incident started on 07/13, 23:30 UTC and that during the 3 hours and 40 minutes that it took to resolve the issue subset of customers using Azure Log Analytics in West US 2 Region experienced issues with intermittent Log Data Latency and Incorrect Alert Activation .
Root Cause: The failure was due to issues with unhealthy nodes in the backend services.
Incident Timeline: 3 Hours & 40 minutes - 07/13, 23:30 UTC through 07/14, 03:10 UTC
We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.
Update: Wednesday, 14 July 2021 02:24 UTC
Root cause has been isolated to issue related to some of the nodes in the backend service which went unhealthy and that led to queue buildup which was impacting data latency for Azure Log Analytics Customers in West US 2 Region. To address this issue we restarted the affected nodes and scaled out the clusters to drain the backlog queue. Some customers may experience issues with intermittent log data latency and incorrect alert activation starting from 07/13 23:30 UTC and we estimate the issue to be resolved in the next 4 hours.