Experiencing Data Latency Issue in Azure Log Analytics for West US 2 Region - 07/14 - Resolved

Former Employee

Jul 14, 2021

Final Update: Wednesday, 14 July 2021 03:24 UTC

We've confirmed that all systems are back to normal with no customer impact as of 07/14, 03:10 UTC. Our logs show the incident started on 07/13, 23:30 UTC and that during the 3 hours and 40 minutes that it took to resolve the issue subset of customers using Azure Log Analytics in West US 2 Region experienced issues with intermittent Log Data Latency and Incorrect Alert Activation .

Root Cause: The failure was due to issues with unhealthy nodes in the backend services.
Incident Timeline: 3 Hours & 40 minutes - 07/13, 23:30 UTC through 07/14, 03:10 UTC

We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.

-Jayadev

Update: Wednesday, 14 July 2021 02:24 UTC

Root cause has been isolated to issue related to some of the nodes in the backend service which went unhealthy and that led to queue buildup which was impacting data latency for Azure Log Analytics Customers in West US 2 Region. To address this issue we restarted the affected nodes and scaled out the clusters to drain the backlog queue. Some customers may experience issues with intermittent log data latency and incorrect alert activation starting from 07/13 23:30 UTC and we estimate the issue to be resolved in the next 4 hours.