Experiencing Data Latency for Log Analytics in East US 2 region - 09/04 - Resolved

Former Employee

Sep 04, 2021

Final Update: Sunday, 05 September 2021 05:26 UTC

We've confirmed that all systems are back to normal with no customer impact as of 09/05, 04:40 UTC. Our logs show the incident started on 09/03, 20:30 UTC and that during the 32 hours and 10 minutes that it took to resolve the issue some customers may have experienced intermittent data latency and alert activation.

Root Cause: The failure was due to a backend dependency which became unhealthy.
Incident Timeline: 32 Hours & 10 minutes - 09/03, 20:30 UTC through 09/05, 04:40 UTC

We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.

-Soumyajeet

Update: Sunday, 05 September 2021 04:04 UTC

Backlogged data continues to be drained. Customers will continue to see intermittent delayed data and incorrect alert activation. We expect that the remaining backlogged data will be completely ingested by 9/5 at 06:30 UTC.

Next Update: Before 09/05 07:00 UTC

-Jack Cantwell

Update: Sunday, 05 September 2021 01:04 UTC

The Log Analytics ingestion team has now fully recovered the problematic back end cluster and has scaled out even more in order to drain the queue of backlogged data as quickly as possible. Customers will continue to see intermittent delayed data and incorrect alert activation.

Next Update: Before 09/05 04:30 UTC

-Jack Cantwell

Update: Saturday, 04 September 2021 22:16 UTC

The Log Analytics ingestion team continues to mitigate the incident and the backlog in the data continues to drain.

Next Update: Before 09/05 01:30 UTC

-Jack Cantwell

Update: Saturday, 04 September 2021 18:38 UTC

Root cause has been isolated to the failure of a back end compute cluster which caused incoming data to queue up and not be processed. To address this issue we restarted the cluster. Ingestion is now working as expected and the backlog of data is now draining. However, because there is a large amount of backlogged data, some customers will continue to experience data latency and incorrect alert activation. We estimate several hours before all the backlogged data has been ingested.

Next Update: Before 09/04 22:00 UTC

-Jack Cantwell

Initial Update: Saturday, 04 September 2021 18:24 UTC

We are aware of issues within Log Analytics and are actively investigating. Some customers may experience delayed or missed Log Search Alerts and temporary unavailability of data.