We've confirmed that all systems are back to normal with no customer impact as of 09/05, 04:40 UTC. Our logs show the incident started on 09/03, 20:30 UTC and that during the 32 hours and 10 minutes that it took to resolve the issue some customers may have experienced intermittent data latency and alert activation.
Root Cause: The failure was due to a backend dependency which became unhealthy.
Incident Timeline: 32 Hours & 10 minutes - 09/03, 20:30 UTC through 09/05, 04:40 UTC
We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.
Update: Sunday, 05 September 2021 04:04 UTC
Backlogged data continues to be drained. Customers will continue to see intermittent delayed data and incorrect alert activation. We expect that the remaining backlogged data will be completely ingested by 9/5 at 06:30 UTC.
Next Update: Before 09/05 07:00 UTC
Update: Sunday, 05 September 2021 01:04 UTC
The Log Analytics ingestion team has now fully recovered the problematic back end cluster and has scaled out even more in order to drain the queue of backlogged data as quickly as possible. Customers will continue to see intermittent delayed data and incorrect alert activation.
Next Update: Before 09/05 04:30 UTC
Update: Saturday, 04 September 2021 22:16 UTC
The Log Analytics ingestion team continues to mitigate the incident and the backlog in the data continues to drain.
Next Update: Before 09/05 01:30 UTC
Update: Saturday, 04 September 2021 18:38 UTC
Root cause has been isolated to the failure of a back end compute cluster which caused incoming data to queue up and not be processed. To address this issue we restarted the cluster. Ingestion is now working as expected and the backlog of data is now draining. However, because there is a large amount of backlogged data, some customers will continue to experience data latency and incorrect alert activation. We estimate several hours before all the backlogged data has been ingested.
Next Update: Before 09/04 22:00 UTC
Initial Update: Saturday, 04 September 2021 18:24 UTC
We are aware of issues within Log Analytics and are actively investigating. Some customers may experience delayed or missed Log Search Alerts and temporary unavailability of data.
Next Update: Before 09/04 19:30 UTC
We are working hard to resolve this issue and apologize for any inconvenience. -Jack Cantwell