Experiencing delayed ingestion for Log Analytics in East US - 03/17 - Resolved

Published 04-07-2021 11:07 AM 471 Views
Final Update: Wednesday, 17 March 2021 01:54 UTC

We've confirmed that all systems are back to normal with no customer impact as of 03/17, 01:45 UTC. Our logs show the incident started on 03/17, 00:42 UTC and that during the one hour that it took to resolve the issue approximately 50% customers in the East US region experienced data ingestion latency and possibly delayed or misfired alerts.
  • Root Cause: The failure was due to a back end system reaching its operational threshold. Once the system was restarted and scaled out, the backlog of ingestion data began to drain.
  • Incident Timeline: 1 hour and 3 minutes - 03/17, 00:42 UTC through 03/17, 01:45 UTC
We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.

-Jack

Update: Wednesday, 17 March 2021 01:39 UTC

Root cause has been isolated to a back end system reaching its system limits which caused ingestion to back up and result in latency and possible missed or late-firing alerts. To address this issue we scaled out the back end service. The ingestion back up is mostly resolved and ingestion will shortly be back working as expected. Some customers may continue to experience data latency and improperly fired alerts. We estimate less than one hour before all latency in data and alerting is addressed.
  • Next Update: Before 03/17 03:00 UTC
-Jack

Version history
Last update:
‎Mar 16 2021 06:59 PM
Updated by: