Final Update: Wednesday, 17 March 2021 01:54 UTC
We've confirmed that all systems are back to normal with no customer impact as of 03/17, 01:45 UTC. Our logs show the incident started on 03/17, 00:42 UTC and that during the one hour that it took to resolve the issue approximately 50% customers in the East US region experienced data ingestion latency and possibly delayed or misfired alerts.
- Root Cause: The failure was due to a back end system reaching its operational threshold. Once the system was restarted and scaled out, the backlog of ingestion data began to drain.
- Incident Timeline: 1 hour and 3 minutes - 03/17, 00:42 UTC through 03/17, 01:45 UTC
We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.
-Jack