Final Update: Tuesday, 18 May 2021 20:15 UTC
We've confirmed that all systems are back to normal with no customer impact as of 05/18, 16:00 UTC. Our logs show the incident started on 05/18, 12:00 UTC and that during the 4 hours that it took to resolve the issue customers using Log Analytics in UK South may have experienced intermittent data latency and incorrect alert activation for resources in this region.
- Root Cause: After our investigation, we found that a backend scale unit become unhealthy due to an ingestion error. This ingestion is to process logging data for Log Analytics and caused alert systems to fail.
- Incident Timeline: 4 Hours - 05/18, 12:00 UTC through 05/18, 16:00 UTC
We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.
-Vincent