We've confirmed that all systems are back to normal with no customer impact as of 05/26, 17:03 UTC. Our logs show the incident started on 05/26, 11:14 UTC and that during the 5+ hours that it took to resolve the issue 100% of customers in the UK South region experienced ingestion latency and misfired alerts.
Root Cause: The failure was due to an incorrect configuration of a back end service rolled out in a new deployment.
Incident Timeline: 5 Hours, 49 minutes - 05/26, 11:14 UTC through 05/26, 17:03 UTC
We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.
Update: Wednesday, 26 May 2021 15:36 UTC
Root cause has been isolated to capacity issue which caused data latency. To address this issue we increased the capacity. Some customers may continue to experience intermittent data latency and incorrect alert activation for resources in South UK region.