We've confirmed that all systems are back to normal with no customer impact as of 5/03, 17:57 UTC. Our logs show the incident started on 5/03, 17:11 UTC and that during the 46 minutes that it took to resolve the issue customers experienced intermittent data latency, data gaps, and incorrect alert activation.
Root Cause: We identified that a backend scale unit began processing traffic through an incorrect route after a new configuration was applied as a part of a recent deployment. This caused new requests to not be ingested and processed correctly. We took this scale unit out of the traffic route and applied a new configuration to ensure traffic was correctly ingested and processed, which mitigated the issue.
Incident Timeline: 46 minutes - 5/3, 17:11 UTC through M/D, 17:57 UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.
Update: Monday, 03 May 2021 17:51 UTC
We continue to investigate issues within Application Insights for South UK. Root cause is not fully understood at this time. Some customers continue to experience data latency and potential data gaps in Application Insights data. This could cause delayed or misfired alerts. We are working to establish the start time for the issue, initial findings indicate that the problem began at 5/03 17:11 UTC. We currently have no estimate for resolution.