Experiencing Latency and Data Loss issue in Azure Portal for Many Data Types - 05/03 - Resolved

Former Employee

May 03, 2021

Final Update: Monday, 03 May 2021 18:23 UTC

We've confirmed that all systems are back to normal with no customer impact as of 5/03, 17:57 UTC. Our logs show the incident started on 5/03, 17:11 UTC and that during the 46 minutes that it took to resolve the issue customers experienced intermittent data latency, data gaps, and incorrect alert activation.

Root Cause: We identified that a backend scale unit began processing traffic through an incorrect route after a new configuration was applied as a part of a recent deployment. This caused new requests to not be ingested and processed correctly. We took this scale unit out of the traffic route and applied a new configuration to ensure traffic was correctly ingested and processed, which mitigated the issue.
Incident Timeline: 46 minutes - 5/3, 17:11 UTC through M/D, 17:57 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Ian

Update: Monday, 03 May 2021 17:51 UTC

We continue to investigate issues within Application Insights for South UK. Root cause is not fully understood at this time. Some customers continue to experience data latency and potential data gaps in Application Insights data. This could cause delayed or misfired alerts. We are working to establish the start time for the issue, initial findings indicate that the problem began at 5/03 17:11 UTC. We currently have no estimate for resolution.

Work Around: none
Next Update: Before 05/03 20:00 UTC

-Ian

Updated May 03, 2021

Version 4.0

Azure-Monitor-Team

Former Employee

Joined February 13, 2019

View Profile

Azure Monitor Status Archive

Follow this blog board to get notified when there's new activity