Experiencing Latency and Data Loss issue in Azure Portal for Many Data Types - 05/03 - Resolved

Published 05-03-2021 10:56 AM 744 Views
Final Update: Monday, 03 May 2021 18:23 UTC

We've confirmed that all systems are back to normal with no customer impact as of 5/03, 17:57 UTC. Our logs show the incident started on 5/03, 17:11 UTC and that during the 46 minutes that it took to resolve the issue customers experienced intermittent data latency, data gaps, and incorrect alert activation.
  • Root CauseWe identified that a backend scale unit began processing traffic through an incorrect route after a new configuration was applied as a part of a recent deployment. This caused new requests to not be ingested and processed correctly. We took this scale unit out of the traffic route and applied a new configuration to ensure traffic was correctly ingested and processed, which mitigated the issue.
  • Incident Timeline: 46 minutes - 5/3, 17:11 UTC through M/D, 17:57 UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Ian

Update: Monday, 03 May 2021 17:51 UTC

We continue to investigate issues within Application Insights for South UK. Root cause is not fully understood at this time. Some customers continue to experience data latency and potential data gaps in Application Insights data. This could cause delayed or misfired alerts. We are working to establish the start time for the issue, initial findings indicate that the problem began at 5/03 17:11 UTC. We currently have no estimate for resolution.
  • Work Around: none
  • Next Update: Before 05/03 20:00 UTC
-Ian

%3CLINGO-SUB%20id%3D%22lingo-sub-2318392%22%20slang%3D%22en-US%22%3EExperiencing%20Latency%20and%20Data%20Loss%20issue%20in%20Azure%20Portal%20for%20Many%20Data%20Types%20-%2005%2F03%20-%20Resolved%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2318392%22%20slang%3D%22en-US%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EFinal%20Update%3C%2FU%3E%3A%20Monday%2C%2003%20May%202021%2018%3A23%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe've%20confirmed%20that%20all%20systems%20are%20back%20to%20normal%20with%20no%20customer%20impact%20as%20of%205%2F03%2C%2017%3A57%20UTC.%20Our%20logs%20show%20the%20incident%20started%20on%205%2F03%2C%2017%3A11%20UTC%20and%20that%20during%20the%2046%20minutes%20that%20it%20took%20to%20resolve%20the%20issue%20customers%20experienced%20intermittent%20data%20latency%2C%20data%20gaps%2C%20and%20incorrect%20alert%20activation.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3ERoot%20Cause%3C%2FU%3E%3A%26nbsp%3B%3CSPAN%20style%3D%22color%3A%20inherit%3B%22%3E%3CSPAN%20style%3D%22font-family%3A%20%26quot%3BSegoe%20UI%26quot%3B%2C%20system-ui%2C%20%26quot%3BApple%20Color%20Emoji%26quot%3B%2C%20%26quot%3BSegoe%20UI%20Emoji%26quot%3B%2C%20sans-serif%3B%22%3EWe%20identified%20that%20a%20backend%20scale%20unit%20began%20processing%20traffic%20through%20an%20incorrect%20route%20after%20a%20new%20configuration%20was%20applied%20as%20a%20part%20of%20a%20recent%20deployment.%20This%20caused%20new%20requests%20to%20not%20be%20ingested%20and%20processed%20correctly.%26nbsp%3B%3C%2FSPAN%3E%3C%2FSPAN%3EWe%20took%20this%20scale%20unit%20out%20of%20the%20traffic%20route%20and%20applied%20a%20new%20configuration%20to%20ensure%20traffic%20was%20correctly%20ingested%20and%20processed%2C%20which%20mitigated%20the%20issue.%3C%2FLI%3E%3CLI%3E%3CU%3EIncident%20Timeline%3C%2FU%3E%3A%2046%20minutes%20-%205%2F3%2C%2017%3A11%20UTC%20through%20M%2FD%2C%2017%3A57%20UTC%3C%2FLI%3E%3C%2FUL%3EWe%20understand%20that%20customers%20rely%20on%20Application%20Insights%20as%20a%20critical%20service%20and%20apologize%20for%20any%20impact%20this%20incident%20caused.%3CBR%20%2F%3E%3CBR%20%2F%3E-Ian%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EUpdate%3C%2FU%3E%3A%20Monday%2C%2003%20May%202021%2017%3A51%20UTC%3CBR%20%2F%3E%3C%2FDIV%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CBR%20%2F%3EWe%20continue%20to%20investigate%20issues%20within%20Application%20Insights%20for%20South%20UK.%20Root%20cause%20is%20not%20fully%20understood%20at%20this%20time.%20Some%20customers%20continue%20to%20experience%20data%20latency%20and%20potential%20data%20gaps%20in%20Application%20Insights%20data.%20This%20could%20cause%20delayed%20or%20misfired%20alerts.%20We%20are%20working%20to%20establish%20the%20start%20time%20for%20the%20issue%2C%20initial%20findings%20indicate%20that%20the%20problem%20began%20at%205%2F03%2017%3A11%20UTC.%20We%20currently%20have%20no%20estimate%20for%20resolution.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3EWork%20Around%3C%2FU%3E%3A%20none%3C%2FLI%3E%3CLI%3E%3CU%3ENext%20Update%3C%2FU%3E%3A%20Before%2005%2F03%2020%3A00%20UTC%3C%2FLI%3E%3C%2FUL%3E-Ian%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E
Version history
Last update:
‎May 03 2021 11:29 AM
Updated by: