Final Update: Monday, 03 May 2021 18:23 UTC
We've confirmed that all systems are back to normal with no customer impact as of 5/03, 17:57 UTC. Our logs show the incident started on 5/03, 17:11 UTC and that during the 46 minutes that it took to resolve the issue customers experienced intermittent data latency, data gaps, and incorrect alert activation.
- Root Cause: We identified that a backend scale unit began processing traffic through an incorrect route after a new configuration was applied as a part of a recent deployment. This caused new requests to not be ingested and processed correctly. We took this scale unit out of the traffic route and applied a new configuration to ensure traffic was correctly ingested and processed, which mitigated the issue.
- Incident Timeline: 46 minutes - 5/3, 17:11 UTC through M/D, 17:57 UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.
-Ian