Experiencing Latency and Data Loss issue in Azure Portal for Many Data Types - 02/16 - Resolved

Former Employee

Feb 15, 2021

Final Update: Tuesday, 16 February 2021 02:26 UTC

We've confirmed that all systems are back to normal with no customer impact as of 02/16, 02:05 UTC. Our logs show the incident started on 02/16, 00:15 UTC and that during the 1 hour and 50 minutes that it took to resolve the issue customers experienced intermittent data gaps of up to 10% of data and incorrect alert activation.

Root Cause: The failure was due to a specific instance of the service processing backend that became unhealthy.
Incident Timeline: 1 Hours & 50 minutes - 02/16, 00:15 UTC through 02/16, 02:05 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Jeff

Update: Tuesday, 16 February 2021 02:12 UTC

Root cause has been isolated to a specific instance of the service processing backend that became unhealthy which was impacting the ingestion pipeline. To address this issue we restarted the affected instance and retrieved instance data for analysis.

Work Around: None
Next Update: Before 02/16 04:30 UTC

-Jeff

Initial Update: Tuesday, 16 February 2021 01:40 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers in Switzerland West may experience intermittent data gaps of up to 10% of data and incorrect alert activation starting at 2021-02-15 00:15 UTC.