Experiencing Latency and Data Loss issue in Azure Portal for Many Data Types - 01/21 - Resolved

Former Employee

Jan 21, 2021

Final Update: Friday, 22 January 2021 00:08 UTC

We've confirmed that all systems are back to normal with no customer impact as of 1/21, 23:56 UTC. Our logs show the incident started on 1/21, 21:55 UTC and that during the 2 hours and 1 minute that it took to resolve the issue 13% of customers in West US 2 and 28% of customers in West US with workspace-enabled Application Insights resources experienced intermittent data gaps and latent data as well as possible misfiring of alerts based on such data gaps or latencies.

Root Cause: The failure was due to a backend cache issue encountering a threshold. The resource was scaled out to handle the expanded load in the region.
Incident Timeline: 2 Hours & 1 minutes - 1/21, 21:55 UTC through 1/21, 23:56 UTC

We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Jeff

Update: Thursday, 21 January 2021 22:59 UTC

Root cause has been isolated to a backend component scale issue which was impacting customers with workspace-enabled Application Insights resources in West US2 and West US regions. To address this issue we are investigating scaling options in the backend components.

Work Around: None
Next Update: Before 01/22 01:00 UTC

-Jeff

Initial Update: Thursday, 21 January 2021 22:33 UTC

We are aware of issues within Application Insights and are actively investigating. Some customers may experience delayed or missed Log Search Alerts and Latency and Data Loss.