We’ve obviously received a lot of questions over the last couple days for more specifics about why this incident affected more than just the South Central region, which was impacted by the datacenter cooling outage on September 4th. You can read more about this here:
Application Insights resources that were located in the South Central US data center, plus some resources from the East US data center were most impacted. These resources were unavailable to manage during the duration of the initial incident.
However, all Application Insights resources across
regions experienced some impact during this incident. This was caused by impact in non-regional services such as Azure Active Directory, Azure Resource Manager and internal components that provide capability for data routing used by other regional components. This resulted in global impact to Application Insights, including the ability to
query data, significant delays in ingestion, and update and manage some types of resources, such as Availability Tests. This was not a result of customer data being stored in the South Central data center; customer data stored within Application Insights resides in the geography it is sent to as described here: https://docs.microsoft.com/en-us/azure/application-insights/app-insights-data-retention-priva...
Recovery from this incident took longer than usual because of continued authentication issues and scaling issues. Application Insights ingestion occurs at the closest ingestion endpoint. This ingestion continued across all regions during the outage, but due to the issues described above, this data could not be routed to the regional storage location. This resulted in a backlog of data which needed to be cleared before new data could be persisted and would be available to query. The impact of this latency in data ingestion surfaced in many ways, including gaps in data as seen in the Azure portal, Log Search alerts firing based on latent ingested data, latency in reporting billing data to Azure commerce, and delays in seeing the results of Availability tests in the Azure portal.
Due to historical reasons Application Insights status is posted on this Application Insights Service Blog. We are working to retire this blog and post all new service status on the Azure Service status page in the future. We understand that Application Insights is an important service for many of you, and apologize again for the impact this incident caused. We are continuing to invest in improvements to the resiliency of the service to ensure future incidents with regional impact do not impact resources in other data centers.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.