Experiencing Data Latency for Log Analytics - 06/20 - Resolved
Published Jun 20 2022 01:50 PM 6,354 Views
Final Update: Thursday, 23 June 2022 19:27 UTC

We've confirmed that all systems are back to normal with no customer impact as of 06/23, 17:00 UTC. Our logs show the incident started on 06/20, 16:00 UTC and that during the duration of incident hours that it took to resolve the issue most of customers in West Europe and some customer in South UK using Azure Log Analytics or Azure Sentinel who may have experienced issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel. Due to the impact to Log Analytics, customers using Azure Automation to delete / link Automation Accounts to workspaces would have been impacted. Customers using Azure Automation for Update Management solution may have seen their machines in a 'Not Assessed' state and Automation account not linked to Log Analytics. Customers using Automation for Change Tracking solution may not have seen their data updated in Log Analytics workspaces.
  • Root Cause: The failure was due to a backend service that Azure Log Analytics and Azure Sentinel utilizes became unhealthy due to a large volume of incoming requests. This was caused by the service auto scaling down resulting in a backlog of retries. During the process of working through the retry backlog, a subscription quota was reached, and the process was unable to get through this backlog.
  • Mitigation: After various mitigation workstream attempts, we have mitigated this issue which included performing a rollback to the last known healthy state, applying a scale out operation, and draining out backlog requests. 
  • Incident Timeline: 06/20, 16:00 UTC through 06/23, 17:00 UTC
We understand that customers rely on Azure Log Analytics as a critical service, and we sincerely apologize for any impact this incident caused.

-Arish Balasubramani

Update: Thursday, 23 June 2022 13:17 UTC

Root cause has been isolated to high load on backend component which was impacting Log Analytics. To address this issue, we have rolled out configuration fix, Paused and restarted the responsible backend services along with scaling out backend services to catch up with latency to mitigate the issue. We have applied a series of mitigation steps and we are continuing to monitor the service health to validate mitigation in West Europe. Additional engineering teams are being engaged to verify if additional mitigation workstreams are needed and issue is mitigated for UK South. The next update will be provided in 4 hours or as events warrant. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel.

Due to the impact to Log Analytics, customers using Azure Automation to delete/link automation accounts to workspaces will be impacted. Customers using Azure Automation for update management solution may see their machines in a 'Not Assessed' state and automation account not linked to Log Analytics. Customers using automation for change tracking solution may not see their data updated in Log Analytics Workspace.
We have entered the final phase of recovery where most data types have been recovered and customers will see signs of recovery. We are continuing to clear a backlog for some data types like IIS Logs and Custom Logs. Customers querying these data types may see some latency where query results may not reflect latest information. We are continuing to clear the backlog load for these data types. The next update will be provided in 4 hours or as events warrant.
  • Work Around: None
  • Next Update: Before 06/23 17:30 UTC
-Deepika

Update: Thursday, 23 June 2022 07:12 UTC

Root cause has been isolated to high load on backend component which was impacting Log Analytics. To address this issue, we have rolled out configuration fix, Paused and restarted the responsible backend services along with scaling out backend services to catch up with latency to mitigate the issue. We have applied a series of mitigation steps and we are continuing to monitor the service health to validate mitigation in West Europe. Additional engineering teams are being engaged to verify if additional mitigation workstreams are needed and issue is mitigated for UK South. The next update will be provided in 6 hours or as events warrant. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel.

Due to the impact to Log Analytics, customers using Azure Automation to delete/link automation accounts to workspaces will be impacted. Customers using Azure Automation for update management solution may see their machines in a 'Not Assessed' state and automation account not linked to Log Analytics. Customers using automation for change tracking solution may not see their data updated in Log Analytics Workspace.

As our mitigation steps were successfully applied and backend services returning healthy. We are currently in the process of catching up with ingestion backlogs.

We have completed the recovery for most data types. We expect to finish recovering for some of the remaining data types by 12:30 UTC on 23 Jun 2022.

  • Work Around: None
  • Next Update: Before 06/23 19:30 UTC
-Anmol

Update: Wednesday, 22 June 2022 22:47 UTC

Root cause has been isolated to high load on backend component which was impacting Log Analytics. To address this issue, we have rolled out configuration fix, Paused and restarted the responsible backend services along with scaling out backend services to catch up with latency to mitigate the issue. We have applied a series of mitigation steps and we are continuing to monitor the service health to validate mitigation in West Europe. Additional engineering teams are being engaged to verify if additional mitigation workstreams are needed and issue is mitigated for UK South. The next update will be provided in 6 hours or as events warrant. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel.

Due to the impact to Log Analytics, customers using Azure Automation to delete/link automation accounts to workspaces will be impacted. Customers using Azure Automation for update management solution may see their machines in a 'Not Assessed' state and automation account not linked to Log Analytics. Customers using automation for change tracking solution may not see their data updated in Log Analytics Workspace.

As our mitigation steps were successfully applied and backend services returning healthy. We are currently in the process of catching up with ingestion backlogs

  • Work Around:
  • Next Update: Before 06/23 05:00 UTC
-Arish Balasubramani

Update: Wednesday, 22 June 2022 11:34 UTC

Root cause has been isolated to high load on backend component which was impacting Log Analytics. To address this issue, we have rolled out configuration fix, Paused and restarted the responsible backend services along with scaling out backend services to catch up with latency to mitigate the issue. We have applied a series of mitigation steps and we are continuing to monitor the service health to validate mitigation in West Europe. Additional engineering teams are being engaged to verify if additional mitigation workstreams are needed and issue is mitigated for UK South. The next update will be provided in 6 hours or as events warrant. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel.

Due to the impact to Log Analytics, customers using Azure Automation to delete/link automation accounts to workspaces will be impacted. Customers using Azure Automation for update management solution may see their machines in a 'Not Assessed' state and automation account not linked to Log Analytics. Customers using automation for change tracking solution may not see their data updated in Log Analytics Workspace.
  • Work Around: None
  • Next Update: Before 06/22 18:00 UTC
-Sai Kumar

Update: Wednesday, 22 June 2022 06:29 UTC

Root cause has been isolated to high load on backend component which was impacting Log Analytics. To address this issue, we have rolled out configuration fix, Paused and restarted the responsible backend services along with scaling out backend services to catch up with latency to mitigate the issue. We are currently monitoring service health to confirm mitigation as we see signs of recovery for West Europe. And mitigation steps are in progress for UK South. The next update will be provided in 5 hours or as events warrant. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel.
  • Work Around: None
  • Next Update: Before 06/22 11:30 UTC
-Deepika

Update: Wednesday, 22 June 2022 02:55 UTC

Root cause has been isolated to high load on backend component which was impacting Log Analytics. To address this issue, we have rolled out configuration fix, Paused and restarted the responsible backend services along with scaling out backend services to catch up with latency to mitigate the issue. We are currently monitoring service health to confirm mitigation as we see signs of recovery for West Europe. And mitigation steps are in progress for UK South. The next update will be provided in 3 hours or as events warrant. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel.
  • Work Around: NA
  • Next Update: Before 06/22 06:00 UTC
-Arish Balasubramani

Update: Tuesday, 21 June 2022 21:56 UTC

Root cause has been isolated to high load on backend component which was impacting Log Analytics. To address this issue, we are rolling out the fix that help with the load, paused a responsible backend service from creating more load in order to mitigate the issue. The next update will be provided in 3 hours or as events warrant. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel.
  • Work Around: NA
  • Next Update: Before 06/22 02:00 UTC
-Arish Balasubramani

Update: Tuesday, 21 June 2022 15:10 UTC

Root cause has been isolated to high load on backend component which was impacting Log Analytics. To address this issue we are rolling out the fix that help with the load in order to mitigate the issue. The next update will be provided in 3 hours or as events warrant. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel.
  • Work Around: NA
  • Next Update: Before 06/21 18:30 UTC
-Deepika

Update: Tuesday, 21 June 2022 09:02 UTC

We continue to investigate issues within Log Analytics. Root cause is not fully understood at this time. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel. Our initial findings indicate that the problem began at 06/20 16:00 UTC. We currently have no estimate for resolution, Adding Cores Capacity to WEU allowing the service to scale out did not help.
  • Work Around: NA
  • Next Update: Before 06/21 14:30 UTC
-Deepika

Update: Tuesday, 21 June 2022 05:25 UTC

We continue to investigate issues within Log Analytics. Root cause is not fully understood at this time. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel. Our initial findings indicate that the problem began at 06/20 16:00 UTC. We currently have no estimate for resolution, Adding Cores Capacity to WEU allowing the service to scale out did not help.
  • Work Around: NA
  • Next Update: Before 06/21 08:30 UTC
-Deepika

Update: Monday, 20 June 2022 20:40 UTC

We continue to investigate issues within Log Analytics. Root cause is not fully understood at this time. Some customers may continue to experience issues with data access, missed or delayed log search alerts, data gaps and data latency and issues with Azure Sentinel. Our initial findings indicate that the problem began at 06/20 17:15 PM UTC. We currently have no estimate for resolution, Adding Cores Capacity to WEU allowing the service to scale out did not help.
  • Work Around: NA
  • Next Update: Before 06/20 23:00 UTC
-Arish Balasubramani

2 Comments
Version history
Last update:
‎Jun 23 2022 12:38 PM
Updated by: