Experiencing intermittent data gaps issue for Log Analytics - 08/01 - Resolved
Published Jul 31 2019 06:37 PM 1,277 Views
Final Update: Thursday, 01 August 2019 16:36 UTC

We've confirmed that all systems are back to normal as of 08/01, 14:45 UTC. Our logs show the incident started on 07/31, 19:20 UTC and that during the 19 hours & 05 minutes that it took to resolve the issue some customers workspaces would not have received correct updates for solution targeting feature. Service is fully backup and processing the backlog. We estimate next 4-6 hours for backlog to be completely processed and some customers may continue to see residual impact until the backlog is fully caught up.

  • Root Cause: The failure was due to a scheduling service within Log Analytics feature which experienced performance degradation.
  • Lessons Learned: We will be performing full postmortem of this issue and will be taking up steps to alleviate any further reliability issues.
  • Incident Timeline: 19 Hours & 05 minutes - 07/31, 19:20 UTC through 08/01, 14:45 UTC
We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.

-Naresh

Final Update: Thursday, 01 August 2019 16:28 UTC

We've confirmed that all systems are back to normal as of 08/01, 14:45 UTC. Our logs show the incident started on 07/31, 19:20 UTC and that during the 19 hours & 05 minutes that it took to resolve the issue some customers workspaces would not have received correct updates for solution targeting feature. Service if fully backup and processing the backlog. We estimate next 4-6 hours for backlog to be completely processed and some customers may continue to see residual impact until the backlog is fully caught up.
  • Root Cause: The failure was due to a scheduling service within Log Analytics feature which experienced performance degradation.
  • Lessons Learned: We will be performing full postmortem of this issue and will be taking up steps to alleviate any further reliability issues.
  • Incident Timeline: 19 Hours & 05 minutes - 07/31, 19:20 UTC through 08/01, 14:45 UTC
We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.

-Naresh

Final Update: Thursday, 01 August 2019 16:10 UTC

We've confirmed that all systems are back to normal as of 08/01, 02:46 UTC. Our logs show the incident started on 07/31, 19:20 UTC and that during the 7 hours & 16 minutes that it took to resolve the issue some customers workspaces would not have received correct updates for solution targeting feature. Service if fully backup and processing the backlog. We estimate next 4-6 hours for backlog to be completely processed and some customers may continue to see residual impact until the backlog is fully caught up.
  • Root Cause: The failure was due to a scheduling service within Log Analytics feature which experienced performance degradation.
  • Lessons Learned: We will be performing full postmortem of this issue and will be taking up steps to alleviate any further reliability issues.
  • Incident Timeline: 7 Hours & 16 minutes - 07/31, 19:20 UTC through 08/01, 02:46 UTC
We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.

-Naresh

Update: Thursday, 01 August 2019 14:52 UTC

Root cause has been isolated to Solution Targeting feature which was impacting Log Analytics. To address this issue we rolled back to previous version of service. Solution Targeting feature is now working as expected. Some customers may experience intermittent data gaps related to solution specific data and we estimate 2hours before all data gaps is addressed.

  • Work Around: Rollback to previous version of the back-end service.
  • Next Update: Before 08/01 17:00 UTC
-Naresh

Update: Thursday, 01 August 2019 12:48 UTC

We continue to investigate issues within Log Analytics. Customers using Solution Targeting feature within Log Analytics in East US might continue to see intermittent data gaps related to solution specific data. We are working to establish the start time for the issue, initial findings indicate that the problem began at 07/31 ~19:20 UTC. 

  • Work Around: None
  • Next Update: Before 08/01 15:00 UTC
-Naresh

Update: Thursday, 01 August 2019 10:18 UTC

We continue to investigate issues within Log Analytics. Customers using Solution Targeting feature within Log Analytics in East US might continue to see intermittent data gaps related to solution specific data. We are working to establish the start time for the issue, initial findings indicate that the problem began at 07/31 ~19:20 UTC. 

  • Work Around: None
  • Next Update: Before 08/01 12:30 UTC
-Naresh

Update: Thursday, 01 August 2019 07:37 UTC

We continue to investigate issues within Log Analytics. Customers using Solution Targeting feature within Log Analytics in East US might continue to see intermittent data gaps related to solution specific data. Service is back to normal in Central Canada region as of  08/01 04:00 UTC. 
  • Work Around: None
  • Next Update: Before 08/01 10:00 UTC
-Naresh

Update: Thursday, 01 August 2019 07:20 UTC

We continue to investigate issues within Log Analytics. Customers using Solution Targeting feature within Log Analytics in East US and Central Canada regions might continue to see intermittent data gaps related to solution specific data. We are working to establish the start time for the issue, initial findings indicate that the problem began at 07/31 ~19:20 UTC. 

  • Work Around: None
  • Next Update: Before 08/01 09:30 UTC
-Naresh

Update: Thursday, 01 August 2019 04:54 UTC

We continue to investigate issues within Log Analytics. Root cause is not fully understood at this time. Customers using Solution Targeting feature within Log Analytics in East US and Central Canada regions might continue to see intermittent data gaps related to solution specific data. We are working to establish the start time for the issue, initial findings indicate that the problem began at 07/27 ~06:00 UTC. We currently have no estimate for resolution.

  • Work Around: none
  • Next Update: Before 08/01 07:00 UTC
-Naresh

Update: Thursday, 01 August 2019 03:17 UTC

We continue to investigate issues within Log Analytics. Root cause is not fully understood at this time. Customers using Solution Targeting feature within Log Analytics in East US and Central Canada regions might continue to see intermittent data gaps related to solution specific data. We currently have no estimate for resolution.

https://docs.microsoft.com/en-us/azure/azure-monitor/insights/solution-targeting 

  • Work Around: none

  • Next Update: Before 08/01 05:30 UTC

-Sindhu   

 


Initial Update: Thursday, 01 August 2019 01:29 UTC

We are aware of issues within Log Analytics and are actively investigating. Customers using Solution Targeting feature within Log Analytics in East US and Central Canada regions might see intermittent data gaps related to solution specific data.

  • Next Update: Before 08/01 03:30 UTC
We are working hard to resolve this issue and apologize for any inconvenience.
-Sindhu

Version history
Last update:
‎Aug 01 2019 09:42 AM
Updated by: