We've confirmed that all systems are back to normal with no customer impact as of ~ 04/18 00:30 UTC. Our logs show the incident started on 04/17 ~12:17 UTC and that during the ~12 hours that it took to resolve the issue, EUS customers must have experienced issues in managing their existing OMS alerts from OMS and Ibiza portal for their workspaces.
Root Cause: The failure was due to configuration change in our backend.
Lessons Learned: Will be revised and listed down as part of our detailed RCA.
Incident Timeline: 12 Hours & 13 minutes - 04/17, 12:17 UTC through 04/18, 12:30 UTC
We understand that customers rely on Azure Log Analytics as a critical service and apologize for any impact this incident caused.
Update: Tuesday, 17 April 2018 23:32 UTC
Root cause has been isolated to one of the config issue with our alerting backend which was impacting customer's management of existing alerts in OMS and Ibiza portal. We are mitigating the issue region by region by fixing the backend configuration where ever it is impacted for customers. For now, EUS customers should see their old alerts back in UX. Current estimation to fix other impacted regions is ~2 hours from now.
Work Around: customers can use Azure Resource Manager. For more information, please see the following:
We continue to investigate issues within Log Analytics. Root cause fully understood at this time. Some customers continue to experience issues in managing their OMS alerts from OMS portal and Ibiza portal. Users monitoring pipeline from the alerts still work, can create new alerts as well. Initial findings indicate that the problem began at 04/17 ~12:17 UTC. We currently have no estimate for resolution.
Work Around: customers can use Azure
Resource Manager. For more information, please see the following:
We are aware of issues within Log Analytics and are actively investigating. Some customers may experience issues in accessing OMS alerts from OMS portal and Ibiza. New alerts creation is not impacted at the moment.
Work Around: None
Next Update: Before 04/17 17:30 UTC
We are working hard to resolve this issue and apologize for any inconvenience.