Final Update: Monday, 20 January 2020 04:19 UTC
We've confirmed that all systems are back to normal with no customer impact as of 1/19, 19:00 UTC. Our logs show the incident started on 1/16, 14:00 UTC and that during the 3 days, 5 hours that it took to resolve the issue a very small percentage of customers experienced Alerting failures for SCOM, ZABBIX, NAGIOS data types in the following regions: UK South, Australia South East, Japan East, Central India, South East Asia, West Central US, Canada Central, West US 2.
-
Root Cause: The failure was due to code regression in the most recent deployment.
- Incident Timeline: 3 days, 5 hours - 1/16, 14:00 UTC through 1/19, 19:00 UTC
We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.
-Jeff
Update: Monday, 20 January 2020 02:34 UTC
Root cause has been isolated to an issue with latest update deployed which was impacting alert types - SCOM, ZABBIX, NAGIOS. To address this issue a hot fix is still being deployed. Some customers may experience alerting failures for the mentioned alert types.
- Work Around: None
- Next Update: Before 01/20 15:00 UTC
-Jeff
Update: Sunday, 19 January 2020 14:06 UTC
Root cause has been isolated to an issue with latest update deployed which was impacting alert types - SCOM, ZABBIX, NAGIOS. To address this issue a hot fix is been deployed. Some customers may experience alerting failures for the mentioned alert types.
- Work Around:None
- Next Update: Before 01/20 02:30 UTC
-Monitors
Initial Update: Sunday, 19 January 2020 10:55 UTC
We are aware of issues within Azure Monitors and are actively investigating. Some customers may experience Alerting failure for the alert types - SCOM, ZABBIX, NAGIOS.
- Work Around: None
- Next Update: Before 01/19 15:00 UTC
We are working hard to resolve this issue and apologize for any inconvenience.
-Monish