Home
%3CLINGO-SUB%20id%3D%22lingo-sub-1116886%22%20slang%3D%22en-US%22%3EExperiencing%20Alerting%20failure%20for%20Metric%20Alerts%20-%2001%2F19%20-%20Investigating%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1116886%22%20slang%3D%22en-US%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EInitial%20Update%3C%2FU%3E%3A%20Sunday%2C%2019%20January%202020%2010%3A55%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe%20are%20aware%20of%20issues%20within%20Azure%20Monitors%20and%20are%20actively%20investigating.%20Some%20customers%20may%20experience%20Alerting%20failure%20for%20the%20alert%20types%20-%20SCOM%2C%20ZABBIX%2C%20NAGIOS.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3EWork%20Around%3C%2FU%3E%3A%20None%3C%2FLI%3E%3CLI%3E%3CU%3ENext%20Update%3C%2FU%3E%3A%20Before%2001%2F19%2015%3A00%20UTC%3C%2FLI%3E%3C%2FUL%3EWe%20are%20working%20hard%20to%20resolve%20this%20issue%20and%20apologize%20for%20any%20inconvenience.%3CBR%20%2F%3E-Monish%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-1116886%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EMetric%20Alerts%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Final Update: Monday, 20 January 2020 04:19 UTC

We've confirmed that all systems are back to normal with no customer impact as of 1/19, 19:00 UTC. Our logs show the incident started on 1/16, 14:00 UTC and that during the 3 days, 5 hours that it took to resolve the issue a very small percentage of customers experienced Alerting failures for SCOM, ZABBIX, NAGIOS data types in the following regions: UK South, Australia South East, Japan East, Central India, South East Asia, West Central US, Canada Central, West US 2.
  • Root Cause: The failure was due to code regression in the most recent deployment.
  • Incident Timeline: 3 days, 5 hours - 1/16, 14:00 UTC through 1/19, 19:00 UTC
We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.

-Jeff

Update: Monday, 20 January 2020 02:34 UTC

Root cause has been isolated to an issue with latest update deployed which was impacting alert types - SCOM, ZABBIX, NAGIOS. To address this issue a hot fix is still being deployed.  Some customers may experience alerting failures for the mentioned alert types.
  • Work Around: None
  • Next Update: Before 01/20 15:00 UTC
-Jeff

Update: Sunday, 19 January 2020 14:06 UTC

Root cause has been isolated to an issue with latest update deployed which was impacting alert types - SCOM, ZABBIX, NAGIOS. To address this issue a hot fix is been deployed.  Some customers may experience alerting failures for the mentioned alert types.
  • Work Around:None
  • Next Update: Before 01/20 02:30 UTC
-Monitors

Initial Update: Sunday, 19 January 2020 10:55 UTC

We are aware of issues within Azure Monitors and are actively investigating. Some customers may experience Alerting failure for the alert types - SCOM, ZABBIX, NAGIOS.
  • Work Around: None
  • Next Update: Before 01/19 15:00 UTC
We are working hard to resolve this issue and apologize for any inconvenience.
-Monish