Final Update: Sunday, 13 December 2020 20:02 UTC
We've confirmed that all systems are back to normal with no customer impact as of 12/13, 19:40 UTC. Our logs show the incident started on 11/20, 02:40 UTC and that during the duration of ~23 days that it took to resolve the issue some customers experienced alerting failures. For new alerts that fired post 12/13 19:40 UTC, these are accurately reflecting the state of the resource health. For alerts in Fired/Resolved status between 11/20 02:40 UTC and 12/13 19:40 UTC, please depend on notifications that are setup through Action groups for accurate status of resource health.
-
Root Cause: Root cause has been isolated to data from older version of code which was impacting Metric Alerts in Azure Portal.
-
Incident Timeline: ~23 days - 11/20, 02:40 UTC through 12/13, 19:40 UTC
We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.
-Anupama