We've confirmed that all systems are back to normal with no customer impact as of 06/01, 05:30 UTC. Our logs show the incident started on 05/19, 23:00 UTC and that during the 12 days 6 hours & 30 minutes that it took to resolve the issue. Some of the customers might have experienced viewing two alerts for notification. When a resource is unhealthy, two notifications are being raised and similarly for alert resolution as well.
An additional symptom in some instances is that one or both of the duplicate alerts would not have been resolved.
Root Cause: The failure was due to one of the backend dependency.
Incident Timeline: 12 days 6 Hours & 30 minutes - 05/19, 23:00 UTC through 06/01, 05:30 UTC.
We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.
Update: Tuesday, 31 May 2022 23:28 UTC
Root cause has been isolated to a bad deployment. To address this issue we are in the process of rolling back. As a customer of Metric Alerts you could be experiencing viewing two alerts for unhealthy resources. When a resource is unhealthy, two notifications are being raised and similarly for alert resolution, two notifications are being sent