Final Update: Thursday, 21 May 2020 01:30 UTC
We've confirmed that all systems are back to normal with no customer impact as of 05/21, 01:00 UTC. Our logs show the incident started on 05/20, 20:15 UTC and that during the 4 hours 45 mins that it took to resolve the issue customers experienced failures when trying to create new rules are modify existing rules in East US and South Central US regions.
-
Root Cause: The failure was due to mismatch in configuration that induced conflicts, in one our dependent services.
- Incident Timeline: 4 Hours & 45 minutes - 05/20, 20:15 UTC through 05/21, 01:00 UTC
We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.
-chandar