Final Update: Tuesday, 19 November 2019 08:06 UTC
We've confirmed that all systems are back to normal with no customer impact as of 11/19, 01:13 UTC. Our logs show the incident started on 11/19, 17:00 UTC and that during the 8 hours and 13 minutes that it took to resolve the issue customers may have experienced higher than expected latency or failures regarding metric alerts during the impact window in East US region.
-
Root Cause: The failure was due to capacity issue with one of our dependent service which created backlog.
-
Incident Timeline: 8 Hours & 13 minutes - 11/19, 17:00 UTC through 11/19, 01:13 UTC
We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.
-Rama