We've confirmed that all systems are back to normal with no customer impact as of 7/18, 00:15 UTC. Our logs show the incident started on 7/17, 20:31 UTC and that during the 3 hours and 44 minutes that it took to resolve the issue customers experienced delayed metric ingestion. Metric Alert rule evaluation would also have been paused to avoid false alerts during this time. .
Root Cause: During the window of impact a sudden spike in requests was observed and an operational threshold was reached causing intermittent metric loss.
Incident Timeline: 3 Hours & 44 minutes - 7/17, 20:31 UTC through 7/18, 00:15 UTC
We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.
Initial Update: Tuesday, 17 May 2022 23:07 UTC
We are aware of issues within Metric Dashboards
and Metric Alerts and are actively investigating. Some customers may experience
delayed metric ingestion. Metric Alert evaluation may also be paused to avoid
false alerts during this time.
Work Around: none
Next Update: Before 05/18 01:30 UTC
working hard to resolve this issue and apologize for any inconvenience. -Ian