Final Update: Friday, 25 October 2019 16:43 UTC
We've confirmed that all systems are back to normal with no customer impact as of 10/25, 16:42 UTC. Our logs show the incident started on 10/25, 12:45 UTC and that during the 3 hours, 57 minutes that it took to resolve the issue some customers experienced latency in their alerts in the East US region.
-
Root Cause: The failure was due to a spike in alerts created a backlog. Then a backend system experienced a failure during scale out.
- Incident Timeline: 3 Hours & 57 minutes - 10/25, 12:45 UTC through 10/25, 16:42 UTC
We understand that customers rely on Log Search Alerts as a critical service and apologize for any impact this incident caused.
-Jack Cantwell