We've confirmed that all systems are back to normal with no customer impact as of 8/9, 22:54 UTC. Our logs show the incident started on 8/9, 18:59 UTC and that during the 3 hours that it took to resolve the issue 1062 subscriptions (~16%) experienced delayed alerts that fired after the issue was mitigated.
Root Cause: The failure was due to a burst of service health activity log events which increased the evaluation latency of the events. Scaling out the service mitigated the issue.
Incident Timeline: 3 Hours & 55 minutes - 8/9, 18:59 UTC through 8/9, 22:54 UTC
We understand that customers rely on Log Search Alerts as a critical service and apologize for any impact this incident caused.