Home
%3CLINGO-SUB%20id%3D%22lingo-sub-798196%22%20slang%3D%22en-US%22%3EExperiencing%20Alerting%20failure%20for%20Log%20Search%20Alerts%20-%2008%2F09%20-%20Resolved%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-798196%22%20slang%3D%22en-US%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EFinal%20Update%3C%2FU%3E%3A%20Friday%2C%2009%20August%202019%2023%3A07%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe've%20confirmed%20that%20all%20systems%20are%20back%20to%20normal%20with%20no%20customer%20impact%20as%20of%208%2F9%2C%2022%3A54%20UTC.%20Our%20logs%20show%20the%20incident%20started%20on%208%2F9%2C%2018%3A59%20UTC%20and%20that%20during%20the%203%20hours%20that%20it%20took%20to%20resolve%20the%20issue%201062%20subscriptions%20(~16%25)%20experienced%20delayed%20alerts%20that%20fired%20after%20the%20issue%20was%20mitigated.%3CBR%20%2F%3E%3CUL%3E%0A%20%3CLI%3E%3CU%3ERoot%20Cause%3C%2FU%3E%3A%20The%20failure%20was%20due%20to%20a%20burst%20of%20service%20health%20activity%20log%20events%20which%20increased%20the%20evaluation%20latency%20of%20the%20events.%20Scaling%20out%20the%20service%20mitigated%20the%20issue.%3C%2FLI%3E%0A%20%3CLI%3E%3CU%3EIncident%20Timeline%3C%2FU%3E%3A%203%20Hours%20%26amp%3B%2055%20minutes%20-%208%2F9%2C%2018%3A59%20UTC%20through%208%2F9%2C%2022%3A54%20UTC%3C%2FLI%3E%0A%3C%2FUL%3EWe%20understand%20that%20customers%20rely%20on%20Log%20Search%20Alerts%20as%20a%20critical%20service%20and%20apologize%20for%20any%20impact%20this%20incident%20caused.%3CBR%20%2F%3E%3CBR%20%2F%3E-Ian%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-798196%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3ELog%20Search%20Alerts%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Final Update: Friday, 09 August 2019 23:07 UTC

We've confirmed that all systems are back to normal with no customer impact as of 8/9, 22:54 UTC. Our logs show the incident started on 8/9, 18:59 UTC and that during the 3 hours that it took to resolve the issue 1062 subscriptions (~16%) experienced delayed alerts that fired after the issue was mitigated.
  • Root Cause: The failure was due to a burst of service health activity log events which increased the evaluation latency of the events. Scaling out the service mitigated the issue.
  • Incident Timeline: 3 Hours & 55 minutes - 8/9, 18:59 UTC through 8/9, 22:54 UTC
We understand that customers rely on Log Search Alerts as a critical service and apologize for any impact this incident caused.

-Ian