Experiencing Alerting failure for Metric Alerts - 04/11 - Resolved

Published 04-13-2021 09:51 AM 497 Views
Final Update: Sunday, 11 April 2021 11:24 UTC

We've confirmed that all systems are back to normal with no customer impact as of 04/08, 13:45 UTC. Our logs show the incident started on 03/31, 15:45 UTC and that during the 7 days and 22 hours that it took to resolve the issue some customers may have experienced misfired alerts when using Azure Metric Alert Rules on Log Analytics resources in West Europe region.
  • Root Cause: We determined that a backend service responsible for processing alerts became unhealthy due to a configuration issue.
  • Incident Timeline: 7 Days & 22 Hours - 03/31, 15:45 UTC through 04/08, 13:45 UTC
We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.

-Madhav

%3CLINGO-SUB%20id%3D%22lingo-sub-2266414%22%20slang%3D%22en-US%22%3EExperiencing%20Alerting%20failure%20for%20Metric%20Alerts%20-%2004%2F11%20-%20Resolved%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2266414%22%20slang%3D%22en-US%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22%22%3E%3CU%20style%3D%22font-size%3A%2014px%3B%22%3EFinal%20Update%3C%2FU%3E%3A%20Sunday%2C%2011%20April%202021%2011%3A24%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe've%20confirmed%20that%20all%20systems%20are%20back%20to%20normal%20with%20no%20customer%20impact%20as%20of%2004%2F08%2C%2013%3A45%20UTC.%20Our%20logs%20show%20the%20incident%20started%20on%2003%2F31%2C%2015%3A45%20UTC%20and%20that%20during%20the%207%20days%20and%2022%20hours%20that%20it%20took%20to%20resolve%20the%20issue%20some%20customers%20may%20have%20experienced%26nbsp%3B%3CSPAN%20style%3D%22font-size%3A%2014.6667px%3B%20font-family%3A%20%22%20segoe%3D%22%22%20ui%3D%22%22%3Emisfired%20alerts%20when%20using%20Azure%20Metric%20Alert%20Rules%20on%20Log%20Analytics%20resources%20in%20West%20Europe%20region.%3C%2FSPAN%3E%3CBR%20%2F%3E%3CUL%20style%3D%22%22%3E%0A%20%3CLI%20style%3D%22%22%3E%3CU%20style%3D%22font-size%3A%2014px%3B%22%3ERoot%20Cause%3C%2FU%3E%3A%20We%20determined%20that%20a%20backend%20service%20responsible%20for%20processing%20alerts%20became%20unhealthy%20due%20to%20a%20configuration%20issue.%3C%2FLI%3E%3CLI%20style%3D%22%22%3E%3CU%3EIncident%20Timeline%3C%2FU%3E%3A%207%20Days%20%26amp%3B%2022%20Hours%20-%2003%2F31%2C%2015%3A45%20UTC%20through%2004%2F08%2C%2013%3A45%20UTC%3C%2FLI%3E%0A%3C%2FUL%3EWe%20understand%20that%20customers%20rely%20on%20Metric%20Alerts%20as%20a%20critical%20service%20and%20apologize%20for%20any%20impact%20this%20incident%20caused.%3CBR%20%2F%3E%3CBR%20%2F%3E-Madhav%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-2266414%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EMetric%20Alerts%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Version history
Last update:
‎Apr 11 2021 04:39 AM
Updated by: