Home
%3CLINGO-SUB%20id%3D%22lingo-sub-780800%22%20slang%3D%22en-US%22%3EExperiencing%20Metric%20Alert%20failures%20in%20Azure%20Monitor%20-%2007%2F31%20-%20Resolved%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-780800%22%20slang%3D%22en-US%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EFinal%20Update%3C%2FU%3E%3A%20Wednesday%2C%2031%20July%202019%2005%3A01%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe've%20confirmed%20that%20all%20systems%20are%20back%20to%20normal%20with%20no%20customer%20impact%20as%20of%2007%2F31%2C%2003%3A15%20UTC.%20Our%20logs%20show%20the%20incident%20started%20on%2007%2F31%2C%2001%3A00%20UTC%20and%20that%20during%20the%202%20hours%20%26amp%3B%2015%20minutes%20that%20it%20took%20to%20resolve%20the%20issue%20some%20customers%20in%20West%20Europe%20may%20have%20not%20received%20some%20metric%20alert%20notifications.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3ERoot%20Cause%3C%2FU%3E%3A%20The%20failure%20was%20due%20to%20one%20of%20back-end%20services%20became%20unhealthy%2C%20impacting%20query%20monitor%20performance%2C%20causing%20some%20of%20the%20metrics%20to%20be%20delayed.%20Correct%20metrics%20were%20not%20flowing%20for%20evaluation%2C%20preventing%20some%20alerts%20from%20being%20generated.%3C%2FLI%3E%3CLI%3E%3CU%3EIncident%20Timeline%3C%2FU%3E%3A%202%20Hours%20%26amp%3B%2015%20minutes%20-%2007%2F31%2C%2001%3A00%20UTC%20through%2007%2F31%2C%2003%3A15%20UTC%3C%2FLI%3E%3C%2FUL%3EWe%20understand%20that%20customers%20rely%20on%20Azure%20monitor%20as%20a%20critical%20service%20and%20apologize%20for%20any%20impact%20this%20incident%20caused.%3CBR%20%2F%3E%3CBR%20%2F%3E-Naresh%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-780800%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EApplication%20Insights%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Final Update: Wednesday, 31 July 2019 05:01 UTC

We've confirmed that all systems are back to normal with no customer impact as of 07/31, 03:15 UTC. Our logs show the incident started on 07/31, 01:00 UTC and that during the 2 hours & 15 minutes that it took to resolve the issue some customers in West Europe may have not received some metric alert notifications.
  • Root Cause: The failure was due to one of back-end services became unhealthy, impacting query monitor performance, causing some of the metrics to be delayed. Correct metrics were not flowing for evaluation, preventing some alerts from being generated.
  • Incident Timeline: 2 Hours & 15 minutes - 07/31, 01:00 UTC through 07/31, 03:15 UTC
We understand that customers rely on Azure monitor as a critical service and apologize for any impact this incident caused.

-Naresh