%3CLINGO-SUB%20id%3D%22lingo-sub-1983329%22%20slang%3D%22en-US%22%3EExperiencing%20Alerting%20failure%20for%20Metric%20Alerts%20-%2012%2F13%20-%20Resolved%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1983329%22%20slang%3D%22en-US%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EFinal%20Update%3C%2FU%3E%3A%20Sunday%2C%2013%20December%202020%2020%3A02%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe've%20confirmed%20that%20all%20systems%20are%20back%20to%20normal%20with%20no%20customer%20impact%20as%20of%2012%2F13%2C%2019%3A40%20UTC.%20Our%20logs%20show%20the%20incident%20started%20on%2011%2F20%2C%2002%3A40%20UTC%20and%20that%20during%20the%20duration%20of%20~23%20days%20that%20it%20took%20to%20resolve%20the%20issue%20some%20customers%20experienced%20alerting%20failures.%20For%20new%20alerts%20that%20fired%20post%2012%2F13%2019%3A40%20UTC%2C%20these%20are%20accurately%20reflecting%20the%20state%20of%20the%20resource%20health.%20For%20alerts%20in%20Fired%2FResolved%20status%20between%2011%2F20%2002%3A40%20UTC%20and%2012%2F13%2019%3A40%20UTC%2C%20please%20depend%20on%20notifications%20that%20are%20setup%20through%20Action%20groups%20for%20accurate%20status%20of%20resource%20health.%3CBR%20%2F%3E%3CUL%3E%0A%20%3CLI%3E%3CU%3ERoot%20Cause%3C%2FU%3E%3A%20Root%20cause%20has%20been%20isolated%20to%20data%20from%20older%20version%20of%20code%20which%20was%20impacting%20Metric%20Alerts%20in%20Azure%20Portal.%3C%2FLI%3E%0A%20%3CLI%3E%3CU%3EIncident%20Timeline%3C%2FU%3E%3A%20~23%20days%20-%2011%2F20%2C%2002%3A40%20UTC%20through%2012%2F13%2C%2019%3A40%20UTC%3C%2FLI%3E%0A%3C%2FUL%3EWe%20understand%20that%20customers%20rely%20on%20Metric%20Alerts%20as%20a%20critical%20service%20and%20apologize%20for%20any%20impact%20this%20incident%20caused.%3CBR%20%2F%3E%3CBR%20%2F%3E-Anupama%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EUpdate%3C%2FU%3E%3A%20Sunday%2C%2013%20December%202020%2018%3A35%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3ERoot%20cause%20has%20been%20isolated%20to%20data%20from%20old%20version%20of%20code%20which%20was%20impacting%20Metric%20Alerts%20in%20Azure%20Portal.%20Mitigation%20is%20complete%20on%20most%20of%20the%20instances%20and%20is%20in%20progress%20for%20last%20couple%20of%20instances.%26nbsp%3B%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3EWork%20Around%3C%2FU%3E%3A%20None%3C%2FLI%3E%3CLI%3E%3CU%3ENext%20Update%3C%2FU%3E%3A%20Before%2012%2F13%2021%3A00%20UTC%3C%2FLI%3E%3C%2FUL%3E-Anupama%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EUpdate%3C%2FU%3E%3A%20Sunday%2C%2013%20December%202020%2015%3A06%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3ERoot%20cause%20has%20been%20isolated%20to%20data%20from%20old%20version%20of%20code%20which%20was%20impacting%20Metric%20Alerts%20in%20Azure%20Portal.%20To%20address%20this%20issue%20we%20have%20rolled%20out%20a%20hotfix%20deployment.%20Some%20customers%20may%20continue%20to%20experience%20alerting%20failures.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3EWork%20Around%3C%2FU%3E%3A%20None%3C%2FLI%3E%3CLI%3E%3CU%3ENext%20Update%3C%2FU%3E%3A%20Before%2012%2F13%2018%3A30%20UTC%3C%2FLI%3E%3C%2FUL%3E%3CP%3EWe%20are%20working%20hard%20to%20resolve%20this%20issue%20and%20apologize%20for%20any%20inconvenience.%3C%2FP%3E%3CP%3E-Sandeep%3C%2FP%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EInitial%20Update%3C%2FU%3E%3A%20Sunday%2C%2013%20December%202020%2011%3A18%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe%20are%20aware%20of%20issues%20within%20Metric%20Alerts%20and%20are%20actively%20investigating.%20Some%20customers%20may%20see%20alerts%20still%20active%20in%20the%20Azure%20Portal%20even%20though%20they%20receive%20resolved%20notification.%20Our%20logs%20show%20the%20incident%20started%20on%2012%2F07%2C%20~15%3A00%20UTC.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3EWork%20Around%3C%2FU%3E%3A%20None%3C%2FLI%3E%3CLI%3E%3CU%3ENext%20Update%3C%2FU%3E%3A%20Before%2012%2F13%2015%3A30%20UTC%3C%2FLI%3E%3C%2FUL%3EWe%20are%20working%20hard%20to%20resolve%20this%20issue%20and%20apologize%20for%20any%20inconvenience.%3CBR%20%2F%3E-Sandeep%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-1983329%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EMetric%20Alerts%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Final Update: Sunday, 13 December 2020 20:02 UTC

We've confirmed that all systems are back to normal with no customer impact as of 12/13, 19:40 UTC. Our logs show the incident started on 11/20, 02:40 UTC and that during the duration of ~23 days that it took to resolve the issue some customers experienced alerting failures. For new alerts that fired post 12/13 19:40 UTC, these are accurately reflecting the state of the resource health. For alerts in Fired/Resolved status between 11/20 02:40 UTC and 12/13 19:40 UTC, please depend on notifications that are setup through Action groups for accurate status of resource health.
  • Root Cause: Root cause has been isolated to data from older version of code which was impacting Metric Alerts in Azure Portal.
  • Incident Timeline: ~23 days - 11/20, 02:40 UTC through 12/13, 19:40 UTC
We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.

-Anupama

Update: Sunday, 13 December 2020 18:35 UTC

Root cause has been isolated to data from old version of code which was impacting Metric Alerts in Azure Portal. Mitigation is complete on most of the instances and is in progress for last couple of instances. 
  • Work Around: None
  • Next Update: Before 12/13 21:00 UTC
-Anupama

Update: Sunday, 13 December 2020 15:06 UTC

Root cause has been isolated to data from old version of code which was impacting Metric Alerts in Azure Portal. To address this issue we have rolled out a hotfix deployment. Some customers may continue to experience alerting failures.
  • Work Around: None
  • Next Update: Before 12/13 18:30 UTC

We are working hard to resolve this issue and apologize for any inconvenience.

-Sandeep


Initial Update: Sunday, 13 December 2020 11:18 UTC

We are aware of issues within Metric Alerts and are actively investigating. Some customers may see alerts still active in the Azure Portal even though they receive resolved notification. Our logs show the incident started on 12/07, ~15:00 UTC.
  • Work Around: None
  • Next Update: Before 12/13 15:30 UTC
We are working hard to resolve this issue and apologize for any inconvenience.
-Sandeep