Experiencing Alerting failure issue in Azure Portal for Many Data Types - 03/28 - Resolved

Published Mar 28 2019 10:07 AM 803 Views
Final Update: Thursday, 28 March 2019 17:05 UTC

We've confirmed that all systems are back to normal with no customer impact as of 03/28,16:30  UTC. Our telemetry shows the incident started on 03/28,13:15 PM UTC and that during the 3 hours 15 min that it took to resolve the issue, all customers using classic alerts under Application Insights would not have experienced alerts state change. This would have resulted alerts not firing for unhealthy alerts or healthy/unhealthy alerts would not have resolved.
  • Root Cause: The failure was due to incorrect value in configuration of one of the dependent services in alerting pipeline. We are working internally to investigate final root cause of the issue.
  • Incident Timeline: 3 Hours & 15 minutes - 03/28,13:15 PM UTC  through 03/28,16:30  UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Anupama

%3CLINGO-SUB%20id%3D%22lingo-sub-390248%22%20slang%3D%22en-US%22%3EExperiencing%20Alerting%20failure%20issue%20in%20Azure%20Portal%20for%20Many%20Data%20Types%20-%2003%2F28%20-%20Resolved%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-390248%22%20slang%3D%22en-US%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EFinal%20Update%3C%2FU%3E%3A%20Thursday%2C%2028%20March%202019%2017%3A05%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe've%20confirmed%20that%20all%20systems%20are%20back%20to%20normal%20with%20no%20customer%20impact%20as%20of%2003%2F28%2C16%3A30%26nbsp%3B%20UTC.%20Our%20telemetry%20shows%20the%20incident%20started%20on%2003%2F28%2C13%3A15%20PM%20UTC%20and%20that%20during%20the%203%20hours%2015%20min%20that%20it%20took%20to%20resolve%20the%20issue%2C%20all%20customers%20using%20classic%20alerts%20under%20Application%20Insights%20would%20not%20have%20experienced%20alerts%20state%20change.%20This%20would%20have%20resulted%20alerts%20not%20firing%20for%20unhealthy%20alerts%20or%20healthy%2Funhealthy%20alerts%20would%20not%20have%20resolved.%3CBR%20%2F%3E%3CUL%3E%3CLI%3E%3CU%3ERoot%20Cause%3C%2FU%3E%3A%20The%20failure%20was%20due%20to%20incorrect%20value%20in%20configuration%20of%20one%20of%20the%20dependent%20services%20in%20alerting%20pipeline.%20We%20are%20working%20internally%20to%20investigate%20final%20root%20cause%20of%20the%20issue.%3C%2FLI%3E%3CLI%3E%3CU%3EIncident%20Timeline%3C%2FU%3E%3A%203%20Hours%20%26amp%3B%2015%20minutes%20-%2003%2F28%2C13%3A15%20PM%20UTC%26nbsp%3B%20through%2003%2F28%2C16%3A30%26nbsp%3B%20UTC%3C%2FLI%3E%3C%2FUL%3EWe%20understand%20that%20customers%20rely%20on%20Application%20Insights%20as%20a%20critical%20service%20and%20apologize%20for%20any%20impact%20this%20incident%20caused.%3CBR%20%2F%3E%3CBR%20%2F%3E-Anupama%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-390248%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EApplication%20Insights%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Version history
Last update:
‎Mar 28 2019 10:07 AM
Updated by: