%3CLINGO-SUB%20id%3D%22lingo-sub-1220973%22%20slang%3D%22en-US%22%3EExperiencing%20availability%20testing%20failures%20in%20Azure%20Portal%20-%2003%2F10%20-%20Resolved%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-1220973%22%20slang%3D%22en-US%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22%22%3E%3CDIV%20style%3D%22%22%3E%3CSPAN%20style%3D%22text-decoration-line%3A%20underline%3B%22%3EFinal%20Update%3A%20Tuesday%2C%2007%20April%202020%2021%3A53%20UTC%3C%2FSPAN%3E%3C%2FDIV%3E%3CDIV%20style%3D%22%22%3E%3CSPAN%20style%3D%22text-decoration-line%3A%20underline%3B%22%3E%3CBR%20%2F%3E%3C%2FSPAN%3E%3C%2FDIV%3E%3CDIV%20style%3D%22%22%3EWe've%20confirmed%20that%20all%20systems%20are%20back%20to%20normal%20with%20no%20customer%20impact%20as%20of%203%2F12%2C%2017%3A10%20UTC.%20Our%20logs%20show%20the%20incident%20started%20on%203%2F10%2C%2021%3A00%20UTC%20and%20that%20during%20the%201%20day%2C%2020%20hours%2C%2010%20minutes%20that%20it%20took%20to%20resolve%20the%20issue%20we%20have%20revised%20our%20conclusion%20that%20customers%20were%20not%20impacted%2C%20instead%20we%20have%20determined%20that%20a%20small%20percentage%20of%20customers%20experienced%20intermittently%20latent%2C%20duplicated%2C%20or%20missing%20availability%20test%20results.%3C%2FDIV%3E%3CDIV%20style%3D%22%22%3E%3CBR%20%2F%3E%3C%2FDIV%3E%3CUL%3E%3CLI%3ERoot%20Cause%3A%20The%20failure%20was%20due%20to%20human%20error%20during%20a%20manual%20configuration%20of%20the%20service%20to%20facilitate%20service%20scale-out.%3C%2FLI%3E%3CLI%3ELessons%20Learned%3A%20In%20addition%20to%20SDK%20updates%2C%20we%20are%20implementing%20several%20items.%20Ingestion%20code%20changes%20to%20better%20handle%20errors%20in%20this%20situation%2C%20updating%20the%20processes%20for%20this%20procedure%20in%20the%20future%2C%20as%20well%20as%20adding%20new%20logging%20for%20better%20customer%20identification%20and%20faster%20detection.%3C%2FLI%3E%3CLI%3EIncident%20Timeline%3A%201%20Day%2C%2020%20Hours%20%26amp%3B%2010%20Minutes%20-%203%2F10%2C%2021%3A00%20UTC%20through%203%2F12%2C%2017%3A10%20UTC%3C%2FLI%3E%3C%2FUL%3E%3CDIV%20style%3D%22%22%3EWe%20understand%20that%20customers%20rely%20on%20Application%20Insights%20as%20a%20critical%20service%20and%20apologize%20for%20any%20impact%20this%20incident%20caused.%3CBR%20%2F%3E%3C%2FDIV%3E%3CDIV%20style%3D%22%22%3E%3CBR%20%2F%3E%3C%2FDIV%3E%3CDIV%20style%3D%22%22%3E-Jeff%3C%2FDIV%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EFinal%20Update%3C%2FU%3E%3A%20Tuesday%2C%2010%20March%202020%2022%3A48%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe've%20confirmed%20that%20there%20was%20no%20customer%20impact%20from%20this%20incident.%20Thank%20you%20for%20your%20patience.%3CBR%20%2F%3E%3CBR%20%2F%3E-Jeff%3CBR%20%2F%3E%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CDIV%20style%3D%22font-size%3A14px%3B%22%3E%3CU%3EInitial%20Update%3C%2FU%3E%3A%20Tuesday%2C%2010%20March%202020%2021%3A38%20UTC%3CBR%20%2F%3E%3CBR%20%2F%3EWe%20are%20aware%20of%20issues%20within%20Application%20Insights%20availability%20tests%20and%20are%20actively%20investigating.%20Some%20customers%20may%20experience%20availability%20test%20failures.%3CUL%3E%3CLI%3E%3CU%3ENext%20Update%3C%2FU%3E%3A%20Before%2003%2F11%2000%3A00%20UTC%3C%2FLI%3E%3C%2FUL%3EWe%20are%20working%20hard%20to%20resolve%20this%20issue%20and%20apologize%20for%20any%20inconvenience.%3CBR%20%2F%3E-Jeff%3C%2FDIV%3E%3CHR%20style%3D%22border-top-color%3Alightgray%22%20%2F%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FDIV%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-1220973%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EApplication%20Insights%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Final Update: Tuesday, 07 April 2020 21:53 UTC

We've confirmed that all systems are back to normal with no customer impact as of 3/12, 17:10 UTC. Our logs show the incident started on 3/10, 21:00 UTC and that during the 1 day, 20 hours, 10 minutes that it took to resolve the issue we have revised our conclusion that customers were not impacted, instead we have determined that a small percentage of customers experienced intermittently latent, duplicated, or missing availability test results.

  • Root Cause: The failure was due to human error during a manual configuration of the service to facilitate service scale-out.
  • Lessons Learned: In addition to SDK updates, we are implementing several items. Ingestion code changes to better handle errors in this situation, updating the processes for this procedure in the future, as well as adding new logging for better customer identification and faster detection.
  • Incident Timeline: 1 Day, 20 Hours & 10 Minutes - 3/10, 21:00 UTC through 3/12, 17:10 UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.

-Jeff

Final Update: Tuesday, 10 March 2020 22:48 UTC

We've confirmed that there was no customer impact from this incident. Thank you for your patience.

-Jeff

Initial Update: Tuesday, 10 March 2020 21:38 UTC

We are aware of issues within Application Insights availability tests and are actively investigating. Some customers may experience availability test failures.
  • Next Update: Before 03/11 00:00 UTC
We are working hard to resolve this issue and apologize for any inconvenience.
-Jeff