Final Update: Thursday, 28 May 2020 22:08 UTC
We've confirmed that all systems are back to normal with no customer impact as of 5/28, 21:56 UTC. Our logs show the incident started on 5/28, 19:24 UTC and that during the 2 1/2 hours that it took to resolve the issue less than 1% of customers experienced delayed or missing alerts.
-Jack
We've confirmed that all systems are back to normal with no customer impact as of 5/28, 21:56 UTC. Our logs show the incident started on 5/28, 19:24 UTC and that during the 2 1/2 hours that it took to resolve the issue less than 1% of customers experienced delayed or missing alerts.
- Root Cause: The failure was due to an outage in a back end data ingestion system and complicated by a watchdog system that was reporting errors incorrectly.
- Incident Timeline: 2 Hours & 32 minutes - 5/28, 19:24 UTC through 5/28, 21:56 UTC
-Jack
Update: Thursday, 28 May 2020 21:53 UTC
Root cause has been isolated to a failure in a back end system due to a watchdog falsely reporting a problem and blocking legitimate traffic. To address this issue we are repairing the faulty watchdog. Some customers may continue to experience delayed and missing metric alerts and we estimate an hour before all delayed and missing alerts have ceased.
Root cause has been isolated to a failure in a back end system due to a watchdog falsely reporting a problem and blocking legitimate traffic. To address this issue we are repairing the faulty watchdog. Some customers may continue to experience delayed and missing metric alerts and we estimate an hour before all delayed and missing alerts have ceased.
- Work Around: none
- Next Update: Before 05/28 23:00 UTC
Initial Update: Thursday, 28 May 2020 20:50 UTC
We are aware of issues within Metric Alerts and are actively investigating. Some customers may experience alerts not firing or firing delayed. This problem began at 19:24 UTC on 5/28.
-Jack
We are aware of issues within Metric Alerts and are actively investigating. Some customers may experience alerts not firing or firing delayed. This problem began at 19:24 UTC on 5/28.
- Work Around: none at this time
- Next Update: Before 05/28 22:00 UTC
-Jack
Updated May 28, 2020
Version 3.0Azure-Monitor-Team
Former Employee
Joined February 13, 2019
Azure Monitor Status Archive
Follow this blog board to get notified when there's new activity