Experiencing Alerting failure for Log Analytics Metric Alerts in East US - 11/18 - Resolved

Azure-Monitor-Team · ‎Nov 18 2019

Final Update: Tuesday, 19 November 2019 08:06 UTC

We've confirmed that all systems are back to normal with no customer impact as of 11/19, 01:13 UTC. Our logs show the incident started on 11/19, 17:00 UTC and that during the 8 hours and 13 minutes that it took to resolve the issue customers may have experienced higher than expected latency or failures regarding metric alerts during the impact window in East US region.

Root Cause: The failure was due to capacity issue with one of our dependent service which created backlog.
Incident Timeline: 8 Hours & 13 minutes - 11/19, 17:00 UTC through 11/19, 01:13 UTC

We understand that customers rely on Metric Alerts as a critical service and apologize for any impact this incident caused.

-Rama

Update: Tuesday, 19 November 2019 02:09 UTC

Root cause has been isolated to throttling of the event hub which was causing delayed ingestion. To address this issue the event hub scale limits were increased. The backlog of requests are still processing. The issue should be mitigated within 5 hours.

Work Around: none
Next Update: Before 11/19 07:30 UTC

-Ian Cairns

Initial Update: Monday, 18 November 2019 22:22 UTC

We are aware of issues within Log Analytics Metric Alerts and are actively investigating. Some customers may experience Alerting failure in East US.

Work Around:
Next Update: Before 11/19 02:30 UTC

We are working hard to resolve this issue and apologize for any inconvenience.
-Subhash

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Experiencing Alerting failure for Log Analytics Metric Alerts in East US - 11/18 - Resolved