We've confirmed that all systems are back to normal with no customer impact as of 4/17, 20:40 UTC. Our logs show the incident started on 4/17, 19:40 UTC and that during the 1 hours that it took to resolve the issue, approximately 4% of queries during this period would have failed, primarily in South UK.
Root Cause: The failure was due to a back end service entering an unhealthy state; impact was mitigated by pulling it out of rotation.
Lessons Learned: We are investigating why this cluster did not automatically fail over to a healthy instance of the service.
Incident Timeline: 1 Hours & 0 minutes - 4/17, 19:40 UTC through 4/17, 20:40 UTC
We understand that customers rely on Azure Monitor as a critical service and apologize for any impact this incident caused.
Initial Update: Wednesday, 17 April 2019 20:25 UTC
We are aware of queries within Log Analytics, Application Insights, and Log Search alerts failing, primarily from the South UK region. This would surface as part load failures in the portal, latent log search alerts, log search alerts that fail to fire, or query failures when hitting the API directly.
Work Around: none
Next Update: Before 04/17 21:30 UTC
We are working hard to resolve this issue and apologize for any inconvenience. -Matthew Cosner