We've confirmed that all systems are back to normal with no customer impact as of 03/25, 18:08 UTC. Our logs show the incident started on 03/25, 13:00 UTC and that during the ~ 5 hours that it took to resolve the issue a very small subset of customers would see failures while sending telemetry data to our ingestion services.
Root Cause: The failure was due to one of our service roles going in to an unhealthy state.
Incident Timeline: 5 Hours & 8 minutes - 03/25, 13:00 UTC through 03/25, 18:08 UTC
We understand that customers rely on Application Insights as a critical service and apologize for any impact this incident caused.
Update: Monday, 25 March 2019 19:14 UTC
We are aware of an issue within Application Insights services
where customers would see failures while sending telemetry data to our
ingestion services. This issue prevents telemetry to be accepted at client end.
Our findings show the root cause is related to some of our service roles going
into an unhealthy state which causes them to stop accepting data. We understand
the root cause and working on applying mitigation per region at a time, but
this might take longer to mitigate the issue in all locations. As a workaround
customer can use below suggested approach.