We've confirmed that all
systems are back to normal with no customer impact as of 7/20, 00:24 UTC. Our
logs show the incident started on 7/13, 7:58 UTC and ended 7/19 21:21 UTC in EJP
and began 7/17 01:53 UTC and ended 7/20
00:24 in EUS and that during the time it took to
resolve the issue customers experienced both data latency and data loss in Japan
East and East US for Service Map and VM Insights. This could have caused errors
when viewing network dependency maps or caused data latency and loss on
VMComputer, VMProcess, VMBoundPort and VMConnection tables in their Log
Analytics Workspaces. This latency and loss could have also caused missed or
misfired alerts if you are using the tables listed.
- Root Cause: We have identified a configuration mismatch in a recent deployment affecting Service Map’s ability to communicate with it’s underlying storage accounts. The deployment, in part, updated our method of authenticating to the storage accounts on the backend, but a subset of storage accounts were not included in the update.
- Incident Timeline: 7/13, 7:58 - 7/19, 21:21 UTC in EJP and 7/17 01:53 UTC - 7/20 00:24 UTC in EUS
We understand that customers rely on Service Map and VM Insights as a critical service and apologize for any impact this incident caused.
-Ian
Service Map in East Japan and East US has had extended data latency. This is due to a new deployment with updated endpoints that were not reflected in a downstream portion of the service. To address this issue we are updating the endpoints which should bring in all the latent data. This could have caused errors when viewing network dependency map as well as data latency in your VMComputer, VMProcess, VMBoundPort and VMConnection tables in your Log Analytics Workspaces. This latency could also have caused missed or misfired alerts if you are using the tables listed.
- Work Around: none
- Next Update: Before 07/20 05:00 UTC
Service Map in East Japan and East US has had extended data latency. This is due to a new deployment with updated endpoints that were not reflected in a downstream portion of the service. To address this issue we are updating the endpoints which should bring in all the latent data. This could have caused errors when viewing network dependency map as well as data latency in your VMComputer, VMProcess, VMBoundPort and VMConnection tables in your Log Analytics Workspaces. This latency could also have caused missed or misfired alerts if you are using the tables listed.
- Work Around: none
- Next Update: Before 07/20 01:00 UTC
Service Map in East Japan has had extended data latency. This is due to a new deployment with updated endpoints that were not reflected in a downstream portion of the service. To address this issue we are updating the endpoints which should bring in all the latent data. This could have caused errors when viewing network dependency map as well as data latency in your VMComputer, VMProcess, VMBoundPort and VMConnection tables in your Log Analytics Workspaces. This latency could also have caused missed or misfired alerts if you are using the tables listed.
- Work Around: none
- Next Update: Before 07/19 22:30 UTC