Forum Discussion
Joedurnal
Sep 11, 2024Copper Contributor
Session hosts across multiped host pools suddenly unavailable
Our AVD environment was running fine this morning and suddenly we noticed that all of our session hosts have become unavailable and are getting the following error message (in addition to slow portal...
MattNowicki
Sep 17, 2024Copper Contributor
OrionWithrow Is this issue still ongoing? Yesterday was a mess, and today isn't looking much better. We're still seeing loads of intermittent connection issues to the 15 session hosts in our pool. Did the failover occur? Hard to know because this issue never showed up on https://azure.status.microsoft.com.
OrionWithrow
Sep 17, 2024Brass Contributor
Lat update I have:
TRACKING ID:
1LG8-1X0 TYPE:
Incident
STATUS:
Resolved
COMMUNICATION:
What happened?
Between 18:46 UTC and 20:36 UTC on 16 September 2024, customers using Azure Virtual Desktop, whose end user connections were dependent on a subset of infrastructure in the East US region, had users who experienced issues creating new connections. In addition, Azure Virtual Desktop administrators were unable to perform management operations via the Azure Portal.
What do we know so far?
We have determined that a customer input metadata database, supporting US geography customers and hosted in the East US region, experienced issues which impacted its availability. This issue was mitigated by performing a database failover operation to the Central US Region. This operation required the connection broker services to be restarted, which may have resulted in some existing user connections being impacted.
How did we respond?
• 18:46 UTC – Customer impact began. Service monitoring detected delayed database processing in East US.
• 18:59 UTC - Engineering initiated database failure procedures from East US to Central US. The procedure triggered a controlled restart of associated broker services used to manage connections to Azure Virtual Desktop session hosts.
• 20:00 UTC – Initial communications to Azure Virtual Desktop customers were sent via Service Health in the Azure Portal.
• 20:00 UTC – The database failover procedure was completed.
• 20:36 UTC – Monitoring of telemetry, logs and alerts confirmed that all service connectivity, application discovery, and management issues had been successfully resolved.
TRACKING ID:
1LG8-1X0 TYPE:
Incident
STATUS:
Resolved
COMMUNICATION:
What happened?
Between 18:46 UTC and 20:36 UTC on 16 September 2024, customers using Azure Virtual Desktop, whose end user connections were dependent on a subset of infrastructure in the East US region, had users who experienced issues creating new connections. In addition, Azure Virtual Desktop administrators were unable to perform management operations via the Azure Portal.
What do we know so far?
We have determined that a customer input metadata database, supporting US geography customers and hosted in the East US region, experienced issues which impacted its availability. This issue was mitigated by performing a database failover operation to the Central US Region. This operation required the connection broker services to be restarted, which may have resulted in some existing user connections being impacted.
How did we respond?
• 18:46 UTC – Customer impact began. Service monitoring detected delayed database processing in East US.
• 18:59 UTC - Engineering initiated database failure procedures from East US to Central US. The procedure triggered a controlled restart of associated broker services used to manage connections to Azure Virtual Desktop session hosts.
• 20:00 UTC – Initial communications to Azure Virtual Desktop customers were sent via Service Health in the Azure Portal.
• 20:00 UTC – The database failover procedure was completed.
• 20:36 UTC – Monitoring of telemetry, logs and alerts confirmed that all service connectivity, application discovery, and management issues had been successfully resolved.
- MattNowickiSep 17, 2024Copper ContributorOur environment is still jacked! What can/should we do? We have intermittent failures all over the place, and hundreds of errors in logs.
- OrionWithrowSep 17, 2024Brass Contributor"Jacked" in what way? What kind of errors are you seeing?