Anyone experiencing session hosts becoming unavailable at random

Brass Contributor

Since the end of last week we have had three occasions where one of the session hosts randomly becomes unavailable. This happened in two separate AVD environments.

  • Users get kicked out of their session and cannot reconnect.
    • The user sessions are still marked as Active/Disconnected according to the Azure portal.
  • We cannot RDP to the session host through the internal network.

After we shutdown and reboot the session host, everything will work fine again.

 

We noticed the following notable things:

  1. There are no event logs generated at all, starting 30-60 min prior to the 'crash'.
  2. Since the 28th of October Event Viewer is getting spammed by the following warning:
    1. Microsoft.RDInfra.RDAgent.Service.AgentUpdateStateImpl
      1. Unexpected last recorded state
  3. The "Remote Desktop Services Infrastructure Agent" has been updated on the 25th of October, to version 1.0.5555.1008
  4. The "Remote Desktop Services SxS Network Stack" has been updated on the 31st of October, to version 1.0.2208.17300
    1. This is also the first day that we experienced the problem.

 

I have yet to find anything on this problem. Is anyone else experiencing this with their AVD environments?

92 Replies

@DvFals 

Hi, we have the same issue as well, have you found anything else about this?

Thanks

@Boffen2000 Yes after changing the Auto scale settings, the issue was solved.

@HarmOosthoek 

Hi

Ok what settings did you change or did you just disable it?

@HarmOosthoek 

Appreciate if you can provide more info on this. I have turned off the scaling plan and it's been 3 days now with no issues. But I want to be able to shut down/start servers to save resources.

Thanks

@Boffen2000 Sure! You need to check the workloads:

Session host virtual machine sizing guidelines for Azure Virtual Desktop and Remote Desktop Services...

 

My user's were considered "Heavy" so with this profile you need "8 vCPUs, 32-GB RAM, 32-GB storage". I still kept the machine size on "4 vCPUs, 16-GB RAM, 32-GB storage" but i reduced the number of concurrent users to 2. 

Hi,
Ok, so fewer users per server solved you problem?
I'm runing D8ads_v5 and we have about 10 users per server, so it shouldn't be a problem and when I monitor there is a lot of free resources available. Servers have also had problems with fewer users logged in. Feels lika there is a bug in the scaling plan if that could cause the problem.
Yes for me it did solve the issue. Should be an issue with D8ads_v5 indeed.

Hi, the cost will be too high with so few users per server.
Microsoft documentation recommend 2 (heavy) users per vCPU, which means that I can have 16 users on a D8ads_v5.

I have had the scaling plan disabled now for over a week without any issues. However I can't understand why it creates these problems.

@steveturnbull1975 

Hi, did you find a solution to this? We have the same problem and have now tested both intel/amd VM with and without autoscaling, but the problem remains.
Very grateful for more info. Have a case with Microsoft but getting nowhere.

@Boffen2000 

Hi - we've been having a similar issue for months and had a case raised with Microsoft.

On their recommendation, we updated FSLogix to version 2.9.8716.30241 (then in preview at the time) and haven't had the issue occur for a few weeks now.

Was a bit doubtful when they suggested it as we've updated FSLogix multiple times, but so far so good with this version.

 

What version of FSLogix are you running on your hosts?

Hi and thanks for the tip. But we are already running that version. Upgraded a month ago to test.
Where do you have your fslogix profiles?
We use Azure Files, and our AVD servers are Azure AD joined.
We are facing the same issues in multiple scenarios. Even on a brand new setup with 2x D8s_v5...
Did you get any further with this? We're also running D8s_v5, it's really intermittent though - it'll run fine for a month then all of a sudden we'll get hosts unavailable over the course of a week.