SOLVED

Recurring session host deployment problems

Iron Contributor

Summary: I'm having errors during the session host deployment process caused by a failure of the new VM to connect to the Internet and download a .zip file. A Microsoft support engineer is advising me to use a completely different deployment method. Before I start arguing with them, I want to get a reality check from the community.

 

Details: I want to make available a Windows 10 multi-session desktop that has my company's applications installed. It takes quite a bit of time to install and configure these apps, so I don't want to start from scratch with the latest bare-bones Azure gallery image each time there's an application or OS update. Instead, I want to grab an Azure image one time, customize it, and then keep updating that image using the steps outlined in Windows Virtual Desktop (WVD) – Image Management : How to manage and deploy custom images (including.... That article is a year old and there have been some important changes to the UI since then, but basically the idea is that you customize and update a non-domain-joined golden image, and then when you're ready to put the updated image into production you can add one or more VMs to an existing host pool, and they will join the domain as part of the deployment process. Automation is important here because if you're trying to quickly add multiple VMs in response to new load demands, you don't want to have to do a lot of fiddling with each new VM after it's spun up.

 

I've had some success with this procedure, but twice in the last six weeks I've had failed deployments. The VMs are created, but there is an error during the process:

 

'The DSC Extension failed to execute: Error downloading https://wvdportalstorageblob.blob.core.windows.net/galleryartifacts/Configuration_3-10-2021.zip after 29 attempts: Unable to connect to the remote server.'

 

In a nutshell, the newly-created VM has a problem accessing the Internet, can't get this configuration file, and fails to complete the deployment process correctly. It doesn't join the domain and it isn't accessible to remote desktop clients.

 

I created a ticket with Azure support in April. They never found the root cause and after a couple weeks I successfully deployed a session host using the same procedures as I had when it was failing. So we chalked it up to some transient back-end failure and closed the ticket. Now the problem is back and I have a second ticket open with a new technician. They also seem unable to explain the root cause. They are telling me I should not be adding my new VMs to the host pool using the Host Pool > Session Hosts > Add button, but instead manually adding the VMs to the pool after they're spun up, using the process described at https://docs.microsoft.com/en-us/azure/virtual-desktop/create-host-pools-powershell#register-the-vir....

 

Specifically, they said: "The Manual WVD Hostpool registration process is recommended by Microsoft and is applicable for custom / sysprep image scenario. The VM machine contains .dll files and other OS building system files which on multiple syspreps and snapshots might broke it. This has been seen on many previous cases and as per those case analysis I’m sharing the information with you. Microsoft always recommend to create a new vm with the latest snapshot (latest version VM) because of the azure VM URL whitelisting."

 

Can this be true? If so, it basically invalidates Robin Hobo's recommendations for image management and suggests that deploying updated session hosts is going to be a ton of work. I'm not sure why the tech is referencing "multiple syspreps," because I never sysprep a VM more than once. My sysprepped machines are captured to my Shared Image Gallery and deleted as part of that process. It's true that there are multiple snapshots involved, but the snapshots are always pre-sysprep.

 

I don't think the problem has to do with Azure VM URL Whitelisting, anyway (Azure Virtual Desktop required URL list - Azure | Microsoft Docs). On the VM whose deployment failed, it wasn't just the configuration file that couldn't be reached. I can't access any web sites using Edge. Name resolution is working, and I can ping other VMs within my vnet, but there's clearly something wrong with Internet connectivity in general.

 

Anyway, I would love to know your reaction to the comment that deployment failures like this have "been seen on many previous cases." Should I push back on this with MS support or is this in fact a widely known problem? And if it's widely known, what's the fix? Please tell me it's not starting with a fresh Azure gallery image each time and then manually registering each host to the host pool!

 

Thanks for reading all this.

5 Replies

@David Schrag Hi David, we are working on solutions to make an image easier to customize and apply to existing host pools. Meanwhile, both options you listed are supported and should work without issues. There are many reasons why the VM isn't able to access the URL you provided, including proxy or firewall settings within your Azure subscription or Windows image. I won't try to troubleshoot this issue over a forum but please try to exclude as many on your end that could impact network for Windows to download the required package.

 

Microsoft support should be able to help you troubleshoot, if you can share your CSS ticket through a private message I can look if there's anything I can help with. 

 

Thanks for the reply. I am supposed to have another troubleshooting call with support in two days. I will PM you the ticket number.
Update for anyone following along ...
In my environment, I have an Azure load balancer configured with a backend pool. (I configured this so that all my session hosts have the same outbound public IP address.) I have noticed that the VMs getting deployed through the session host deployment wizard have no Internet connectivity whatsoever, unless/until I add them to the backend pool. My ticket at Microsoft is being routed to the Azure networking team so they can help figure out what it is about my network configuration that's blocking Internet access for VMs that are outside the load balancer. (There's no way to add a session host to a load balancer as part of the deployment wizard.)
Stay tuned ....
Another update: The problem seems to stem from my use of availability sets in combination with a load balancer. I think I have this right -- if you add a VM to an existing availability set, and a VM already in that availability set is using a load balancer, then the VM you add must also use the load balancer in order to get Internet connectivity. But as noted before, there's no provision in the session host deployment wizard to specify a load balancer. So when you try to add a new session host and you tell it to join a load-balancer-using availability set but not the load balancer, that new VM has no Internet access, can't download the configuration .zip file, and fails to deploy properly. The Azure networking team is going back to the virtual desktop team to see whether/how you can deploy a session host where both an availability set and a load balancer are in play.
best response confirmed by David Schrag (Iron Contributor)
Solution
This is now resolved, with the help of MS support. I'll just copy and paste the case summary here, since it captures the issue pretty well.
Symptom: Need help in fixing session host deployment failure when adding a new session host in host pool.

Resolution: We helped you in fixing the session host deployment failure by asking you to use a NAT Gateway to provide a fixed public IP address for your session hosts rather than a load balancer.

This allows you to replace your Load Balancer and on a per subnet level, allows all outbound connections to use specified static public IP address(es).

Reference Articles :

Virtual Network NAT: https://docs.microsoft.com/en-us/azure/virtual-network/nat-overview
NAT Resource: https://docs.microsoft.com/en-us/azure/virtual-network/nat-gateway-resource#resource
Implementation: Tutorial: https://docs.microsoft.com/en-us/azure/virtual-network/tutorial-create-nat-gateway-portal
1 best response

Accepted Solutions
best response confirmed by David Schrag (Iron Contributor)
Solution
This is now resolved, with the help of MS support. I'll just copy and paste the case summary here, since it captures the issue pretty well.
Symptom: Need help in fixing session host deployment failure when adding a new session host in host pool.

Resolution: We helped you in fixing the session host deployment failure by asking you to use a NAT Gateway to provide a fixed public IP address for your session hosts rather than a load balancer.

This allows you to replace your Load Balancer and on a per subnet level, allows all outbound connections to use specified static public IP address(es).

Reference Articles :

Virtual Network NAT: https://docs.microsoft.com/en-us/azure/virtual-network/nat-overview
NAT Resource: https://docs.microsoft.com/en-us/azure/virtual-network/nat-gateway-resource#resource
Implementation: Tutorial: https://docs.microsoft.com/en-us/azure/virtual-network/tutorial-create-nat-gateway-portal

View solution in original post