SOLVED

Consistent Thin Client Disconnection from WVD Pool

Copper Contributor

Hello we have been experiencing some random but consistent disconnects from our WVD Pool. We have roughly 10 users and have been getting different event viewer logs for when they disconnect. We have Thin Clients on Windows 10 version 1607. When the users disconnect it will happen multiple times per day, however some days they do not disconnect. Attached are the event viewer logs

148 Replies
Hi all,

We are one week further. Did anyone experience more disconnect problems last week?

Anyone migrated to WVD spring update in the meanwhile maybe, and is this better?

Greets, Marco

@Marco Brouwer 

 

User still reports disconnections, today.

We have 150 users and it is just getting worse on the disconnects. Did anyone find a solution or has Microsoft responded? It is very random and seems to be across all ISP.

@mikejones1600 : this forum is not useful to discuss those kind of issues. Recommend the following (https://aka.ms/wvvdgetstarted:

- Review troubleshooting guides and match diagnostics errors you see for the disconnects.

- Review networking configurations

- Review Azure Service health for health alerts.

After exploring all options please raise a ticket with support as it might be an issue that needs deeper troubleshooting.

Hello @Eva Seydl,

 

Thank you for your response. Nice that someone from Microsoft is finally responding to this topic.

 

Contrary to what you indicate, this forum is very useful to discuss those kind of issues. This is the only place where partners can share experiences with each other.

 

Several partners have already created a ticket at Microsoft, but no one at Microsoft takes the initiative to bundle it. Perhaps you can do something about this?

 

Microsoft's support is very very very very poor/slow. Together with Microsoft Azure/WVD specialists, we have carried out various tests and make a lot of logfiles what already cost us 130 hours of support for Microsoft the past three months. The cause is also known, it is in your WVD brokers. Only each team within Microsoft points to each other.

 

Our ticket (120022923000282) have been escalated by your Dutch colleagues Caroline Wouters and Bianca Ritchi. Maybe you can contact them. But also Christiaan Brinkhoff is aware of the disconnect problems.

 

You don't have to ask the partners for additional information. The cause is known. There is a structural problem in your brokers.

 

There was a good way for partners to monitor this problem. But Microsoft turned it off unexpectedly: WVD Diagnostics. This gives the appearance that Microsoft does not want the partners to see that there are disconnects??

 

"Our service health monitoring indicates that diagnostics queries are having a very high impact on the database (DB) performance, making it likely to affect the operation of other basic service functions, so we need to take steps to mitigate DB impact. The problem is caused by the increasingly large volume of activities executing in the service and frequent queries that scan through relatively large volumes of data. Due to unexpectedly high demand of the service, we have blocked diagnostics queries. We are evaluating our options to enable them in the near future."

 

I look forward to your response.

 

Best regards,
Gerard Mulder

Altios Cloud Experts

The Netherlands

0031624650071

 

 

@Gerard_Mulder 

 

Yes. this thread is useful because I know other WVD users are experiencing disconnections too. 

 

Just want to log another data point today.  6/18/2020  9am - 11am pst, east us datacenter.  We have WVD users experiencing disconnects, black screens.  appears the problem is gone for now.  creating a ticket is useless. by the time they are able to troubleshoot, the problem is gone.  I guess we just have to accept that current WVD's availability isn't very good.  if you need better availability, go find another solution.

In March they had already found the possible cause, but unfortunately it is still not resolved.

 

-----------------------------------------------------

Van: Alexandru Daniel Stan <Alexandru.Stan@microsoft.com>
Verzonden: vrijdag 13 maart 2020 17:08
Aan: Marco Brouwer | Altios <marco@altios.nl>; imcloudservicedesk@cloud.im
CC: Gerard Mulder | Altios <g.mulder@altios.nl>
Onderwerp: RE: Problem with Microsoft Windows Virtual Desktop connections - intermittent disconnects

 

Hi team,

 

Ahmed if off today but PG said there were transient errors from Azure REDIS cache components, which caused users to be disconnected.

However they are working to improve robustness on external dependencies failures; so let’s monitor closely and see the status, if anything comes up let us know.

 

 

Thank you,

Kind regards,

 

Alexandru-Daniel Stan

Support Engineer
Windows UEX Premier
Customer Service & Support

Office: +40 (31) 1331448

Alexandru.Stan@microsoft.com

 

Gerard_Mulder_0-1592929135069.png

 

If you have any feedback about my work, please let either me or my manager Razvan Mitescu at  razvanm@microsoft.com

@Gerard_Mulder 

 

Just to make it clear. There are many, many reasons why you can experience disconnects, freezes and lags. This original thread was created because we clearly identified the disconnects, freezes and latency was ONLY located in the WVD Gateway component of WVD. ie: if you RDPed directly to the same VM you didn't have the issues. 

 

Since then, our customers are complaining far less on these issues on to the point where we can sign them off to local internet connection issues and sign them off to the local ITers responsible for that. Is everyone who's reviving this thread absolutely sure that the issue is located in the WVD Gateway component? Or do you simply need to look into different possible causes?

@Marco Brouwer we have the same problem at a client environment. It is really unworkable and we have a support case running for a few weeks still not resolved, sending in network traces almost every day but no real solution yet.

 

In my LAB I was able to reproduce it when we use the Azure NAT Gateway on the WVD subnet, removing the NAT gateway fixed the issue in my LAB, but for the client it did not fix the issue.

 

This client environment is running on WVD Classic (Fall release), next week I am going to migrate it to WVD Spring Release in the hope it will work more stable.

 

It is not really a WVD issue in general, we have clients that have no issues at all.

 

The issue is also Disconnect and auto-reconnect, it takes about 10-30 seconds each reconnect. Customer is using USB based smartcards for some authentication in an application. In every reconnect the application crashes because the smartcards is unreachable for all that time.

 

Region: West Europe.

 

Do you use an Azure NAT Gateway on the WVD subnet?

 

 

 

Hi Stefan,

One of customers complained yesterday about several disconnects / reconnects. Reporting users where not on same WVD host, some working from home, some from 4G, some from the office. The only common component in that situation is the WVD gateway service, managed by Azure.

They use WVD classic in combination with a standard load balancer (for the purpose of having a static outgoing public ip), not Azure NAT gateway (load balancer is cheaper).

We have another customer on WVDv2, also with same load balancer configuration, no problems reported.

So I guess that if there is a connection between our issues, is has something to do with NATting the traffic on classic WVD, not with the new NAT GW product in particular.

To be honest, I still believe in the (unproved) theory that Azure WVD has 1001 gateway / broker servers, and it just depends on the fact if your Azure tenant gets assigned to a buggy one or something like that. Or that some WVD gw has some problems for an x period of time, or gets overloaded and customers are fe assigned to a different gateway cluster by hand of some non instant process.

In the history of this post, multiple people complain about "waves of disconnects" in the same period of time in the same Azure region, but not all of them at once. That would strengthen this theory.

But no way Microsoft would ever admit to something like that :). For us, I find the flow of traffic / WVD sessions within the by Azure managed part of WVD (brokering / gateway) a big unknown foggy cloud. You don't get enough information from the configurable logging to pinpoint / prove that something goes wrong in there and on what server in their big cluster.

When you contact MS support, your case goes from team to team, and they always point the finger to anything but themselves, although I provide "proof" that the problem is not on our side. If it's not on my side, on whose side is it then?

Also, they disabled reading WVD logging with powershell permanently around April due to "performance problems", that was actually confirmed by MS to me. Sure, you can now read the same logs with log analytics in Azure, but the reason provided for disabling these Powershell command remains "performance problems". Hmmm...

Please post your MS case number here, so Microsoft can bundle multiple cases into one big issue they can't ignore.

Greets

@Marco Brouwer thank you for your reply.

 

In my LAB WVDv2 also gave the issue with an Azure NAT Gateway. I have a small scale lab, with a D2s_v3, if I put the Azure NAT Gateway in the network, open de WVD desktop, start youtube with some nice long fireplace video, it doesn't take more than 5-10 minutes before the first reconnect occurs. I even filmed it with OBS Studio and already wrote a concept blog for my website, something is really off in WVD with the NAT Gateway I believe.

 

Before NAT Gateway existed I sometimes create an Ubuntu based NAT machine running on B1s (https://www.microcloud.nl/azure-nat-with-ubuntu-linux/)

 

I also tried that in my LAB and it seems that the reconnects did not occur. We also have one client running in WVD_v2 with an Azure Express Route, and they routed the internet back over the express route via their local ISP, they have no issues with reconnections.

 

Our client with the issues currently runs on WVD Classic, without any NAT/Load balancers or whatever, just the default internet breakout and the issues still occur.

 

I am currently still on holiday leave, next week I will discuss if I can publish the ticket number. We also are going to start creating new host pools in WVDv2 asap.

 

@ everyone.

 

I want to let everybody know, we have resolved the issue. That is, removing the Azure NAT Gateway from our WVD servers removed the reconnection problems.

 

At first we did not think it was resolved because after removing Azure NAT Gateway one of the two customer sites using WVD still had the same issues. Unfortunately due to holidays it took some time to learn that one site was resolved and the other wasn't.

 

After the holidays we confirmed that the issue was also resolved if we connected to WVD from home locations.

 

On the other site where the reconnect issue did not resolve we finally discovered that our Fortigate 51E firewall is causing the same symptoms as the Azure NAT gateway, thus confusing us in the real root case of the issue.

In the Fortigate firewall we disable all HTTPS inspection/proxy etc. But that did not resolve the problem.

The support ticket at Fortinet is still open to resolve the issue on that site. We have bypassed the Fortigate for the WVD clients with a Draytak firewall/NAT device, all those clients have no issues anymore with WVD reconnection problems.

 

About the Azure NAT Gateway causing WVD reconnection problems;

Microsoft confirmed in the support ticket:

"It appears this is a bug with NAT gateways and WVD. It is currently being worked on for a resolution with no current ETA on a fix."

 

Hope this information helps anyone else.

 

 

@Marco Brouwer We are also facing same issue, even some of the active sessions got terminated all of suddenly. Not able to access further. Same West Europe. 

Since a couple of hours poor performance and high latency reported by several customers.

 

oiab_nl_0-1599141212108.jpeg

 

@VanellySha : Please review review diagnostics for the failed connection and match with troubleshooting guidance in our documentation: https://docs.microsoft.com/en-us/azure/virtual-desktop/troubleshoot-set-up-overview

 

If you can't resolve the issue recommend to raise a support ticket.

Yesterday, 9/3/2020 was pretty bad.  We have a small shop client that we setup with WVD Spring Release back in May, and they have had instances like this where they all get disconnected at the same time.  

 

According to Log Analytics, these are the reasons for failures

 

CodeSymbolicUserCount
NL_ERR_TDSKTCONNECTFAILED24
ConnectionFailedClientDisconnect24
UnexpectedNetworkDisconnect17
ConnectionFailedServerDisconnect6
CM_ERR_MISSED_HEARTBEAT6
SavedCredentialsNotAllowed4
NL_ERR_TDTIMEOUT4
SSL_ERR_FRESH_CRED_REQUIRED_BY_SERVER4
ConnectionFailedUserHasValidSessionButRdshIsUnhealthy2
ConnectionFailedRDAgentBrokerConnectionNotFound1

 

These disconnects happened pretty much all throughout the day, makes it extremely frustrating and hard for them to do their work, and frustrating for us trying to provide a root cause analysis.  Overall WVD is great, as long as it's working properly and the users are not getting disconnected.  Hoping that someone can weigh in on some insight as to what might have been happening yesterday.  I can provide more details if someone from Microsoft WVD support would like to review the correlation event IDs from the logs gathered from Log Workspace.

 

@stefanpeters2020 I'm glad to hear that you figured out the issue, thanks for sharing! We've been seeing the same thing and it also started once we added the NAT Gateway to our WVD servers. I would really appreciate it if you could post an update when you hear more info from Microsoft, especially if they eventually provide an ETA for the fix.

@stefanpeters2020 Stefan, any news from Fortinet about the issue?

Adding my data point.  We tested WVD for weeks.  Users had random disconnection from work, home, wifi or cabled.  Had to drop the project because the unacceptable user experience.  Yes, the wvd pool used NAT gateway too.

@leoatap 

 

Did you test without NAT gateway? Did this solve the issue?

 

We are not using NAT gateway, but standard load balancer with outgoing NAT rule.

Not tested disabling this myself because our customer needs a static outgoing IP.

 

Anyone else using load balancer instead of NAT gateway and experiencing the same issues?