SOLVED

Consistent Thin Client Disconnection from WVD Pool

Copper Contributor

Hello we have been experiencing some random but consistent disconnects from our WVD Pool. We have roughly 10 users and have been getting different event viewer logs for when they disconnect. We have Thin Clients on Windows 10 version 1607. When the users disconnect it will happen multiple times per day, however some days they do not disconnect. Attached are the event viewer logs

148 Replies

@swalra 

 

Hi, yes screen performance seems a bit slow last hours. Like dragging windows over the screen, it shocks. Seems a connection lag, not a CPU problem or something like that. So WVD Gateway again I guess.

 

Also over here no disconnects yet today!

 

A reaction from Microsoft we got today:
"we have some disconnection issues but the Product group didn’t provide any updates regard changes but all we know that they are investigating and should provide a fix."

 

Well, if they want to provide a fix, at least they acknowledge something is broken :).

@swalra 

 

Same here.

 

Altough the problem seems widespread in EU West it isn't the case for all WVD connections. Our own internal WVD works fine and in one client we're even seeing some users with 4 bars of WVD connectivity and decent (not great) performance while others in the same office are experiencing 1 bar and piss-poor performance. FYI, standard RDP connectivity remains normal at those times. It's 100% an issue with (some of?) their WVD Gateway Entrypoints in EU West

@XxArkayxX 

 

Did anyone maybe try to make a second WVD tenant within the same Azure (onmicrosoft.com) tenant? Would that make a difference?

 

I really wonder how MS configures this. It seems some customers are really assigned to certain gateways (clusters).

 

 

@Marco Brouwer 

 

I doubt that would make for a good testcase. We have different behaviour within a single tenant so my guess at the moment would be that the load balancing mechanism isn't based on tenancy.

 

They do need some sort of health monitoring mechanism though. It's bad enough that we have to tell our customers it's an Azure problem and out of our hands but not even having a clear monitoring mechanism other the this forum to determin if it is Azure-side is worrysome.

 

Apparently they are aware of issues in EU West with Recovery Vault. The problem is just more widespread as they know/are acknowledging. Or their attempts at fixing the issue is causing issues elsewhere in the same datacenter.

 

https://app.azure.com/h/9SX1-198/acdc45

Hi, we see the health issue to in our portal. At least something is happening :)

@Marco Brouwer 

 

How do you see the health issues? 

swalra_1-1584016773700.png

 

 

Hi,

They should be in your screenshot. Log in to Azure portal, go to https://portal.azure.com/#blade/Microsoft_Azure_Health/AzureHealthBrowseBlade/serviceIssues

Seems that you don't have the issue :) Are you using Azure Backup Vault?

@swalra 

 

Your filter is too narrow. On your screen you can click through to the only active Health Issue at the moment: the Recovery Vault one. 

 

I logged a case for the WVD performance degradation as well. I'd suggest we all report them. If enough people complain they will create a Health issue for it I guess. Given that the problems have been happening for more then a week now and they're still not "aware" of the issue it's the only way forward I see.

The Backup health case has been "resolved" but the latency issues with WVD remain.

 

I have an open ticket but awaiting feedback. If you run https://azure.microsoft.com/en-us/services/virtual-desktop/assessment/  it sometimes peaks over 100 ms but I'm not sure if that's a valid test. Would be nice to get some sort of historic graph on that. I assume it just pings an end-point so we could do it ourselves but haven't found what exact end-point to ping yet.

Hello,

 

this morning we had again performance issues, freezes and disconnects. 

Just got reply from MS:

PG said there were transient errors from Azure REDIS cache components, which caused users to be disconnected.

However they are working to improve robustness on external dependencies failures; so let’s monitor closely and see the status, if anything comes up let us know.

@swalra 

 

Today again performance issues in several enviroments :(

@oiab_nl 

 

Yep, today very bad performance.....

@swalra 

 

Likewise.

 

Just so we're all on the same page here. Everyone experiencing these issues are contacting infrastructure in the Europe West region?

 

Further breakdown. Users usually have 1-2 bars on their WVD connection while experiencing the issue. Although that's a poor metric ofcourse. Sometimes the latency is present with 4 bars as well. I'm 100% certain it's the gateway infrastructure though. If I RDP straght to the VM's I get 4 bars and no latency issues at all.

 

I have escalated the case (through Ingram) to severity A but still not even an acknowledgement from MS. Does anyone else have any open tickets with MS? Mine is: 120031223001948 if you want to link them.

 

@XxArkayxX 

 

I will try to open a ticket and link.

 

My biggest clients are logging in now from a site 2 site vpn, then its ok.

Just got complaints from our client aswel, working with 9 per VM and feedback is slow.
Thinking about migrating them back to our own datacenter environment where we can troubleshoot the whole infrastructure.

just got off the line with a MS Engineer. He did little more then run Psping:

psping -t rdweb.wvd.microsoft.com:443

 

You see a lot of spikes depending on which site you run it from it gets worse and worse but they all experience the spiking. Naturally you don't see such spikes to other internet infrastructure to exclude the possibility of a local issue.

 

After seeing that he is escalating once more internally and will provide feedback.

 

I suggest we all try running these continuously and keep comparing. This is a screenshot of the current "performance" although I do have to mention it's a whole lot better then this morning.

 

WVD.JPG

They are doing DNS LB so your rdweb endpoint isn't the same as mine,our results will differ.
Your resolved IP address is different than mine, however the spikes are also there.

Is there a way to resolve the RDGW that the user is using at the time of issues?
Hi,

Our case number at Microsoft is 120022923000282 (via Ingram Micro though). They have just asked to run another network trace tool on one of the customers fat clients, but I refused.

I referred to this thread, and stated the issue is NOT in the customers network, but for sure somewhere in Azure WVD.

When I get more information, I'll post it here. Today I did not see any disconnects, and since I am not at the customer on site with all Corona madness going on, I can't really check for "slow sessions".