Forum Discussion
Consistent Thin Client Disconnection from WVD Pool
stefanpeters2020 I'm glad to hear that you figured out the issue, thanks for sharing! We've been seeing the same thing and it also started once we added the NAT Gateway to our WVD servers. I would really appreciate it if you could post an update when you hear more info from Microsoft, especially if they eventually provide an ETA for the fix.
Adding my data point. We tested WVD for weeks. Users had random disconnection from work, home, wifi or cabled. Had to drop the project because the unacceptable user experience. Yes, the wvd pool used NAT gateway too.
- stefanpeters2020Oct 03, 2020Copper Contributor
Thank you for your reply. We have confirmed the NAT Gateway is causing problems with WVD if the WVD sessions hosts internet breakout is router over the NAT Gateways. I have tested this in my LAB and the reconnection problem occurred within 5-10 minutes. Proof of that is published on my personal blog page: https://www.microcloud.nl/azure-nat-revisited/
If however there is need for special configuration to allow the use of the NAT Gateway without the reconnection problems I welcome that. At time of the writing of the BLOG there was no special configuration available from Microsoft, nor was there any information about whether or not the NAT Gateway is compatible with WVD session hosts.
In our customer situation we also created a ticket and captured network packets until forever. After we removed the NAT Gateway in our customer situation the issue was resolved except for 1 location. That location had double trouble because the Fortigate Firewall is causing similar problems. We have replaced the Fortigate with a Draytek NAT router. Then all sites of this customer have no reconnection problems anymore.
Microsoft confirmed in our open ticket that something is going on.
I am happy to discus this in a Teams call if you like.
Here is a quote from an email I received from Microsoft.
"I have some interesting news from some WVD/RDS collegs with investigate one case similary with ours. Customer have offices in many country from Europe .
They involve Azure Networking Team and some PG expert. After they took many logs from WVD and from Network they discovered the problem it is generated by NAT Gateway network device. They recomanded to customer to disable Nat Gateway . Now they wait today response from customer about the last status ."
- ckuhtzOct 02, 2020Former Employee
Marco Brouwer Thank you. Can you please open a case with support so we can troubleshoot and properly root cause the issue? Thank you!
- ckuhtzOct 02, 2020Former Employee
stefanpeters2020 et al.
Hi there, I'm the product manager for NAT gateway, and the statement below is not accurate. There is an active investigation going on as to what is causing the issue. This investigation has not concluded. There is no confirmed root caused incompatibility at this time. Please follow up with your support contacts for further questions, and if you have this type of issue with this setup so we can properly diagnose and root cause the issue. When you do contact support, it is beneficial to have simultaneous packet captures at source and destination where we can see a repro. I apologize for the trouble you guys are having and appreciate your cooperation as we work through resolution. Thank you!
- Marco BrouwerSep 29, 2020Brass Contributor
Hi all,
I just did some more digging and testing in our customers environment.
- Connected some thin clients directly to a 4G LTE router. Same problems. By doing this, the customers normal company network / router / ISP is ruled out. Over this 4G LTE, I can RDP to somewhere else perfectly.
- Using WVDv1 / v2 makes no difference. Tried moving one WVD host to WVDv2, same trouble.
- When a user experiences disconnects, only about half of these are logged by Log Analytics, with error code "-2146233088". The other times are not logged, so it seems.
- User A works on an IGEL thin client (officially suppored by MS!) at the office, but on a Windows 10 laptop from home. He reports that from home he experiences no trouble. Maybe the IGEL client for WVD is more "sensitive" for small hickups and this causes the user to experience a disconnect?
This is what Log Analytics reports:
- Roger1175Sep 24, 2020Brass Contributor
stefanpeters2020 thanks again for your feedback.
I tried getting around using a NAT by assigning the static PIP directly to the NIC of one of the WVD session hosts. Unfortunately, I'm still seeing no improvement in the disconnects.
At this point, I'm convinced the issue is not with Microsoft but with our the stability of our user's Internet connections. Though I do think that the WVD service does a really poor job dealing with a connection that is not 100% stable. For example, when I was testing last night after making the change to no longer using the NAT, I was connected to the WVD host and then tried downloading a large file on my local network. In the 10 minutes that the file was downloading, I was disconnected from the WVD host over 40 times and the service was unusable.
This also lines up with what we are seeing with it seeming to be happening to random users from day to day while other users have no issues at all. All of our users are working from home and so they all have different network setups that are not controlled by the business. We are not using a proxy server so I do not know what the "PROXY_ERR_INVALIDCA" error means and I am unable to find anything about it from Microsoft.
I will likely put in a ticket with Microsoft about the issue but I'm not sure anything will come of it. Thanks again for all of your help.
- stefanpeters2020Sep 22, 2020Copper Contributor
Roger1175 can you test the environment without any NAT solution? At our customer site we are currently working without NAT. The Ubuntu NAT did work in my LAB. I tried it over an hour. (about 5 weeks ago). At that time the normal Azure NAT gateway give me reconnection problems 4-6 times per hour, so testing over an hour seemed to be enough at that time.
If your environment is stable without any NAT then you know the WVD is working ok. If not then the client network may not be stable enough. At one client we had double trouble, the Azure NAT Gateway was causing problems, but also the Fortigate firewall is causing the similar problems. We bypassed the Fortigate with a Draytek NAT router, and all WVD clients are stable. We raised a support ticket at Fortigate but they still have not resolved the issue.
Does this client error "PROXY_ERR_INVALIDCA" indicate you are using some proxyserver in you client network? It is best to use Cloud Services without any proxy server, and never use any protocol inspections like https inspection.
If all else fails you can also open a support ticket at Microsoft.
Good luck!
Stefan
- Roger1175Sep 22, 2020Brass Contributor
stefanpeters2020 thanks again for your suggestion.
I followed your guide to create the Ubuntu NAT server last night and already put it to use for today. So far, I am not seeing much (if any) improvement in the disconnects for users. I'm mostly seeing either of the two following errors in the log analytics logs for the Client which leads to the following RDGateway error.
Client Errors:
PROXY_ERR_INVALIDCA
CM_ERR_MISSED_HEARTBEATRDGateway Error:
ConnectionFailedClientDisconnectI'm not sure what else to try at this point.
- Roger1175Sep 21, 2020Brass Contributor
stefanpeters2020 sorry I missed that before, thank you for sharing! I will try implementing this solution and will report back with my findings in a few days.
- stefanpeters2020Sep 21, 2020Copper Contributor
leoatap NAT gateway is currently not compatible with WVD Pool. This is confirmed by Microsoft support. The load balanced method doesn't work either. Marco Brouwer can tell you about that.
I have written a blog about this issue:
https://www.microcloud.nl/azure-nat-revisited/
Roger1175 you could try this:
This is a method I used before Azure NAT Gateway existed and seems to be working fine if you need a static public IP for your entire pool.
https://www.microcloud.nl/azure-nat-with-ubuntu-linux/
Also I mentioned these blogs before 🙂
- Roger1175Sep 21, 2020Brass Contributor
Marco Brouwer We recently switched over from using the NAT Gateway to a Standard Load Balancer and we saw no improvement in the disconnects. As much as I would love to, we too require a static outbound IP and are unable to do any testing without it.
This is an incredibly frustrating issue and our users are not very happy when they get disconnected multiple times a day. It would be great to hear something here from Microsoft about this issue.
- Marco BrouwerSep 21, 2020Brass Contributor
Did you test without NAT gateway? Did this solve the issue?
We are not using NAT gateway, but standard load balancer with outgoing NAT rule.
Not tested disabling this myself because our customer needs a static outgoing IP.
Anyone else using load balancer instead of NAT gateway and experiencing the same issues?