Dec 12 2019 11:59 AM - edited Dec 12 2019 12:06 PM
Dec 12 2019 11:59 AM - edited Dec 12 2019 12:06 PM
Hello we have been experiencing some random but consistent disconnects from our WVD Pool. We have roughly 10 users and have been getting different event viewer logs for when they disconnect. We have Thin Clients on Windows 10 version 1607. When the users disconnect it will happen multiple times per day, however some days they do not disconnect. Attached are the event viewer logs
Sep 21 2020 12:11 PM
@Marco Brouwer We recently switched over from using the NAT Gateway to a Standard Load Balancer and we saw no improvement in the disconnects. As much as I would love to, we too require a static outbound IP and are unable to do any testing without it.
This is an incredibly frustrating issue and our users are not very happy when they get disconnected multiple times a day. It would be great to hear something here from Microsoft about this issue.
Sep 21 2020 12:13 PM - edited Sep 21 2020 12:14 PM
@leoatap NAT gateway is currently not compatible with WVD Pool. This is confirmed by Microsoft support. The load balanced method doesn't work either. Marco Brouwer can tell you about that.
I have written a blog about this issue:
@Roger1175 you could try this:
This is a method I used before Azure NAT Gateway existed and seems to be working fine if you need a static public IP for your entire pool.
Also I mentioned these blogs before :)
Sep 21 2020 12:20 PM
@stefanpeters2020 sorry I missed that before, thank you for sharing! I will try implementing this solution and will report back with my findings in a few days.
Sep 22 2020 12:22 PM
@stefanpeters2020 thanks again for your suggestion.
I followed your guide to create the Ubuntu NAT server last night and already put it to use for today. So far, I am not seeing much (if any) improvement in the disconnects for users. I'm mostly seeing either of the two following errors in the log analytics logs for the Client which leads to the following RDGateway error.
I'm not sure what else to try at this point.
Sep 22 2020 11:23 PM
@Roger1175 can you test the environment without any NAT solution? At our customer site we are currently working without NAT. The Ubuntu NAT did work in my LAB. I tried it over an hour. (about 5 weeks ago). At that time the normal Azure NAT gateway give me reconnection problems 4-6 times per hour, so testing over an hour seemed to be enough at that time.
If your environment is stable without any NAT then you know the WVD is working ok. If not then the client network may not be stable enough. At one client we had double trouble, the Azure NAT Gateway was causing problems, but also the Fortigate firewall is causing the similar problems. We bypassed the Fortigate with a Draytek NAT router, and all WVD clients are stable. We raised a support ticket at Fortigate but they still have not resolved the issue.
Does this client error "PROXY_ERR_INVALIDCA" indicate you are using some proxyserver in you client network? It is best to use Cloud Services without any proxy server, and never use any protocol inspections like https inspection.
If all else fails you can also open a support ticket at Microsoft.
Sep 24 2020 12:16 PM
@stefanpeters2020 thanks again for your feedback.
I tried getting around using a NAT by assigning the static PIP directly to the NIC of one of the WVD session hosts. Unfortunately, I'm still seeing no improvement in the disconnects.
At this point, I'm convinced the issue is not with Microsoft but with our the stability of our user's Internet connections. Though I do think that the WVD service does a really poor job dealing with a connection that is not 100% stable. For example, when I was testing last night after making the change to no longer using the NAT, I was connected to the WVD host and then tried downloading a large file on my local network. In the 10 minutes that the file was downloading, I was disconnected from the WVD host over 40 times and the service was unusable.
This also lines up with what we are seeing with it seeming to be happening to random users from day to day while other users have no issues at all. All of our users are working from home and so they all have different network setups that are not controlled by the business. We are not using a proxy server so I do not know what the "PROXY_ERR_INVALIDCA" error means and I am unable to find anything about it from Microsoft.
I will likely put in a ticket with Microsoft about the issue but I'm not sure anything will come of it. Thanks again for all of your help.
Sep 29 2020 03:49 AM
I just did some more digging and testing in our customers environment.
- Connected some thin clients directly to a 4G LTE router. Same problems. By doing this, the customers normal company network / router / ISP is ruled out. Over this 4G LTE, I can RDP to somewhere else perfectly.
- Using WVDv1 / v2 makes no difference. Tried moving one WVD host to WVDv2, same trouble.
- When a user experiences disconnects, only about half of these are logged by Log Analytics, with error code "-2146233088". The other times are not logged, so it seems.
- User A works on an IGEL thin client (officially suppored by MS!) at the office, but on a Windows 10 laptop from home. He reports that from home he experiences no trouble. Maybe the IGEL client for WVD is more "sensitive" for small hickups and this causes the user to experience a disconnect?
This is what Log Analytics reports:
Oct 02 2020 01:10 PM
@stefanpeters2020 et al.
Hi there, I'm the product manager for NAT gateway, and the statement below is not accurate. There is an active investigation going on as to what is causing the issue. This investigation has not concluded. There is no confirmed root caused incompatibility at this time. Please follow up with your support contacts for further questions, and if you have this type of issue with this setup so we can properly diagnose and root cause the issue. When you do contact support, it is beneficial to have simultaneous packet captures at source and destination where we can see a repro. I apologize for the trouble you guys are having and appreciate your cooperation as we work through resolution. Thank you!
Oct 02 2020 01:28 PM
@Roger1175 Hi Roger, please do open a case so we can root cause and investigate. This needs to be properly root caused. Thank you!
Oct 02 2020 01:30 PM
@Marco Brouwer Thank you. Can you please open a case with support so we can troubleshoot and properly root cause the issue? Thank you!
Oct 03 2020 05:07 AM
Thank you for your reply. We have confirmed the NAT Gateway is causing problems with WVD if the WVD sessions hosts internet breakout is router over the NAT Gateways. I have tested this in my LAB and the reconnection problem occurred within 5-10 minutes. Proof of that is published on my personal blog page: https://www.microcloud.nl/azure-nat-revisited/
If however there is need for special configuration to allow the use of the NAT Gateway without the reconnection problems I welcome that. At time of the writing of the BLOG there was no special configuration available from Microsoft, nor was there any information about whether or not the NAT Gateway is compatible with WVD session hosts.
In our customer situation we also created a ticket and captured network packets until forever. After we removed the NAT Gateway in our customer situation the issue was resolved except for 1 location. That location had double trouble because the Fortigate Firewall is causing similar problems. We have replaced the Fortigate with a Draytek NAT router. Then all sites of this customer have no reconnection problems anymore.
Microsoft confirmed in our open ticket that something is going on.
I am happy to discus this in a Teams call if you like.
Here is a quote from an email I received from Microsoft.
"I have some interesting news from some WVD/RDS collegs with investigate one case similary with ours. Customer have offices in many country from Europe .
They involve Azure Networking Team and some PG expert. After they took many logs from WVD and from Network they discovered the problem it is generated by NAT Gateway network device. They recomanded to customer to disable Nat Gateway . Now they wait today response from customer about the last status ."
Jan 26 2021 12:58 PM
We've had this issue the week after we launched our WVD environment. The clients (end-users) pc's/thin clients, etc. momentarily stop sending heart beats to the RG Gateway.
We've done A LOT of network tracing and discovered that's what was happening.
GPO to change registry values on your host pool VM's. I've attached it as a PDF.
Basically you need to add registry keys to increase heart beat intervals so that the RG gateway continues to stay connected even though the client may have dropped some heart beats. That's when the RG gateway disconnects. Hope this helps!
Jan 27 2021 05:44 PM
Jan 27 2021 05:57 PM
@troyjones thank you for sharing! Unfortunately, this did not seem to make any improvement for us. In fact, one of our MacOS users started getting reconnects every minute or so after implementing this. Once the registry keys were deleted, it immediately fixed that issue.
UDP shortpath has been a lifesaver for us in terms of fixing the disconnects primarily. It hasn't completely resolved things but it's significantly more stable now.
Feb 10 2021 04:55 AM
@Roger1175 We got the exact same issue with Mac users when adding these registry keys, the disconnects for them increased massivley.
Nov 16 2021 07:02 AM
Nov 17 2021 01:46 AM