Troubleshooting Guide for Virtual Machine Connectivity (RDP/SSH)
Published Oct 09 2023 12:40 PM 2,898 Views
Microsoft

Azure Lab Services requires RDP & SSH access to connect to the lab template and student virtual machines (this applies whether using advanced networking or not). There are a common set of issues and pitfalls that affect the connectivity for students & teachers to their lab resources. Fundamentally, the service needs an unobstructed network path from the individual (Chromebook, iPad, PC, Mac, etc.) to their virtual machine in Azure to complete the connection. This can be impacted/affected by many things. The troubleshooting steps below include areas to check throughout the path of connectivity. 

Other Troubleshooting Guides 

There is an Azure Troubleshooting guide that has some good data on Cannot connect with RDP to a Windows VM in Azure - Virtual Machines. However, Azure Lab Services is a managed offering where some of the backing resources for a Lab are not directly accessible. This affects the ability to complete each of the steps within this guide. 

All Labs Troubleshooting Items 

There are troubleshooting items that apply to all labs - whether the lab is using advanced networking or standard networking, but there are other pitfalls listed for advanced networking section at the end. 

Connection is too slow 

A slow RDP connection is one of the more difficult problems to debug and fix. The first action should be to quantify the RDP connection speed. In the post How to ensure the best RDP experience for lab users, the utility PsPing is used to measure the response time to the machine. This post includes information about different methods to improve RDP performance.  If the connection experience seems slow, the first step may be to adjust the settings in the client experience to reduce the volume of data being transmitted.  The next step is to determine the scope of the problem, which answering the following questions should help: 

  • Is it a specific machine? 
  • Is it a specific lab? 
  • Is there a VPN being used? 
  • Is it slow on a specific network? 
  • Is there a firewall on the network? 
  • Is it slow with a specific ISP? 

I’ll walk through each of these questions and suggestions for ways to improve the experience.  

A specific machine is slow: We usually don’t see this type of slowdown on a consistent basis, commonly it’s related to lab, network, or ISP for the student.  In Azure Lab Services, Teachers and Students can use the redeploy capability to change the backing infrastructure (VM, network, etc.) which could improve the experience. 

Specific lab is slow: If all the machines in a lab are running slow there are a few ways to troubleshoot the connection: 

  • Check the region that the lab is created in with respect to where your students are, the further away the greater the chances of slow connection speed.  This is testable by creating another lab in a closer region and checking if connections to virtual machines in the new lab are faster than the current lab. 
  • Check the lab machine size for the software used in the class.  If all the machines are “slow” it may be the machine is undersized for the software/class type and the slowdown may not be the connection speed.  This can be tested by creating a lab with more CPU or RAM than the current lab and verifying performance. 

VPN is being used: A good troubleshooting step is to turn off the VPN to see if that improves the connection speed. If it is the VPN and it is required, then review the VPN settings and configuration to possibly allow RDP or SSH connections to be “passed through”, connections aren’t routed to distant regions or routed incorrectly. 

On a specific network/firewall: Any network from an enterprise level network to a student's home router/Wi-Fi combination can impact the connectivity to Azure Lab Services.  For example, we’ve seen where some students’ home routers have built-in firewalls that block or limit the RDP/SSH connections.  Check if there is a firewall enabled on the network and if it is configured to limit the RDP/SSH connections.  There are specific details below. 

On a specific internet service provider: Ok, this is a difficult one to test as most people don’t have two ISPs to connect to. If the slowdown is on a specific network and you’ve exhausted all the other options, you may want to contact your ISP to see if they have any limiters on RDP/SSH connections. 

 

Virtual Machine not running 

Commonly when the students get the message, “Remote Desktop can’t connect to the remote computer …  Make sure the remote computer is turned on and connected to the network, and that remote access is enabled”, the virtual machine that the students are trying to connect to hasn’t completely started yet. 

PeterHauge_0-1696879732851.png

 

 

There are a few different techniques and adjustments that may improve this type of problem. The first step is to open the Lab portal (https://labs.azure.com ) and check that the virtual machine shows as running. 

Not running: 

PeterHauge_1-1696879732854.png

Starting 

(The virtual machine cannot be connected to yet) 

PeterHauge_2-1696879732854.png

Running: 

PeterHauge_3-1696879732855.png

Teacher’s view of all VMs (stopped, starting & running)  

PeterHauge_4-1696879732856.png

The student can start the virtual machine from their lab portal.  It may take between 2 to 5 minutes to get the machine fully running.   

Adjusting the lab automatic shutdown settings may improve the student connection experience.  As turning on and off the virtual machine takes time, adjusting the settings may decrease the chances of the student trying to connect while the machine is changing state.  While the automatic shutdown settings are part of a cost savings strategy, they may need to be adjusted to improve the student experience. 

  • Shut down idle virtual machines: If the duration is too short, there may not be enough time from when the student starts the machine and then connects, or if the student is not active (in-classroom learning for example), the virtual machine may be shutdown. 
  • Shut down virtual machines when users disconnect: If there is too small a time delay, you can run into issues where an accidental disconnect will start a shutdown. Students will need to start the virtual machine again to connect.  
  • Shut down virtual machines when users do not connect: If students do not connect to the virtual machine after some time and if the duration is too short, the virtual machine will be shutdown.  This can affect students starting the virtual machine themselves, or if schedules are used in the lab.  Changing the idle setting to a longer duration is an option but has potential cost implications.  If schedules are being used, the virtual machines can be started closer to when the class time starts.  

Outbound from the School/University/Enterprise/Home Network (local Firewalls) 

The network can be a point of interference when firewall(s), switches, routers or other network appliances block or limit RDP/SSH (3389/22) ports.  There are a couple of points where this can happen, either at school or on the local network router at the student’s home.  For example, there have been situations where modern routers, especially WIFI 6, have default behavior to block or restrict the RDP/SSH connections.  We also see this on campus firewalls restricting outbound RDP/SSH connections to the internet.  If you can’t generally remove restrictions, you can usually add an exemption for the lab public IP address.  You’ll need to get an IP address for each lab and add those IP addresses to the allow list for the firewall or router.  We also see the operating system (Windows & Linux) that can restrict outbound RDP/SSH access to connect to the machine.  Consult the operating system firewall documentation to enable RDP/SSH connectivity. 

Misconfigurations on the Student Virtual Machine 

When students are administrators on their virtual machine, they can make system changes including the network configuration.  The student may accidentally change the network configuration in the operating system causing connectivity issues (and virtual machine startup issues).  Some examples of these misconfigurations are: 

  • Updating the IP Address to a static IP instead of specified as a dynamic IP 
  • Disabling DCHP (preventing automatically getting an IP address)  
  • Specifying DNS servers (this should not be specified on the virtual machine.  If custom DNS is needed, please use Advanced Networking and specify custom DNS servers on the virtual network) 

Students running as administrators can also update the local user groups and permissions which could inadvertently block the ability to connect to the machine.  

To proactively prevent these types of mistakes, the template can be setup a script to auto-reset the networking on machine shutdown  (this will run a script on shutdown, that will reset the network configuration).  Otherwise, students or teachers would need to reimage their virtual machine which will get them back to a good state. 

Username/password provided doesn’t work on Student VMs when logging in 

If the students are unable to connect to their machines using the username and password for the lab with the following message, “Your credentials did not work”: 

PeterHauge_5-1696879732858.png

 

 

There are a couple of different reasons that this may occur: 

  1. Student using wrong credentials: Please confirm that the student is using the correct username and password for the lab.  If the lab has been created with the “Use same password for all virtual machines” enabled, then the username and password should be the same for every student. 
  1. Student forgot their password:  If they have a custom password and have forgotten it, then the student can reset the password on the machine from the lab portal. Additionally, the student can reset the machine, but any user data will be deleted and not be retrievable. 
  1. Shared Password + Azure Compute Gallery image:  If other students can’t login using the common lab username and password and the lab was created using an existing custom image this may be caused by a known limitation.  The workaround is to use the username and password when the image was created or reset the password. 
  1. Virtual Machine was compromised: There are situations where a student password could be fraudulently changed by a bad actor.  The student can reset their password to regain access to the machine, but here are some suggestions to reduce the likelihood of this happening: 
  • Do not use common passwords, uncheck the use same password option when creating the lab.  Having individual specific passwords reduces the scope if the password is compromised. 
  • Use strong passwords and secure it. 
  • Restrict access to the lab, so that only those students that are in the class can access the machines. By default, the lab is restricted. 
  1. Remote Desktop Gateway configured: The remote desktop client the students are using may have a Remote Desktop Gateway configured.  If so, they would need to enter their gateway credentials first (to authenticate to the gateway) before connecting to their student VM.  (NOTE: This is not common.) 

Confirm no lab deployment issues 

While not directly related to RDP/SSH connection troubleshooting, if the lab has a failure the machine connections may not work properly. The Azure activity log is the most comprehensive list of events and results.  Commonly, the activity log will be filtered on the resource group that the lab is located in. The events may take a few minutes to be available in the log.  These event logs will contain more detailed information that can be used for troubleshooting and should be included if a support ticket needs to be created. 

Unable to connect an outgoing VPN from a Student VM 

If students are attempting to use a VPN connection initiated from a student VM (to the campus/university network for example) and the VPN fails to connect, this is most likely due to the VPN having issues with the Azure Lab Services network configuration.  Please open an Azure Support Ticket to get help from Microsoft on resolving this. 

 

Advanced Networking Troubleshooting Items 

The list below contains troubleshooting items that apply to advanced networking scenarios only. 

Missing a Network Security Group 

When troubleshooting a lab plan that has advanced networking configured, one of the first checks is to confirm that the lab services network subnet has a network security group connected to it.  This will let the RDP/SSH connections to be allowed through.  Without a network security group, all connections are blocked to the virtual machines (template VM and student VMs). 

Using Azure Virtual Machine RDP Troubleshooting 

There are unique troubleshooting techniques with labs that are configured with advanced networking.  Advanced networking enables additional troubleshooting by creating an Azure Virtual Machine connected directly to the virtual network that the lab plan is connected to. Using this Azure VM (outside of Azure Lab Services), you can use the Azure Virtual Machine RDP Troubleshooting guide, including the in-Azure connection troubleshooter, to determine if the network is configured correctly.  If you’re still unable to connect to the virtual machine, please see the following section.determine if the network is configured correctly.  If you’re still unable to connect to the virtual machine, please see the following section. 

NSG Rules are blocking RDP/SSH connections 

Using the Azure VM that is connected directly to the virtual network (from the previous section), you can diagnose virtual machine network connectivity directly in the Azure Portal.  The blocking or limiting of the RDP/SSH connections via security rules can be done at the subnet with a Network Security Group or by using Azure Virtual Network Manager, the easiest way to see the full list of rules is via the Azure Virtual Machine network effective security rules. 

Default User Defined Route (Route table problem) 

Advanced networking allows the network to be customized as needed, including modifying the route table.  A user-defined route table directs traffic to the appropriate destinations.  There is a special route, the “internet route” (0.0.0.0/0) which directs traffic not bound for another local address to the Internet.  Azure Lab Services advanced networking does not support updating the ‘next hop’ for the 0.0.0.0/0 route to anything except the internet.  Changing this to a specific IP address (for example, directing outbound internet traffic to a firewall or other network appliance) will break connectivity to the lab by introducing an asymmetric routing issue.  When debugging issues, check for a custom route table and make sure that the default route is set to have 0.0.0.0/0 to the Internet.  

 

Contact Microsoft for Help 

If all else fails and none of the troubleshooting items above helped, please open an Azure Support Ticket to get help from Microsoft on resolving the connection issues. 

 

3 Comments
Co-Authors
Version history
Last update:
‎Oct 09 2023 12:39 PM
Updated by: