VMware HCX Troubleshooting with Azure VMware Solution
Published Apr 08 2024 03:34 PM 242K Views
Microsoft

Overview

VMware HCX is one of the Azure VMware Solution components that generates a large number of service requests from our customers. The Azure VMware Solution product group has worked to cover the most common troubleshooting considerations that you should know about when using VMware HCX with the Azure VMware Solution.

 

Azure VMware Solution is a VMware validated first party Azure service from Microsoft that provides private clouds containing VMware vSphere clusters built from dedicated bare-metal Azure infrastructure. It enables customers to leverage their existing investments in VMware skills and tools, allowing them to focus on developing and running their VMware-based workloads on Azure.

 

VMware HCX is the mobility and migration software used by the Azure VMware Solution to connect remote VMware vSphere environments to the Azure VMware Solution. These remote VMware vSphere environments can be on-premises, co-location or cloud-based instances.

 

rvandenbedem_0-1712587058481.png

Figure 1 – Azure VMware Solution with VMware HCX Service Mesh

 

In the next section, I will introduce the architectural components of the Azure VMware Solution.

 

Architectural Components

The diagram below describes the architectural components of the Azure VMware Solution.

 

rvandenbedem_1-1712587058488.png

Figure 2 – Azure VMware Solution Architectural Components

 

Each Azure VMware Solution architectural component has the following function:

 

  • Azure Subscription: Used to provide controlled access, budget and quota management for the Azure VMware Solution.
  • Azure Region: Physical locations around the world where we group data centers into Availability Zones (AZs) and then group AZs into regions.
  • Azure Resource Group: Container used to place Azure services and resources into logical groups.
  • Azure VMware Solution Private Cloud: Uses VMware software, including vCenter Server, NSX software-defined networking, vSAN software-defined storage, and Azure bare-metal ESXi hosts to provide compute, networking, and storage resources. Azure NetApp Files, Azure Elastic SAN, and Pure Cloud Block Store are also supported.
  • Azure VMware Solution Resource Cluster: Uses VMware software, including vSAN software-defined storage, and Azure bare-metal ESXi hosts to provide compute, networking, and storage resources for customer workloads by scaling out the Azure VMware Solution private cloud. Azure NetApp Files, Azure Elastic SAN, and Pure Cloud Block Store are also supported.
  • VMware HCX: Provides mobility, migration, and network extension services.
  • VMware Site Recovery: Provides Disaster Recovery automation, and storage replication services with VMware vSphere Replication. Third party Disaster Recovery solutions Zerto DR and JetStream DR are also supported.
  • Dedicated Microsoft Enterprise Edge (D-MSEE): Router that provides connectivity between Azure cloud and the Azure VMware Solution private cloud instance.
  • Azure Virtual Network (VNet): Private network used to connect Azure services and resources together.
  • Azure Route Server: Enables network appliances to exchange dynamic route information with Azure networks.
  • Azure Virtual Network Gateway: Cross premises gateway for connecting Azure services and resources to other private networks using IPSec VPN, ExpressRoute, and VNet to VNet.
  • Azure ExpressRoute: Provides high-speed private connections between Azure data centers and on-premises or colocation infrastructure.
  • Azure Virtual WAN (vWAN): Aggregates networking, security, and routing functions together into a single unified Wide Area Network (WAN).

 

In the next section, I will describe the troubleshooting steps you should follow for VMware HCX when used with the Azure VMware Solution.

 

Troubleshooting Considerations

Before opening a ticket with Microsoft support, please use the following steps as a checklist to ensure you are not impacted by the most common VMware HCX issues.

 

Troubleshooting Step 1: Download the VMware HCX Connector.

 

Once VMware HCX is deployed on the Azure VMware Solution side, the download for the VMware HCX Connector OVA is in the VMware HCX UI plugin. Under the Administration there is a Request Download Link. The OVA can be copied locally or a download link for the OVA can be selected.

 

rvandenbedem_13-1712587058534.png

Figure 3 – VMware HCX Connector OVA Download

 

Troubleshooting Step 2: Upgrade to HCX Enterprise.

 

Azure VMware Solution comes with an Enterprise license key for VMware HCX. If you have a pre-existing VMware HCX Connector on-prem that is licensed for VMware HCX Advanced, please be sure to upgrade the connector to the Enterprise version. To upgrade VMware HCX navigate to the HCX Connector at https://<hcx_connector_fqdn>:9443, under the Configuration section select Licensing and Activation, edit the current license and enter the VMware HCX enterprise license key obtained from the Azure VMware Solution portal. Verify that the License is showing Enterprise.

 

rvandenbedem_2-1712587058490.png

Figure 4 – VMware HCX Connector License Key

 

Once you have updated the VMware HCX Connector, be sure to update/edit the VMware HCX Compute Profile and Service Mesh to include the updated VMware HCX services that you would like to take advantage of, such as Replicated Assisted vMotion and OS Assisted Migration. OS Assisted Migration is used for migrating and converting Microsoft Hyper-V and RedHat KVM workloads into Azure VMware Solution.

 

rvandenbedem_3-1712587058494.png

Figure 5 – VMware HCX Connector Compute Profile Service Activation

 

Troubleshooting Step 3: Only use the key from the Azure VMware Solution private cloud you are connecting to.

 

When deploying the VMware HCX Connector on-premises, the activation key should come from the Azure VMware Solution you are migrating to. In the Azure portal, an activation Key can be obtained in the Add-Ons section. Simply request an activation key, provide it with a friendly name and map that activation key to the on-premises VMware HCX connector.

 

rvandenbedem_15-1712587058540.png

Figure 6 – VMware HCX Connector License Key

 

Troubleshooting Step 4: Do not use an IPSec VPN.

 

If possible, avoid using an IPSec VPN connection to Azure VMware Solution when migrations with VMware HCX will happen. Migrating with VMware HCX over VPN has been known to cause issues and multiple failures around migrations. Although utilizing VMware HCX via VPN is supported, it is not the recommended way to migrate virtual machines to Azure VMware Solution. One of the biggest caveats of migrating VMs with VMware HCX over VPN is that a separate uplink network profile is needed on-premises. The management network cannot be used as an uplink profile, as the MTU of the uplink profile needs to be adjusted to 1300 to accommodate the IPSec overhead. Note that VMware HCX uses IPSec VPN natively as part of the VMware HCX Service Mesh.

 

Troubleshooting Step 5: Check MTU size within your Network Profile.

 

Be sure to verify the MTU setting on the Network Profiles setup. Within VMware HCX, navigate to the Interconnect section, select Network Profiles and be sure to verify the correct MTU size is being used for each Profile. Be sure to verify this on both ends of the VMware HCX site pair.

 

rvandenbedem_4-1712587058498.png

Figure 7 – VMware HCX MTU size in Network Profile

 

Use this guide of recommended MTU sizes for the Network Profiles in the table below when connecting to Azure VMware Solution.

 

Connectivity Method

Management

Uplink

Replication

vMotion

Azure ExpressRoute

1500

1500

1500 or 9000

1500 or 9000

VMware HCX over IPSec VPN

1500

1300

1500 or 9000

1500 or 9000

Table 1 – VMware HCX Network Profile MTU Sizes

 

Troubleshooting Step 6: Always keep your VMware HCX versions updated (Connectors, Cloud Manager and Service Meshes).

 

Before you upgrade VMware HCX, check the VMware product interoperability matrix to ensure the integrated versions of on-premises VMware solution software are supported by the new version of VMware HCX you are going to upgrade to.

 

Updates to VMware HCX are released regularly by VMware. It is the responsibility of the customer to upgrade and maintain VMware HCX on both sides of the Service Mesh (on-premises and Azure VMware Solution). When updating VMware HCX, the VMware HCX Cloud Managers should be updated first. It is recommended to create a back-up to the VMware HCX Connector before updating.

 

Backups to the VMware HCX Connector can be done through the VMware HCX manager UI at https://<hcx_connector_fqdn>:9443 with the admin password created at the time of VMware HCX Connector deployment. Under the Administration section head to the Backups and restore section. Backups can be taken here and scheduled to be taken as well.

 

Optionally, you can take a vSphere snapshot of the VMware HCX Connector on-premises as well.

 

rvandenbedem_5-1712587058501.png

Figure 8 – VMware HCX Connector Backup & Restore

 

Updates for the VMware HCX Cloud Managers can be found in the administration section, select your current version, and hit the ‘Check for Updates’ button. If a new version is available, you will be able to download and update to the newest version. Backups of the VMware HCX Cloud Manager are taken automatically each day.

 

rvandenbedem_6-1712587058504.png

Figure 9 – VMware HCX Upgrades

 

It should be noted that VMware HCX Service Meshes are updated independently of the VMware HCX Cloud Managers and Connectors. Upon completion of the VMware HCX Cloud Manager and Connector updates, Service Meshes should be updated next. VMware HCX Cloud Managers and Service Meshes should be upgraded in order and together as to not cause an issue with mixed mode versions of Managers and Service Meshes. Running mixed mode versions of VMware HCX Cloud Managers, Connectors, and Service Meshes in production is highly discouraged. You can lose certain features and it often creates issues within the environment.

 

rvandenbedem_0-1712591023368.png

Figure 10 – VMware HCX Manager Service Mesh Update

 

During the Service Mesh update process, if Network Extension appliances are deployed a temporary loss of connectivity will occur while the appliances update. For Network Extension in an HA pair, down time is approximately a few seconds. Network Extension appliances not in an HA pair will incur downtime of approximately one minute.

 

Troubleshooting Step 7: On-Premises Network Connectivity and Firewalls.

 

For VMware HCX to be activated and receive updates, your on-premises firewalls need to allow outbound traffic to port 443 for the following websites:

 

 

Your on-premises firewalls will also need to allow outbound traffic to UDP port 4500. Within VMware HCX UDP port 4500 serves a specific purpose, it allows IPSec VPN communication between VMware HCX components across environments and is essential for communication and data transfer between environments to work. When configuring VMware HCX, you need to ensure that this port is open between your on-premises VMware HCX Connector uplink network profile and the Azure VMware Solution HCX Cloud Manager uplink network profile.

 

Another common issue we see within VMware HCX, is that your on-premises VMware HCX Connector is unable to reach the VMware HCX activation and entitlement website. A simple way to verify your on-premises environment has access to the activation and entitlement website is as follows. SSH into the on-premises VMware HCX Connector and run the below curl commands to verify connectivity:

 

 

A successful connection to the above website will look like the figure below.

 

rvandenbedem_8-1712587058511.png

Figure 11 – VMware HCX Connector SSH CURL connectivity test

 

Troubleshooting Step 8: Diagnostics page on the Service Mesh.

 

Built into the VMware HCX Service Mesh there is an option to run a diagnostics check on the Service Mesh appliances. This is an effective way to verify the health of your Service Mesh and pinpoint any specific issues the appliances may have.

 

In the VMware HCX Connect user interface, under the Interconnect section, select the Service Mesh you want to run the diagnostics on. Under the “More” link, select Run Diagnostics to perform a health check on the appliances.

 

rvandenbedem_9-1712587058515.png

Figure 12 – VMware HCX Service Mesh Run Diagnostics

 

Once the Diagnostics test is completed, if there are any issues, a red banner will appear under the Service Mesh name. You can drill down to the specific issues by clicking on the red alert (!) icon.

 

rvandenbedem_10-1712587058519.png

Figure 13 – VMware HCX Service Mesh Alert

 

Troubleshooting Step 9: If you are having issues with the source side interface reboot the VMware HCX Connector.

 

VMware HCX Connectors may have issues over time. It is recommended to reboot the VMware HCX Connector if it has been up and running for an extended period without a reboot.

 

On the Azure VMware Solution side, we do have the option for customers to reboot the VMware HCX Cloud Manager within Azure VMware Solution through a Run Command in the Azure portal. The option to Force or Hard Reboot the VMware HCX Cloud Manager is also an option that is offered. Please use this with caution as it does not check for any active migrations or replications that may be occurring.

 

rvandenbedem_12-1712587058530.png

Figure 14 – Azure VMware Solution Run Command Restart-HCXManager

 

Troubleshooting Step 10: Logging into the VMware HCX Cloud Manager directly

 

You have the ability to log into the VMware HCX Cloud Manager directly. At times the VMware HCX plugin through your Azure VMware Solution vSphere Client will not be available or fail to open. You can obtain the IP address of the VMware HCX Cloud Manager in the Azure portal when you are in the Azure VMware Solution resource. In the Add-ons section under the “Migration using VMware HCX”, the IP address of the VMware HCX Cloud manager will be listed. It is part of the /22 network you provided when deploying Azure VMware Solution. Access the manager directly at https://<x.x.x.9>:443 or https://hcx.<guid>.<region>.avs.azure.com. The VMware HCX Cloud Manager will always end with a .9 octet.

 

rvandenbedem_14-1712587058535.png

Figure 15 – VMware HCX Cloud Manager Login

 

Troubleshooting Step 11: Network Extensions are for temporary migration phases, not for permanent use.

 

At its core VMware HCX is a migration tool. When using Network Extensions in VMware HCX, it is important to understand that these Network Extensions should be a temporary solution used during the migration process to migrate VMs into Azure VMware Solution with no downtime during the migration. It is best practice to remove the network extensions as soon as the migration waves are completed. Leaving network extensions in place for extended periods of time can cause issues and outages in your environment. Use Network Extensions with caution.

 

rvandenbedem_11-1712587058522.png

Figure 16 – VMware HCX Network Extension

 

Troubleshooting Step 12: If you have Mobility Optimized Networking (MON) enabled, ensure you have the router location set to the correct side.

 

When configuring MON, verify where the default gateway resides. The default gateway will always be located on the source side of the network extension. Primarily, it will reside in the on-premises data center when connecting to Azure VMware Solution.

 

rvandenbedem_16-1712587058547.png

Figure 17 – VMware HCX Mobility Optimized Network (MON)

 

Troubleshooting Step 13: OS Assisted Migration -Sentinel Gateway Appliances.

 

When using VMware HCX OS Assisted Migration, it is important to maintain and manage the VMware HCX Sentinel Gateway Appliance (SGW) at the source site (On-premises). The Sentinel Gateway Appliance is responsible for establishing a forwarding connection with the VMware HCX Sentinel Data Receiver (SDR) on the destination site. Managing and maintaining the Sentinel Gateway appliance’s resources, CPU and memory configuration, is the responsibility of the customer.

 

Next Steps

If this has not resolved the VMware HCX issue in your Azure VMware Solution private cloud, please open a Service Request with Microsoft to continue the resolution process.

 

Summary

In this post, we described helpful troubleshooting tips when facing some of the most common VMware HCX service issues our customers have with the Azure VMware Solution.

 

If you are interested in the Azure VMware Solution, please use these resources to learn more about the service:

 

 

Author Bios

Ricky Perez is a Senior Cloud Solution Architect in the international Customer Success Unit (iCSU) at Microsoft. His background is in solution architecture with experience in public cloud and core infrastructure services.

 

Jason Trammell is a Senior Software Engineer in the Azure VMware Solution engineering group at Microsoft.

 

Kenyon Hensler is a Principal Technical Program Manager in the Azure VMware Solution product group at Microsoft. His background is in system engineering with experience across all facets of enterprise networking and compute stacks.

 

René van den Bedem is a Principal Technical Program Manager in the Azure VMware Solution product group at Microsoft. His background is in enterprise architecture with extensive experience across all facets of the enterprise, public cloud & service provider spaces, including digital transformation and the business, enterprise, and technology architecture stacks. René works backwards from the problem to be solved and designs solutions that deliver business value with the minimum of risk. In addition to being the first quadruple VMware Certified Design Expert (VCDX), he is also a Dell Technologies Certified Master Enterprise Architect, a Nutanix Platform Expert (NPX), and a VMware vExpert.

Version history
Last update:
‎Jun 04 2024 09:24 AM
Updated by: