Networking Blog

6 MIN READ

Migrate an existing cluster to Network ATC

DanCuomo

Microsoft

Jun 09, 2023

Since Azure Stack HCI 21H2, customers have used Network ATC to:

Reduce host networking deployment time, complexity, and errors
Deploy the latest Microsoft validated and supported best practices
Ensure configuration consistency across the cluster
Eliminate configuration drift

Network ATC has led to HUGE reductions in customer support cases which means increased uptime for your business applications and less headaches for you! But what if you already deployed your cluster? How do you take advantage now that you’re travelled through that trepidatious train of thought against taking on new technology?

With minimal alliteration, this article will show you how to migrate an existing cluster to Network ATC so you can take advantage of all the benefits mentioned above. Once completed, you could easily cookie-cut this configuration across all new deployments using our previous blog; so this would be a one-time migration, and all new clusters will gain the benefits!

Before you begin

Since this is a live cluster with running VMs, we’ll take some precautions to ensure we’re never working on a host with a running VM on it. If you don’t have running workloads on these nodes, you don’t need these instructions. Just add your intent command as if this was a brand-new cluster.

As some background, Network ATC stores information in the cluster database which is then replicated to other nodes in the cluster. The Network ATC service on the other nodes in the cluster see the change in the cluster database and implements the new intent. So we setup the cluster to receive a new intent, but we can also control the rollout of the new intent by stopping or disabling the Network ATC service on nodes that have virtual machines on them.

Procedure

Step 1: Install the Network ATC feature

First, let’s install Network ATC on EVERY node in the cluster using the following command. This does not require a reboot.

Install-WindowsFeature -Name NetworkATC

Step 2: Pause one node in the cluster

Pause one node in the cluster. This node will be migrated to Network ATC. We’ll repeat this step later for other nodes in the cluster too. As a result of this pause, all workloads will migrate to other nodes in the cluster leaving this machine available for changes. To do this, you can use the command:

Suspend-ClusterNode

Step 3: Stop the Network ATC service

For all nodes that are not paused, stop and disable the Network ATC service. As a reminder, this is to prevent Network ATC from implementing the intent while there are running virtual machines. To do this, you can use the commands:

Set-Service  -Name NetworkATC -StartupType Disabled
Stop-Service -Name NetworkATC

Step 4: Remove existing configuration

Next, we’ll remove any previous configurations that might interfere with Network ATC’s ability to implement the intent. An example of this might be a Data Center Bridging (NetQos) policy for RDMA traffic. Network ATC will also deploy this, and if it sees a conflicting policy, Network ATC is wise enough not to interfere with it until you make it clear which policies you want to keep. While Network ATC will attempt to “adopt” the existing configuration if the names match (whether it be NetQos or other settings) it’s far simpler to just remove the existing configuration and let Network ATC redeploy.

Network ATC deploys a lot more than these items, but these are the items that need to be resolved before implementing the new intent.

VMSwitch

If you have more than one VMSwitch on this system, ensure you specify the switch attached to the adapters that will be used in this intent.

Get-VMSwitch -Name <VMSwitchName> | Remove-VMSwitch -force

Data Center Bridging Configuration

Remove the existing DCB Configurations.

Get-NetQosTrafficClass | Remove-NetQosTrafficClass
Get-NetQosPolicy       | Remove-NetQosPolicy -Confirm:$false
Get-NetQosFlowControl  | Disable-NetQosFlowControl

LBFO

If you accidentally deployed an LBFO team, we’ll need to remove that as well. As you might have read, LBFO is not supported on Azure Stack HCI at all. Don’t worry, Network ATC will prevent these types of accidental oversights in the future as it will never deploy a solution that we do not support.

Get-NetLBFOTeam | Remove-NetLBFOTeam -Confirm:$true

SCVMM

If the nodes were configured via VMM, these configuration objects may need to be removed from VMM as well.

Step 5: Add the Network ATC intent

It’s now time to add a Network ATC intent. You’ll only need to do this once since Network ATC intents are implemented cluster wide. However, we have taken some precautions to control the speed of the rollout. In step 2, we paused this node so there are no running workloads on it. In step 3, we stopped and disabled the Network ATC service on nodes where there are running workloads.

If you stopped and disabled the Network ATC service, you should start this service on this node only. To do this, run the following command:

Set-Service   -Name NetworkATC -StartupType Automatic
Start-Service -Name NetworkATC

Now, add your Network ATC intent(s). There are some example intents listed on our documentation here.

Step 6: Verify deployment on one node

To verify that the node has successfully deployed the intent submitted in step 5, use the Get-NetIntentStatus command as shown below.

Get-NetIntentStatus -Name <IntentName>

The Get-NetIntentStatus command will show the deployment status of the requested intents. Eventually, there will be one object per intent returned from each node in the cluster. As a simple example, if you had a 3-node cluster with 2 intents, you would see 6 objects returned by this command, each with their own status.

Before moving on from this step, ensure that each intent you added has an entry for the host you’re working on, and the ConfigurationStatus shows Success. If the ConfigurationStatus shows “Failed” you should look to see if the Error message indicates why it failed. We have some quick resolutions listed in our documentation here.

Step 7: Rename the VMSwitch on other nodes

Now that one node is deployed with Network ATC, we’ll get ready to move on to the next node. To do this, we’ll migrate the VMs off the next node. This requires that the nodes have the same VMSwitch name as the node deployed with Network ATC. This is a non-disruptive change and can be done on all nodes at the same time.

Rename-VMSwitch -Name 'ExistingName' -NewName 'NewATCName'

Why don’t we change the Network ATC VMSwitch? Two reasons, the first is that Network ATC ensures that all nodes in the cluster have the same name to ensure live migrations and symmetry. The second is that you really shouldn’t need to worry about the VMSwitch name. This is simply a configuration artifact and just one more thing you’d need to ensure is perfectly deployed. Instead of that, Network ATC implements and controls the names of configuration objects.

Edit: After renaming your switch, you need to disconnect and re-connect your vNICs for the vSwitch name change to go through. On every node, after you rename the switch, run:

$VMSW=Get-VMSwitch
$VMs = get-vm
$VMs | %{Get-VMNetworkAdapter -VMName $_.name | Disconnect-VMNetworkAdapter ; Get-VMNetworkAdapter -VMName $_.name | Connect-VMNetworkAdapter -SwitchName $VMSW.name}

Step 8: Resume the cluster node

This node is now ready to re-enter the cluster. Run this command to put it back into service:

Resume-ClusterNode

Step 9: Rinse and Repeat

Each node will need to go through the procedure outlined above. To complete the migration to Network ATC across the cluster, repeat steps 1 – 4, 6 and 8.

Summary

Migrating your existing clusters to Network ATC can be a game-changer for your cluster infrastructure and management. By automating and simplifying your network management, Network ATC can help you save time, increase efficiency, improve overall performance and avoid cluster downtime.

If you have any further questions or would like to learn more about Network ATC, please don't hesitate to reach out to us!

Dan "Advanced Technology Coordinator" Cuomo

Updated Aug 24, 2023

Version 5.0

DanCuomo

Microsoft

Joined May 01, 2018

View Profile

Networking Blog

The Official Blog Site of the Windows Core Networking Team at Microsoft

8 Comments

JimGandy
Copper Contributor
Jul 24, 2023
Hi DanCuomo, Thanks for the quick reply. 🙂
I agree with you. We should open a case since we have successfully replicated the issue in our lab.
DanCuomo
Microsoft
Jul 24, 2023
Hi JimGandy - I did try to repro this behavior in my lab but it worked as I described above. I did not require a disconnect/reconnect. If you have time, I would check with the support teams at MSFT to see if there is something preventing this process from completing the operations on this system. It would be helpful to understand what's occuring!
JimGandy
Copper Contributor
Jul 21, 2023
Great article, Dan! Thank you for sharing. I found your instructions very helpful, but I encountered one additional step to get Live Migration working. After renaming the VMSwitch on all nodes, I had to disconnect and reconnect the VMs network adapters. This allowed their .vmcx files to reflect the new VMSwitch name. Here's the command I used:
#Disconnect and connect VMNetworkAdapters after renaming VMSwitch
$VMSW=Get-VMSwitch
$VMs = get-vm ; $VMs | %{Get-VMNetworkAdapter -VMName $_.name | Disconnect-VMNetworkAdapter ; Get-VMNetworkAdapter -VMName $_.name | Connect-VMNetworkAdapter -SwitchName $VMSW.name}
Karl-WE
MVP
Jul 11, 2023
Thank you DanCuomo. Sometimes it's unfortunate that one cannot skip single steps in the WAC wizard. So I would need take a different way to try out the new extension.

Not so sure about the DNS issue, as the storage is directly connected, how would DNS work? If I am not mistaken directly connected deployments are supported for 2 nodes at the moment.
DanCuomo
Microsoft
Jul 10, 2023
Karl-WE - No existing plans for IPv6. For testing inside VMs, use this: https://learn.microsoft.com/en-us/azure-stack/hci/manage/manage-network-atc?tabs=21H2#test-network-atc-in-vms

That said your message above appears to be a DNS issue which could also be a WAC query issue. You might try the new management extension for Network ATC available for installation:

https://techcommunity.microsoft.com/t5/networking-blog/network-atc-management-in-windows-admin-center/ba-p/3861305
Karl-WE
MVP
Jul 08, 2023
While network ATC is ace, is there anything planned to support IPv6 ? Imho it improves performance compared to IPv4 and we do not have to deal around with Jumbo Packets.
Karl-WE
MVP
Jul 08, 2023
Dear DanCuomo I am trying to setup a lab, but network ATC get stuck in provisioning state, because (quite logically) I do not have any RDMA capable adapters in my nested Hyper-V setup.
I didn't find any switch to override RDMA checks. WAC 2306 though, will fail with a not so pointing failure. I would not have any clue if not checking the provisioning state of network ATC.
cblackuk1
MVP
Jun 12, 2023
Thanks for sharing DanCuomo 🙂 very informative 🙂

Happy Azure Stacking!!!