Windows Server Summit 2024
Mar 26 2024 08:00 AM - Mar 28 2024 03:30 PM (PDT)
Microsoft Tech Community

Storage Spaces Direct - Live Migration Uses Wrong Network

Copper Contributor

Good Morning all,

 

The formatting in this text is the best I can do...even tabs/bullets do not seem to work


I have read a LOT on this topic and here is a summary of the problem I am facing, plus where I am currently.

The Problem – Live Migration is using the wrong network.
When I “Live Migrate” a VM, it is only the “Host Mgmt” network that shows any throughput in Task Manager
Live Migration does work, but I cannot seem to find a way to force the Live Migration traffic over the 10GbE networks

 

Main Question:
• How do I configure the 10GbE networks to be used the Live Migrations?

 

Follow up questions:
• How do I test the 10GbE networks?
• How do I test the Cluster configuration for this problem?
• How do I perform a file copy over a specific interface to prove/disprove the basic 10GbE connectivity?

 

I have seen articles on SMB and SMB Direct – but not sure where to go now as I have tried a lot of things to try and “fix” this.

 

I would be grateful any help in this frustrating matter.

 

So, how did I get here……

 

The Scenario – 2 x node Storage Spaces Direct cluster:
• Windows 2019 Server on Dell R640 servers with QLogic QL41162HMRJ – 2x1GbE & 2x10GbE card

 

The hardware configuration includes:
• Appropriate disks for caching and capacity - all working satisfactorily

 

2x1GbE – these are switched and connected to the local network infrastructure. I have created a SET VMSwitch and converged the networks with VMNetAdapters on top - all working satisfactorily with ping and routing etc…


2x10GbE – switchless – standard cable from server to server with no switch in between
• These will be used for S2D, Live Migration, Storage Migration etc..
• Set to iWARP in BIOS
• RDMA enabled & Mode = iWARP
• Jumbo Frames enabled to 9014 – successfully checked with ping to ensure enabled
• VMQ enabled
• RSSProfile = NUMAScalingStatic
• Network Direct Functionality = Enabled
• Data Centre Bridging not configured – apparently not needed with switchless cluster
• Each interface in own vLAN:
• Server1_Port#1<->Server2_Port#1 on vLAN xxx
• Server1_Port#2<->Server2_Port#2 on vLAN yyy
• Each NIC can ping its corresponding partner successfully

 

The Cluster configuration includes:
• “Test-Cluster” output with 100% Success before creation of Cluster
• Host Mgmt = Role 3 “Cluster and Client”
• Other Converged vNICs = Role 1 “Cluster”
• Storage [10GbE] = Role 1 “Cluster”
• All Resources are “Up” in Cluster GUI – no errors reported
• Live Migration networks selected are ONLY the 10GbE ones – and top of the list too
• “Validate Cluster” output with 100% Success after creation of Cluster
• Cluster Events show “NO Errors” or indeed any records at all

 

Hyper-V Configuration includes:
  • Live Migration
    • Enabled = True
    • 4 Simultaneous
    • “Use these IP Addresses for Live Migration – is enabled and show the correct addresses
    • Advanced Features
      • “Use Kerberos” = true
      • “SMB” = true

 

Thanks for listening.

 

Stephen.

 

4 Replies

@Stephen666    Try check Jumbo Frame config.

On my HP servers sometime after restart it switch from Ethernet to InfiniBand and after revert to Ethernet all settings keep same except Jumbo which became 1514 instead of 9014. After correct jumbo it starting use a right network.

 

Hi,

I will check it out the server NIC settings as this a two-node switchless cluster.

Thanks for the tip though.

Stephen
Hello Stephen,

Take a look at this doc under the prioritization and how to configure it:
https://techcommunity.microsoft.com/t5/failover-clustering/configuring-network-prioritization-on-a-f...

Also:

Follow up questions:
• How do I test the 10GbE networks? Server-wise, a file copy should do the trick, for live migration: a test VM live migrated or maybe a test migration.
• How do I test the Cluster configuration for this problem? task mgr seems to be the easiest way because you would see the load increasing on an Ethernet and not the other one.
• How do I perform a file copy over a specific interface to prove/disprove the basic 10GbE connectivity? I would disable the adapter (I could not find anything either)

Why would you have VMQ turned on for the Cluster NICs?  In my experience, turning that off helped the cluster obey the Cluster Network metric settings.  And it won't be used in any cluster communications anyway, that is for the NICs that the Hyper-V machines can see.