Blog Post

Failover Clustering
4 MIN READ

Optimizing Hyper-V Live Migrations on an Hyperconverged Infrastructure

John Marlin's avatar
John Marlin
Icon for Microsoft rankMicrosoft
Apr 04, 2019

We have been doing some extensive testing in how to best configure Hyper-V live migration to achieve the best performance and highest level of availability.

Recommendations:

  1. Configure Live Migration to use SMB. This must be performed on all nodes
    Set-VMHost –VirtualMachineMigrationPerformanceOption SMB

 

  1. Use RDMA enabled NIC’s to offload the CPU and improve network performance

 

  1. Configure SMB Bandwidth Limits to ensure live migrations do not saturate the network and throttle to 750 MB. This must be performed on all nodes

          First install the SMB Bandwidth Limits feature:

     Add-WindowsFeature -Name FS-SMBBW

          Throttle to 750 MB

     Set-SmbBandwidthLimit -Category LiveMigration -BytesPerSecond 750MB

 

  1. Configure a maximum of 2 simultaneous Live migrations (which is default). This must be performed on all nodes

          Leave at the default value of 2, no changes required

     Set-VMHost -MaximumVirtualMachineMigrations 2

 

Background:

For those that want to understand the ‘why’ on the above recommendations, read on!

 

What a live migration fundamentally does is take the memory allocated to a virtual machine and copies it over the network from one server to another.  Let’s say you allocated 4 GB of memory to a virtual machine, when you invoke a live migration that 4 GB of memory is copied over the network between the source and destination server.  Because the VM is running, that means that memory is changing while that 4 GB is copied.  Those changes are tracked, and once the initial allocated memory is copied, then a second pass occurs and the changed memory is copied.  In the second pass the amount of memory changed will be smaller and takes less time; all the while yet memory is changing while that happens.  So a third pass happens, and so on with each pass getting faster and the delta of memory changed getting smaller.  Eventually the set of memory gets small enough and the VM is paused and the final set of changes is copied over, then the VM is resumed on the new server.  While the VM is paused and the final memory copy occurs the VM is not available, this is referred to as the blackout window.  This is not unique to Hyper-V, all virtualization platforms have this.  The magic of a live migration is that as long as the blackout window is within the TCP reconnect window, it is completely seamless to the applications.  That’s how a live migration achieves zero downtime from an application perspective, even though there is a very small amount of downtime from an infrastructure perspective.  Don’t get hung up on the blackout window (if it is within the TCP reconnect window), it’s all about the app!

 

Live migration supports TCPIP, Compression, and SMB to perform live migration.  On nearly all hyperconverged infrastructure (HCI) systems have RDMA enabled network cards, and Server Message Block (SMB) has a feature called SMB Direct which can take advantage of RDMA.  By using SMB as the protocol for the memory copy over the network, it will result in drastically reduced CPU overhead to conduct the data copy with the best network performance.  This is important to minimize consuming CPU cycles from other running virtual machines and keeping the data copy windows small so that the number of passes to copy changed memory is minimized.  Another feature of SMB is SMB Multi-channel, which will stream the live migration across multiple network interfaces to achieve even better performance.

 

An HCI system is a distributed system that is heavily dependent on reliable networking, as there is cluster communication and data replication also occurring over the network.  From a network perspective, a live migration is a sudden burst of heavy network traffic.  Using SMB bandwidth limits to achieve Network Quality of Service (QoS) is desired to keep this burst traffic from saturating the network and negatively impacting other aspects on the system.  The testing conducted tested different bandwidth limits on a dual 10 Gpbs RDMA enabled NIC and measured failures under stress conditions and found that throttling live migration to 750 MB achieved the highest level of availability to the system.  On a system with higher bandwidth, you may be able to throttle to a value higher than 750 MB.

 

When draining a node in an HCI system, multiple VMs can be live migrated at the same time.  This parallelization can achieve faster overall times when moving large numbers of VMs off a host.  As an example, instead of copying just 4 GB for a single machine, it will copy 4 GB for one VM and 4 GB for another VM.  But there is a sweet spot, a single live migration at a time serializes and results in longer overall times and having too may simultaneous live migrations can end up taking much longer.  Remember that if the network becomes saturated with many large copies, that each one takes longer…  which means more memory is changing on each, which means more passes, and results in overall longer times for each.  Two simultaneous live migrations were found to deliver the best balance in combination with a 750 MB throttle.

 

Lastly a live migration will not infinitely continue to make pass after pass copying changed memory on a very busy VM with a slow interconnect that is taking a long time, eventually live migration will give up and freeze the VM and make a final memory copy.  This can result in longer blackout windows, and if that final copy exceeds the TCP reconnect window then it can impact the apps.  This is why ensuring live migrations can complete in a timely manner is important.

 

In our internal testing we have found that these recommended settings will achieve the fastest times to drain multiple VMs off of server, will achieve the smallest blackout windows for application availability, and the least impact to production VMs, and will achieve greatest levels of availability to the infrastructure.

 

Elden Christensen

Principal Program Manager

High Availability and Storage Team

Updated Sep 05, 2019
Version 10.0
  • It does not supersede it as this will be rolled into the doc you are mentioning.  We wanted to get this out as it explains a lot of the whys and gives more detailed information.  In the doc, it will be much less detailed as the doc covers a wide range of things where this blog simply covers a very specific topic.

  • cfenton's avatar
    cfenton
    Copper Contributor

    Jhebert2 

    "The testing conducted tested different bandwidth limits on a dual 10 Gpbs RDMA enabled NIC and measured failures under stress conditions and found that throttling live migration to 750 MB achieved the highest level of availability to the system.  On a system with higher bandwidth, you may be able to throttle to a value higher than 750 MB"

    Basically all this article is doing is confirming the RDMA/SET guidance of 50% bandwidth reservation on hyperconverged nics...

    750MB*2*8 == 12,000Mbps // (On a dual 10Gpbs RDMA enabled NIC) ^

     

    So for your 100Gb I would basically just multiply 750MB * 10 if you want to follow this guidance.  But MS guidance is really to use percentage reservations for QoS

  • Thanks, added this to a larger S2D implementation as a safety measure.

    Our setup has dedicated NIC´s and switches for SMB and Live Migration (ConnectX-4, 2x25Gbe and SN2100). I´ll run some testing to see how LM performance and blackout windows are effected.

     

    $hostcluster = "CLUXXXXXX"


    $clunodes = Get-clusternode -Cluster $hostcluster

    foreach ($clunode in $clunodes ) {
    Invoke-Command -ComputerName $clunode -ScriptBlock {
    Add-WindowsFeature -Name FS-SMBBW
    }
    }

    foreach ($clunode in $clunodes ) {
    Invoke-Command -ComputerName $clunode -ScriptBlock {
    Set-SmbBandwidthLimit -Category LiveMigration -BytesPerSecond 750MB
    Set-VMHost -MaximumVirtualMachineMigrations 2
    }
    }

  • Jhebert2's avatar
    Jhebert2
    Copper Contributor

    We set up a 100GB RDMA infrastructure and the VM migrations or Virtual Disk Regeneration make the environment very sluggish. Hopefully, this will help. Any recommendations what should be the Set-SmbBandwidthLimit  for a Mellanox5 100Gb adapters and switches? Also, can we make it 4 machine migrations instead of 2?  Wasnt -  New-NetQosTrafficClass "SMB" -Priority 3 -BandwidthPercentage 50 -Algorithm ETS suppose to take care of this?