First published on MSDN on Dec 13, 2010
In this blog I will discuss the behavior of the failure policies for a highly available Hyper-V virtual machine. For the most part, cluster failover policies are generic and apply to all workloads on a cluster, whether it is SQL, a File Server, etc. There is a wealth of documentation out there, but I thought I would take a focused view at understanding what influences where VMs go when a node crashes.
When there is a failure of a node, VMs are spread across the remaining cluster nodes and distributed to the nodes hosting the fewest number of VMs. For example, let’s say that NodeA crashes and it was hosting 10 VMs. The Cluster Service will take one of the VMs, it will then look across the surviving nodes and find the node currently hosting the fewest number of VMs (technically it looks at cluster Group ownership and selects the node with the lowest count). The VM is then started on that node. Then the next VM is selected, then again looks across the cluster again for the node now currently hosting the fewest VMs is selected and the VM is started there. This happens for all VMs until they are all placed. VMs will be distributed across the cluster to different nodes based on who is currently hosting the fewest VMs.
To prevent "boot storms" where simultaneously starting a large number of VMs could severely impact the server’s performance or the underlying storage, there is throttling of VM starts for any individual node. As a safety precaution during failover or on node boot, VM start is throttled to 32 VMs concurrently in the process of being started at any given time on each node. The rest will be queued up to start on that node. Once a VM completes starting, by getting past POST (technically the cluster resource transitions from Online Pending to Online), another VM is then started to slightly stagger all the VM starts. Technically, we allow up to 32 resources to be in a pending state on any given node at a time. However, if a single cluster group contains more than 16 resources the number of concurrent resources allowed in a pending state is increased to 64.
For the average person, this is probably all you need to know or care about. However, for the geeks I’ll now continue to some more advanced topics.
In general the default settings are the right choice for most people, and I recommend you stick with those. However, Clustering does offer a wealth of granular options so you can fine tune the default behavior.
There are a few other settings worth discussing which can influence VM placement and behavior on failover. Let’s discuss some of those:
Thanks!
Elden Christensen
Principal PM Manager
Clustering & High-Availability
Microsoft
In this blog I will discuss the behavior of the failure policies for a highly available Hyper-V virtual machine. For the most part, cluster failover policies are generic and apply to all workloads on a cluster, whether it is SQL, a File Server, etc. There is a wealth of documentation out there, but I thought I would take a focused view at understanding what influences where VMs go when a node crashes.
Default Failover Policies:
When there is a failure of a node, VMs are spread across the remaining cluster nodes and distributed to the nodes hosting the fewest number of VMs. For example, let’s say that NodeA crashes and it was hosting 10 VMs. The Cluster Service will take one of the VMs, it will then look across the surviving nodes and find the node currently hosting the fewest number of VMs (technically it looks at cluster Group ownership and selects the node with the lowest count). The VM is then started on that node. Then the next VM is selected, then again looks across the cluster again for the node now currently hosting the fewest VMs is selected and the VM is started there. This happens for all VMs until they are all placed. VMs will be distributed across the cluster to different nodes based on who is currently hosting the fewest VMs.
To prevent "boot storms" where simultaneously starting a large number of VMs could severely impact the server’s performance or the underlying storage, there is throttling of VM starts for any individual node. As a safety precaution during failover or on node boot, VM start is throttled to 32 VMs concurrently in the process of being started at any given time on each node. The rest will be queued up to start on that node. Once a VM completes starting, by getting past POST (technically the cluster resource transitions from Online Pending to Online), another VM is then started to slightly stagger all the VM starts. Technically, we allow up to 32 resources to be in a pending state on any given node at a time. However, if a single cluster group contains more than 16 resources the number of concurrent resources allowed in a pending state is increased to 64.
For the average person, this is probably all you need to know or care about. However, for the geeks I’ll now continue to some more advanced topics.
Advanced Failover Policies:
In general the default settings are the right choice for most people, and I recommend you stick with those. However, Clustering does offer a wealth of granular options so you can fine tune the default behavior.
- Possible Owners – For a given VM (technically any cluster Resource) you can configure the nodes which the VM has the possibility of failing over to. By default it’s all nodes, but if you have a specific node you never want this VM to failover to you can remove it from being a possible owner and prevent it.
- Preferred Owners – For a given VM (technically any cluster Group) you can configure the preference for node order on failover. So let’s say that this VM normally runs on NodeA and you always want it next to go to NodeC if it is available, then preferred owners is a way for you to define a preference of first go to this node, then next go to this other node, then go to this next node. It’s a priority list, and clustering will walk that list in where to place the VM. This will override the default behavior of selecting the node currently hosting the least VMs I described above, and gives you explicit control of where VMs go. More information about Preferred and Possible Owners: http://support.microsoft.com/kb/299631 .
- Anti-Affinity – For a given VM (technically any cluster Group) there is a cluster group property called AntiAffinityClassNames that allows you to configure the preference to attempt to keep that VM off the same node as other similar VMs. Let’s say for example you have two domain controllers running in VMs. It would probably be best to keep those running on different nodes if possible. When determining failover, the cluster service will deprioritize any node which is hosting a similar VM. If there is no other option (in the goal of making VMs available) it will place them on the same host. More information: http://msdn.microsoft.com/en-us/library/aa369651(VS.85).aspx .
Other Influencers of Failover:
There are a few other settings worth discussing which can influence VM placement and behavior on failover. Let’s discuss some of those:
- Pause Node – At the server level, you can Pause a node. When a node is paused, it means that no VMs (technically no cluster Group) can failover to that node. If a node is paused, it will be removed from the possibility to be a failover target. Pausing a node is handy when doing maintenance tasks like applying a patch, this prevents VMs from failing over to the node when you are doing something to it.
- Disable Failover – For a given VM (technically any cluster Group) you can configure the “Auto Start” setting. If auto start is disabled for a VM, it means that a VM will not be started on failover. This could be useful if you have a low priority VM that you don’t necessarily want to failover, but you still want it to be clustered so that you can, for example, perform live migrations.
Startup Placement Policies
- Persistent Mode – When a cluster as a whole is shut down and then restarted clustering will attempt to start VMs back on the last node they were hosted on. This is controlled by the “Persistent Mode” setting, and is enabled by default. The default amount of time the cluster service will wait for the original node to rejoin the cluster is 30 seconds; this is configurable via the cluster common property ClusterGroupWaitDelay. You may choose to disable Persistent Mode for high priority VMs, where you do not want to wait for the original node to come back… just start the VM as soon as possible. See this blog for additional information.
Thanks!
Elden Christensen
Principal PM Manager
Clustering & High-Availability
Microsoft
Updated Mar 15, 2019
Version 2.0John Marlin
Microsoft
Joined August 24, 2017
Failover Clustering
Follow this blog board to get notified when there's new activity