Failover Clustering Networking Basics and Fundamentals
Published Sep 23 2020 06:16 PM 72.3K Views
Microsoft

My name is John Marlin and I am with the High Availability and Storage Team.  With newer versions of Windows Server and Azure Stack HCI on the horizon, it’s time to head to the archives and dust off some old information as they are in need of updating.

 

man-blowing-off-dust.jpg

 

In this blog, I want to talk about Failover Clustering and Networking. Networking is a fundamental key with Failover Clustering that sometimes is overlooked but can be the difference in success or failure. In this blog, I will be hitting on all facets from the basics, tweaks, multi-site/stretch, and Storage Spaces Direct.  By no means should this be taken as a “this is a networking requirement” blog.  Treat this as more of general guidance with some recommendations and things to consider.  Specific requirements for any of our operating systems (new or old) will be a part of the documentation (https://docs.microsoft.com) of the particular OS.

 

In Failover Clustering, all networking aspects are provided by our Network Fault Tolerant (NetFT) adapter. Our NetFT adapter is a virtual adapter that is created with the Cluster is created. There is no configuration necessary as it is self-configuring. When it is created, it will create its MAC Address based off of a hash of the MAC Address of the first physical network card. It does have conflict detection and resolution built in. For the IP Address scheme, it will create itself an APIPA IPv4 (169.254.*) and IPv6 (fe80::*) address for communication.

 

Connection-specific DNS Suffix  . :

Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter

Physical Address. . . . . . . . . : 02-B8-FA-7F-A5-F3

DHCP Enabled. . . . . . . . . . . : No

Autoconfiguration Enabled . . . . : Yes

Link-local IPv6 Address . . . . . : fe80::80ac:e638:2e8d:9c09%4(Preferred)

IPv4 Address. . . . . . . . . . . : 169.254.1.143(Preferred)

Subnet Mask . . . . . . . . . . . : 255.255.0.0

Default Gateway . . . . . . . . . :

DHCPv6 IAID . . . . . . . . . . . : 67287290

DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-26-6B-52-A5-00-15-5D-31-8E-86

NetBIOS over Tcpip. . . . . . . . : Enabled

 

The NetFT adapter provides the communications between all nodes in the cluster from the Cluster Service. To do this, it discovers multiple communication paths between nodes and if the routes are on the same subnet or cross subnet. The way it does this is through “heartbeats” through all network adapters for Cluster use to all other nodes. Heartbeats basically serve multiple purposes.

 

  1. Is this a viable route between the nodes?
  2. Is this route currently up?
  3. Is the node being connected to up?

 

There is more to heartbeats, but will defer to my other blog No Such Thing as a Heartbeat Network for more details on it.

 

For Cluster communication and heartbeats, there are several considerations that must be taken into account.

 

  1. Traffic uses port 3343. Ensure any firewall rules have this port open for both TCP and UDP
  2. Most Cluster traffic is lightweight.
  3. Communication is sensitive to latency and packet loss. Latency delays could mean performance issues, including removal of nodes from membership.
  4. Bandwidth is not as important as quality of service.

 

Cluster communication between nodes is crucial so that all nodes are currently in sync. Cluster communication is constantly going on as things progress. The NetFT adapter will dynamically switch intra-cluster traffic to another available Cluster network if it goes down or isn’t responding.

 

The communications from the Cluster Service to other nodes through the NetFT adapter looks like this.

 

netft-arch.png

  • Cluster Service plumbs network routes over NIC1, NIC2 on NetFT
  • Cluster Service establishes TCP connection over NetFT adapter using the private NetFT IP address (source port 3343)
  • NetFT wraps the TCP connection inside of a UDP packet (source port 3343)
  • NetFT sends this UDP packet over one of the cluster-enabled physical NIC adapters to the destination node targeted for destination node’s NetFT adapter
  • Destination node’s NetFT adapter receives the UDP packet and then sends the TCP connection to the destination node’s Cluster Service

 

Heartbeats are always traversing all Cluster enabled adapters and networks. However, Cluster communication will only go through one network at a time. The network it will use is determined by the role of the network and the priority (metric).

 

There are three roles a Cluster has for networks.

 

Disabled for Cluster Communications – Role 0 - This is a network that Cluster will not use for anything.

 

Enabled for Cluster Communication only – Role 1 – Internal Cluster Communication and Cluster Shared Volume traffic (more later) are using this type network as a priority.

 

Enabled for client and cluster communication – Role 3 – This network is used for all client access and Cluster communications. Items like talking to a domain controller, DNS, DHCP (if enabled) when Network Names and IP Addresses come online. Cluster communication and Cluster Shared Volume traffic could use this network if all Role 1 networks are down.

 

Based on the roles, the NetFT adapter will create metrics for priority. The metric Failover Cluster uses is not the same as the network card metrics that TCP/IP assigns. Networks are given a “cost” (Metric) to define priority. A lower metric value means a higher priority while a higher metric value means a lower priority.

 

These metrics are automatically configured based on Cluster network role setting.

 

Cluster Network Role of 1 = 40,000 starting value

Cluster Network Role of 3 = 80,000 starting value

 

Things such as Link speed, RDMA, and RSS capabilities will reduce metric value. For example, let’s say I have two networks in my Cluster with one being selected and Cluster communications only and one for both Cluster/Client. I can run the following to see the metrics.

 

PS > Get-ClusterNetwork | ft Name, Metric

 

Name                 Metric

----                     ------

Cluster Network 1     70240

Cluster Network 2     30240

 

The NetFT adapter is also capable of taking advantage of SMB Multichannel and load balance across the networks. For NetFt to take advantage of it, the metrics need to be < 16 metric values apart. In the example above, SMB Multichannel would not be used. But if there were additional cards in the machines and it looked like this:

 

PS > Get-ClusterNetwork | ft Name, Metric

 

Name                 Metric

----                 ------

Cluster Network 1     70240

Cluster Network 2     30240

Cluster Network 3     30241

Cluster Network 4     30245

Cluster Network 5     30265

 

In a configuration such as this, SMB Multichannel would be used over Cluster Networks 2, 3 and 4. From a Cluster communication and heartbeat standpoint, multichannel really isn’t a big deal. However, when a Cluster is using Cluster Shared Volumes or is a Storage Spaces Direct Cluster, storage traffic is going to need higher bandwidth. SMB Multichannel would fit nicely here so an additional network card or higher speed network cards are certainly a consideration.

 

In the beginning of the blog, I mentioned latency and packet loss. If heartbeats cannot get through in a timely fashion, node removals can happen. Heartbeats can be tuned in the case of higher latency networks. The following are default settings for tuning the Cluster networks.

 

Parameter

Windows 2012 R2

Windows 2016

Windows 2019

SameSubnetDelay

1 second

1 second

1 second

SameSubnetThreshold

5 heartbeats

10 heartbeats

20 heartbeats

CrossSubnetDelay

1 second

1 second

1 second

CrossSubnetThreshold

5 heartbeats

10 heartbeats

20 heartbeats

CrossSiteDelay

N/A

1 second

1 second

CrossSiteThreshold

N/A

20 heartbeats

20 heartbeats

 

For more information on these settings, please refer to the Tuning Failover Cluster Network Thresholds blog.

 

Planning networks for Failover Clustering is dependent on how it will be used. Let’s take a look at some of the common network traffics a Cluster would have.

 

Network1.png

 

If this were a Hyper-V Cluster running virtual machines and Cluster Shared Volumes, Live Migration is going to occur.  Clients are also connecting to the virtual machines.

 

Cluster Communications and heart beating will always be on the wire.  If you are using Cluster Shared Volumes (CSV), there will be some redirection traffic.

 

If this were Cluster that used ISCSI for its storage, you would have that as a network.

 

If this was stretched (nodes in multiple sites), you may have the need for an additional network as the considerations for replication (such as Storage Replica) traffic.

 

If this is a Storage Spaces Direct Cluster, additional traffic for the Storage Bus Layer (SBL) traffic needs to be considered.

 

As you can see, there is a lot of various network traffic requirements depending on the type of Cluster and the roles running. Obviously, you cannot have a dedicated network or network card for each as that just isn’t always possible.

 

We do have a blog that will help with the Live Migration traffic to get some of the traffic isolated or limited in the bandwidth it uses. The blog Optimizing Hyper-V Live Migrations on an Hyperconverged Infrastructure goes over some tips to set up.

 

The last thing I wanted to talk about is with stretch/multisite Failover Clusters. I have already mentioned the Cluster specific networking considerations, but now I want to talk about how the virtual machines react in this type environment.

 

Let’s say we have two datacenters and a four-node Failover Cluster with 2 nodes in each datacenter. As with most datacenters, they are in their own subnet and would be similar to this:

 

stretch1.png

 

The first thing you want to consider is if you want security between the cluster nodes on the wire. As a default, all Cluster communication is signed. That may be fine for some, but for others, they wish to have that extra level of security. We can set the Cluster to encrypt all traffic between the nodes. It is simply a PowerShell command to change it. Once you change it, the Cluster as a whole needs to be restarted.

 

PS > (Get-Cluster).SecurityLevel = 2

 

0 = Clear Text

1 = Signed (default)

2 = Encrypt (slight performance decrease)

 

Here is a virtual machine (VM1) that has an IP Address on the 1.0.0.0/8 network and clients are connecting to it. If the virtual machine moves over to Site2 that is a different network (172.0.0.0/16), there will not be any connectivity as it stands.

 

To get around this, there are basically a couple options.

 

To prevent the virtual machine from moving from a Cluster-initiated move (i.e. drain, node shutdown, etc), consider using sites. When you create sites, Cluster now has site awareness. This means that any Cluster-initiated move will always keep resources in the same site. Setting a preferred site will also keep it in the same site. If the virtual machine was to ever move to the second site, it would be due to a user-initiated move (i.e. Move-ClusterGroup, etc) or a site failure.

 

But you still have the IP Address of the virtual machine issue to deal with. During a migration of the virtual machine, one of the very last things is to register the name and IP Address with DNS. If you are using a static IP Address for the virtual machine, a script would need to be manually run to change the IP Address to the local site it is on. If you are using DHCP, with DHCP servers in each site, the virtual machine will obtain a new address for the local site and register it. You then have to deal with DNS replication and TTL records a client may have. Instead of waiting for the timeout periods, a forced replication and TTL clearing on the client side would allow them to connect again.

 

If you do not wish to go that route, a virtual LAN (VLAN) could be set up across the routers/switches to be a single IP Address scheme. Doing this will not have the need to change the IP Address of the virtual machine as it will always remain the same. However, stretching a VLAN (not a recommendation by Microsoft) is not always easy to do and the Networking Group within your company may not want to do this for various reasons.  

 

Another consideration is implementing a network device on the network that has a third IP Address that clients connect to and it holds that actual IP Address of the virtual machine so it will route clients appropriately. For example:

 

stretch2.png

 In the above example, we have a network device that has the IP Address of the virtual machine as 30.0.30.1. It will register this with all DNS and will keep the same IP Address no matter which site it is on. Your Networking Group would need to involved with this and need to control it. The chances of them not doing it is something to also consider if it can even done within your network.

 

We talked about virtual machines, but what about other resources, say, a file server?  Unlike virtual machine roles, roles such as a file server have a Network Name and IP Address resource in the Cluster. In Windows 2008 Failover Cluster, we added he concept of “or” dependencies. Meaning, we can depend on this "or" that.

 

Or.png

In the case of the scenario above, your Network Name could be dependent on 1.0.0.50 “or” 172.0.0.50. As long as one of the IP Address resources is online, the name is online and what is published in DNS. To go a step further for the stretch scenario, we have two parameters that can be used.

 

RegisterAllProvidersIP: (default = 0 for FALSE)    

 

  • Determines if all IP Addresses for a Network Name will be registered by DNS
  • TRUE (1): IP Addresses can be online or offline and will still be registered
  • Ensure application is set to try all IP Addresses, so clients can connect quicker
  • Not supported by all applications, check with application vendor
  • Supported by SQL Server starting with SQL Server 2012

 

HostRecordTTL: (default = 1200 seconds)

 

  • Controls time the DNS record lives on client for a cluster network name
  • Shorter TTL: DNS records for clients updated sooner
  • Disclaimer: This does not speed up DNS replication

 

By manipulating these parameters, you will have quicker connection times by a client. For example, I want to enable to register all the IP Addresses with DNS but I want the TTL to be 5 minutes. I would run the commands:

 

PS > Get-ClusterResource FSNetworkName | Set-ClusterParameter RegisterAllProvidersIP 1

 

PS > Get-ClusterResource FSNetworkName | Set-ClusterParameter HostRecordTTL 300

 

When setting the parameters, recycling (offline/online) of the resources is needed.

 

There is more I could go into here with this subject but need to signoff for now. I hope that this gives you some basics to consider when designing your Clusters while thinking of the networking aspects of it. Networking designs and considerations must be carefully thought out.

 

Happy Clustering !!

 

John Marlin

Senior Program Manager

High Availability and Storage

Follow me on Twitter: @johnmarlin_msft

22 Comments
Microsoft

Thank you for putting this all together, John!

Copper Contributor

Afternoon John , 

 

A very comprehensive guide and some very useful information. I have a few questions on the above , wondering if you can point me in the right direction.

 

  1. Most Cluster traffic is lightweight.
  2. Communication is sensitive to latency and packet loss. Latency delays could mean performance issues, including removal of nodes from membership. Is there a threshold at which you will see the nodes drop , I have had a look on several sites but cant see any figures.
  3. Bandwidth is not as important as quality of service. Is there a minimum bandwidth on which the clusters need for the traffic.

 

In  addition if you could help with the below I would appreciate it.

 

If I have several nodes connected using the failover cluster and the storage shared using spaces direct which Quorum will take precedence will it be the cluster quorum or the pool quorum.

 

If a user / client moves in range of another clustered node will it take the services from that node.

 

Any help will be appreciated 

 

 

 

 

Microsoft

@chris8324...

 

For number 2, please see the following article:

For number 3 there is no documented requirements regarding the bandwidth of the cluster networks, but the following article discusses best practices for cluster networks:

Regarding the quorum questions, the following article should help:

I am afraid that I don't understand your last question about client proximity...

 

I hope that this helps!

Copper Contributor

Afternoon Eriq , 

 

Some very good links there that have alleviated 99% of the queries that I had. 

 

So just to clarify if storage spaces direct is configured on a server that is also setup in a failover cluster both pool and cluster quorum can run simultaneously.

 

Kind Regards 

 

 

Microsoft

Correct, here is what it says in the 3rd link above:

  • Cluster Quorum: This operates at the cluster level (i.e. you can lose nodes and have the cluster stay up)
  • Pool Quorum: This operates on the pool level when Storage Spaces Direct is enabled (i.e. you can lose nodes and drives and have the pool stay up). Storage pools were designed to be used in both clustered and non-clustered scenarios, which is why they have a different quorum mechanism.
Copper Contributor

Thanks for this great article.
We have a Windows Server 2019 Hyper-V 4 node, storage spaces direct cluster with 2 dual port 25Gbps NICs.
Two of the 25G NICs are in a SET team and have 3 vNICs (vCluster, vLiveMigration and vBackup)
The other 2 x 25G NICs are not teamed and have IP addresses configured in different subnets. Our idea here was to dedicate these 2 physical NICs to be used for storage traffic only. This was because we took the online recommendations that Microsoft say to use iWARP to save complexity of setting up QoS. As we are not doing QoS we though dedicate 2 NICs to storage only. We didn't think the overhead of creating a SET Team would have any significant advantages over just using SMB Direct and multichannel on the physical NICS so they were not teamed.
BUT, I cannot see these SMB NICs being used even when large file copy traffic to the test VM is generated. I do see extra traffic on the SET team interfaces but this is not what we intended. Obviously we are doing something wrong. See the cluster network metrics below in case helps this explain anything.
Points to Note:
1. I was using perf mon RDMA activity for the monitoring of storage traffic.
2. Also in perfmon when selecting 'SMB Direct Connection' it doesn't give the choice for the SMB interfaces, it only shows the virtualised interfaces created on the SET team.
3. The VM is using a 4 x 1Gbps SET team for its traffic.
I feel there is an important concept missing in understanding how Hyper-V 2019 selects its storage network. Appreciate any comments. Thanks.

 

Stu17_0-1616506839835.png

 

Copper Contributor

does anyone know what level of encryption this provides and also if there are any statistics on how much this encryption degrades performance ?

@John Marlin  @Eriq Stern really helpful summary thanks for this article. As it mentions DHCP I have encountered the following obstacles.

Every Hyper-V has a unique MAC address range used for its virtual switches.

 

Assume: We have 2 or 2+ Hyper-V cluster. A VM uses DHCP MAC address reservation to obtain it's IP config.
When live migrating the VM within a cluster from node A to node B there is no change.
The VM will keep its MAC address and a DHCP MAC reservation will still work as intended.

When I restart VM on B it gets a new MAC address from the MAC pool of B causing the DHCP MAC reservation to fail. 

 

As a workaround I tend to configure the same MAC Address pool on all Hyper-V a cluster. This might solve the problem for the VM but I wonder how this would work correctly on the Layer 2 with or without SET or LFBO. What's your stance how to deal with it? 

Copper Contributor

@Stu17 I know this is really late, but it looks like the issue is that the networks SMB1 and SMB2 have no role and there metric is higher than the SET interfaces

You can change the role from Failover Cluster Manager 

loliharpylord_0-1644956193703.png

 

So since the metric on SMB1 (60400) > Cluster (30001)

I was here looking to see how to tell traffic to go over certain interfaces and this I think should resolve my issue hope it helps (Assuming you haven't fixed it already)

 

 

 

Copper Contributor

Hi, I have some problem when I create a new Cluster Resource.
Explain below:
When I create a new cluster resource from availability group in Sql Management tools, when i type from powershell:

Get-ClusterResource AGTest_ag-test | Get-ClusterParameter

I see this:

AGTest_ag-test HostRecordTTL          1200                             UInt32
AGTest_ag-test RegisterAllProvidersIP 0                                UInt32

Can I modify default value of these or I have to do any time manually?

 

Thanks 

 

 

Microsoft

@mgraps97 cluster parameters can be modified with the Set-ClusterParameter cmdlet: Set-ClusterParameter (FailoverClusters) | Microsoft Learn

 

However, I recommend reviewing the SQL documentation for best practices with SQL Availability Groups: Configure availability group listener - SQL Server Always On | Microsoft Learn

 

Happy clustering!

EriqS

Copper Contributor

@Eriq Stern Thanks for the reply.
However, I would like to change the default behavior, I would like the value to be RegisterAllProvidersIP 0 by default and instead I find it at 1.
I don't want to have to run the command every time:
Get-ClusterResource "Name" |Set-ClusterParameter -name RegisterAllProvidersIP -value 0

Microsoft

When you run SQL Server setup, it creates the resources and configures those parameters so it may be something that can be configured during that setup.  This is detailed here:

 

http://aka.ms/RegisterAllProvidersIP

RegisterAllProvidersIP Setting

When you use SQL Server Management Studio, Transact-SQL, or PowerShell to create an availability group listener, the Client Access Point is created in WSFC with the RegisterAllProvidersIP property set to 1 (true). 

Copper Contributor

we haven't changed anything, up until a month ago the parameter was at 0, now it is configured automatically at 1. Is there any guide to change this set without reconfiguring the whole sql?

Microsoft

I can confirm that, in my lab, when I create a new Client Access Point directly in Failover Cluster manager, RegisterAllProvidersIP is set to 0 by default.

 

EriqStern_0-1669134171965.png

 

This is the default configuration in Windows, but when SQL setup creates the cluster resource it will reconfigure that parameter based on SQL best practices, by default.

 

I don't know of a way to change the default values of these parameters in Windows, but it may be possible to configure SQL to modify them during initial setup.

 

I hope this helps to clarify.

Microsoft

@mgraps97 the value can be changed by set-clusterparameter please see here Configure availability group listener - SQL Server Always On | Microsoft Learn

Get-ClusterResource "VCO_Listener" | set-clusterparameter RegisterAllProvidersIP 1

 

Change will be effective once the resource is bounced.

Copper Contributor

@Yusuf Anis so if I use this command then on restart the parameters will be as I set them?

 

VCO_Listener is a sample name or I have to use this?

 

Thanks

Microsoft

@mgraps97 Yes, this "VCO_Listener" is an example I've used. And do note the resource name of the Listener differs depending upon whether it's created via SQL Server or from PSH/CluAdmin.

Run a get-clusterresource to get the exact name & then under a planned downtime window do the change & bounce the resource which requires the dependency resources to be bounced as well. In case you don't want to stop the database synchronization remove the dependency then after the change add the dependency back (AOAGres - VCO - IP).

RegisterAllProvidersIP is to 1 for multi-subnet setups and is advised to be used along with TTL

 

Import-Module FailoverClusters

Get-ClusterResource

Get-ClusterResource yourListenerName | Set-ClusterParameter RegisterAllProvidersIP 0

Get-ClusterResource yourListenerName | Set-ClusterParameter HostRecordTTL 300

Stop-ClusterResource yourListenerName

Start-ClusterResource yourListenerName

 

Make sure you cross verify the dependency chain before & after this activity.
And in case of resource recreation outside of SQL Sever, add the port# from SSMS.

 

This public URL can help with guidance.

Configure availability group listener - SQL Server Always On | Microsoft Learn

https://learn.microsoft.com/en-us/sql/database-engine/availability-groups/windows/create-or-configur...

Copper Contributor

Hi

@Yusuf Anis I've tried to do this:
Get-ClusterResource *| Set-ClusterParameter RegisterAllProvidersIP 0

and restarted entire server.
I've created a new AG and then this is the situation:
Object Name Value Type

test_marco_test Name TEST String
test_marco_test DnsName test String
test_marco_test Aliases String
test_marco_test RemapPipeNames 1 UInt32
test_marco_test HostRecordTTL 1200 UInt32
test_marco_test RegisterAllProvidersIP 1 UInt32
test_marco_test PublishPTRRecords 0 UInt32
test_marco_test ADAware 1 UInt32
test_marco_test ResourceData {1, 0, 0, 0...} ByteArray
test_marco_test StatusNetBIOS 0 UInt32
test_marco_test StatusDNS 0 UInt32
test_marco_test StatusKerberos 0 UInt32

 

the command did not change the default behavior

Copper Contributor

Thank you for this good article. I have a short question: 

 

I have a cluster network (cluster only) mit metric 1001 and livemigration network (cluster only) with metric 1000. So then the heartbeat and cluster communication will transfered over the livemigration network? 

 

Is there a way that I can change that? 

 

Greetz

Ovrld

Microsoft

Please specify the ResourceName instead of "*".

Copper Contributor

@Yusuf Anis 
Yes I know that I have to specify but when I create a new AG and then a new default cluster resource it takes the wrong settings, I want to change the default behavior and not go and edit by hand.

Version history
Last update:
‎Sep 23 2020 06:19 PM
Updated by: