BLOG: "Only 16 nodes per cluster?! - but VMware..." limitations and rightsizing of failover clusters

MVP


Greetings community Windows Server Community members!
Today I am sharing insights with you on an often discussed matter.


Intro

This is an exercise on technical limitations and rightsizing of Hyper-V based clusters.

The article applies to rules for general Hyper-V based failover clusters, using Windows Server with Shared Storage (SAN), dHCI, (Azure Stack) HCI and the underlying S2D considerations in special.


Seriously, I've stopped counting the number of customers telling me about Hyper-V / Storage Spaces Direct / Azure Stack not being scalable.  Especially when thinking about Azure Stack HCI, this gives me chuckles.

Inspired by a simple question from a Microsoft techcommunity member I thought it is about time to share my experience on this "limitation".

Granted it is themed for S2D and Azure Stack HCI and I do see differences for many use cases out there using shared storages (SAN) or Scale Out File server. If you have comments and suggestions, I am all ears. I appreciate your comments and thoughts. This article is something I am writing from the top of my mind so, bear with me if I missed aspects or things are wrong, I will certainly investigate your comments and corrections.


Thinking about Cluster Size - I am putting all my eggs in one basket 

A great classic song, we will look into this further from an IT perspective. 
As always in IT: IT depends™. 

The cluster size possible with S2D scales from 1-16 physical or in lab virtual nodes forming one logical cluster.

Especially with S2D (Storage Spaces Direct) using Windows Server and Hyper-V or the more advanced adaptive cloud (hybrid cloud) focused Azure Stack HCI. It is using the same technology as a base product on Windows Server with some notable extras in terms of deployment and management.

One should consider though that the number of nodes in a Failover Cluster (and with that the number of disks) does not necessarily help to defend physical disk errors.

It depends how the Storage Pool deals with fault domains. This is automatic and just sometimes is good to revise and adjust (if you know what you are doing).


Considering fault domains

Independent from a Storage point of view, running a large cluster also means it is one large fault domain. In case of issues with the failover-cluster, and there are numerous from networking, physical up to “it is always DNS™”, storage issues, configuration issues and changes or drift.


Performance impacts

Running one large cluster also causes higher performance and bandwidth impacts. No so much when using a shared SAN, one might think, but certainly when using SDS, dHCI, HCI, like Microsoft Storage Spaces Direct. This is especially true for rebuild times of S2D in case of disk failures, replacement or HW especially disk capacity expansions.


Costs

When considering cost, S2D requires equal disks in a pool and mostly identical hardware within a cluster. Larger cluster could be less efficient and not tightly targeted and HW optimized to use case, especially for general VM or VDI workloads.


Lifespan and oopsies with physical disks

Granted NVMe, when choosing appropriated TBW / TWPD models, offer a long very lifespan, excellent response times and performance galore, for sequential but especially for random operations and IOPS. Today they are more cost efficient than SSDs.

Albeit when one does not follow the advice to patch your OS, FW and drivers regularly you might be hitting sudden outtakes on NVMe, SSDs and HDDs due to code issues in the firmware. This happened sometimes in the past and just recently also affected Samsung NVMe but have been spotted before disasters at scale.


Understanding Storage Spaces (Direct) / Storage Pools

In Windows Server S2D (always equally include Azure Stack HCI), all physical disks are pooled. In general, there is just one Storage pool available for all servers within a cluster. An exception are Stretched Clusters, something I do not want to go into detail here. If you want to learn more about these, I can recommend you this epic YT series.

If you face a problem with your pool, you are facing a problem for all nodes. This is common and likely what happens with other third-party SAN / RAID / SDS systems.
No change here, we are all cooking with water.

Here is a general overview of this amazing and "free" technology. It requires Windows Server Datacenter licensing, that's all for all bells and whistles of a highly performant and reliable Software defined Storage. It runs best on with NVMe only setups, but allows flexibility, based on the use case.

A high level overview for now to explain the relation to the original topic.

Storage Spaces Direct (S2D) has been introduced with Windows Server 2016 and uses ReFS 3.1. Currently we are at Windows Server 2025 soon, and ReFS 3.12, which comes with a ton of improvements.

Next to S2D there is Storage Spaces, a similar technology, but not forming a shared storage across different servers, so designed for standalone servers, opposing to server clusters. Something you should consider with ReFS for unclustered Hyper-V, Scale-out File Server and Backup Servers, instead of RAID or SAN.


When larger doesn’t mean more secure – Storage Resiliency affects also Cluster resilience

On both you define your storage policies per Volume / Cluster Shared Volume, likely to LUNs.

So, you can dedicate how much of resiliency, Deduplication and performance is required based on the workload that is going to be stored on that volume.

Some basic and common policies are Mirror and Nested Mirror. There exist other depending the number of disks / hosts, but these are not all recommended for VM workloads.

When using these resiliency methods, especially Mirror, adding several disks (or hosts) exponentially raises the risks of a full data loss on this Volume / CSV in case of unfortunate events.
 
So, choose and plan wisely.  Can just recommend doing the RTFM job beforehand, as later changes are possible but require juggling the data and require having space left in the pool (physical disks) for such storage (migration) operations.

Sure there are other methods that scale better like Dual parity.

Be warned that the diagrams are simplified, and the data is not equally distributed "per disks" as you would expect in traditional RAID but using 256 MB data blocks (slabs) by using an algorithm that care for the balanced placement.

It is important to understand this small difference to understand better on the predicable outcome of disk or host failures within the cluster. Not saying the docs are wrong, just the display of it is simplified.

Read on more here:
S2D - Nested Resiliency 
S2D - Understanding Storage fault tolerance 

Speaking of clusters the best effort is starting with 2 or 4 nodes. I would avoid and unequal number of nodes like three nodes or a multiple of it, as they are not very efficient (33%) and expanding on or from these require changing the Storage Policy (e. g. Three Way Mirror).

S2D also support single node clusters, with nested mirror. You have heard right.
Still satisfactory performance for many use cases, when you do not need full hardware stack resiliency, at a very low footprint and cost.


Notable upcoming improvements to Clustering Storage Spaces (Direct) and beyond

I trust that Azure Stack HCI will receive some of the recently announced improvements of Windows Server 2025. Be curious what's coming up next in the from Microsoft in regards of storage and storage options. Have a look at this later on. 
S2D - New Storage Options for Windows Server 2025 (release planned later this year) 


One large vs one or more smaller use case designed, scaled clusters

Again, it is a common question why the limit of 16 nodes while e. g. VMware supports larger clusters.

With no further ado, let's talk about how to do it right (imho).

You might seek to create smaller clusters and Azure Stack HCI by design makes it easier to do management, RBAC / Security and LCM operations across several clusters. Having more than one Cluster also enables you to leverage different Azure Subscriptions (RBAC, Cost Management / Billing > Controlling).


Sizing considerations – Why you do not need the same amount of hardware you had > smaller clusters

Proper (physical) CPU sizing

Often, when sizing is done, it is not considered about rightsizing the workloads, and rightsizing the CPU in use in a node. 

Modern CPUs compared to e.g. Sandy Bridge can help to consolidate in an 8:1 physical server. This way, you can easily save quite some complexity and costs for HW, licensing, cooling etc.

To understand the efficiency and why you should not expect current pCPUs to be same on new systems, these calculators from Intel and AMD help you to find a right sized CPU for your Windows Server and what to expect on reducing hardware, TCO and environmental impact. That is climate action up to par. You can find the calculator from Intel here. Same exists for AMD.

The vCPU to pCPU ration in today’s environments appear much higher than we are used to in previous clusters.

Yes, That's true. I often hear VMware / Hyper-V customers being happy with a 2:1 vCPU:pCPU ratio across their workloads. It depends on the use case but often CPU resources are wasted and pCPUs are idling for their money, even at the end of life of a usual hardware cycle.


Plan for rightsizing existing workloads before sizing hardware > Save costs

Please consider:

  • Storage Deduplication, also included in a more efficient way, in with in-line Deduplication and Compression with Windows Server 2025. Extra points (savings) for keeping all OS on a similar release.
  • Storage Savings and RAM saving, using Windows Server Core for infrastructure VMs.
  • Saving through Dynamic Memory
  • vCPU / RAM reduction per VM
  • Disk Space (too large, fixed disks, or Thin Provisioned Disks that got a relevant amount data deleted and have not been compressed) etc. etc.

All this can be based on your monitoring metrics with your existing solution, or Azure Monitoring through Azure Arc.

There is an enormous potential for savings and reduction of Cluster Nodes, RAM and Storage when you are interpreting your metrics before the migration to a more efficient platform.


VM Assessment

As outlined you can rely on your own methods and monitoring for the assessment of your workloads for rightsizing of hardware and VMs.

In addition you can leverage Azure Migrate, to do that for you. It does not matter if you finally decide to migrate to Azure or Azure Stack HCI, it can help you with the general assessment using live data during your operation, which gives you good conclusions on right sizing. No matter the target.



Consider growth and migrations

There is always growth of the business or increased requirements to consider.

The Azure Stack HCI Sizing tool helps you here but watch out sometimes there is huge gap of free resources. The tool is not logically perfect. It is math.
Also OS migrations cause temporary growth that can surpass 50% of resources. Good news it is getting better with IPU starting with Windows Server 2022 and later. 
Additionally, services on older VMs are not well designed like Fileserver+ DHCP + RDS + Certificate Auth on a Domain Controller. These scenarios are still around existing and scream to be resolved, at costs of more VMs / resources. 

Have you heard Hyper-V isn't scalable?

Get you own idea, here are the facts for Windows Server / Azure Stack HCI.
And often growing with every release.
Source: Azure Stack HCI System Requirements 

Azure Stack HCI and S2D

Karl_WesterEbbinghaus_0-1712241469993.png



These limitations shall not be confused with the ability of Hyper-V using a general Hypervisor, e.g. not using S2D but attached SAN:

General Hyper-V Scalability and Limitations

Karl_WesterEbbinghaus_0-1714060954215.png

 


Conclusion

You see there is some complexity in the game on the decision and the “limitation” of 16 nodes per cluster. I personally do not see this as a limitation in Windows Server Hyper-V or Azure Stack HCI given all of those aspects.

Smaller clusters use case targeted clusters can also ensure flexibility, inherit the motivation for (cost) reductions and right sizing in the first place.

No doubt lift and shift is easy, but it is often more expensive than investing some time into assessments, same with on-premises as in the cloud.

So why a 16+ node cluster? Hope this helped you to make a decision.
Allow me to cite Kenny Lowe, Regional Director and expert for Azure Stack HCI and other topics: “As ever, just because you can, doesn't mean you should.”


Looking for more?

Here is an amazing 1h video, you should consider watching if this article just fueled your interest into alternatives to classic clusters. VMware to Hyper-V migration options.


Full agenda, on-demand, of Windows Server Summit 2024
https://techcommunity.microsoft.com/t5/tech-community-live/windows-server-summit-2024/ev-p/4068971

Thank you for reading, learning, sharing. 

Do you find this post helpful? Appreciate your thumbs up, feedback, questions.
Let me know in the comments below.

2 Replies

Dear readers,

 

I have actually forgot to mention a very important point, which just came into my mind very recently when a customer with a 20 nodes large cluster and SAN attached flagged issues. (without SAN the 16 nodes limit applies)

 

 

Processor compatibility mode in Hyper-V based virtualization - a phyrric victory as of today 

 

The larger the cluster, especially when not using S2D and Azure Stack HCI, the more likely it is one expanding the cluster with hardware of different CPU generations. 

 

This is something very common as one CSV using a SAN cannot be presented across different clusters. 

 

Then with different CPU generations customers often enable the CPU compatibility flag in the VMs so they can still leverage seamless LiveMigration between hosts / nodes with different CPU generations.

 

Without the compat flag only Quick Migration would be possible, causing the VM to restart, then negotiating existing CPU instructions. 

 

This is because Intel and AMD offer new instruction sets to gain perfomance. Some popular to name are AVX256, AVX512. 

For detailed information on Intel CPUs, head to ark.intel.com and the compare feature, to see the CPU instructions available and if you need this compat flag. 

 

If you think CPU compat flag is the solution, let me warn you: It's a phyrric victory! 

 

Other than VMware ESXi, all Hyper-V versions before Windows Server 2025 and before Azure Stack HCI 22H2 will severely degrade the CPU performance causing lower pCPU to vCPU ratios and higher CPU load.

 

This is because at the moment, this CPU compat flag sets the CPU to a technical state back to "Sandy Bridge" like CPU instructions.

So enabling it gives you peace of mind but effectively you are downgrading your investment. 

 

The dynamic CPU feature negotiation is available starting with Azure Stack HCI 22H2 and Windows Server 2025. 

 

Learn more about it in the documentation.

Updated the OP to eliminate confusion still around the sixteen nodes limit, outlining this is for S2D and Azure Stack HCI. Included another picture and and reference. Thanks for your feedback, offline!