Blog Post

Failover Clustering
3 MIN READ

Security Settings for Failover Clustering

John Marlin's avatar
John Marlin
Icon for Microsoft rankMicrosoft
Jul 13, 2021

 

Security is at the forefront of many administrator's minds and with Failover Clustering, we did some improvements with Windows Server 2019 and Azure Stack HCI with regards to security.

 

Since the beginning of time, Failover Clustering has always had a dependency on NTLM authentication.  As the versions came and went, a little more of this dependency was removed.  Now, with Windows Server 2019 Failover Clustering, we have finally removed all of these dependencies.  Instead Kerberos and certificate-based authentication is used exclusively. There are no changes required by the user, or deployment tools, to take advantage of this security enhancement. It also allows failover clusters to be deployed in environments where NTLM has been disabled.

 

This goes for the bootstrapping of the cluster to the starting of the resources and drives.  With the bootstrapping process, the need of an Active Directory domain controller is also no longer needed.  As explained in this blog, we have a local user account (CLIUSR) that is used for various things now.  In conjunction of this account as well as the use of certificates:

 

  1. Cluster Service starts and forms the cluster
  2. Other nodes will join the cluster
  3. Drives (including Cluster Shared Volumes) will come online
  4. Groups and resources start coming online.

This is especially beneficial if you have a domain controller is virtualized running on the cluster, preventing the "chicken or the egg" scenario.

 

Another security concern that administrators have is what is out on the wire.  There are a couple of security settings to consider with regards to communications between the nodes and and storage.  From a storage perspective, there is Cluster Shared Volume (CSV) traffic for any redirected data and Storage Bus Layer (SBL) traffic, if using Storage Spaces Direct.

 

Let's first talk about cluster communications.  Cluster communications could contain any number of things and what an admin would like is to prevent anything from picking it up on the network.  As a default, all communication between the nodes are sent signed, making the use of certificates.  This may be fine when all the cluster nodes reside in the same rack.  However, when nodes are separated in different racks or locations, an admin may wish to have a little more security and make use of encryption. 

 

 

This setting is controlled by the Cluster property SecurityLevel and has three different levels.

 

0 = Clear Text

1 = Signed (default)

2 = Encrypted

 

If the desire is to change this to encrypted communications, the command to run would be:

 

(Get-Cluster).SecurityLevel = 2

 

The other bit of communication between the nodes would be with the storage.  Both Cluster Shared Volumes (CSV) has traffic on the wire and if using Storage Spaces Direct, you have the Storage Bus Layer (SBL) traffic.  For these bits of traffic, the default is to send everything in clear text.  Admins may decide they wish to secure this type of data traffic to lock it down and prevent sniffer traces from picking anything up.

 

 

This setting is controlled by the Cluster property SecurityLevelToStorage and has three different levels.

 

0 = Clear Text (default)

1 = Both CSV and SBL traffic are signed

2 = Both CSV and SBL traffic are encrypted

 

If the desire is to change this to encrypted communications, the command to run would be:

 

(Get-Cluster).SecurityLevelToStorage = 2

 

One caveat to the SecurityLevel and SecurityLevelToStorage that must be taken into consideration.  These forms of communication are using SMB.  When using a form of encryption on the network with SMB, RDMA is not used.  Therefore, if you are using this on RDMA network cards, RDMA is not used and can cause a performance impact.  Microsoft is aware of this impact and working on correcting this for a later version.  For more information on this, please refer to the following document.

 

Reduced networking performance after you enable SMB Encryption or SMB Signing in Windows Server 2016

Reduced performance after SMB Encryption or SMB Signing is enabled - Windows Server | Microsoft Docs

 

Thanks

John Marlin

Senior Program Manager

Twitter: @Johnmarlin_MSFT

Updated Jun 06, 2022
Version 3.0
  • John Marlin thank you for sharing these insights.

     

    When looking at the upcoming projection of AD-less Failover Clusters with Windows Server 2025 - as per Windows Server Summit 2024 - we would have no requirement for Domain Controllers, is this correct? 

     

    Could you explain, eventually in a follow-up blog, how authentifcation, signing, Livemigration will work without Domain Controllers Kerberos?

     

    Since today it's a security best practice using a seperate domain instance only for the Operation and management of core services like Failover-Cluster for Hyper-V / S2D / Azure Stack HCI other questions arise:

     

    How can other Software safely communicate and authenticate like Windows Admin Center, SCVMM or Backup / BCDR Software?

     

    Why is the NETBIOS naming convention on clustername (CNO) still a thing?

     

    Thank you for your reply in advance! 

  • JimGandy's avatar
    JimGandy
    Copper Contributor

    It looks like there is a typo in this blog article. SecurityLevelToStorage does not exist, but SecurityLevelForStorage does. 

     

     

     

  • AzureGuineaPig's avatar
    AzureGuineaPig
    Copper Contributor

    Thanks for sharing John Marlin - was wondering if there are any published resources available or if you would consider a write up of the NTLM dependencies in Failover Clustering in older versions, specifically with how NTLMv1/v2 are used?  We're undergoing a hardening process in my company to at minimum enforce NTLM v2, and refuse LM & NTLM (lmcompatibilitylevel=5 in the registry), and wanted to understand the impact of doing so on 2012R2, 2016, 2019 versions.  In a test scenario on a 2012R2 cluster, where we blocked NTLM outright, this doesn't appear to have affected the cluster resources or failover between nodes at all, even though the blocked NTLM audit events are generated frequently.

  • WhoIsHomer's avatar
    WhoIsHomer
    Copper Contributor

    The document you link to regarding SMB Encryption/Signing impacting performance.

    How would you protect a failover cluster from a SMB relay attack if we can't enable SMB Encryption/Signing without greatly impacting performance?