Exchange Team Blog

4 MIN READ

Placement of the File Share Witness (FSW) on a Geographically Dispersed CCR Cluster

Platinum Contributor

Apr 25, 2007

Update 6/4/2008: Please note that we have posted our new guidance around this subject here: New File Share Witness and Force Quorum Guidance for Exchange 2007 Clusters.

Exchange Server 2007 introduced Continuous Cluster Replication (CCR), which utilizes the log shipping and replay features of Exchange 2007 combined with a Majority Node Set (MNS) cluster. CCR utilizes the Microsoft Cluster service to provide failover to the passive node. An Exchange 2007 CCR cluster can be configured with two nodes and a File Share Witness (FSW). FSW is an option that is recommended so that a third piece of hardware does not need be utilized. FSW takes the place of the voter node that was used prior to update 921181. This hotfix allows the use of FSW which is used as a quorum resource without using a shared disk. The FSW is used by the cluster node for arbitration. At least two of the three (when counting nodes or FSW) must be up and communicating to maintain a majority or any cluster resources and the cluster service will cleanly shutdown.

With a CCR cluster it is possible to place each node in a separate datacenter, however the nodes must be on the same subnet. This is a Windows Clustering limitation that will removed with Longhorn Server. With the nodes being in separate datacenters, where should you place the FSW? It is Microsoft's recommendation to place the FSW in the same datacenter with the preferred owning node of the CCR cluster. This recommendation is based on management and statistics. These statistics show the most common network failure is WAN, not LAN related. This means that the highest risk of communication failure between 'voters' in the CCR implementation is between the Active and Passive nodes when the cluster is geographically disbursed. If we place the FSW in the datacenter with the preferred owning node, a loss of communication between the datacenter's will not cause the cluster service to shutdown. We recommend the FSW to be placed on the Hub Transport Role Server.

Note: We also recommend that you create a CNAME record in DNS for the server hosting the share, instead of the actual server name. When creating the share for the file share witness use the fully-qualified domain name for the CNAME record instead of the server name, as this practice assists with site resilience.

Please see article 281308 for some configuration issues with using the CNAME to access the FSW (share):

We are assuming the following has been configured for the following examples:

Creation of FSW in primary data center on the Hub Transport Server.
CCR Geo-Cluster has been configured.
IP subnets have been properly configured.
FSW share has been previously configured in the Backup Data Center.

So what happens if we lose the primary datacenter? If this were to occur, we could not have an automatic failover to the passive node as we would have lost the active node as well as the FSW, and the cluster would no longer have a majority (counting both Nodes or FSW) so the cluster service on the passive node would shutdown. Manual intervention in this scenario is necessary to force the quorum and to redirect the FSW to a new share in the secondary datacenter:

1.) Update the CNAME to point to a Hub Transport Server in the Backup Data Center.

2.) Force the quorum to start the cluster service on the Secondary Node:

net start clussvc /forcequorum node_list

Please see article 258078 â?? Cluster service startup options for more details

3.) At this point the Cluster service will start and you should be able to bring your cluster resources on-line.

The FSW share at the secondary site should already exist. The recommended location is on the Hub Transport server in the backup datacenter. You can easily script the previous steps if desired, but the Primary Data Center failure will require manual intervention.

For the procedure to create and secure the file share for the FSW please refer to:

http://technet.microsoft.com/en-us/bb124922.aspx

What happens if the Primary Data Center comes back on-line, but the WAN link is down to the Backup Data Center? The following illustrates this scenario:

1.) When the Primary Node comes up in the Primary Data Center, it would not see the Secondary node.

2.) The Primary node would see the local FSW because it would not see the DNS update since the local DNS and the DNS in the backup data center have not synced.

3.) The FSW (in the Primary Data Center) would show that the Primary Node has a current version of the cluster database, then the Primary Node will form a cluster. At this point we have hit the Split Brain Syndrome (a scenario under which a MNS cluster is partitioned because of network communication failure and each partition considers itself as an owner and brings resources online).

To avoid hitting the Split Brain Syndrome, you can delete the files in the FSW in the primary datacenter before starting the Primary Node or don't start the server where the FSW resides. If the WAN link was restored and the Primary Node had not been started, propagate the DNS updates if possible so the Primary Node will use the FSW in the Backup Data Center.

Note: You should ensure that your Servers are not configured to auto-power back on. Most up-to-date server manufacturers have a BIOS configuration to control when the server powers back on. It can be set to remember previous power sate, turn on, or leave off. The servers should be set in a "leave off" configuration, so when power is restored t o the Data Center the server can be brought up in a controlled/managed way.

Using a 3rd Site to achieve Automatic Failover

If possible, you could use a 3rd site to host the FSW:

With the 3rd site, a failure at the Primary Data Center would allow for an automatic failover to the Secondary Node, as cluster keeps a majority of the voters (Secondary Node and FSW).

- Matt Richoux

Updated Jul 01, 2019

Version 2.0

Platinum Contributor

Joined April 19, 2019

View Profile

Exchange Team Blog

You Had Me at EHLO.

24 Comments

Anonymous
Aug 13, 2008
What happens if you have your active and passive ccr in the same location with your HUB server with the FSW installed and you add a second HUB for HA and I rebooted the HUB with the FWS the cluster would go down would it not? If so how would i use both HUBs and FSW to provide HA?
Anonymous
Jul 16, 2007
We've got exactly this set up, however in testing, the /forcequorum switch does not seem to start the cluster service on the secondary node if the hub transport server hosting the FSW is tranferred to the secondary site and the primary exchange node is down. Any ideas?
Anonymous
Jun 26, 2007
Hi Jim,
You can use seperate VLANs on the same network link (as long as your hardware supports it), but it would be a single point of failure if they are coming off the same router/switch.
Anonymous
Jun 26, 2007
The CCR would continue to function on the active node. We still can communicate with the passive node so no failure would occur. If we did lost communication with the passive node at this time, the cluster service on both nodes would stop.
Anonymous
Jun 26, 2007
Hi Matt,

In your last scernario of FSW at third site what could happen to CCR if this third site down?

Thanks, Tony
Anonymous
Jun 11, 2007
Hi Matt:

When planning for the public and private "heartbeat" networks in a geo-dispersed CCR implementation, do we need to have separate network links for the public/private networks (i.e. separate routers, switches, WAN connections), or can we use VLANs over a single wide area link?

The CCR Planning documentation at http://technet.microsoft.com/en-us/library/a26a07fc-f692-4d84-a645-4c7bda090a70.aspx says that "A separate cluster private network must be provided" and that "to avoid single points of failure, use independent VLAN hardware for the different paths between the nodes."

This suggests we should have two separate links, but I wanted to get clarification on this if possible.

Thanks for your help!

Jim
Anonymous
Jun 06, 2007
Hi Sfotovat,
No you can't use SCC with CCR, but you can use SCC with SCR, which will be available with Exchange 2007 SP1. For more information on SCR see the following:

http://msexchangeteam.com/archive/2007/02/23/435699.aspx
Anonymous
Jun 06, 2007
Hi Harry,

No Exchange 2007 Clusters support Active/Active, this includes CCR.
CCR does not support more than one CMS per cluster.
Anonymous
Jun 02, 2007
Since CCR supports multiple Clustered Mailbox Servers, why cannot we configure Active/Active for CCR? Will future release, say, Exchange 2007 SP1 supports Active/Active CCR?

Thanks.
Anonymous
May 30, 2007
If I have two data centers, is it possible to use "Single copy clusters" for redundancy within a data center and "Cluster continuous replication" for across data centers redundancy?

Blog Post

Placement of the File Share Witness (FSW) on a Geographically Dispersed CCR Cluster