Cluster Shared Volume Failure Handling

Microsoft

Mar 15, 2019

Part 4

First published on MSDN on Oct 27, 2014
This is the fourth blog post in a series about Cluster Shared Volumes (CSV). In this post we will explain how CSV handles storage failures and how it hides failures from applications. This blog will build on prior knowledge with the assumption that the reader is familiar with the previous blog posts:

Cluster Shared Volume (CSV) Inside Out | Microsoft Community Hub

Which explains CSV components and different CSV IO modes.

Cluster Shared Volume Diagnostics | Microsoft Community Hub

Which explains tools that help you to understand why CSV volume uses one or another mode for IO.

Cluster Shared Volume Performance Counters | Microsoft Community Hub

Which is a reference and guide to the CSV related performance counters.

Failure Handling

CSV is designed to increase availability by abstracting applications and make them resilient to failures of network, storage, and nodes. CSV accomplishes this by virtualizing file opens. When an application opens a file on CSVFS, this open is claimed by CSVFS. CSVFS then in turn opens another handle on NTFS. When handling failures CSVFS can reestablish its file open on NTFS while keeping the virtual handle to the application open on CSVFS valid. To better understand that, let’s look at how a hypothetical failure handling might go. We will do that with help of a diagram where we will remove many components that are part of CSV to keep the picture simple.

Let’s assume that we start in the state where disk is mounted on the Node 2, and there are applications running on both nodes using files on this CSV volume.

Let’s take a failure scenario where Node 2 loses connectivity to the Disk.

For instance this might be caused by HBA on that node going bad or by someone unintentionally misconfiguring the LUN masking while making another change. In this example, there are many different IO’s in flight at the moment of failure. For instance there might be File System or Block Redirected IO from Node 1 or any IO from Node 2 or any metadata IO. Because connectivity to the storage was lost, NTFS will start failing these IOs with status code indicating that device object has been removed. Once CSVFS observes failed IO it will switch volume to the Draining state.

When CSVFS switches itself to the Draining state because it has observed a failure from Disk or SMB, we refer to that as CSVFS “Autopauses”. This indicates that the volume has automatically put itself in a recovery state. For instance when a user invokes an action to move a physical disk resource from one cluster node to another, then CSV volume also will be put in Draining state. But in this case it does it because of an explicit administrative action and the volume is not considered to be in an Autopause in this case.

In the Draining state volume pends all new IOs and any failed IOs. Cluster will first put CSVFS for that volume into a ‘Draining’ state on all the nodes on the cluster. Once state transition to the draining is complete cluster will then tell CSVFS on all the nodes to move to the ‘Paused’ state. During transition to the paused state CSVFS will wait for all ongoing IOs to complete, and once all IO’s have completed so that there are no longer any IO’s is in flight, it will close the underlying file opens to NTFS. Meanwhile cluster will discover that path to the disk is gone and will dismount NTFS on the Node 2

Clustering has a component called the Storage Topology Manager (STM) which will has a view of all the nodes disk connectivity, it will discover that Node 1 can see the disk. Cluster will mount NTFS on the Node 1.

Once mount is done Cluster will tell CSVFS to transition to the ‘Set Down Level’ state. During that transition CSVFS re-opens files on NTFS. Once all nodes are in Set Down Level state Cluster tells CSVFS on all nodes to go to ‘Active’ state. While transitioning to the active state CSVFS will resume all paused IOs and will stop pausing any new IOs. From this point on CSV has fully recovered from the disk failure and is back to a fully operational state.

Applications running on CSVFS would perceive this failure as if IOs for some reason took longer than usual, but they will not observe the failure.

On the nodes where CSVFS observed the failure due to the disk disconnect, and automatically put itself to ‘Draining’ state (a.k.a Autopaused) before Cluster told it to do so you will see System Event log message 5120 which would look like this:

Log Name:      System 
Source:        Microsoft-Windows-FailoverClustering 
Event ID:      5120 
Task Category: Cluster Shared Volume 
Level:         Error 
Description: 
Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_VOLUME_DISMOUNTED(C000026E)'. All I/O will temporarily be queued until a path to the volume is reestablished.

In case if cluster was not able to recover from the failure and had to take CSVFS down you also will see System Event log message 5142

Log Name:      System 
Source:        Microsoft-Windows-FailoverClustering 
Event ID:      5142 
Task Category: Cluster Shared Volume 
Level:         Error 
Description: 
Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

Summary

CSV is a clustered file system which also helps increase availability by being resilient to underlying failures. In this blog post we went into detail how CSV abstracts storage failures from applications.

Thanks!
Vladimir Petter
Principal Software Development Engineer
Clustering & High-Availability
Microsoft