First published on TECHNET on Dec 05, 2013
Hi folks, Ned here again. If Shakespeare had run Windows, Hamlet would be a play about configuring failover clusters for storage. Today I discuss the Scale-Out File Server configuration and why general use file server clustered shares may still have a place in your environment. These two types of Windows file servers have different goals and capabilities that come with trade-offs; it’s critical to understand them when designing a solution for you or your customer. We’ve not always done a great job explaining the differences between these two file servers, and this post aims to clear things up.
It's not enough to speak, but to speak true. So let’s get crackin’.
We released Scale-Out File Server (SOFS) in Windows Server 2012. SOFS adds highly available, active-active, file data access for application data workloads to Windows Server clusters through SMB, Continuous Availability (CA), and Cluster Shared Volumes (CSV). CA file shares ensure - amongst other things - that when you connect through SMB 3, the server synchronously writes through to the disk for data integrity in the event of a node failure. In other words, it makes sure that your files are consistent and safe even if the power goes out.
You get the option when you configure the File Server role in a Windows Server failover cluster:
SOFS has some other key benefits:
Claus Joergensen has a good blog post on these capabilities, and if you want to give to set up a test environment, Jose Barreto is the man with the step-by-step plans.
All of this has a single customer in mind: application data accessed via SMB, like Hyper-V virtual machine disks and SQL database files. With your hypervisor running on one cluster and your storage on another cluster, you can manage – and scale - each aspect of the stack as separate high-performance entities, without the need for expensive SAN fabrics. Truly awesome stuff that sounds like a great fit for any high availability scenarios.
For those who want to use SOFS for regular user shares, though: proceed with caution.
Inside Microsoft, we define “Information Worker” as the standard business user scenario. In other words, a person sitting at their physical client or virtual desktop session, and connecting to file servers to access unstructured data. This means SMB shares filled with home folders, roaming user profiles, redirected folders, departmental data, and common shared data; decades of documents, spreadsheets, and PDFs.
Ooh, there’s leftover cake in the break room!
The typical file operations from IW users are very different when compared to application data like Hyper-V or SQL. IW workloads are metadata heavy (operations like opening files, closing files, creating new files, or renaming existing files). IW operations also involve a great many files, with plenty of copies and deletes, and of course, tons of editing. Even though individual users aren’t doing much, file servers have many users. These operations may involve masses of opens, writes, and closes, and often on files without pre-allocated space. This can mean frequent VDL extension, which means many trips to the disk and back, all over SMB.
Right away, you can see that going through a share enabled with CA to provide the data integrity guarantee might have an impact on performance, when compared to previous releases of Windows Server, which did not have shares with CA and thus did not provide this data integrity guarantee. Continuous Availability requires that data write-through to the disk to ensure integrity in the event of a node failure in SOFS, so everything is synchronous and any buffering only helps on subsequent reads, not writes. A user that needs to copy many big files to a file server - such as by adding them to a redirected My Documents folder - can see significantly slower performance on CA shares. A user that spent a week working from home and returns with their offline files cache brimming will see slower uploads to CA shares.
Nothing is broken here – this is just a consequence of how IW workloads operate. A big VHDX or SQL database file also sees slower creation time through a CA share, but it’s largely a cost paid once, because the files have plenty of pre-allocated space to use up, and subsequent IO prices are comparatively much lower. We also optimize SMB for them, such as with SMB Direct’s handling of 8K IOs .
To demonstrate this, I performed a few small-scale tests in my gross test environment. Don’t worry too much about the raw numbers; just focus on the relative performance differences.
Environment:
Note: to set this up, see Jose’s demo here . My only big change was to use one node instead of three, so I had more resources in my very gross test environment.
Methodologies:
Results:
Test method
|
CA, avg sec
|
Non-CA, avg sec
|
Non-CA to CA IW perf comparison
|
MS Internal synthetic file creation (1GB)
|
59
|
40
|
1.475 X faster
|
Robocopy.exe (1GB)
|
58
|
42
|
1.38 X faster
|
Copy-Item cmdlet (2GB)
|
107
|
73
|
1.465 X faster
|
Folder Redirection full sync (5GB)
|
689
|
545
|
1.26 X faster
|
Important: again, this could be faster in absolute terms on your systems with similar data, as my test system is very gross. It could also be slower if your server is quite busy, has crusty drivers installed, is on a choked-out network, etc.
The good news
MS Office 2013’s big three – Word, Excel, and PowerPoint – performed well with both CA and non-CA shares and don’t have notable performance differences in my tests even when editing and saving individual files that were hundreds of MB in size. This is because later versions of Office operate very asynchronously, using local temporary files rather than forcing the user to wait on remote servers. On a remote 210MB PPTX, the save times on an edited file were nearly identical, so I didn’t bother posting any results.
The not-so-good news
Office’s good performance is less likely in other user applications; MS Office has been at this game for 22 years. One internal test application I used to generate files had non-CA performance similar to the synthetic file creation test above. However, when the same tool ran against a CA share, it was 8.6 times slower, because of how it continuously asked the server to allocate more space for the file and kept paying the synchronous write-through cost. There’s no way to know what the more “write-through inefficient” apps are until you find out in testing.
Important: even general-purpose file server clusters have CA set on their shares by default when created via the cluster admin tool, Server Manager, or New-SmbShare. You should consider removing that setting if you require performance over data write-through integrity on shares on clusters. On non-clustered file servers, you cannot enable CA.
This is conceivably useful even with SOFS and application data workloads: for instance, you could create two shares to the same folder. One is for Hyper-V to mount VHDXs remotely, and one is to copy VHDXs to that share when configuring new VMs, such as through SCVMM.
Final important note: make sure you install (at a minimum) KB2883200 on your Windows Server 2012 R2 servers and Windows 8.1 clients; it makes copying to shares a little faster. Better yet, stay up to date on your file server by using this list of currently available hotfixes for the File Services technologies in Windows Server 2012 and ...
The performance issues are actually manageable; many users probably won’t notice any write-through impact, depending on their work patterns. The real issue here is that Scale-Out requires CSV. Moreover, this paints your environment into a corner, because many IW applications do not support that file system.
At first, you configure files on a scale-out cluster share and it works fine. Nevertheless, a year later, when you decide you need more file server capabilities like Work Folders, Dynamic Access Control, File Classification Infrastructure, and FSRM file quotas and screens – you are blocked .
Let’s go to the big board.
Technology Area
|
Feature
|
General Use File Server Cluster
|
Scale-Out File Server
|
SMB
|
SMB Continuous Availability
|
Yes
|
Yes
|
SMB Multichannel
|
Yes
|
Yes
|
|
SMB Direct
|
Yes
|
Yes
|
|
SMB Encryption
|
Yes
|
Yes
|
|
SMB Transparent failover
|
Yes 1
|
Yes
|
|
File System
|
NTFS
|
Yes
|
NA
|
Resilient File System (ReFS)
|
Yes
|
NA
|
|
Cluster Shared Volume File System (CSV)
|
NA
|
Yes
|
|
File Management
|
BranchCache
|
Yes
|
No 4
|
Data Deduplication (Windows Server 2012)
|
Yes
|
No 4
|
|
Data Deduplication (Windows Server 2012 R2)
|
Yes
|
Yes 6
|
|
DFS Namespace (DFSN) root server root
|
Yes
|
No 4
|
|
DFS Namespace (DFSN) folder target server
|
Yes
|
Yes
|
|
DFS Replication (DFSR)
|
Yes
|
No 4
|
|
File Server Resource Manager (Screens and Quotas)
|
Yes
|
No 4
|
|
File Classification Infrastructure
|
Yes
|
No 4
|
|
Dynamic Access Control (claim-based access, CAP)
|
Yes
|
No 4
|
|
Folder Redirection
|
Yes
|
Yes 2
|
|
Offline Files (client side caching)
|
Yes 5
|
Yes 5
|
|
Roaming User Profiles
|
Yes
|
Yes 2
|
|
Home Directories
|
Yes
|
Yes 2
|
|
Work Folders
|
Yes
|
No 4
|
|
NFS
|
NFS Server
|
Yes
|
No 4
|
Applications
|
Hyper-V
|
Yes 3
|
Yes
|
Microsoft SQL Server
|
Yes 3
|
Yes
|
1 Only works if CA is enabled on shares
2 Not recommended on Scale-Out File Servers.
3 Not recommended on general use file servers.
4 Requires NTFS
5 CSC is less compatible with CA shares than the other IW technologies, due to how it decides a share is offline combined with the SMB 3 client. This means that Offline Files will stay online even if the user no longer has access to the share, for 3-6 minutes. Ensure that CA is disabled on the share, even if it is using a File Server for General Use role configuration.
6 Data Deduplication is only supported in a scale-out file server deployment for Virtual Desktop Infrastructure (VDI) workloads with separate storage and compute nodes. The storage must be remote.
Ultimately, this means that if you, your boss, or your customer decides “after that recent audit, we need to use DAC+FCI for more manageable security and we definitely need to screen out MP3 files and Grumpy Cat meme pics”, you will be forced to recreate the entire configuration using NTFS and general use file server clusters. This does not sound pleasant, especially when you now have to shift around terabytes of data.
Moreover, let’s not forget about down-level clients like Windows 7; any CA shares require SMB 3.0 or later and older clients connecting to them cannot use SOFS features. While a Windows 7 or Vista client can connect to a CA share, you need Windows 8 or later to use the CA feature.
As for XP? It cannot connect to a CA share at all. This doesn’t matter though, because you already got rid of XP. Right?
Finally, though, is the big question: if you accept the performance overhead, what does continuous availability provided by SOFS buy you with IW workloads?
The answer: little.
Many end-user applications don’t need the guarantees of continuous availability that SQL and Hyper-V demand in their workload. Your IW applications like Office and Windows Explorer are often quite resistant to the short-term server outages during traditional cluster failover. MS Office especially – it has lived for years in a world of unreliable networking; it uses temp files, works offline, and retries constantly without telling the user if there are intermittent problems contacting a file on a share.
The bottom line is that Word and all its friends will be just fine using traditional general use shares on clusters. Make sure that before you go down the scale-out route in a particular cluster design, it’s the right approach for the task.
The Bard would really hate spellchecker.
If you caught all the pseudo-Shakespeare references in this article, post the count in the commons and win a fabulous No-Prize!
Update 10/8/2015: It was just a matter of time before someone found a way to break SOFS with metadata heavy operations - check out https://support.microsoft.com/en-us/kb/3101545
Until next time,
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.