Heya folks, Ned here again. In my continuing series on offbeat SMB settings, today I’m going to explain how to control SMB write-through and data consistency in Windows 10 and Windows Server.
Network protocols like SMB or NFS are actually remote file systems. They allow a client to mount destination storage as if it were their own local disks and read and write files to them. These protocols rely on underlying transports like TCP and then provide a layer on top for your apps to think that the server 5,000 miles away is truly just your F: volume.
Because remote networks, latency, and storage added up to a much slower experience than local I/O for the first few decades of computing, file servers that implemented these protocols were stuffed with buffers and caches to squeeze better performance out of craptastic spinning disks, as well as help the servers deal with lots of clients fighting for resources simultaneously – a problem your laptop doesn’t have.
With the advent of incredibly high throughput storage like SSD, NVME, and NVDIMM and incredibly low latency networking fabrics like RDMA, these simple SMB servers with simple user workloads morphed into Scale-out File Servers, where applications like SQL and Hyper-V want to use them as Scale-as part of a Software-defined Storage fabric that required near-perfect work resiliency and durability. That also meant that we needed to stop using caches and start requiring data to commit to the disk, not memory, for safety.
I know what you’re thinking right now: “Doggone software devs, playing games with my data.” Well, disks have buffers too! SCSI and modern SATA drives implement “Force Unit Access” (FUA), which guarantees when an IO is marked for write-through it will land on true stable storage and not the disk’s own caches, which are legion in the constant battle of IOPS brochures between hardware makers. Basically, if your drive gets told “you better write this IO and don’t reply until it’s really written for realsies,” it will.
We first added FUA support for SCSI in Windows Vista. We later added the SATA support and I am here to tell, you dear reader, that we still see SATA disks out there which answer write-through commands then don’t actually write through so I recommend sticking with commercial, name brand disks when not using SAS/SCSI storage!
If you suffer from insomnia, I recommend reading more about WRITE DMA FUA EXT (command 3Dh), WRITE DMA QUEUED FUA EXT (command 3Eh), and WRITE MULTIPLE FUA EXT ...
For organizations using those Scale-out File Servers for software-defined datacenter workloads like SQL, you get write-through for free as soon as you create Windows Server 2012, 2012 R2, 2016, and 2019 failover clusters with the File Server resource configured. The Scale-out File Server (SoFS) cluster role enables the “Continuous Availability” flag on every share you create, guaranteeing write-through as part of a larger set of durability and reliability guarantees for your application data workload. When combined with features like Transparent Failover and Persistent Handles, a dead cluster node will not lead to a crashed workload – IOs are persisted and handed over to another node, all while getting FUA.
We also enable the CA share flag on regular file server cluster nodes but admins often disable it for performance reasons, the same way they might avoid SoFS for compatibility reasons. Remember when I wrote the Shakespearean prose to scale out or not to scale out? CA is not designed for copying files but for handing IOs on a file opened then being modified forever because it’s a virtual machine or database.
That’s all fine for a specific workload type, but what if you want to force write-through from a client and not care what your Windows Server OS version and configuration are? Starting in Windows 10 1809 and Windows Server 2019, I’ve got an answer for you:
NET USE /WRITETHROUGH
When you map a UNC path (with or without a letter) to a remote Windows Server using whatever flavor of SMB and provide the new flag, you will send along the write-through command for any files you create or modify over that session. Now an admin can specify for users’ logon scripts or their own mapped drives that any IO happening on there will ignore those caches and guarantee writes for maximum durability when you don’t trust the reliability of your servers. And you’ll certainly find out how fast your drives really are!
Let’s see it in action. First, I map a drive normally and copy a single 10GB file then 3,100 little files that added up to 10GB. I use robocopy for all my tests because it has exact copy times and lets me add efficiency like multi-threaded copies; stay away from File Explorer for any testing. Hell, stay away from it for copies the rest of the time too!
As you can see, my single big file took 48 seconds, and my many small files took a bit longer. A large batch, multi-directory small file copy to a server has tons of overhead that a single sequential IO write large file does not, so just to get that time close I still needed to add multi-threading (another reason to use Robocopy instead of File Explorer every time!)
Now let’s try that again with write-through enabled:
We took a hit in time. Still, perhaps worth guaranteeing that some 125Gb file copy wasn’t corrupted by a server crash at the last couple seconds of IO the way Murphy always guarantees.
The difference in SMB quite simple: a single flag will now be enabled on every Create File request operation. This tells all subsequent writes to the file to require the storage to support and use FUA.
Nothing else needs to be done and the rest of the SMB conversation will look normal.
As you saw, there is a performance hit to requiring write through, and it can vary a little or a lot. Your mileage will vary here – I am using some pretty quick SSD storage and low latency 10Gb networking without congestion, you might not be so fortunate. You can use the Robocopy /J option on very large files - tens of GB or larger - to offset this a hit a bit, if you’re feeling fancy.
Test test test!
The odds of you needing write through for a normal user doing normal user things is pretty low; their files are small, apps like Office often keep local copies, and their window of some IO living in a server buffer just as it replied back to the client but then crashed before committing to disk is really quite small. The overhead for them is pretty light too, however; unless they are copying very large files all the time, they typically won’t see a huge downside to you mapping their drives with write through.
Until next time,
Ned "write on!" Pyle
An interesting question, Arian. I don't know how much further development GPP does anymore, but I'll find out the owners and ask if there is a path to this...
Ned, it seems you would need more than just the /writethrough and -UseWriteThrough switches to assure writing to the platter on the other end of an ordinary SMB share. From my understanding, all these switches do is skip any caching in RAM of the file data on the client computer. File writes are sent immediately out over the wire. But then that's it. If the SMB share is hosted on a Windows 95 computer, or on a NAS device with its own caching, or a virtual machine sharing an iSCSI volume saved as a file on a NAS volume (meaning two network hops), you can't assume the write-through request will apply all the way down to the physical platter, right?
@Jeffrey Foxx 100% correct. We are insisting that the physical storage layers write through but we cannot validate into that layer; we only trust that it did as it was told and the reply that it did was truthful.
Ned, you wrote that Microsoft "first added FUA support for SCSI in Windows Vista". What exactly does that mean? FUA was around in the 1990s, and the documentation from back to Windows 95 and Windows NT 3.1 talk about opening files with flags such as FILE_FLAG_WRITE_THROUGH and FILE_FLAG_NO_BUFFERING, and the fact this causes FUA to be sent on each write to the drive.
@Jeffrey Foxx I cannot recall where I got that, but it was from a conversation with the local storage team here at the time. It might have been some peculiarity where it worked properly, reliably, generically, etc.
So, funny story...
I had a project at work once to create a resilient file storage system for hosting the most sensitive production data and services (we hosted VMs in there too).
I was pressured to use Linux as or SysAdmin was heavily bias against Microsoft, despite my attempts to dissuade him (licensing and lack of beta testing broke that camels back). So, I attempted to immerse myself in Linux despite my anxieties of vastly unfamiliar territory on the prompt syntax (completely self taught IT stuff), and was curious to understand 'clustering' concepts. It was suggested that we try either Gluster or CEPH, we brainstormed and settled on CEPH.
For those unfamiliar, CEPH is basically Storage Spaces Direct (SDS?), or distributed block storage across multiple hosts with heavy resiliency.
CEPH, by design I believe, uses this exact write through method; they call it sync write in Linux land, I think?
Anywho, my real passion is actually hardware, and I'm a speed demon. So, I know the type of drives you're looking for in order to get the best performance using this write method.
What you need are SSDs with a dedicated power failure protection as the firmware ensures buffer/cache flushes during write operations. These typically are in the M.2 22110 format, and have several block capacitors to act as an uninterruptible power source to flush the buffer in the event of power failure. It makes best effort to get the buffer data into the flash media gracefully, typically with very high success rate.
These drives tend to have write acceleration in the SSDs design in order to handle this write method, common among enterprise SSDs (but not all!). You can't adjust the write cache settings for the drives in the device manager, typically, as (I believe) they handle write operation in a protected path to ensure data resiliency. They provide very low write latency, and are beasts with software defined block storage.
I actually invested in a 960GB Samsung PM963 M.2 SSD after working on that storage project (2018) and discovering these niche use case drives, and they're really engaging at just about any workload you throw at it, higher the queue depth the better! 8)
Edit for the numerous typos
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.