Chelsio RDMA and Storage Replica Perf on Windows Server 2019 are 💯
Published Apr 10 2019 07:53 AM 8,054 Views

First published on TECHNET on Dec 13, 2018
Heya folks, Ned here again. Some recent Windows Server 2019 news you may have missed: Storage Replica performance was greatly increased over our original numbers. I chatted about this at earlier Ignite sessions, but when we finally got to Orlando, I was too busy talking about the new Storage Migration Service .

To make up for this, the great folks at Chelsio decided to setup servers and their insane 100Gb T62100-CR iWARP RDMA network adapters , then test the same replication on the same hardware with both Windows Server 2016 and Windows Server 2019; apples and apples, baby. If you’ve been in a coma since 2012, Windows Server uses RDMA for CPU-offloaded SMB Direct high performance data transfer over SMB3. iWARP brings an additional advantage of metro-area ranges while still using TCP for simplified configuration.

The TL; DR is: Chelsio iWARP 100Gb - with SMB 3.1.1 and SMB Direct providing the transport - for Storage Replica is so low latency and so high bandwidth that you can stop worrying about your storage outrunning it. :face_with_tears_of_joy: No matter how much NVME SSD we threw at the workload, the storage ran out of IO before the Chelsio network did. It’s such an incredible flip from most of my networking life. We live in magical networking times.

In these tests we used a pair of SuperMicro servers, one with five striped Intel NVME SSDs, one with five striped Micron NVME SSDs. Each had 24 3Ghz Xeon cores and 128GB of memory. They were installed with both Windows Server 2016 RTM and Windows Server 2019 build 17744. A single 1TB volume was formatted on the source storage. Each server got a single-port 100Gb T62100-CR iWARP RDMA network adapter and the latest Chelsio Unified Wire drivers.

Let’s see some numbers and charts!

Initial Block Copy

We started with initial block copy, where Storage Replica must copy every single disk bock from a source partition to a destination partition. Even though the Chelsio iWARP adapter is pushing 94Gb per second at sustained rate – which is as fast as this storage will send and receive CPU overhead is only 5% thanks to offloading. And even 5 RAID-0 NVME SSDs at 100% read on the source and 100% write on the destination couldn’t completely fill that single 100Gb pipe. With SMB multichannel and another RDMA port turned on – this adapter has two – this would have been even less utilized.

That entire 1TB volume replicated in 95 seconds .

People talk about the coming 5G speed revolution and I can’t help but laugh my butt off, tbh. :beaming_face_with_smiling_eyes:

Continuous Replication

There shouldn’t be much initial sync performance difference between Windows Server 2016 and 2019 because the logs are not used at that phase of replication. They only kick in when block copy is done and you are performing writes on the source. So at this phase two sets of tests were run with the same exact hardware and drivers, but now a few times with Windows Server 2016’s v1 log and a few times with Windows Server 2019’s v1.1 tuned up log.

To perform the test we used Diskspd , a free IO workload creation tool we provide for testing and validation. This is the tool used to ensure that Microsoft Windows Server Software Defined HCI clusters sold by Dell , HPE , DataOn , Fujitsu , Supermicro , NEC , Lenovo , QCT , and others to meet the logo standards for performance and reliability under stress test via a test suite we call “VM Fleet.”

OK, enough Storage Spaces Direct shilling for Cosmos , let’s see how the perf changed between Storage Replica in Windows Server 2016 (aka RS1) and Windows Server 2019 (aka RS5).

The lower orange line shows Windows Server 2016 performance as we hit the replicated volume on the source with 4K, 8K, then 16K IO writes. The upper green line for Windows Server 2019 shows improvements from ~2-3X depending on size for MB per second (that’s a big B for bytes, not bits) and you can see we tuned as carefully as possible for the common 8K IO size. Because we’re using extra wide, low-latency, high-throughput, low-CPU-impacting Chelsio NICs, you’ll never have any bottlenecks due to the network and it will all be dedicated to the actual workload you’re running, not just to being a special “replication network” that are so common in the old world of regular low-bandwidth 1 and 10 Gb TCP dumb adapters.

The Big Sum Up

Storage Replica with Chelsio T6 provides datacenters with high performance data replication over local and remote locations, with the ease of use of TCP instead of Ethernet, and ensuring that your most critical workloads are protected with synchronous replication. Chelsio makes a cost-effective and secure data recovery solution that should appeal to any-sized datacenter or org.

The bottom line: we’re entered a new age for moving all that data around and its name is iWARP. Get on the rocket, IT pros.

Until next time,

- Ned “RDMA good, old networking bad. Me simple man” Pyle

Version history
Last update:
‎Apr 10 2019 08:21 AM
Updated by: