Storage Spaces Direct throughput with iWARP
Published Apr 10 2019 04:28 AM 1,586 Views
First published on TECHNET on Mar 13, 2017
Hello, Claus here again. It has been a while since I last posted here and a few things have changed since last time. Windows Server has been moved into the Windows and Devices Group, we have moved to a new building with a better café, but a worse view :smiling_face_with_smiling_eyes:. On a personal note, I can be seen waddling the hallways as I have had foot surgery.

At Microsoft Ignite 2016 I did a demo at the 28-minute mark as part of the Meet Windows Server 2016 and System Center 2016 session. I showed how Storage Spaces Direct can deliver massive amounts of IOPS to many virtual machines with various storage QoS settings. I encourage you to watch it, if you haven’t already, or go watch it again :smiling_face_with_smiling_eyes:. In the demo, we used a 16-node cluster connected over iWARP using the 40GbE Chelsio iWARP T580CR adapters, showing 6M+ read IOPS. Since then, Chelsio has released their 100GbE T6 NIC adapter, and we wanted to take a peek at what kind of network throughput would be possible with this new adapter.

We used the following hardware configuration:

  • 4 nodes of Dell R730xd

    • 2x E5-2660v3 2.6Ghz 10c/20t

    • 256GiB DDR4 2133Mhz (16 16GiB DIMM)

    • 2x Chelsio T6 100Gb NIC (PCIe 3.0 x16), single port connected/each, QSFP28 passive copper cabling

    • Performance Power Plan

    • Storage:

      • 4x 3.2TB NVME Samsung PM1725 (PCIe 3.0 x8)

      • 4x SSD + 12x HDD (not in use: all load from Samsung PM1725)

    • Windows Server 2016 + Storage Spaces Direct

      • Cache: Samsung PM1725

      • Capacity: SSD + HDD (not in use: all load from cache)

      • 4x 2TB 3-way mirrored virtual disks, one per cluster node

      • 20 Azure A1-sized VMs (1 VCPU, 1.75GiB RAM) per node

      • OS High Performance Power Plan

    • Load:

      • DISKSPD workload generator

      • VM Fleet workload orchestrator

      • 80 virtual machines with 16GiB file in VHDX

      • 512KiB 100% random read at a queue depth of 3 per VM

We did not configure DCB (PFC) in our deployment, since it is not required in iWARP configurations.

Below is a screenshot from the VMFleet Watch-Cluster window, which reports IOPS, bandwidth and latency.

As you can see the aggregated bandwidth exceeded 83GB/s, which is very impressive. Each VM realized more than 1GB/s of throughput, and notice the average read latency is <1.5ms.

Let me know what you think.

Until next time

Version history
Last update:
‎Apr 10 2019 04:28 AM
Updated by: