SOLVED

Server 2019: Poor write performance on S2D two node cluster

Copper Contributor

Hello!

 

I am trying to set up S2D on two node cluster for Hyper converged infrastructure. Unfortunately I observe significant write performance drop if we compare S2D storage with slowest physical hard drive performance participating in cluster.

 

What could cause this?

How to get better results?

 

My test environment

OS: Windows Server 2019 Datacenter Build 17723.rs5_release.180720-1452

Both nodes are connected directly using one 10 Gbps link for S2D

Each node have 1 Gbps link for management

S2D two node cluster configured with Cache disabled

 

Node 1

System: Supermicro X9SRH-7F/7TF

CPU: Intel Xeon E5-2620 2.00 GHz (6CPUs)

RAM: 32 GB DDR3

Network: Intel X540-AT2 10 Gbps copper

System drive: Samsung SSD 840 PRO 512 GB

Storage drives: Samsung SSD 850 PRO 512 GB, Samsung SSD 840 PRO 512 GB

 

Node 2

System: Intel S2600WTT

CPU: Genuine Intel CPU 2.30 GHz (ES) (28 CPUs)

RAM: 64 GB DDR4

Network: Intel X540-AT2 10 Gbps copper

System drive: INTEL SSDSC2BB240G7 240 GB

Storage drives: Samsung SSD 850 PRO 512 GB, Samsung SSD 840 PRO 512 GB

 

Before enabling S2D I turned off write cache for each SSD drive individually and tested their write performance by copying 30 GB large VHD file. Results were around 130 - 160 MB/s for Samsung SSD 840 PRO drives and around 60 - 70 MB/s for Samsung SSD 850 PRO drives.

 

After enabling S2D write performance drops to 40 - 44 MB/s (see attachment)

13 Replies

Thanks for evaluating S2D Clusters on Server 2019.

This configuration does not meet the fundamental requirement of S2D, as:

  • SSDs used are non-PLP, and
  • Nodes are heterogeneous.

 Please go over this blog, https://blogs.technet.microsoft.com/filecab/2016/11/18/dont-do-it-consumer-ssd for more details.

 

Also, please refer this article as well, on evaluating storage perf:

https://blogs.technet.microsoft.com/josebda/2014/08/18/using-file-copy-to-measure-storage-performanc...

 

~Girdhar Beriwal

best response confirmed by Denis Dyagilev (Brass Contributor)
Solution

Hi ,

Your nodes don’t comply with S2D requirements. Additionally, I would not recommend to measure performance by windows file copying, you’ll find arguments here:

https://blogs.technet.microsoft.com/josebda/2014/08/18/using-file-copy-to-measure-storage-performanc...

Better use DiskSPD from MS.

 

Regarding the storage solution, you can look at virtual SAN vendors. I have a good experience of using Starwind vSAN for 2 servers cluster. The performance is better and no problem with configuration. You can find guide here:

https://www.starwindsoftware.com/resource-library/starwind-virtual-san-hyperconverged-2-node-scenari...

Hello!

Thank You about Your comment! I understand, that my lab setup does not meet these requirements, but still I believe that fundamental thinks should work with such a setup too. The main point for me was to check if this tehnology works before invest into new and quite expensive parts.

Anyway, now I have rebuilt my setup using two intel S2600WTF/Y boxes and Intel CPUs. Initially each of them had two 512 GB SSD drives for S2D. I configured S2D with automatic settings successfully. After providing some performance tests I got much better results than earlier. Actually really acceptable results (even up to little more than 200MB/s write speed).

Next I moved some VMs to the S2D and enabled High Availability for them. I provided some crush tests as well and they succeed, all worked great.

BUT then I faced new problems. I wanted to add four new 1 TB SSD drives per each node and extend my pool. I did reset all these drives and connected to servers.
1) First strange thing was, that they automatically were added to my S2D pool even I was previously disabled autopooling (Get-StorageSubSystem Clu* | Set-StorageHealthSetting -Name “System.Storage.PhysicalDisk.AutoPool.Enabled” -Value False).
2) Second and the most important - my SSD tier statistics shows available space only for 670 GB, but I connected 8 x 1TB SSD drives and using mirrored storage it should be able to allocate around 4 TB! I run Optimize-StoragePool and this did not helped.
3) I connected another SSD drive for other purposes and it again automatically got pooled. I tried to remove it from S2D pool, but this also was unsuccessful. The disk stuck into Primordial Pool. Things I did to try to get disk out of pool:
$pool = Get-StoragePool S2D*
$disk = Get-PhysicalDisk -SerialNumber "XXXXXXXXXXXXXXXXXXX"

$disk | Set-PhysicalDisk -Usage Retired

$vdisk=Get-VirtualDisk

Repair-VirtualDisk $vdisk.FriendlyName

Get-StorageJob

Get-StoragePool S2D* | Remove-PhysicalDisk -PhysicalDisks $disk

Set-ClusterS2DDisk -CanBeClaimed $true -PhysicalDiskGuid $disk.UniqueId

$disk | Reset-PhysicalDisk
Thanks! I tried to use DiskSPD and it works good and it seems simulates workload quite near to real world.

Can you please send the output of following cmdlets:

1. Get-StoragePool

2. Get-PhysicalDisk

 

~Girdhar

Hi, Girdhar!

 

I physically removed from server the disk that accidentally went into S2D and then stuck in Primordial pool. Then I cleared it on another PC and created new partition. Then put back in cluter server and finally had option to use it without pooling.

But I plan to replace s2d 512 GB ssd driveswith larger ones so I still need to find option how to correctly remove disk from pool.

PS C:\Windows\system32> Get-StoragePool

 

FriendlyName OperationalStatus HealthStatus IsPrimordial IsReadOnly Size AllocatedSize

------------ ----------------- ------------ ------------ ---------- ---- -------------

Primordial OK Healthy True False 72.9 TB 9.31 TB

S2D on hc-cluster-1 OK Healthy False False 9.31 TB 1.84 TB

Primordial OK Healthy True False 11.53 TB 9.31 TB

 

 

PS C:\Windows\system32> Get-PhysicalDisk

DeviceId FriendlyName SerialNumber MediaType CanPool OperationalStatus HealthStatus Usage Size

-------- ------------ ------------ --------- ------- ----------------- ------------ ----- ----

22 ATA INTEL SSDSC2BB24 PHDV7171021B240AGN SSD False OK Healthy Auto-Select 223.57 GB

1004 Samsung SSD 850 PRO 512GB S250NSAG432476E SSD False OK Healthy Auto-Select 476.94 GB

1003 Samsung SSD 840 PRO Series S1AXNSAD800683Y SSD False OK Healthy Auto-Select 476.94 GB

1010 Samsung SSD 850 PRO 1TB S252NWAG304907F SSD False OK Healthy Auto-Select 953.87 GB

2016 Samsung SSD 850 PRO 512GB S250NSAG432479X SSD False OK Healthy Auto-Select 476.94 GB

1009 Samsung SSD 850 PRO 1TB S252NEAG301324Y SSD False OK Healthy Auto-Select 953.87 GB

2015 Samsung SSD 840 PRO Series S1AXNSAF111936H SSD False OK Healthy Auto-Select 476.94 GB

1008 Samsung SSD 850 PRO 1TB S252NWAG304891D SSD False OK Healthy Auto-Select 953.87 GB

2000 ATA Samsung SSD 850 S1SRNWAF913328T SSD False OK Healthy Auto-Select 953.87 GB

1007 Samsung SSD 850 PRO 1TB S252NWAG403194P SSD False OK Healthy Auto-Select 953.87 GB

2019 ATA Samsung SSD 850 S1SRNWAF914370B SSD False OK Healthy Auto-Select 953.87 GB

2020 ATA Samsung SSD 850 S2BBNEAG113774L SSD False OK Healthy Auto-Select 953.87 GB

2021 ATA Samsung SSD 850 S2BBNEAG113775K SSD False OK Healthy Auto-Select 953.87 GB

 

 

 

Hi Uedgars,

 

Actually I was looking for physical disk and storage pools when you got into the bad state, to understand why System.Storage.PhysicalDisk.AutoPool.Enabled property value was not honored. Let me know, if you still face issues with it in future.

 

Now that you have fixed, the better way of removing the disk from the pool is Remove-PhysicalDisk as you tried earlier, though you have to specify the full FriendlyName of the S2D StoragePool rather than S2d*. Once this succeeds, you will see the CanPool value of the disk to be True.

 

Let me know, if this doesn't work for you.

Also, if you feel, system is not behaving as you expect, please follow this link and share the zip file with us. We try to collect needed information, so that there is no back and forth 🙂

 

Thanks

Girdhar

Hello!

 

Remove-PhysicalDisk worked for me even with asterisk (SSD*) and ssd get moved from s2d pool to primordial pool, but the problem is with the next step. I want to get out disks also from primordial pool to see them in disk management and use as standalone disks in windows system. To do this I understood I need to use command Set-ClusterS2DDisk -CanBeClaimed $false, but I got error. I provided command and error message below.

Shortly: I had 4x 512GB ssds (2 pcs in each server, I have two). Then I added 8 more ssd drives with size 1tb each (4 pcs in each server). Then I had problem, that I am unable to use all of these space (problem I described earlier). As I was no success to extend my volume, I decided to remove 512gb ssds out of the pool and see what happens. Then I run commands Set-PhysicalDisk -Usage Retired, Repair-VirtualDisk and Remove-Physical disk. So far all worked good. And then finally I wanted to get these 512GB disks out of the Primordial pool using Set-ClusterS2DDisk -CanBeClaimed $false, but it was unsuccessful. I got error.

! Interesting thing is, that after 512GB ssd removal my StorageTier allowed maximum size changed and now it is around 2.8TB As I already have 930 GB volume that I want to extend, it means Tiear allows around 3.7TB This sounds much better and I believe it is the maximum for 8x1tb drives in mirror. But it is still strange, that with 4x512gb + 8x1tb my tier max size was only around 1.5TB

 

$disk=get-physicaldisk -FriendlyName "*Samsung SSD*" | ? {$_.size -eq "512110190592" -and $_.deviceid -ne 0}
Set-ClusterS2DDisk -CanBeClaimed $false -PhysicalDisk $disk
Set-ClusterS2DDisk : Failed to set cache mode on disks connected to node 'h11'. Run cluster validation, including the Storage Spaces
Direct tests, to verify the configuration
At line:2 char:1
+ Set-ClusterS2DDisk -CanBeClaimed $false -PhysicalDisk $disk
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Set-ClusterStorageSpacesDirectDisk], CimException
+ FullyQualifiedErrorId : HRESULT 0x8007139f,Microsoft.Management.Infrastructure.CimCmdlets.InvokeCimMethodCommand,Set-ClusterSt
orageSpacesDirectDisk

 

Physical Disks now looks like this.

PS C:\Windows\system32> get-physicaldisk -FriendlyName "*Samsung SSD*" | ? {$_.size -eq "512110190592" -and $_.deviceid -ne 0}

DeviceId FriendlyName SerialNumber MediaType CanPool OperationalStatus HealthStatus Usage Size
-------- ------------ ------------ --------- ------- ----------------- ------------ ----- ----
1004 Samsung SSD 850 PRO 512GB S250NSAG432476E SSD True OK Healthy Auto-Select 476.94 GB
1003 Samsung SSD 840 PRO Series S1AXNSAD800683Y SSD True OK Healthy Auto-Select 476.94 GB
2016 Samsung SSD 850 PRO 512GB S250NSAG432479X SSD True OK Healthy Auto-Select 476.94 GB
2015 Samsung SSD 840 PRO Series S1AXNSAF111936H SSD True OK Healthy Auto-Select 476.94 GB

 

PS C:\Windows\system32> get-physicaldisk -FriendlyName "*Samsung SSD*" | ? {$_.size -eq "512110190592" -and $_.deviceid -ne 0} | Get-StoragePool

FriendlyName OperationalStatus HealthStatus IsPrimordial IsReadOnly Size AllocatedSize
------------ ----------------- ------------ ------------ ---------- ---- -------------
Primordial OK Healthy True False 11.53 TB 7.45 TB
Primordial OK Healthy True False 11.53 TB 7.45 TB
Primordial OK Healthy True False 11.53 TB 7.45 TB
Primordial OK Healthy True False 11.53 TB 7.45 TB

And actually there is one more important question before I start to extend my storage.

At the beginning I had 4x512gb ssd drives. I enabled S2D without cache and it automatically created pool and tiers. When I started to add drives and faced problems posted here, I found there exists tier parameter column number. I figured out for the default tier template this parameter is set to auto. But for my tiered volume column number has value 2. Now, when I have 8 ssd drives (4 on each server), it was better for performance and drive wear equalization to set column number to 4 so all 4 drives at each server forms one stripe. Is it even possible to do? I was unable to find detailed specs about s2d operation in this level. And unfortunatelly some forums gave info that it is not possible to change column count after volume is created. Is it true? And if so, are there any technical recommendation how to choose this value?

Well as you figured it out, updating the Column count post volume creation is not possible.

Re-creating the volume should take the correct Column count. 

Can you try Set-ClusterS2DDisk -CanBeClaimed:$false -PhysicalDisk $disk

Note the colon.

 

On the error: Set-ClusterS2DDisk : Failed to set cache mode on disks connected to node 'h11'

Have your created cache tiers manually?

 

Also, after running set-ClusterS2dDisk, check the Get-Disk output to see available disk. 

I tried to run command using column but it still returns the same error.

 

PS C:\Windows\system32> $disk=get-physicaldisk -FriendlyName "*Samsung SSD*" | ? {$_.size -eq "512110190592" -and $_.deviceid -ne 0}
Set-ClusterS2DDisk -CanBeClaimed:$false -PhysicalDisk $disk
Set-ClusterS2DDisk : Failed to set cache mode on disks connected to node 'h11'. Run cluster validation, including the Storage Spaces Direct tests, to verify the configuration
At line:2 char:1
+ Set-ClusterS2DDisk -CanBeClaimed:$false -PhysicalDisk $disk
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [Set-ClusterStorageSpacesDirectDisk], CimException
+ FullyQualifiedErrorId : HRESULT 0x8007139f,Microsoft.Management.Infrastructure.CimCmdlets.InvokeCimMethodCommand,Set-ClusterStorageSpacesDirectDisk

 

I enabled S2D with cache disabled (Enable-ClusterS2D -cachestate disabled)

I even did not know about cache tiers. How can I check their status? If I use Get-StorageTiers I only see my storage tier:

PS C:\Windows\system32> get-storagetier

FriendlyName TierClass MediaType ResiliencySettingName FaultDomainRedundancy Size FootprintOnPool StorageEfficiency
------------ --------- --------- --------------------- --------------------- ---- --------------- -----------------
Capacity Unknown SSD Mirror 1 0 B 0 B
MirrorOnSSD Unknown SSD Mirror 1 0 B 0 B
ssd-volume-1-MirrorOnSSD Capacity SSD Mirror 1 930 GB 1.82 TB 50.00%

 

Oh, I remembered I have turned deduplication on. Might be it disturbs something?

 

PS C:\Windows\system32> Get-DedupStatus | fl *


ObjectId : \\?\Volume{079e5b9b-7f17-4bea-bbd6-6de7bed066fd}\
Capacity : 998512787456
FreeSpace : 503428747264
InPolicyFilesCount : 13
InPolicyFilesSize : 868141497962
LastGarbageCollectionResult : 0
LastGarbageCollectionResultMessage : The operation completed successfully.
LastGarbageCollectionTime : 1/26/2019 4:39:45 AM
LastOptimizationResult : 0
LastOptimizationResultMessage : The operation completed successfully.
LastOptimizationTime : 1/28/2019 10:41:21 AM
LastScrubbingResult : 0
LastScrubbingResultMessage : The operation completed successfully.
LastScrubbingTime : 1/26/2019 4:40:48 AM
OptimizedFilesCount : 13
OptimizedFilesSavingsRate : 50
OptimizedFilesSize : 868141497962
SavedSpace : 440480093290
SavingsRate : 47
UnoptimizedSize : 935564133482
UsedSpace : 495084040192
Volume : C:\ClusterStorage\ssd-volume-1
VolumeId : \\?\Volume{079e5b9b-7f17-4bea-bbd6-6de7bed066fd}\
PSComputerName :
CimClass : ROOT/Microsoft/Windows/Deduplication:MSFT_DedupVolumeStatus
CimInstanceProperties : {Capacity, FreeSpace, InPolicyFilesCount, InPolicyFilesSize...}
CimSystemProperties : Microsoft.Management.Infrastructure.CimSystemProperties

So, If I have column count 2, it means S2D takes two drives for a stripe per server. And only when these drives are full, it starts to write into rest pair of drives. Right?

Then what happens, if I leave column count 2 and create a second tiered volume also with column count 2? Do S2D understands less loaded drives and distributes this volume around empty ones?

Or from performance perspective it is better to set column count of 4 for my setup? (I understand, that if it is set to 4, I can extend my tiered volume only if I add appropriate count of drives)

1 best response

Accepted Solutions
best response confirmed by Denis Dyagilev (Brass Contributor)
Solution

Hi ,

Your nodes don’t comply with S2D requirements. Additionally, I would not recommend to measure performance by windows file copying, you’ll find arguments here:

https://blogs.technet.microsoft.com/josebda/2014/08/18/using-file-copy-to-measure-storage-performanc...

Better use DiskSPD from MS.

 

Regarding the storage solution, you can look at virtual SAN vendors. I have a good experience of using Starwind vSAN for 2 servers cluster. The performance is better and no problem with configuration. You can find guide here:

https://www.starwindsoftware.com/resource-library/starwind-virtual-san-hyperconverged-2-node-scenari...

View solution in original post