S2D Performance Troubleshooting

%3CLINGO-SUB%20id%3D%22lingo-sub-139787%22%20slang%3D%22en-US%22%3ES2D%20Performance%20Troubleshooting%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-139787%22%20slang%3D%22en-US%22%3E%3CP%3EGood%20day%20all!%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EI'm%20working%20on%20trying%20to%20troubleshoot%20some%20S2D%20performance%20issues%20(or%20so%20I%20assume%20they%20are%20issues).%20I'm%20seeing%20very%20low%20write%20speeds.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EMy%20Setup%3A%3C%2FP%3E%0A%3CP%3E4x%20Nodes%20with%20each%20node%20having%3C%2FP%3E%0A%3CP%3E-%204x%202TB%20Samsung%20850%20Pro%3C%2FP%3E%0A%3CP%3E-%208x%204TB%20HGST%207.2k%20RPD%20rust%3C%2FP%3E%0A%3CP%3E-%2064GB%20Memory%3C%2FP%3E%0A%3CP%3E-%20E5-2670v2%20processors%3C%2FP%3E%0A%3CP%3E-%20LSI%2FAvago%2FBroadcom%2FWhoever%20they%20are%20today%209300-8i%20HBA%20running%20latest%20firmware%3C%2FP%3E%0A%3CP%3E-%202X%20Mellanox%20ConnextX3%20flashed%20with%202.4.5030%20firmware%20(read%20the%20latest%20was%20buggy)%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EMy%20initial%20configuration%20was%20100%25%203-way%20mirror%20of%20HDD%20with%20SSD%20as%20cache.%20Saw%20poor%20performance%20so%20I%20yanked%20all%20the%20HDD%20and%20created%20a%203-way%20mirror%20out%20of%20just%20SSD.%20Performance%20wasn't%20much%20better.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3ERead%20performance%20isn't%20horrible%2C%20however%20when%20transferring%20a%20VHDX%20to%20S2D%2C%20I%20was%20only%20getting%20about%2075-100MB%2Fs.%20I%20realize%20that%20file%20transfers%20is%20not%20an%20optimal%20test%2C%20however%20at%20those%20speeds%20it%20will%20literally%20take%20ages%20from%20my%20to%20live%20migrate%20VM's%20to%20storage%20with%20this%20performance.%20I%20could%20also%20tell%20the%20VM's%20running%20on%20S2D%20were%20impacted%20by%20the%20import%20of%20other%20VM's%20as%20well%20so%20I'm%20fairly%20certain%20something%20wrong.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EAll%20servers%20are%20reporting%20RDMA%20%3D%20True%20on%20all%20NIC's.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EIf%20I%20do%20a%20live%20migration%20from%20source%20storage%20to%20a%20different%20destination%2C%20transfer%20rates%20are%20600-700MB%2Fs%2C%20so%20I%20know%20the%20source%20is%20not%20the%20bottleneck.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EAny%20ideas%20as%20to%20what%20I%20may%20be%20doing%20wrong%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-139787%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EClustering%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EHyper-V%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EStorage%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EWindows%20Server%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-2007271%22%20slang%3D%22en-US%22%3ERe%3A%20S2D%20Performance%20Troubleshooting%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2007271%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F104755%22%20target%3D%22_blank%22%3E%40Dave%20Smith%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI'm%20with%20you%20on%20trying%20to%20understand%20what's%20normal%20per%20performance%2C%20or%20how%20to%20test%20or%20fine%20tune%20it%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3Elike%20my%20old%202012%20sata%20hdd%20only%20arrays%20on%20hardware%20raid%206%20are%20still%20kicking%20the%20pants%20of%20my%20four%20node%202019%20s2d%20with%20nvme%2Bssd%2Bhdd%20(journal%2C%20performance%203-way%20mirror%2C%20capacity%20dual%20parity%20(what%20s2d%20does%20when%20you%20have%20four%20nodes)%3B%20and%20setup%20ReFS%20if%20that%20matters)%3B%20have%202%20x%2010g%20nic%20on%20each%20node%20for%20cluster%20only%20traffic%2C%20and%201%20x%202g%20nic%20team%20for%20clusterAndClient%20(and%20thinking%20of%20adding%20a%201g%20nic%20for%20clusterAndClient%2C%20and%20thus%20make%20the%20nic%20team%20none%20cluster%20traffic)%3B%20even%20set%20my%2010g%20nics%20to%20allow%204k%20jumbo%20packets%2C%20and%20my%20network%20latency%20is%20def%20under%205ms%20(and%20all%20are%20rdma)%3B%20yes%20yes%2C%20all%20very%20vague%20(but%20conceptually%2C%20a%20proper%20s2d%20setup%20with%20no%20test-cluster%20problems)%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3Eunsure%20if%20I'm%20getting%20a%20lot%20of%20cache%20miss%3A%20I%20see%20maybe%20one%20to%20twenty%20every%20couple%20seconds%2C%20but%20note%20that%20I'm%20dumping%20everything%20on%20it%20for%20a%20sustained%20period%20of%20time%20(robocopy%20seeding%20to%20a%20dozen%20csv%2Fvd%20for%20weeks)%3B%20unsure%20if%20we're%20not%20supposed%20to%20see%20any%20at%20all%20to%20be%20right%20sized%3B%20i.e.%20only%20have%20a%20pair%20of%20nvme%20in%20each%20node%2C%20maybe%20I%20should%20up%20that%20to%20four%3F%20%26nbsp%3B(overkill%20won't%20be%20bad%20if%20my%20migrate%20is%20done%2C%20and%20normal%20user%20usage%20would%20have%20been%20fine%20with%20just%20a%20pair%20on%20each%20node)%3B%20see%20my%26nbsp%3BClusterNode.SblCache.Iops.Read.Miss%20below%20(makes%20sense%3A%2080%2Fs%20would%20be%20total%20for%20my%20seeing%20maybe%2020%20on%20any%20one%20node%20via%20perfmon)%2C%20but%20why%20doesn't%20get-clusterPerf%20show%20a%20stat%20for%20cache%20write%20(what%20I%20think%20my%20problem%20might%20be)%3F%20%26nbsp%3BI'm%20guessing%20the%20cache%20size%20dirty%20vs%20total%20not%20being%20the%20same%20means%20I'm%20not%20using%20all%20of%20my%20cache%20(bad%20work%20%22dirty%22%3B%20should%20be%20%22currentlyBeingUsed%22)%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3Emy%20performance%20layer%20is%2010%25%20of%20the%20capacity%20layer%3B%20i.e.%20could%20add%20four%20more%20ssd%20to%20each%20node%20to%20get%20that%20up%20to%2012%25%2C%20so%20unsure%20if%20I%20should%20add%20more%3F%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3Enot%20running%20storageReplica%20(if%20I%20had%20an%20equal%20cluster%2C%20would%20be%20fabulous%20just%20in%20case%20server)%2C%20but%20I%20am%20running%20dedup%20(but%20cpu%20and%20memory%20are%20always%20low%20usage)%3B%20not%20running%20any%20vm%20right%20now%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EPS%20%26gt%3B%20get-clusterPerf%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ESeries%20Time%20Value%20Unit%3CBR%20%2F%3E------%20----%20-----%20----%3CBR%20%2F%3EClusterNode.Cpu.Usage%2012%2F21%2F2020%2011%3A31%3A19%2010.15%20%25%3CBR%20%2F%3EClusterNode.Cpu.Usage.Host%2012%2F21%2F2020%2011%3A31%3A19%2010.15%20%25%3CBR%20%2F%3EClusterNode.CsvCache.Iops.Read.Hit%2012%2F21%2F2020%2011%3A31%3A20%200%20%2Fs%3CBR%20%2F%3EClusterNode.CsvCache.Iops.Read.HitRate%2012%2F21%2F2020%2011%3A31%3A20%20100%20%25%3CBR%20%2F%3EClusterNode.CsvCache.Iops.Read.Miss%2012%2F21%2F2020%2011%3A31%3A20%200%20%2Fs%3CBR%20%2F%3EClusterNode.Memory.Available%2012%2F21%2F2020%2011%3A31%3A19%201.17%20TB%3CBR%20%2F%3EClusterNode.Memory.Total%2012%2F21%2F2020%2011%3A31%3A19%201.5%20TB%3CBR%20%2F%3EClusterNode.Memory.Usage%2012%2F21%2F2020%2011%3A31%3A19%20340.48%20GB%3CBR%20%2F%3EClusterNode.Memory.Usage.Host%2012%2F21%2F2020%2011%3A31%3A19%20340.48%20GB%3CBR%20%2F%3EClusterNode.SblCache.Iops.Read.Hit%2012%2F21%2F2020%2011%3A31%3A20%20943%20%2Fs%3CBR%20%2F%3EClusterNode.SblCache.Iops.Read.HitRate%2012%2F21%2F2020%2011%3A31%3A20%2092.14%20%25%3CBR%20%2F%3EClusterNode.SblCache.Iops.Read.Miss%2012%2F21%2F2020%2011%3A31%3A20%2080%20%2Fs%3CBR%20%2F%3EPhysicalDisk.Cache.Size.Dirty%2012%2F21%2F2020%2011%3A31%3A16%203.75%20TB%3CBR%20%2F%3EPhysicalDisk.Cache.Size.Total%2012%2F21%2F2020%2011%3A31%3A16%2022.9%20TB%3CBR%20%2F%3EPhysicalDisk.Capacity.Size.Total%2012%2F21%2F2020%2011%3A31%3A25%201.38%20PB%3CBR%20%2F%3EPhysicalDisk.Capacity.Size.Used%2012%2F21%2F2020%2011%3A31%3A25%20730.61%20TB%3CBR%20%2F%3EVolume.IOPS.Read%2012%2F21%2F2020%2011%3A31%3A25%209%20%2Fs%3CBR%20%2F%3EVolume.IOPS.Total%2012%2F21%2F2020%2011%3A31%3A25%20397%20%2Fs%3CBR%20%2F%3EVolume.IOPS.Write%2012%2F21%2F2020%2011%3A31%3A25%20388%20%2Fs%3CBR%20%2F%3EVolume.Latency.Average%2012%2F21%2F2020%2011%3A31%3A25%2016.09%20ms%3CBR%20%2F%3EVolume.Latency.Read%2012%2F21%2F2020%2011%3A31%3A25%203.06%20ms%3CBR%20%2F%3EVolume.Latency.Write%2012%2F21%2F2020%2011%3A31%3A25%2016.38%20ms%3CBR%20%2F%3EVolume.Size.Available%2012%2F21%2F2020%2011%3A31%3A25%2047.82%20TB%3CBR%20%2F%3EVolume.Size.Total%2012%2F21%2F2020%2011%3A31%3A25%20298.26%20TB%3CBR%20%2F%3EVolume.Throughput.Read%2012%2F21%2F2020%2011%3A31%3A25%20727.8%20KB%2FS%3CBR%20%2F%3EVolume.Throughput.Total%2012%2F21%2F2020%2011%3A31%3A25%2019.45%20MB%2FS%3CBR%20%2F%3EVolume.Throughput.Write%2012%2F21%2F2020%2011%3A31%3A25%2018.74%20MB%2FS%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3Enot%20an%20answer%2C%20just%20joining%20conversation%20(nudging%20your%20years%20old%20stale%20question)%3C%2FP%3E%3C%2FLINGO-BODY%3E
Occasional Visitor

Good day all!

 

I'm working on trying to troubleshoot some S2D performance issues (or so I assume they are issues). I'm seeing very low write speeds.

 

My Setup:

4x Nodes with each node having

- 4x 2TB Samsung 850 Pro

- 8x 4TB HGST 7.2k RPD rust

- 64GB Memory

- E5-2670v2 processors

- LSI/Avago/Broadcom/Whoever they are today 9300-8i HBA running latest firmware

- 2X Mellanox ConnextX3 flashed with 2.4.5030 firmware (read the latest was buggy)

 

My initial configuration was 100% 3-way mirror of HDD with SSD as cache. Saw poor performance so I yanked all the HDD and created a 3-way mirror out of just SSD. Performance wasn't much better.

 

Read performance isn't horrible, however when transferring a VHDX to S2D, I was only getting about 75-100MB/s. I realize that file transfers is not an optimal test, however at those speeds it will literally take ages from my to live migrate VM's to storage with this performance. I could also tell the VM's running on S2D were impacted by the import of other VM's as well so I'm fairly certain something wrong.

 

All servers are reporting RDMA = True on all NIC's.

 

If I do a live migration from source storage to a different destination, transfer rates are 600-700MB/s, so I know the source is not the bottleneck.

 

Any ideas as to what I may be doing wrong?

1 Reply

@Dave Smith 

 

I'm with you on trying to understand what's normal per performance, or how to test or fine tune it

 

like my old 2012 sata hdd only arrays on hardware raid 6 are still kicking the pants of my four node 2019 s2d with nvme+ssd+hdd (journal, performance 3-way mirror, capacity dual parity (what s2d does when you have four nodes); and setup ReFS if that matters); have 2 x 10g nic on each node for cluster only traffic, and 1 x 2g nic team for clusterAndClient (and thinking of adding a 1g nic for clusterAndClient, and thus make the nic team none cluster traffic); even set my 10g nics to allow 4k jumbo packets, and my network latency is def under 5ms (and all are rdma); yes yes, all very vague (but conceptually, a proper s2d setup with no test-cluster problems)

 

unsure if I'm getting a lot of cache miss: I see maybe one to twenty every couple seconds, but note that I'm dumping everything on it for a sustained period of time (robocopy seeding to a dozen csv/vd for weeks); unsure if we're not supposed to see any at all to be right sized; i.e. only have a pair of nvme in each node, maybe I should up that to four?  (overkill won't be bad if my migrate is done, and normal user usage would have been fine with just a pair on each node); see my ClusterNode.SblCache.Iops.Read.Miss below (makes sense: 80/s would be total for my seeing maybe 20 on any one node via perfmon), but why doesn't get-clusterPerf show a stat for cache write (what I think my problem might be)?  I'm guessing the cache size dirty vs total not being the same means I'm not using all of my cache (bad work "dirty"; should be "currentlyBeingUsed")?

 

my performance layer is 10% of the capacity layer; i.e. could add four more ssd to each node to get that up to 12%, so unsure if I should add more?

 

not running storageReplica (if I had an equal cluster, would be fabulous just in case server), but I am running dedup (but cpu and memory are always low usage); not running any vm right now

 

 

PS > get-clusterPerf

 

Series Time Value Unit
------ ---- ----- ----
ClusterNode.Cpu.Usage 12/21/2020 11:31:19 10.15 %
ClusterNode.Cpu.Usage.Host 12/21/2020 11:31:19 10.15 %
ClusterNode.CsvCache.Iops.Read.Hit 12/21/2020 11:31:20 0 /s
ClusterNode.CsvCache.Iops.Read.HitRate 12/21/2020 11:31:20 100 %
ClusterNode.CsvCache.Iops.Read.Miss 12/21/2020 11:31:20 0 /s
ClusterNode.Memory.Available 12/21/2020 11:31:19 1.17 TB
ClusterNode.Memory.Total 12/21/2020 11:31:19 1.5 TB
ClusterNode.Memory.Usage 12/21/2020 11:31:19 340.48 GB
ClusterNode.Memory.Usage.Host 12/21/2020 11:31:19 340.48 GB
ClusterNode.SblCache.Iops.Read.Hit 12/21/2020 11:31:20 943 /s
ClusterNode.SblCache.Iops.Read.HitRate 12/21/2020 11:31:20 92.14 %
ClusterNode.SblCache.Iops.Read.Miss 12/21/2020 11:31:20 80 /s
PhysicalDisk.Cache.Size.Dirty 12/21/2020 11:31:16 3.75 TB
PhysicalDisk.Cache.Size.Total 12/21/2020 11:31:16 22.9 TB
PhysicalDisk.Capacity.Size.Total 12/21/2020 11:31:25 1.38 PB
PhysicalDisk.Capacity.Size.Used 12/21/2020 11:31:25 730.61 TB
Volume.IOPS.Read 12/21/2020 11:31:25 9 /s
Volume.IOPS.Total 12/21/2020 11:31:25 397 /s
Volume.IOPS.Write 12/21/2020 11:31:25 388 /s
Volume.Latency.Average 12/21/2020 11:31:25 16.09 ms
Volume.Latency.Read 12/21/2020 11:31:25 3.06 ms
Volume.Latency.Write 12/21/2020 11:31:25 16.38 ms
Volume.Size.Available 12/21/2020 11:31:25 47.82 TB
Volume.Size.Total 12/21/2020 11:31:25 298.26 TB
Volume.Throughput.Read 12/21/2020 11:31:25 727.8 KB/S
Volume.Throughput.Total 12/21/2020 11:31:25 19.45 MB/S
Volume.Throughput.Write 12/21/2020 11:31:25 18.74 MB/S

 

 

not an answer, just joining conversation (nudging your years old stale question)