Storage Replica Async Performance

%3CLINGO-SUB%20id%3D%22lingo-sub-334427%22%20slang%3D%22en-US%22%3ERe%3A%20Storage%20Replica%20Async%20Performance%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-334427%22%20slang%3D%22en-US%22%3E%3CP%3E%3CSPAN%3EIf%20you%20enable%20consistency%20groups%2C%20it%20may%20decrease%20replication%20and%20write%20Input%2FOutput%20performance.%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%3EThe%20best%20practice%20at%20this%20case%20is%20maintain%20as%20few%20disks%20as%20possible%20in%20same%20replication%20group.%202%3A1%20or%203%3A1%20(data%20disks%20%3A%20SR%20log%20disk)%20it's%20a%20good%20ratio.%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%3ERef.%3A%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fpowershell%2Fmodule%2Fstoragereplica%2Fnew-srgroup%3Fview%3Dwin10-ps%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Ehttps%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fpowershell%2Fmodule%2Fstoragereplica%2Fnew-srgroup%3Fview%3Dwin10-ps%3C%2FA%3E%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-334280%22%20slang%3D%22en-US%22%3ERe%3A%20RE%3A%20Storage%20Replica%20Async%20Performance%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-334280%22%20slang%3D%22en-US%22%3E%3CP%3EHi%2C%20I%20am%20doing%20a%20POC%20using%20SR%20(sync%20mode)%20for%20a%20new%20SQL%20stretch%20cluster.%20With%20SR%20enabled%20the%20IOPs%20from%20diskspd%20runs%20drops%20from%201064%20I%2FO%20per%20s%2C%20to%20just%20125.%20is%20that%20normal%3F%20such%20a%20large%20drop%20in%20performance.%20I%20have%20tried%20both%20sync%20and%20async%20modes%20and%20both%20give%20similar%20diskspd%20results.%3C%2FP%3E%3CP%3EI'm%20using%20disks%20from%20an%20all%20flash%20SAN%20via%20FC%2C%20so%20shouldn't%20it%20be%20going%20as%20fast%20as%20the%20SSD%20storage%20can%20go%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-302073%22%20slang%3D%22en-US%22%3ERe%3A%20RE%3A%20Storage%20Replica%20Async%20Performance%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-302073%22%20slang%3D%22en-US%22%3E%3CP%3EI'm%20testing%20SR%20Async%20on%20Windows%20Server%202019%201809%20and%20there%20is%20still%20a%20significant%20write%20penalty%20when%20Storage%20Replica%20is%20enabled.%20I'm%20using%20a%20pair%20of%20NVMe%20drives%20(simple%20volume)%20for%20my%20log%20that%20can%20sustain%20%26gt%3B3.5GB%2Fs%204K%20write%20performance.%20My%20data%20volume%20gets%20~1.6GB%2Fs%204K%20sequential%20writes%20and%20174MB%2Fs%20random%20before%20SR%20is%20enabled%2C%20but%20only%20700MB%2Fs%20seq%20and%2029MB%2Fs%20random%20writes%20after%20SR%20is%20enabled.%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAre%20there%20any%20details%20on%20how%20SR%20log%20performance%20was%20improved%20vs.%20Server%202016%3F%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-100552%22%20slang%3D%22en-US%22%3ERE%3A%20Storage%20Replica%20Async%20Performance%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-100552%22%20slang%3D%22en-US%22%3EYes%2C%20data%20writes%20still%20have%20to%20go%20through%20the%20local%20log.%20If%20the%20local%20log%20volume%20is%20slow%2C%20or%20at%20least%2C%20same%2Fslower%20than%20data%20volume%2C%20perf%20will%20suffer%20even%20without%20the%20secondary%20server%20trip%20and%20network.%20We%20are%20working%20on%20a%20much%20improved%20log%20with%20much%20better%20perf%20to%20minimize%20this%3B%20coming%20in%20a%20later%20release%20after%20RS3%20-%20it%20is%20our%20top%20priority.%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-100470%22%20slang%3D%22en-US%22%3EStorage%20Replica%20Async%20Performance%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-100470%22%20slang%3D%22en-US%22%3E%3CP%3EShould%20Storage%20Replica%20in%20async%20replication%20mode%20impact%20write%20performance%3F%20All%20of%20my%20testing%20has%20shown%20that%20it%20does%20(I%20am%20using%20dedicated%20data%20and%20log%20NVMe%20drives).%26nbsp%3B%20In%20sync%20mode%20write%20performance%20is%20expected%20to%20suffer%20but%20in%20async%20mode%20writes%20should%20ACK%20immediately%20once%20written%20to%20the%20local%20server%E2%80%99s%20disk.%20I%20had%20opened%20a%20MS%20support%20case%20and%20they%20did%20not%20find%20anything%20wrong%20with%20my%20config%2C%20but%20they%20can%E2%80%99t%20confirm%20or%20deny%20if%20this%20is%20expected%20behavior.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-100470%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAMA%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Occasional Contributor

Should Storage Replica in async replication mode impact write performance? All of my testing has shown that it does (I am using dedicated data and log NVMe drives).  In sync mode write performance is expected to suffer but in async mode writes should ACK immediately once written to the local server’s disk. I had opened a MS support case and they did not find anything wrong with my config, but they can’t confirm or deny if this is expected behavior.

4 Replies
Yes, data writes still have to go through the local log. If the local log volume is slow, or at least, same/slower than data volume, perf will suffer even without the secondary server trip and network. We are working on a much improved log with much better perf to minimize this; coming in a later release after RS3 - it is our top priority.

I'm testing SR Async on Windows Server 2019 1809 and there is still a significant write penalty when Storage Replica is enabled. I'm using a pair of NVMe drives (simple volume) for my log that can sustain >3.5GB/s 4K write performance. My data volume gets ~1.6GB/s 4K sequential writes and 174MB/s random before SR is enabled, but only 700MB/s seq and 29MB/s random writes after SR is enabled. 

 

Are there any details on how SR log performance was improved vs. Server 2016? 

Hi, I am doing a POC using SR (sync mode) for a new SQL stretch cluster. With SR enabled the IOPs from diskspd runs drops from 1064 I/O per s, to just 125. is that normal? such a large drop in performance. I have tried both sync and async modes and both give similar diskspd results.

I'm using disks from an all flash SAN via FC, so shouldn't it be going as fast as the SSD storage can go?

If you enable consistency groups, it may decrease replication and write Input/Output performance.

 

The best practice at this case is maintain as few disks as possible in same replication group. 2:1 or 3:1 (data disks : SR log disk) it's a good ratio.

 

Ref.: https://docs.microsoft.com/en-us/powershell/module/storagereplica/new-srgroup?view=win10-ps