Storage Replica - Anyone using it successfully?

Brass Contributor

I read about Storage Replica about a year ago and finally got around to setting up a demo in my environment. From my experience so far, this seems like it's still in preview. Am I doing something wrong?

 

I've created a 4 node stretch cluster using iSCSI storage between my on-prem datacenter and Azure subscription. Getting the data disk sizes to match up was a pain in the rear since which metric is actually used to define the size of the disk isn't clearly explained.

 

I finally got passed that and got a File Server role added to the cluster and replication setup. Only to find that I can't move the owner of the role from the on-prem site to the remote site using the Failover Cluster Manager. So, I figured I'd try the Windows Admin Center since it includes a section for Storage Replica. That does not successfully "reverse replication" either. It just changes the replication from ONPREM01 -> AZURE01 to ONPREM01->ONPREM02. And as you can imagine the automatic failover doesn't work either.

 

Has anyone had success with the feature? Are you using it in production? This seems like a great hardware agnostic native data replication solution, but so far my experience is dismal and I can barely find any documentation on it outside of Microsoft Learn.

1 Reply

I recently put it in place in production to replace a DFS-R replication on older file servers. I just posted this in a separate thread, but there is minimal documentation available from Microsoft aside from the initial configuration. Importantly, I've found that any reboot or networking hiccup kicks the replication into ReplicationSuspended status, which requires a "Get-SRGroup | Sync-SRGroup" command to restart the sync. To get it running reliably, I've found I need a startup script to run that command after each reboot, as well as monitoring the event log as described here: https://muzahid.medium.com/windows-storage-replica-monitor-using-powershell-or-any-windows-event-mon... and scripting "Get-SRGroup | Sync-SRGroup" into the response to Event ID 5014 as well. There should be a heads up in the Microsoft documentation of Storage Replica that this is required, so users don't find out much later that their replications have been in ReplicationSuspended status for a long time. And make sure users know what that status means and what to do with it - descriptions of the possible statuses is also missing from the current documentation. Other than the poor documentation and the need for frequent manual restarts, it works well for my application.

 

I'm using it in conjunction with DFS-N, and to reverse direction/change the active server in the event of a failure is a manual process, using the script provided here: https://www.reddit.com/r/storage/comments/pxdiu3/server_2019_storage_replica_dfs_namespace_and/