First published on TECHNET on Aug 21, 2013
Hi folks,
Ned
here again. By now, you know that DFS Replication has some
major new features in Windows Server 2012 R2
. Today I talk about one of the most radical:
DFSR database cloning.
Prepare for a long post, this has a walkthrough…
The old ways are not always the best
DFSR – or any proper file replication technology - spends a great deal of time validating that servers have the same knowledge. This is critical to safe and reliable replication; if a server doesn’t know everything about a file, it can’t tell its partner about that file. In DFSR, we often refer to this “initial build” and “initial sync” processing as “initial replication”. DFSR needs to grovel files and folders, record their information in a database on that volume, exchange that information between nodes, stage files and create hashes, then transmit that data over the network. Even if you preseed the files on each server before configuring replication, the metadata transmissions are still necessary. Each server performs this initial build process locally, and then the non-authoritative server checks his work against an authoritative copy and reconciles the differences.
This process is necessarily very expensive. Heaps of local IO, oodles of network conversation, tons of serialized exchanges based on directory structures. As you add bigger and more complex datasets, initial replication gets slower. A replicated folder that contains tens of millions of preseeded files can take weeks to synchronize the databases, even with preseeding stopping the need to send the actual files.
Furthermore, there are times when you need to
recreate
replication of previously synchronized data, such as when:
1. Upgrading operating systems
2. Replacing computer hardware
3. Recovering from a disaster
4. Redesigning the replication topology
Any one of these requires re-running initial replication on at least one node. This has been the way of DFSR since Microsoft introduced it in Windows Server 2003 R2.
Cutting out the middle man
DFSR database cloning is an optional alternative to so-called classic initial replication. By providing each downstream server with an exported copy of the upstream server’s database and preseeded files, DFSR reduces or eliminates the need for over-the-wire metadata exchange. DFSR database cloning also provides multiple file validation levels to ensure reconciliation of files added, modified, or deleted after the database export but before the database import. After file validation, initial sync is now instantaneous if there are no differences. If there are differences, DFSR only has to synchronize the delta of changes as part of a shortened initial sync process.
We are talking about fundamental, state of the art performance improvements here, folks. To steal from my previous post, let’s compare a test run with ~10 terabytes of data in a single volume comprising 14,000,000 preseeded files:
“Classic” initial sync
Time to convergence
Preseeded
~24 days
Now, with DB cloning:
Validation Level
Time to export
Time to import
Improvement %
2 – Full
9 days, 0 hours
5 days, 10 hours
40%
1 – Basic
2 hours, 48 minutes
9 hours, 37 minutes
98%
0 – None
1 hour, 13 minutes
6 hours, 8 minutes
99%
I think we can actually do better than this – we found out recently that we’re having some CPU underperformance in our test hardware. I may be able to re-post even better numbers someday.
For instance, here I created exactly one million files and cloned that volume, using VMs running on a 3 year old test server.
Let’s examine the mainline case of creating a new replication topology using DB cloning:
1. You create a replication group and a replicated folder, then add a server as a member of that topology (but no partners, yet). This will be the “upstream” (source) server.
2. You let initial build complete
3. You export the cloned database from the upstream server
4. You preseed the files to the downstream (destination) server and copy in the exported clone DB files
5. You import the cloned database on the downstream server
6. You add the downstream server to the replication group and RF membership, just like classic DFSR
7. You let the initial sync validation complete
If you did everything right, step 7 is done instantly, and the server is now replicating normally for all further file additions, modifications, and deletions. It’s straightforward stuff, with only a handful of steps.
Walkthrough
Let’s get some hands-on with DB cloning. Below is a walkthrough using the new
DFSR Windows PowerShell module
and the mainstream “setting up a new replication topology” scenario.
Requirements and sample setup
Active Directory domain with at least one domain controller (does not need to run Windows Server 2012 R2)
AD schema updated to at least Windows Server 2008 R2 (there are no forest or domain functional level requirements)
Two file servers running Windows Server 2012 R2 and joined to the domain (Windows Server 2012 and earlier file servers cannot participate in cloning scenarios, but do support replication with Windows Server 2012 R2)
You can use virtualized DFSR servers or physical ones; it makes no difference. This walkthrough uses the following domain environment as an example:
One domain controller
Two member servers, named
SRV01
and
SRV02
Configure DFSR
To configure the DFSR role on
SRV01
and
SRV02
using Windows PowerShell, run the following command on each server:
Alternatively, to configure the DFSR role using Server Manager:
1. Start
Server Manager
.
2. Click
Manage
, and then click
Add Roles and Features
.
3. Proceed to the
Server Roles
page, then select
DFS Replication
, leave the default option to install the
Remote Server Administration Tools
selected, and continue to the end.
Configure volumes
On
SRV01
and
SRV02,
configure an
F:
,
G:
, and
H:
drive with NTFS. Each drive should be at least 2GB in size. If your test servers do not already have these drives configured or don’t have additional disks, you can shrink the existing
C:
volume with
Resize-Partition
,
DiskMgmt.Msc
, or
Diskpart.exe
, and then format the new volumes. Multiple drives allows you test cloning multiple times without starting over too often – remember, DFSR databases are per-volume, and therefore cloning is as well.
For example, using Windows PowerShell with a virtual machine that has one 40GB disk and C: volume:
New-Partition -DiskNumber 0 –Size 2GB -DriveLetter f | Format-Volume
New-Partition -DiskNumber 0 –Size 2GB -DriveLetter g | Format-Volume
New-Partition -DiskNumber 0 –Size 2GB -DriveLetter h | Format-Volume
Clone a DFSR database
1. On the upstream server
SRV01
only, create
H:\RF01
and create or copy in some test files (such as by copying the 2,000 largest immediate file contents of the
C:\Windows\SysWow64
folder).
Important:
Windows Server 2012 R2
Preview
contains a bug that restricts cloning to under 3,100 files and folders – if you add more files, cloning export never completes. Ehhh, sorry: we fixed this issue before Preview shipped but even then it was too late due to the build’s age. Do not attempt to clone more than 3,100 files while using the Preview version of Windows Server 2012 R2 Basic validation. If you want to use more files, use
–Validation None.
The RTM version of DFSR DB cloning will not have this limitation.
Use the
New-DfsReplicationGroup, New-DfsReplicatedFolder, Add-DfsrMember,
and
Set-DfsrMembership
cmdlets to create a replicated folder and membership for
SRV01
only, using only the
H:\RF01
directory replicated folder. You must specify
PrimaryMember
as
$True
for this membership, so that the server performs initial build with no need for partners. You can run these commands on any server.
Note:
Do not add SRV02 as a member nor create a connection between the servers in this new RG. We don’t want that server starting classic replication.
Note the sample output below and how I used the built-in
–Verbose
parameter to see more AD polling details:
2. Wait for a DFS Replication Event
4112
in the DFS Replication Event Log, which indicates that the replication folder initialized successfully as primary.
Note below in the sample output how I have a
6020
event; in a cloning scenario, it is expected and supported, unlike the implied messaging.
3. Export the cloned database and volume config XML for the
H:
drive. Export requires the output folder for the database and configuration XML file already exist. It also requires that no replicated folders on that volume be in an initial build or initial sync phase of processing.
Sample:
New-Item -Path "H:\Dfsrclone" -Type Directory
Export-DfsrClone -Volume H: -Path "H:\Dfsrclone"
Note the use of the
–Validation
parameter in the sample out below. Cloning provides three levels of file validation during the export and import processing. These ensure that if you are allowing users to alter data on the upstream server while cloning is occurring, files are later reconciled on the downstream.
None
- No validation of files on source or destination server. Fastest and most optimistic. Requires that you preseed data perfectly. Any modification of data during the clone processing on the servers will not be detected or replicated until it is later modified after cloning.
Basic
- (Default behavior, Microsoft recommended). Each file’s existing database record is updated with a hash of the ACL, the file size, and he last modified time. Good mix of fidelity and performance. This is the recommended validation level, and the maximum one you should use if you are replicating more than 10TB of data.
Yes, we are going to support
much
more than 10TB and 11M files in WS2012 R2 as long as you use cloning; we’ll give you an official number at RTM.
Full
- Same hashing mechanism used by DFSR during normal operations. Hash stored in database record for each file. Slowest but highest fidelity. If you exceed 10TB, we do not recommend using this value due to the comparatively poor performance.
We recommend that you do not allow users to add, modify, or delete files on the source server as this makes cloning less effective, but we realize you live in the real world. Hence, the validation code.
Important:
You should not let users modify or access files on the downstream (destination) server until cloning completes end-to-end and replication is working. This is no different from our normal “classic” initial sync replication best practice for the past 8 years of DFSR, as there is a high likelihood that users will lose their changes through conflict resolution or movement to the preexisting files store. If users need to access these files and make changes, only allow them to access the original source server from which you exported.
Note the hint outputs above. The export cmdlet shows a suggested copy command for the database export folder. It also suggests preseeding hints for any replicated folders on that volume that will clone. All you have to do is fill in your destination server name and RF path.
4. Wait for a DFS Replication Event
2402
in the DFS Replication Event Log, which indicates that the export completed successfully. As you can see from the sample outputs, there are four event IDs of note when exporting:
2406
,
2410
(there may be many of these, they are progress indicators),
2402
, and finally
2002
(which brings the volume back online for normal replication).
As you can see from my example, I cloned more than 3,100 files. I told you we fixed it already!
5. Preseed the file and folder data from the source computer to the new downstream computer that will clone the DFS Replication database.
Important:
There should be no existing replicated folder content (folders, files, or database) on the downstream server's volume that will perform cloning – let the preseeding fill it all in in this mainstream scenario. Microsoft recommends that you do not create network shares to the data until completion of cloning and do not allow users to add, modify, or change files on the downstream server until post-initial replication is operational.
Important:
Do not use the robocopy /MIR option on the root of a volume, do not manually create the replicated folder on the downstream server, and do not run robocopy files that you already copied previously (i.e. if you have to start over, delete the destination folder and file structure and
really
start over). Let robocopy create all folders and copy all contents to the downstream server, via the /e /b /copyall options,
every time you run it
. Otherwise, you are very likely to end up with hash mismatches.
Robocopy can be a bit… finicky.
6. Copy the contents of the exported folder, both the database and xml, to the downstream server and save them in a temporary folder on the volume that contains the populated file data.
7. On the downstream server
SRV02
, ensure that you correctly performed preseeding by using the
Get-DfsrFileHash
cmdlet to spot-check folders and files, and then compare to the upstream copies.
This sample shows hashes for all the files beginning with “p”:
PS C:\> Get-DfsrFileHash \\SRV01\H$\RF01\pri*
PS C:\> Get-DfsrFileHash "\\SRV02\H$\RF01\pri*"
Sample output showing an easy “eyeball comparison”
I recommend you run this on multiple small file subsets and at a few subfolder levels. There are many other examples of using the new
Get-DfsrFileHash cmdlet here on TechNet already
, including using the compare-object cmdlet to get fancy-schmancy.
8. Ensure that the
System Volume Information\DFSR
folder does not exist on this downstream
SRV02
server
H:
drive.
Important:
Naturally, this server must not already be participating in replication on that volume and if it is, you cannot clone into it.
Sample (note: you may need to stop the DFSR service, run this, and then start the DFSR service):
Note:
When re-using existing files that were previously replicated, you are likely to run into some benign errors when running this command due the MAX_PATH limitations of RD, where some of the Staging folder contents will be too long to delete. You can ignore those warnings, or if you want to clean out the folder completely, you can use this workaround:
C. Delete the now empty “system volume information\DFSR” folder after the robocopy command completes.
9. Import the cloned database on
SRV02
. For example:
Import-DfsrClone -Volume H: -Path "H:\Dfsrclone"
10. Wait for a DFSR Informational Event
2404
in the DFS Replication Event Log, which indicates that the import completed successfully. As you can see from the sample outputs, there are four event IDs of note when importing:
2412
,
2416
(there may be many of these, they are progress indicators),
2418
, and finally
2404
.
11. Add the downstream
SRV02
server as a member of the replication group using
Add-DfsrMember
, set its membership using
Set-DfsrMembership
for the
-ContentPath
matching
H:\rf01
, and create bi-directional replication connections between the upstream and downstream servers using
Add-DfsrConnection
.
Note in the sample output how I use
Get-DfsrMember
in a pipeline to force AD polling operations on all members in the RG01 replication group, instead of having to run for each server. Imagine how much easier this will make administering environments with dozens or hundreds of DFSR nodes.
12. Wait for the DFSR informational event
4104
, which indicates that the server is now normally replicating files. Unlike your previous experience, there will not be a preceding
4102
even when enabling replication of a cloned volume. If there are any changed files on the upstream server since you performed export cloning, those files will replicate inbound to the downstream server authoritatively and you will see 4412 conflict events. If you allowed users to modify data on the downstream server – and again, you shouldn’t - while cloning operations were ongoing, those files will conflict (and lose) or move to the preexisting folder, and any files the user had deleted will replicated back in again from the upstream. This is identical to classic initial sync behavior.
Cheat sheet
Now that you have tried out the controlled scenario once, here is a cut down “quick steps” version you can use for further testing with those
F
: and
G:
drives on your own; once you use those up, you will need to remove the server from replication for those volumes in order to try some more experimentation with things like a 3
rd
server or cloning from an existing replicated folder.
In this case, I am using the
F:
drive with its
RF02
replicated folder in the
RG02
replication group. Keep in mind – you don’t have to keep creating new RGs and we support cloning multiple custom writable RFs on a volume. These are just simplified walkthroughs, after all.
On the downstream
SRV02
server (note: you may need to stop the DFSR service to perform the first step; be sure to start it up again so that you can run the import)
While the future TechNet content on DB cloning contains a complete troubleshooting section, here are some common issues seen by first-time users of this new feature:
Symptom
Export-DfsrClone does not show RootFolderPath or PreseedingHint output for SYSVOL or read-only replicated folders. After running Import-DfsrClone, SYSVOL and read-only replicated folders are not imported.
Cause
DFSR cloning does not support SYSVOL or read-only replicated folders in Windows Server 2012 R2. Those folders are skipped by cloning. This behavior is by design.
Resolution
Configure replication of read-only replicated folders using classic initial sync. Configure SYSVOL by promoting domain controllers normally.
Symptom
Export-DfsrClone does not show RootFolderPath and PreseedingHint output for one or more custom replicated folders. After running Import-DfsrClone, not all custom replicated folders are imported.
Cause
DFSR cloning does not support replicated folders that are currently in initial sync or initial building. Those replicated folders are skipped by cloning.
Resolution
Ensure that all replicated folders on a volume are in a normal, non-initial building, non-initial synchronizing state. Any replicated folders that did not get DFSR event 4112 (primary server) after initial build started, or event 4104 (non-primary server) after initial sync completed, are not capable of cloning yet. If your event logs have wrapped, you can use WMI to determine if a replicated folder is ready to clone:
Import-DfsrClone fails with errors: “Import-DfsrClone : Could not import the database clone for the volume h: to "H:\dfsrclone". Confirm that you are running in an elevated Windows PowerShell
session, the DFSR service is running, and that you are a member of the local Administrators group. Error code: 0x80131500. Details: The WS-Management service cannot process the request. The WMI service or the WMI provider returned an unknown error: HRESULT 0x80041001”
Cause
You do not preseed the replicated folders onto the destination volume with the same name and relative path.
Resolution
Ensure that you preseed the source replicated folders onto the destination volume using the same folder names and relative paths (i.e. if the source replicated folder was on “d:\dfsr\rf01”, the destination volume must contain
<volume>
:\dfsr\rf01”
Symptom
DFSR event 2418 shows a significant mismatch count. Cloning takes as long as classic non-preseeded initial sync.
Cause
Files were not preseeded onto the destination server correctly or at all.
Resolution
Validate your preseeding technique and results. Reattempt the export and import process.
Symptom
Export-DfsrClone never completes or returns any output when using
–Validation Basic
or not specifying
-Validation
.
Cause
Code defect in Windows Server 2012 R2 Preview build only, when cloning more than 3100 files on a volume.
Resolution
This is a known issue in the Preview build. This was resolved in later builds. As a workaround, limit the number of files replicated with basic validation to under 3,100 per volume. If you wish to see the cloning performance with a larger dataset, use 3,100 much larger sample files (such as ISO, CAB, MSI, VHD, or VHDX files). Alternatively, use validation level none (0) instead of basic.
Where you can learn more
Update May 2014: See it all in video! TechEd North America 2014 with live demos and walkthroughs:
We have a comprehensive
cloning walkthrough available
on TechNet and a preseeding article on the way, as well as updates to the
DFSR FAQ
. These include steps on cloning an existing replica, dealing with hub servers that have many unique replicated folders from branch offices, using cloning to recover a corrupted database, and replacing or upgrading servers. Not to mention the new supported DFSR size limits!
Update August 2015: We have found that cloning doesn't work with Windows Failover Clusters (there is no 4104 event, and initial preseeded sync occurs) - booooooo! But we have fixed the bug and it will release in a coming update - yaaaaaaaay!