Hi folks, Ned Pyle here. As promised when I left AskDS and MS Support for greener pastures, I’m still in the blogging game – I told you I’d be back! Let’s start things off talking about improvements in Windows Server 2012 and DFS Replication (DFSR).
Windows Server 2012 DFSR focuses on reliability and supportability changes based on direct field and MS Support feedback. This release doesn’t contain many new features but is much easier to troubleshoot and is more resilient to environmental issues. In the end, that makes your life easier. And every IT department could use some easier…
If this is your daily routine, we can help
I can only assume you already know DFSR from all of my old write-ups , so let’s dive into the details.
DFSR uses a per-volume ESE (aka “Jet”) database to track all file changes in replicated folders on their individual volumes. DFSR contains code to attempt graceful and dirty recovery of the database after an unexpected shutdown. Mallikarjun Chadalapaka has a great write-up on dirty shutdown recovery here .
On detecting a dirty shutdown, DFSR begins a recovery process. This starts with logging event 2212:
Event ID=2212
Severity=Warning
The DFS Replication service has detected an unexpected shutdown on
volume %2. This can occur if the service terminated abnormally (due to
a power loss, for example) or an error occurred on the volume. The
service has automatically initiated a recovery process. The service
will rebuild the database if it determines it cannot reliably
recover. No user action is required.
Additional Information:
Volume: %2
GUID: %1
If the recovery is successful, DFSR logs event 2214:
Event ID=2214
Severity=Informational
The DFS Replication service successfully recovered from an unexpected
shutdown on volume %2.This can occur if the service terminated
abnormally (due to a power loss, for example) or an error occurred on
the volume. No user action is required.
Additional Information:
Volume: %2
GUID: %1
If the recovery is unsuccessful, DFSR logs event 2216:
Event ID=2216
Severity=Error
The DFS Replication service failed to recover from an unexpected
shutdown on volume %2. This can occur if the service terminated
abnormally (due to a power loss, for example) or an error occurred on
the volume. Recovery will be attempted periodically in %3 seconds. No
user action is required.
Additional Information:
Error: %4 (%5)
Volume: %2
Guid: %1
DFSR didn’t log how a recovery was progressing, though. This makes troubleshooting tricky and we found that sometimes customers would think the recovery had hung or halted, and they’d start trying to fix things (perhaps making things worse).
Two new event log messages now appear that describe where the internal repair process stands. You now know that DFSR has moved past the detection phase and into the consistency checking and rebuilding phase.
Event ID=2218
Severity=Informational
Message=
The DFS Replication service is in the second step of replication database
consistency checks after an unexpected shutdown. The database will be
rebuilt if it cannot be recovered. No user action is required.
Additional Information:
Volume: %2
GUID: %1
Event ID=2220
Severity=Informational
Message=
The DFS Replication service is in the third step of replication database
consistency checks after an unexpected shutdown. Database recovery is
currently in progress. No user action is required.
Additional Information:
Volume: %2
GUID: %1
Just be patient – it will complete. If in doubt, contact Microsoft Support – don’t try to get out and push.
DFSR contains registry overrides to control behaviors like the number of files to replicate simultaneously, stage simultaneously, etc.
The default settings in Windows Server 2008 R2 were a bit too conservative. After release, we tested tweaked registry settings that resulted in roughly double the performance of default settings:
These more aggressive settings are now the default in Windows Server 2012 (if not overridden in the registry by you):
The allowed ranges are unchanged except for UpdateWorkerThreadCount (see below).
UpdateWorkerThreadCount controls the number of simultaneously inbound-replicating files to a DFSR server.
The maximum configurable range in Windows Server 2008 R2 is 64. If you set the maximum allowed value for UpdateWorkerThreadCount to 64, it is possible to see intermittent DFSR service deadlocks. This manifests as a hung service, which for customers is nearly impossible to troubleshoot (you need a debugger and private symbols). Because the issue may not happen for days or weeks, there is no easy way to correlate cause and effect.
The maximum value is now 63. Voila!
Administrators use the DFS Management snap-in (Dfsmgmt.msc) for all graphical configuration of DFSR.
DFS Management was introduced in Windows Server 2003 Service Pack 1 introduced the, long before read-only domain controllers (RODCs). It expected all domain controllers to be writable when creating a replication group or any other AD objects. When DFS Management tries to write to an RODC, it fails with an access denied error. This issue has existed since Windows Server 2008, but since RODC usage was lower and RODCs tend to exist mainly in branch offices, we never saw it until much later. Now that RODCs are everywhere, well…
DFS Management now requests only writable domain controllers when making DC queries.
DFS Management contains a topology checking routine to alert administrators when they have created an incomplete (aka "disconnected") DFS replication topology. A disconnected topology prevents eventual replication of data, leading to divergence, user confusion, and potential data loss.
A bridged topology of A <-> B <-> C is not flagged as disconnected when B is a read-only replicated folder. Because there is no outbound replication on a read-only member, any files created on A or C will not replicate further than B , so users on A and C will potentially see different versions of files, or no files at all.
The topology checker code now understands the bridged read-only replicated folder scenario and appropriately warns you when detected.
DFSR uses a series of conflict resolution algorithms to detect file collisions and appropriately handle a winning and losing file. DFSR notes these in a per-collision 4412 informational event log entry.
The 4412 event did not contain quite enough information easily troubleshoot unexpected collisions. For example:
Message=
The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder.
Additional Information:
Original File Path: D:\Windows\SYSVOL\domain\Policies\{E75E8CC5-27B3-483F-AA79-FFF726236A0A}\Adm
New Name in Conflict Folder: Adm-{EE271589-88F7-4E8C-A057-013CF75B352B}-v294528
Replicated Folder Root: D:\Windows\SYSVOL\domain
File ID: {3351DB9B-9DAF-4273-90C1-FC347266BBD2}-v29180999
Replicated Folder Name: SYSVOL Share
Replicated Folder ID: 29578A90-233A-48B7-B8C3-1BB0A05873EC
Replication Group Name: Domain System Volume
Replication Group ID: 70AC3FC4-60FC-4D15-964D-AE0F96098E60
Member ID: C6D34675-591E-4FC9-B88E-06AFC659CAED
The 4412 event message now contains an additional field of Partner Member ID that lists the winning server's identity.
Message=
The DFS Replication service detected that a file was changed on multiple servers. A conflict resolution algorithm was used to determine the winning file. The losing file was moved to the Conflict and Deleted folder.
Additional Information:
Original File Path: D:\Windows\SYSVOL\domain\Policies\{E75E8CC5-27B3-483F-AA79-FFF726236A0A}\Adm
New Name in Conflict Folder: Adm-{EE271589-88F7-4E8C-A057-013CF75B352B}-v294528
Replicated Folder Root: D:\Windows\SYSVOL\domain
File ID: {3351DB9B-9DAF-4273-90C1-FC347266BBD2}-v29180999
Replicated Folder Name: SYSVOL Share
Replicated Folder ID: 29578A90-233A-48B7-B8C3-1BB0A05873EC
Replication Group Name: Domain System Volume
Replication Group ID: 70AC3FC4-60FC-4D15-964D-AE0F96098E60
Member ID: C6D34675-591E-4FC9-B88E-06AFC659CAED
Partner Member ID: 2716E4E2-ED01-4285-9137-FACB4EE84C4A
You can use DFSRDIAG GUID2NAME to translate that partner GUID into a human-friendly name. For example:
Aha! FSF-02 won.
There is no Windows Server 2012 Enterprise Edition; instead, you can purchase Windows Server 2012 Standard or Windows Server 2012 Datacenter, which is no longer an OEM-only SKU and exists to provide unlimited virtualization licenses.
DFSR cross-file Remote Differential Compression (RDC) support ties to the server edition being Enterprise or Datacenter. DFSR Cluster support ties to Enterprise or Datacenter editions as well, through internal checks. Implicitly, DFSR cluster support requires enterprise and higher because the Failover Cluster features only exist on those editions.
All edition checks are removed and Windows Server 2012 has full DFSR capabilities even in Windows Server 2012 Standard.
Read-only (RO) replicated folders are always non-authoritative and do not allow local changes by use of an IO-blocking filter driver named dfsrro.sys. You are encouraged to pre-seed data before initial sync, meaning that data can already exist when DFSR is configured on two or more servers.
Windows Server 2008 R2 SP1 introduced a regression ( that we recently fixed ) where initial sync from Read Write (RW) to RO does not overwrite file differences on the RO. This leads to data inconsistencies in the replication groups, as these differing files will never be right on RO servers unless they are later modified again on the RW. Which rather defeats the purpose of pre-seeding.
This is fixed. :)
DFSR uses TCP/IP and RPC to replicate files, and we finally fixed an old scenario where domain controllers differed in port usage from member servers.
In Windows Server 2008 and Windows Server 2008 R2, a domain controller replicating SYSVOL and/or custom replicated folders with DFSR used TCP port 5722 . This was due to a bug I discussed back on AskDS.
This is also fixed. Now DCs will operate consistently like member servers, listening on a dynamic port in the 49152 – 65535 range unless you choose to hard code a port. If you have gotten used to 5722 and reaaaaally like using hard-coded ports, you can return to the old behavior with command:
Dfsrdiag.exe staticrpc /port:5722
I doubt the person who takes over your job someday will thank you for it though…
When using DFSRMIG.EXE to migrate your SYSVOL from using FRS to DFSR, event log entries tell you how things are proceeding and if there are any problems you need to investigate before moving to the next phase.
In Windows Server 2008 R2, a timing issue could give you an expected warning 6804 with the rather scary message:
The DFS Replication service has detected that no connections are configured for replication group Domain System Volume. No data is being replicated for this replication group.
Once AD replication and the migration caught up, we should have logged a 6806 event saying everything was fine. But we forgot to. Errp.
Now we log that missing 6806 event letting you know that all is well and migration is working.
Replicated folders are the base of replication and the top level of a content set in DFSR database terms.
In Windows Server 2008 R2, removing a replicated folder stopped replication of all other RFs until the removal completed.
Now you can remove a replicated folder (thereby causing DFSR to update its DFSR database and stop replicating that content set) and not see other replicated folders pause replication. This keeps a hub server working efficiently when you decide to decommission a branch node. Faster also implicitly means increased reliability, as we are not spending large amounts of time with replication halted.
Windows Server 2008 R2 SP1 introduced a little-known hotfix to update the Dfsmgmt.msc wizards for new replication groups and new replication wizards. This provides further guidance around configuring the staging folder quota to prevent performance bottlenecks.
This capability is now native to Windows Server 2012.
We modified the DFSR allowed reparse point replication rules to support replicating the new IO_REPARSE_TAG_DEDUP tag. This type of reparse point tag is part of the new file deduplication system. This isn’t truly reparse point replication; file is “rehydrated” and replicated as a normal file then put back into its dedup’ed state on the downstream. Slick.
We modified File Classification Infrastructure (FCI) to prevent re-writing unchanged data to the alternate data stream on files during classification passes. This previously caused replication storms in Windows Server 2008 R2. Note: you should still only configure FCI on one server (usually the hub), not multiple servers.
Changes made to APIs used to access new NTFS data structures for auditing and conditional ACE security required updates to DFSR in Windows Server 2012. Because Windows Server 2008 R2 and older operating systems do not implement these APIs though (and therefore cannot use or display these ACLs) they did not require changes. Therefore, there is no back port required to configure replication between a Windows Server 2008 R2 and Windows Server 2012 replicated folder.
But!
Microsoft strongly discourages mixed Windows Server 2012 and legacy operating system DFSR.
There are significant NTFS security data differences between Windows Server 2012 and earlier operating systems, often to facilitate Dynamic Access Control features. Moreover, any claims-based access configuration will not work consistently in a design that allows users to connect to Windows Server 2008 R2 and Windows Server 2012 versions of a replicated file; one server might grant more or less access than the other.
For example, if someone modifies the security of a file on a Win2008 R2 server, DFSR packages that up with the file (this is called “marshalling”) and sends it along as-is. When a user attempted to access the file on the Win2012 server, the Claims-based security elements would no longer exist, and the user would be denied access. More troubling, if you were letting users access the data from multiple DFSN-provided shares, they would be calling you with the infamous “it sometimes works and sometimes fails” symptom that drives IT pros batty.
However!
Central Access Policies modify individual files and folders to contain a special SID in the tail of the SACL structure when adding the CAP rules the first time. This means that first applying a CAP triggers replication of all folders and files replicated under the auspices of the CAP structure, just like it would with any other security change to the classic DACL.
Subsequent changes to the rules of an already-added CAP do not alter the files, however – this is the beauty of Central Access Policy. This means that once replication completes, you can change the security on files without triggering further replication. This is a seriously cool feature if you are a DFSR administrator, and it means once you deploy CAP, further security changes to an existing policy are completely non-intrusive to replication!
Ideally, configure CAP and File Classification Infrastructure on the file structure before configuring DFSR; that way you only pay the replication price once during DFSR initial sync. And to reiterate, use Windows Server 2012 on all nodes before deploying DAC . If you need help migrating existing DFSR environments, I recommend this series. It goes without saying that when using Windows Server 2012, CAP/DAC will only be effective if you apply the CAP to all nodes being replicated - otherwise you end up with differing security per node.
DFSR does not support ReFS volumes, as this new file system removes many critical data types used or supported by DFSR, such as streams, sparse files*, compressed files, 8.3 names, extended attributes, etc.
* Update Jan 9, 2013 - it turns out (despite what you will read on most of the Internet, including the Build 8 blog) that we added sparse file support to ReFS right at the tail end of development. So it's there.
DFSR does not allow you to replicate ReFS volumes. The service checks to make sure you are using NTFS and it will fail, gracefully. You cannot replicate a volume with ReFS locally; the DFSR service will not allow it.
Dfsmgmt.msc prevents an administrator from accidentally configuring a ReFS volume. Even if you pre-create the folder and use DFSRADMIN to bypass the check, DFSR prevents replication with event 6404, ("The local path is not the fully qualified path name of an existing, accessible local folder."). The debug log will show error 9225 ("volume was not found")
No ReFS allowed!
Just like Windows Server 2008 R2, DFSR in Windows Server 2012 does not support Cluster Shared Volumes (CSV).
Just like Windows Server 2008 R2, DFSR in Windows Server 2012 includes the database autorecovery change:
Just like Windows Server 2008 R2, DFSR in Windows Server 2012 includes the latest reliability changes for handling complex nested file and folder creation and deletion on partner nodes:
Windows Server 2012 changes the only disparate file conflict resolution previously algorithm used from first creator wins to last creator wins, in order to be more consistent. For more information about this topic, see this article .
Windows Server 2012 now correctly allows very large (many many GB) files to complete computation of RDC signatures before the RPC server connection times out. In prior OSes the file would never replicate due to timing constraints. This mainly happened with files that were hundreds of GB.
But!
64GB files are still the supported maximum. So this is us being nice and helping you in a scenario that is technically, still unsupported.
As a final note: I didn’t include all the fixes released as updates to Windows Server 2008 R2 that are also part of Windows Server 2012, just the more interesting ones. So as a rule of thumb, if you got a hotfix for Win2008 R2 before Win2012 RTM’ed, the latter has the update built-in.
And that’s it. Nice, eh?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.