Are you seeing a lot of event ID’s 4202 and 4204? If so, your staging area is not sized correctly. The downside to an improperly sized staging area is that replication performance will be negatively affected as the service has to spend time cleaning up the staging area instead of replicating files.
DFSR servers are more efficient with a full staging area for at least these two reasons:
- It is much more efficient to stage a file once and send it to all downstream partners than to stage a file, replicate the file, purge for each downstream partner.
- If at least one member is running Enterprise Edition the servers can take advantage of Cross File RDC
An improperly sized staging area can also cause a replication “loop” to occur. This condition happens when a file get replicated to the downstream server and is present in the staging however the file is purges by the staging area cleanup process before the file can be installed into the Replicated Folder. The purged file will be replicated again to the server that just purged it from its staging as it never got to install the file. This process will keep repeating until the file gets installed.
Don’t ignore staging area events.
See this blog post for the method to use to determine your minimum staging area size
See the section “Increase Staging Quota” here
See “Remote Differential Compression details” here for information on Cross File RDC
Pre-seeding is the act of copying the data that will be replicated to a new replica member before they are added to the Replicated Folder with the goal of reducing the amount of time it takes to complete initial replication. Most failed pre-seeding cases I have worked on failed in 3 ways.
- ACL mismatch between source and target.
- Changes were made to the files after they were copied to the new member
- No testing was done to verify the pre-seeding process they were using worked as expected.
In short the files must be copied in a certain way, you cannot change the files after they are staged and you must test your process.
Click here to read Mr. Pyle’s blog on how to properly pre-seed your DFSR servers
Besides the fact that having high backlogs for extended periods of time means your data is out of sync, it can lead to unwanted conflict resolution where a file with older content wins in a conflict resolution scenario. The most common way I have seen this condition hit is when rolling out new RF’s . Instead of doing a phased rollout some admins will add 20 new RF’s from 20 different branch offices at once overloading their hub server. Stagger your rollouts so that initial replication is finished in a reasonable amount of time.
Believe it or not some admins run DFSR without backing up the replicated content offline. DFSR was not designed as a backup solution. One of DFSR’s design goals is to be part of an enterprise backup strategy in that it gets your geographically distributed data to a centralized site for backup, restore and archival. Multiple members do offer protection from server failure; however, this does not protect you from accidental deletions. To be fully protected you must backup your data.
In an attempt to prevent unwanted updates from occurring on servers where they know the data will never be changed (or they don’t want changes made there) many customers have configured one-way replication by removing outbound connections from replica members. One-way replication is not supported on any version of DFSR until Windows Server 2008 R2. On Windows 2008 R2 one-way replication is supported provided you configure Read-Only replicated folders. Using Read –Only DFSR members allows you to accomplish the goal of one-way replication, which is preventing unwanted changes to replicated content. If you must use one-way replication with DFSR then you must use Windows 2008 R2 and mark the members as read-only where changes to content should not occur.
Another common problem that occurs is when an Admin discovers that one way replication is not supported they go about fixing it the wrong way. Simply enabling two-way replication again can have undesirable results. See the blog post below on how to recover from one-way replication.
Click here to learn about fixing one-way replication.
I have seen many deployments with just one hub server. When that hub server fails the entire deployment is at risk. If you’re using Windows Server 2003 or 2008 you should have at least 2 hub servers so that if one fails the other can handle the load while the other is repaired with little to no impact to the end users. Starting with Windows Server 2008 R2 you can deploy DFSR on a Windows Failover Cluster which gives you high availability with half of the storage requirement.
Other times admins will have too many branch office servers replicating with a single hub server. This can lead to delays in replication. Knowing how many branch office servers a single hub server can service is a matter of monitoring your backlogs. There is no magic formula as each environment is unique and there are many variables.
Read the “Topology Tuning” section here for ideas on deploying hub servers.
Click here to learn how to setup DFSR a Windows Server 2008 Fail-Over Cluster.
DFSR maintain one Jet database per volume. As a result placing all of your RFs on the same volume puts them all in the same Jet Database. If that Jet database has a problem that requires repair or recovery of the database all of the RF’s on that drive are affected. It is better to spread the RFs out using as many drives as possible to provide maximum uptime for the data.
I have seen more than one DFSR iSCSI deployment where the cheapest hardware available was used. Usually if you are using DFSR it is for some mission critical purpose such as data redundancy, backup consolidation, pushing out applications and OS upgrades on a schedule. Depending on low-end hardware with little to no vendor support is not a good plan. If the data is important to your business then the hardware that runs the OS and replication engine is important to your business.
DFSR is actively maintained by Microsoft and has updates released for it as needed. You should update DFSR when a new release is available during your normal patch cycle. Make sure your servers are up to date per the KB articles listed below.
You will notice that updates are listed for NTFS.SYS and other files besides DFSR.EXE/DFSRS.EXE. For replication always make sure DFSR and NTFS are at least at the highest version listed. The other patches listed mostly deal with UI issues that you will at minimum want installed on the systems you perform DFSR configuration tasks on.
Proactively patching the DFSR servers is advisable even if everything is running normally as it will prevent you from getting effected by a known issue.
DFSR will only work as well as the network you provide for it. Running drivers that are 5 years old is not smart. Yes, I have talked with more than a few customers with NIC drivers that old who’s DFSR replication issue was resolved by updating the NIC driver.
Despite the fact that the data DFSR is moving around is usually mission critical many Admins have no idea what DFSR is doing until they discover a problem. Savvy admins have created their own backlog scripts to monitor their servers but a lot of customers just “hoped for the best”. The DFSR Management Pack has been out for almost a year now (and other versions for much longer). Deploy it and use it so you can detect problems and respond before they become a nightmare. If you can’t use the DFSR Ops Management pack at a minimum write a script to track your backlogs on a daily basis so that you know that DFSR is replicating files or not.
Click here for information on the Ops Manager DFSR Management Pack.
Updated Jan 19, 2011:
Warren “wide net” Williams
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.