‘This member is waiting for initial replication for replicated folder FOO and is not currently participating in replication. This delay can occur because the member is waiting for the DFS Replication service to retrieve replication settings from Active Directory. After the member detects that it is part of replication group, the member will begin initial replication.’
So now we know the importance of AD to DFSR. When we get the ‘waiting to retrieve’ warning, what can we do to figure out what’s going on? Let’s go through this step by step.
There are a few ways we can determine which DC is being polled.
If we open the Event Viewer snap-in on the server, navigate to the DFSR event log, and then search for the most recent 1206 event, we will see which DC the server is talking to:
The downside to this is the event log may have already wrapped when you look here. If you restart the DFSR service the new event will be added, but there’s the risk that it might start talking to another DC which isn’t having a problem. So your problem is (kind of) fixed, but you still don’t have a root cause.
A more reliable way to determine the DC is by using the WMIC command. Open a CMD prompt as an administrator on the DFSR server and run:
WMIC /namespace:\\root\microsoftdfs path DfsrReplicationGroupConfig get LastChangeSource
This will return the DC you are talking to:
Finally, you can examine the DFSR debug logs. Go to %systemroot%\debug and open the DFSR <somenumber> .log file. Look for:
20080521 10:57:47.110 2972 CFAD 7256 Config::AdConfig::Connect Binding to dcAddr:\\10.10.0.10 dcDnsName:\\2003DC10.contoso.com
20080521 10:57:47.110 2972 CFAD 6586 Config::AdConfig::BindToAd Trying to connect. hostName:2003DC10.contoso.com
20080521 10:57:47.130 2972 CFAD 6605 Config::AdConfig::BindToAd Bound . hostName:2003DC10.contoso.com
20080521 10:57:47.130 2972 CFAD 6641 Config::AdConfig::BindToDc Try to bind. hostName:\\2003DC10.contoso.com domainName:<null>
20080521 10:57:47.150 2972 CFAD 6651 Config::AdConfig::BindToDc Bound . hostName:\\2003DC10.contoso.com domainName:<null>
There’s our domain controller. If it’s not in the latest log, you will need to unzip the DFSR debug logs using a tool that understands the GZ format (such as 7-Zip, Winzip, WinRar, etc), and then you can find the entry. As you can tell, using the debug logs is probably the least friendly way to find – but you’ll see later that it’s critical for troubleshooting.
Active Directory replication is based on the theory of ‘multi-master loose consistency with convergence’. Intra-site replication on Win2003 DC’s will only take fifteen seconds, but by default inter-site replication is 180 minutes (three hours).
So if I have a chain of sites connected like below and my original DFSR configuration was set on a DC in Site B, those three DC’s would have the change available for their DFSR customers within 30 seconds or so. But the DC in Site F might not see it for nine hours.
You have a few options here:
A. Wait patiently and enjoy some light reading .
B. Force replication using REPADMIN (with /replicate /force) or REPLMON (with synchronize directory partition).
C. Change your replication schedule and topology .
So actually this isn’t really always an ‘issue’ per se – just a fact of your environmental configuration.
It’s possible that your DC is not actually replicating at all. Historically, the most likely reason for this is network misconfigurations. These come from DNS resolution, firewall blocks, etc. To step through the most common areas, check out:
A. Fixing Replication DNS Lookup Problems (Event IDs 1925, 2087, 2088)
B. Fixing Replication Connectivity Problems (Event ID 1925)
C. Troubleshooting RPC Endpoint Mapper errors using the Windows Server 2003 Support Tools
Whatever you suspect, it’s always advisable to start with a REPADMIN.EXE /SHOWREPS on the DC just to get a line on what replication health looks like.
AD replication might be in a state where if just knew where its partners were, it could replicate fine. It’s common to find a variety of site topology missteps in environments, so make sure you run down:
Fixing Replication Topology Problems
Your event logs and DSSITE.MSC will tell the tale…
Now we have moved past trivial configuration issues into a much more insidious ground. Lingering objects are typically objects that exist in the read-only GC partition of a domain controller but no longer exist in the read-write source domain partition. This can happen when an administrator brings a DC back online after it has been shut off for months; source objects that were deleted and tombstoned are longer available and the old DC can’t be told about the deletions anymore, so he still has ‘reanimated’ versions.
If ‘strict replication consistency’ is in place, that server is not allowed to replicate anymore until it’s fixed (this is a good thing – and if you don’t think so, I will be happy to tell you stories about lingering object cleanup on a forest with 20 domains and 3000 DC’s, all infected with LO’s). So you will want to follow:
A. Outdated Active Directory objects generate event ID 1988 in Windows Server 2003
B. Lingering objects may remain after you bring an out-of-date global catalog server back online o...
Honestly, if you get this one, you should call us. Event ID 2042 (“It has been too long since this machine replicated”) is tricky to fix without having a frank discussion about the ramifications for your environment. While we do have some techniques, the cure is usually worse than the disease. In most circumstances, the best answer is to forcibly demote the DC because you (of course!) have several other DC’s that can handle the load in the meantime.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.