Windows Server Summit 2024
Mar 26 2024 08:00 AM - Mar 28 2024 04:30 PM (PDT)
Microsoft Tech Community
LIVE

ADs not Syncing

Copper Contributor

Brief description of the issue:

Primary DC1 (unhealthy) is not syncing to Secondary DC2 (healthy)

 

Long description:

We have 2 VM DCs in our domain where the main is DC1 running on Windows Server 2008 R2, and our secondary DC2 running on Windows Server 2012.

 

I suspect that our DC1 is not syncing to our DC2 due to the reasons below:

1) When DC2 is down or offline, no one can authenticate.

2) Netlogon is paused and Windows Time service stopped.

3) Sysvol folder of DC1 and DC2 are not in sync on both sides (e.g. I create a file/folder on either side and it doesn't sync)

 

Additional info:

1) DC1 has 2 NICs and 2 IPs on 2 different VLANs because we use it as a file server as well. Not sure if this would have an impact to how AD works but I did set the NIC priority manually.

2) DC1 is a file server and all home profiles and GPOs points back to it.

 

My questions are:

1) How to confirm which DC is healthy and up-to-date? I assume my suspicions are correct but I'm not an expert.

2) How to search for clues to determine root cause and the next course of action? 

 

Possible solutions I can think of and as per research:

1) Demote DC1 and do metadata cleanup. After which we can add a new DC3 running Windows Server 2016 DC and make it primary. Problem with this though is that all home profiles and GPOs link back to DC1. (I did try this before but DC2 wouldn't sync to the new DC3)

2) Create a new forest in a new DC and migrate the existing domain to it. Not sure if I can use the same domain name or how the existing computers would work without rejoining them to the new domain if ever.

 

Hoping to get some help by pointing me to the right direction.

13 Replies

Multi-homing a domain controller always causes no end to grief for active directory domain DNS. So that's the first issue to clear up, then do ipconfig /flushds, ipconfig /registerdns, restart the netlogon service.

 

Depending on how long this situation has lasted the 2012 may also have tombstoned in which case demoting, reboot, promo it again may be the simpler solution after above is corrected.

 

As to stopped services I'd check the system event log for details.

 

 

 

 

 

 

@Dave Patrick , appreciate you helping me on this and for the lightning fast response.

I will test your advise in an isolated environment and let you know the outcome.

 

More info:

I also see the following in event viewer.

Event 2103:
The Active Directory Domain Services database has been restored using an unsupported restoration procedure.
Active Directory Domain Services will be unable to log on users while this condition persists. As a result, the Net Logon service has paused.
User Action
See previous event logs for details.

 

I suspect this is due to the VM being rolled back to its previous state via snapshot.

 

In my test environment, tried deleting the registry key "DSA not writable" from HKLM\System\CurrentControlSet\Services\NTDS\Parameters and it did resolve the NetLogon and Windows Time service issues but I'm not sure if it's the best thing to do.

 

My ultimate goal is to eventually remove DC1 from the domain and add in a new primary DC running 2016 or 2019.

@Dave Patrick 

After trying your recommendation in the test environment, it seems like I have now lost access to DC2 and getting the error below:

Log Name: Directory Service
Source: Microsoft-Windows-ActiveDirectory_DomainService
Date: 20/1/2021 5:38:01 PM
Event ID: 1308
Task Category: Knowledge Consistency Checker
Level: Warning
Keywords: Classic
User: ANONYMOUS LOGON
Computer: DC1.domain.name
Description:
The Knowledge Consistency Checker (KCC) has detected that successive attempts to replicate with the following directory service has consistently failed.

Attempts:
4
Directory service:
CN=NTDS Settings,CN=DC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=domain,DC=name,DC=net
Period of time (minutes):
88989

The Connection object for this directory service will be ignored, and a new temporary connection will be established to ensure that replication continues. Once replication with this directory service resumes, the temporary connection will be removed.

Additional Data
Error value:
5 Access is denied.

 

Here's what I did to remove multi-homing:

1) Removed other and unused NICs in the system via device manager

2) Ran ipconfig /flushdns, ipconfig /registerdns  and restarted NetLogon services

 

 

Question:

1) Would it be easier to setup a new DC then promote it as the main then remove DC1?

2) Or do I have to fix the issue with DC1 first before I can do anything else?

Please run;

Dcdiag /v /c /d /e /s:%computername% >c:\dcdiag.log
repadmin /showrepl >C:\repl.txt
ipconfig /all > C:\dc1.txt
ipconfig /all > C:\dc2.txt


then put `unzipped` text files up on OneDrive and share a link.

 

 

 

 

 

Glad to hear problem is sorted.

 

 

 

@Dave Patrick 

Actually we still haven't solved anything yet. 

What I meant on my PM was that I got the test environment working properly (meaning I don't get that access denied error message) after I recreated it. But we're yet to fix the ADs not synching. Kindly help check the logs you've requested and advise what I need to check on my end.

Please don't edit the files.

 

 

@Dave Patrick 

Files updated. 

Please check again

PBA-DC-01 is multi-homed. Multi-homing will always cause no end to grief for active directory DNS. After correcting this issue do an ipconfig /flushdns, ipconfig /registerdns and restart the netlogon service. If problems persist then put up a new set of files to look at.

 

 

 

@Dave Patrick 

Done. Files have been updated.

Please note that the files were taken from the test environment which is not connected to the internet.

PBA-DC-02 is missing a default gateway

NETLOGON Service is paused on [PBA-DC-01] this one is problematic. I'd check the event log for reasons why.

 IsmServ Service is stopped on [PBA-DC-02] (Intersite Messaging) this one is problematic. I'd check the event log for reasons why.

Windows Time Service [PBA-DC-02] should be auto start

 

 

@Dave Patrick 

@Dave Patrick 

Please see my response below in red font.

 

PBA-DC-02 is missing a default gateway

I think this wouldn't matter in the test environment but I have added the gateway anyway

 

NETLOGON Service is paused on [PBA-DC-01] this one is problematic. I'd check the event log for reasons why.  

I see the following in Event Viewer:

Log Name: Directory Service
Source: Microsoft-Windows-ActiveDirectory_DomainService
Date: 25/1/2021 10:53:45 AM
Event ID: 2103
Task Category: Service Control
Level: Error
Keywords: Classic
User: ANONYMOUS LOGON
Computer: PBA-DC-01.windows.pbagroup.net
Description:
The Active Directory Domain Services database has been restored using an unsupported restoration procedure.

Active Directory Domain Services will be unable to log on users while this condition persists. As a result, the Net Logon service has paused.

User Action
See previous event logs for details.

I think the above is due to the VM being rolled-back using a snapshot.

 

I also see warning events 2886 and 1539 but I think these are not so critical.

 

 IsmServ Service is stopped on [PBA-DC-02] (Intersite Messaging) this one is problematic. I'd check the event log for reasons why.

Service seems to be running when I checked.

 

Windows Time Service [PBA-DC-02] should be auto start

Service startup type set to automatic. Don't recall setting it to manual previously though.

 

Please advise on how to proceed in fixing the other errors/issues.

Thanks in advance!

I'd probably abandon that one, seize roles to another healthy one

Transfer or seize FSMO roles - Windows Server | Microsoft Docs

 

the perform cleanup to remove the failed one.

Clean up AD DS server metadata | Microsoft Docs
Step-By-Step: Manually Removing A Domain Controller Server (microsoft.com)

 

then stand up a new one for replacement.

 

(please don't forget to mark helpful replies)