AD Connect Start-ConnectivityValidation - GetDomain failing error while running adding directories

Brass Contributor

We have some 40 countries i.e. 40 local forests in our environment separated by firewalls. We are trying to onboard all our local forests on AD Connect and decommission MIM. We have this issue where the Start-ConnectivityValidation command of the ADConnectivityTool PS module, fails with the error “GetDomain failed. The specified domain does not exist or cannot be contacted”. The AD Connect servers are in a different forest than the country forests. Here are the configuration details:-

ADC Architecture: Multi-forest, single tenant. 

Country Forests Network Architecture: All forests have a DMZ, that contains an additional DC with which AD Connect has connectivity. Local forest network doesn't have direct network connectivity with ADC forest. 

Firewall Settings: ADC Staging & Prod servers IP ranges are allowed in country forest's firewall. ADC forest firewall allows all traffic to & from all forest networks.  

Ports: 53, 88, 389 & 3268 are open for both TC & UDP protocols.

DNS Request Routing: AD Connect uses Conditional Forwarders, MIM uses Hosts file or Fwd Zones.

SRV Records: Configured for both LDAP & Kerberos on the country forest's local DNS for the DMZ ADC.
Test-NetConnectivity: Successful for above mentioned ports.

NSLookup/Ping: Successfully resolves the DCs, DMZ ADC also listed in the output.

Confirm-DnsConnectivity: Successful

Connectivity Validation: 

Start-ConnectivityValidation -Forest "contoso.com" -AutoCreateConnectorAccount $False -Username "contoso.com\username" fails with the above mentioned error. 

 

Even tried the Netbios name format, but still no success. MS Premier Support Directory Services, Network(DNS) & Identity support guys have all tried but can't resolve this issue. 

 

Any help will be highly appreciated.

 

 

17 Replies

@Ajay_Joshi 

 

Hi, Ajay.

 

A few random thoughts come to mind.

 

Firstly, if you want an authoritative answer on this, you're going to need to run a network capture on the AAD Connect host you're seeing the error on. But I'll put my first impressions here as well.

 

Domain controller selection

Next, AAD Connect and MIM have different starting points for the management agent configuration:

 

  • AAD Connect limits you to completing everything it thinks it needs in the setup wizard;
  • The Active Directory connector in MIM gives you complete control as-you-go.

 

Where this may be important is that the AAD Connect wizard does not let you specify a specific domain controller to connect to where the MIM management agents does. This wouldn't matter if AAD Connect could talk to any remote domain controller, however, given you're running a DMZ configuration, it very much matters. More on how this relates to your DNS request routing below.

 

As a side note, you do get full control over the management agent within AAD Connect (noting the "preferred domain controllers" option in the picture below) but only after you've finished the setup, leading to a chicken-and-egg scenario:

 

LainRobertson_0-1679181980402.png

Figure 1: Preferred domain controller option - same for MIM and AAD Connect.

 

There's a number of ways you can detect a domain controller programmatically, including letting the underlying Windows system do it for you. I haven't had the need to crack open the MIM management agent and AAD Connect setup wizard to compare them, so I'll simply point out that you could easily get different results for both even if they were to be running on the exact same host.

 

I'd recommend running the more granular commandlets - almost all of them are useful - listed here as I can't help but think that while your DNS is indeed being directed to the remote DMZ domain controllers, the AAD Connect wizard is receiving an unordered list of domain controllers as the DNS response meaning it's then trying to establish subsequent connections to any random domain controller in that remote forest:

 

 

While you've run Confirm-DnsConnectivity, that's not nearly enough. A better indicator using this PowerShell module would be from Confirm-ForestExists and Confirm-TargetsAreReachable.

 

LainRobertson_2-1679185463594.png

Figure 2: Using the AAD Connect diagnostic commandlets to see which domain controllers are returned and if they're reachable.

 

In contrast, perhaps a preferred domain controller has been entered into the MIM management agent (as above) though that isn't the only option, which brings us to your point on DNS.

 

DNS request routing != domain controller selection

As noted above, having DNS queries go to a specific domain controller doesn't guarantee that the same domain controller will be returned first in the DNS response.

 

Other factors such as your Active Directory topology (i.e. the defined sites and subnets) and whatever entries reside in your hosts file (in the case of your MIM host) will play a significant role in selection.

 

My educated guess is that the wizard does rely on the underlying Windows system to perform the check though, as the relevant .NET class throws that same error word-for-word:

 

LainRobertson_1-1679184614142.png

Figure 3: Reproducing the error outside of the AAD Connect wizard.

 

Chances are that because AAD Connect is using a conditional forwarder and likely has no subnet assignment to the DMZ site (if one even exists) within each remote forest, it's sending a DNS query to the DMZ domain controller and coming back with a different domain controller to talk to. Again, this would show up very quickly in a network capture but you may also have some luck with the diagnostic commandlets referenced above.

 

Of course, it pays to remember that the error message covers two separate causes:

 

  • Can't find the forest, which implies a name resolution issue;
  • Can't contact the forest, which implies a firewall issue, which itself may be because the "wrong" domain controller was returned in the DNS response.

 

If it's the first cause, the above .NET-based check will fail very quickly. If it's the second cause, it will take longer (~ 9 seconds in my testing). That's a very unscientific test but a good indicator and quick to run.

 

Here's the command in text form for ease of reference:

 

[System.DirectoryServices.ActiveDirectory.Forest]::GetForest([System.DirectoryServices.ActiveDirectory.DirectoryContext]::new([System.DirectoryServices.ActiveDirectory.DirectoryContextType]::Forest, "forestFqdn.somedomain.com"))

 

 

As I mentioned in the first section, I anticipate you're running into the second issue of not being able to reach the returned domain controller.

 

ICMP

This section is speculation on my part, as - again - I haven't done a network capture of my own to confirm any of this.

 

You've already listed the ports from the AAD Connect troubleshooting guide:

 

 

However, from the diagnostic commandlets documentation, we can see there's an ICMP test in the Confirm-TargetsAreReachable commandlet, implying that the wizard may also check for the ICMP protocol where the MIM Active Directory agent does not.

 

It might be worth allowing ICMP through from the AAD Connect hosts to the remote DMZs.

 

Other protocols

Other protocols I've seen used by other facets of the AAD Connect tool are:

 

  • RPC: TCP 135
  • WinRM: TCP 5985 (or TCP 5986 for WinRM/TLS)

 

I'd also strongly recommend enabling the following for better-secured AAD Connect-to-domain controller communications, although there is a bit of post configuration work to be done to enable this:

 

  • LDAP/TLS: TCP 636, TCP 3269

 

Reference:

 

These won't assist with resolving your error, but I figured since I've written this much, I'd include it anyway.

 

Summary

The AAD Connect wizard and the MIM (and AAD Connect) Active Directory management agent are different beasts.

 

What has worked for MIM in the past may not be enough for the AAD Connect wizard, and a network capture will provide the most authoritative answer to your question.

 

It's common for hosts files to provide a very small subset of responses that a conditional forwarder will provide. Be sure to compare the differences and not simply assume they provide the same results to the DNS client.

 

Cheers,

Lain

@LainRobertson Thanks so much for your time on this post and the detailed analysis. I further tested as you suggested and have very strange results. Seems like ICMP is blocked for the very DMZ DC which is supposed to have more connectivity with the ADC forest as compared to the local network of the country forest. We actually have onboarded all countries on ADC successfully except last 3.

Ajay_Joshi_0-1679245261192.png

Sorry need to hide the IPs as they are of our production environment, is ICMP mandatorily required for ADC? 

The below command took well more than 9secs, around 30 actually to fail and give the exact error. However, my observation till now was that the GetForest succeeds and GetDomain fails. 

[System.DirectoryServices.ActiveDirectory.Forest]::GetForest([System.DirectoryServices.ActiveDirectory.DirectoryContext]::new([System.DirectoryServices.ActiveDirectory.DirectoryContextType]::Forest, "forestFqdn.somedomain.com"))

 

Also worth mentioning is that since that local network of countries is not directly pingable from ADC forest and we rely on the DMZ DC for that. We ensure that all the DCs IP are listed in the output for NSLookup and they further resolve to the machine FQDN  by setting the Server to the DMZ DC IP. 

 

We also had some countries where there are syncing users in both Parent & Child domains, so we ensure both atleast 1 child & root DCs are included in the DMZ. The local network again will not have direct connectivity.

 

Last but not the least, all ports mentioned are open except 5986, need to check if WinRM/TLS is required and will get it opened if yes. Here is the exact error for your ready reference:-

PS C:\Program Files\Microsoft Azure Active Directory Connect\Tools> Start-ConnectivityValidation -Forest xxxxx -AutoCreateConnectorAccount:$false -UserName "xxxxx\yyyyyy"
Please provide the credentials of the account you entered on AADConnect Wizard.
DOMAIN\Username: xxxxx\yyyyyy (previously obtained)


Diagnosis is starting...

Attempting to obtain a domainFQDN
Attempting to retrieve DomainFQDN object...
Exception calling "GetDomain" with "1" argument(s): "The specified domain does not exist or cannot be contacted."


There has been a problem while validating connectivity between AADConnect and the Active Directory.
An attempt to diagnose the problem will be performed by running a set of network connectivity tests

Press ENTER to continue:


Starting NetworkConnectivityDiagnosisTools


Verifying that 'xxxxxxx' exists

xxxxxxxxx exists

Verifying if the provided credentials are correct

Attempting to obtain a domainFQDN
Attempting to retrieve DomainFQDN object...
There was an error during the validation of the credentials you have entered. Details: Exception calling "GetDomain" with "1" argument(s): "The specified domain does not exist or cannot be contacted."

@Ajay_Joshi 

 

The only way I could provide authoritative answers is install a new instance of AAD Connect and track what it's doing with a network capture utility - which is a bit more than I can easily accommodate in my small business environment.

 

It's important to keep in mind that the requirements of different components can be different, meaning something like the setup wizard could easily be different to the synchronisation service, hence why I'd have to trace the whole process from end-to-end.

 

So, that means I'm still making assumptions based on the error plus what's documented.

 

So, jumping to your first screenshot, what I would infer from that is that while Confirm-TargetsAreReachable tests using ICMP, the warning message in yellow indicates that it's simply a convenience test and not required for the operation of the wizard or synchronisation service. Of course, that assumption could also be wrong but since they've gone to the trouble of putting in such a message, that's my starting assumption.

 

As an aside, the synchronisation service within AAD Connect is just MIM with some customised agents. It doesn't need ICMP - I know that for sure - but I can't be so sure for the setup wizard, which of course is where you're currently stuck.

 

But for now I'd set the ICMP failure aside.

 

The more interesting part of that screenshot is that five domain controllers were detected for that forest.

 

Would I be correct in assuming that all five do not reside within that forest's DMZ? I'm guessing one or maybe two reside in the DMZ with the remaining three or four being on their internal network.

 

The fact that AAD Connect is seeing five domain controllers means that it is likely trying to contact one of the domain controllers on the internal network rather than the one in the DMZ.

 

If any of the IP addresses you've blacked out relate to internally-located domain controllers then I'm certain this will be the cause of the AAD Connect wizard's error.

 

There's multiple ways in which you can deal with this scenario - more than I have the time to comprehensively cover here at the moment.

 

My initial preference if I were faced with this situation would be to get the remote forest administrators to create an Active Directory site for their DMZ, then subnets matching the DMZ and your AAD Connect subnets which would then be attached to the DMZ site so that for any traffic coming from either subnet range, domain controller referrals (which is separate to the DNS resolution architecture) will prefer the domain controllers in the DMZ.

 

With respect to being able to obtain the forest object but not the domain, that could be for a couple of reasons. The ones that come to mind immediately are:

 

  • Differing results for the forest DNS query versus the domain DNS query (since there are separate zones for ForestDnsZone and DomainDnsZone which will reflect the architecture and FSMO role allocation; and DNS round robin-ing can play a role in this scenario, too, if it's enabled);
  • No sites matching the client IP address (which is that of your AAD Connect host) leading to a random domain controller being provided as the referral - which you can track via the netlogon.log file under C:\Windows\debug\netlogon.log on the remote forest's domain controller in their DMZ.

 

Ultimately though, however you choose to do it, you need to ensure that the centralised hosts running AAD Connect reliably locate the DMZ domain controllers and not those located internally in the remote forests, which requires more than just DNS sending queries to the DMZ domain controllers.

 

I may come back and re-read this and add more again tonight, but this is probably enough for now until I know whether those five domain controllers are positioned in the DMZ, or spread between the DMZ and internal networks.

 

Cheers,

Lain

@Ajay_Joshi 

 

Here's another AAD Connect-specific article that outlines in greater detail the port requirements:

 

 

It's considerably more useful than the previous troubleshooting article and provides a per component breakdown (including the wizard.)

 

As I said above, I feel your issue is with domain controller selection but this is still a very useful reference.

 

Cheers,

Lain

As a part of Troubleshoopting please validate below things as well.

@ajay

Verify that the domain exists and can be contacted. You can do this by checking the network connectivity between AADConnect and the Active Directory domain controller.

Check if the DNS settings are configured correctly on the server running AADConnect. Ensure that the DNS server specified on the server is the same as the one used by the Active Directory domain.

If the issue persists, try restarting the Azure AD Connect Health Sync Monitor service.

@LainRobertson 

There are 4 DCs in the forest and 1 of them is in the DMZ, conditional forwarder in the ADC forward points to it.

 

Although I'm yet to check the Netlogon logs in the target forest, I noticed something interesting with the network connectivity tests. While the Confirm-TargetsAreReachable & Confirm-DNSConnectivity results are all green & successful, Confirm-NetworkConnectivity provides something to think about.

 

If I run this and DC1 doesn't have port 53 open, its gives an error and doesn't even check the next DC.
Confirm-NetworkConnectivity -DCs DC1.forest.com,DC2.forest.com,DC3.forest.com
TCP connection to DC1.forest.com on port 53 failed.
But if we mention DC3 first which has the connectivity, then it succeeds for it and then does check the next and then if it fails, stops checking further.
Confirm-NetworkConnectivity -DCs DC3.forest.com,DC2.forest.com,DC1.forest.com

TCP connection to DC3.forest.com on port 53 succeeded.
TCP connection to DC3.forest.com on port 88 succeeded.
TCP connection to DC3.forest.com on port 389 succeeded.
TCP connection toDC2.forest.com on port 53 failed.

Could this be the issue? Probably the lookup list has the internal DC as the first entry which doesn't have connectivity. Although an assumption, I'm more suspicious after seeing this result:-

PS C:\Program Files\Microsoft Azure Active Directory Connect\Tools> Confirm-FunctionalLevel -Forest forest.com
Verifying that the AD forest functional level is >= Windows2003Forest
Obtaining ForestFQDN
Attempting to retrieve ForestFQDN...
Exception calling "GetForest" with "1" argument(s): "The specified forest does not exist or cannot be contacted."

CurrentForestLevel is
The binary operator GreaterThanOrEqual is not defined for the types 'System.DirectoryServices.ActiveDirectory.ForestMode' and
'System.DirectoryServices.ActiveDirectory.ForestMode'.
At C:\Program Files\Microsoft Azure Active Directory Connect\Tools\ADConnectivityTool.psm1:1127 char:8
+ If($CurrentForestLevel -lt $MinAdForestVersion)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], InvalidOperationException
+ FullyQualifiedErrorId : System.InvalidOperationException

The Active Directory forest functional level is correct
True

 

It fails in the beginning probably with the internal DC and then succeeds with the DMZ DC. This output is actually similar to the forest which were successfully onboarded to ADC. Just that the initial error statement is "The user name or password is incorrect." instead of "The specified forest does not exist or cannot be contacted". Rest all output is same including the "The binary operator GreaterThanOrEqual is not defined for the types" error.

@Ajay_Joshi,

 

Run Netdom query FSMO and check the PDC on that forest.

If PDC is DC1.Forest.com then need to fix the DNS on that DC first.

 

Run 

Nltest /DsgetDC: Domain Name and check.

If that one shows DC then force it to DC3

 

nltest /dsgetdc:DCname -force

@Ajay_Joshi 

 

I just want to re-iterate what I said in my first post:

 

Controlling where DNS goes (via your conditional forwarder) does not control where subsequent traffic for other protocols - such as LDAP - goes.

 

TCP/UDP port 53 is DNS and this is not your issue, meaning you can ignore this DNS-specific error.

 

As you said, you are using a conditional forwarder for the remote forest, and the server entered into the conditional forwarder is only pointing to the IP address of the DMZ host. This means that DNS can only go to that one specific domain controller, making DNS connectivity tests to the other domain controllers irrelevant.

 

What we are focusing on are the non-DNS conversations between your AAD Connect host and the remote forest, as these are not controlled by the conditional forwarder. These are the conversations that are likely trying to use the internally-located domain controllers, since all five are returned to the AAD Connect host.

 

As I said earlier, there's a number of different strategies for solving this issue, however, the most future proof is the one I mentioned in my earlier reply about using Active Directory sites and subnets to solve the domain controller selection issue.

 

Quoting from the following article, this is what I'm talking about:

 

After the client locates a domain controller, it establishes communication by using LDAP to gain access to Active Directory. As part of that negotiation, the domain controller identifies which site the client is in based on the IP subnet of that client.

 

If the client is communicating with a domain controller that isn't in the closest (most optimal) site, the domain controller returns the name of the client's site. If the client has already tried to find domain controllers in that site, the client uses the domain controller that isn't optimal. For example, the client sends a DNS Lookup query to DNS to find domain controllers in the client's subnet.

 

Otherwise, the client does a site-specific DNS lookup again with the new optimal site name. The domain controller uses some of the directory service information for identifying sites and subnets.

 

 

While this represents the best solution (in my opinion), there are other DNS-based solutions such as:

 

  • Using DNS policy (if the remote domain controllers are Windows Server 2016 or later);
  • Manipulating the DC locator DNS records (you need to know what you're doing with this one or it negatively impacts everyone);
  • Using a split-brain DNS approach (more control but at the cost of extra and ongoing manual administration);
  • hosts file (I'm really not a fan of this one but it is an option.)

 

Of these lesser-preferred options, I'd recommend option 1. Here's a link to DNS policies:

 

 

The problem you're now trying to solve is not "where is DNS going to?" but "where is everything after DNS going to?"

 

Here is another article that lists the various DNS queries your AAD Connect host (which in this context is the DNS client) can ask of the remote DMZ domain controller, though don't invest too much time reading this one as it's purely educational (for now):

 

 

Your focus for now is getting a random selection from a pool of five domain controllers down to consistently selecting the DMZ domain controller for non-DNS traffic. Using the aforementioned sites and subnets approach, or one of the lesser-preferred DNS-related configurations should be your priority.

 

Cheers,

Lain

@LainRobertson @Supernova Thanks for providing inputs, I've tested a lot of things.

 

Firstly, the Netlogon logs have this error:-

03/22 12:14:38 [7968] Forest.com: NO_CLIENT_SITE: ADConnectServerName ADConnectIP 

Strangely, MIM server too has the same error entry, but as mentioned earlier, might not matter in its case. 

 

Nltest /DsgetDC:forest.com indeed returned the primary DC to which AD Connect doesn't have connectivity. This is the issue I think.

 

Since, the primary DC is in Default Site, I changed the SRV records on it to point to the DMZ DC. Need to wait for replication.

 

Meanwhile also trying to understand the DNS Split brain approach and using DNS policy. I'm unable to find a method that cane be used to force the DMZ DC to return its own name to DNS queries rather than the PDC's.  

 

 

To force the DMZ DC to return its own name to DNS queries rather than the PDC's, you can try the following steps:

Log in to the DMZ DC using an account with administrative privileges.

Open the DNS Manager console.

Right-click on the server name and select Properties.

In the Server Properties window, click on the Forwarders tab.

If any forwarders are configured, clear the list by selecting each entry and clicking Remove.

Click the Root Hints tab and select the option "Do not use recursion for this domain".

Click OK to save the changes.

Restart the DNS Server service on the DMZ DC.

These steps will ensure that the DMZ DC returns its own name to DNS queries rather than forwarding them to the PDC.

@Ajay_Joshi Try below steps first:
Run nltest /dsgetdc:DC3.domain.com /force

Verify whether it actually pointing to DC3 nltest /dsgetdc:domain.com

This will temproarily resolve the issue and further you can do the neccessary test, once test successful,

 

For permanent solution you can try below steps to force all the queries to DMZ DC3.

To force the DMZ DC to return its own name to DNS queries rather than the PDC's, you can try the following steps:

Log in to the DMZ DC using an account with administrative privileges.

Open the DNS Manager console.

Right-click on the server name and select Properties.

In the Server Properties window, click on the Forwarders tab.

If any forwarders are configured, clear the list by selecting each entry and clicking Remove.

Click the Root Hints tab and select the option "Do not use recursion for this domain".

Click OK to save the changes.

Restart the DNS Server service on the DMZ DC.

These steps will ensure that the DMZ DC returns its own name to DNS queries rather than forwarding them to the PDC.

@Ajay_Joshi 

 

This is what you want to focus on (at least, if you want to achieve the best outcome), and what I have come back to in the previous two posts now:

 

LainRobertson_0-1679524942541.png

 

The following is not sound advice and the final paragraph is factually incorrect:

 

LainRobertson_1-1679525074505.png

 

The error from the AAD Connect installation wizard doesn't strictly relate to speaking directly with a FSMO role holder, either.

 

As I outlined earlier, this error can be resolved by doing the following in the remote forest:

 

  1. Create a new site for the DMZ;
  2. Create a new subnet matching the IP range of the DMZ and attach it to the site;
  3. Create a new subnet matching the IP range of your central subnet hosting your AAD Connect hosts and attach that to the DMZ site.

 

You'll now find that the netlogon.log entries stop and the DMZ domain controller is reliably returned as the appropriate domain controller.

 

Separately, don't edit domain controller-owned DNS SRV records directly. This can break your environment if you're not completely aware of the effects of doing this. You'll likely also lose any changes you made within 24 hours when the NETLOGON service from the owning host refreshes the record (unless automatic registration has been disabled, which also should not be done unless there's a compelling reason.)

 

Cheers,

Lain

@LainRobertson Thanks yet again Lain, for the detailed action plan. Since the site for DMZ already existed and the subnet with IP address was also attached to it, I completed the last important step by creating new subnets for the IP ranges of ADC staging & prod servers and attaching that to the DMZ site. However, this still hasn't resolved the issue and the errors persist at both ends. 

 

I've captured fresh network traces at both sides, can't upload them here as .cab & .etl files are not allowed. I have serious trouble reading them using Netmon. Will ask MS to analyze them.

@LainRobertson MS is suggesting DNS Split brain policy for which our management is not agreeing as 40 other countries didn't need it. Is it ok to allow the network connectivity with the internal DC network in addition to the DMZ DC? Would that be a good approach considering that AD Connect does contacts all DCs once while setting up the directory. Does this happen in other companies too? Seems to me like defeating the purpose of the DMZ altogether.

@Ajay_Joshi 

 

The short answer is I don't know, as I'm not privy to how the AAD Connect installation process works to that kind of depth.

 

You may not have a choice except to allow the AAD Connect host to have full internal access to all 40 forest/domains. The references on the AAD Connect requirements pages are vague and don't make it clear if what you're trying to do with a DMZ approach will or will not work.

 

I just finished some testing against the .NET "System.DirectoryServices.ActiveDirectory.Domain" class - which I expect is used in the installation wizard, in conjunction with some high-level Wireshark tracing and the takeaway is that the .NET class produces numerous domain controller references with no way to limit or order them.

 

LainRobertson_0-1682415278541.png

 

The AAD Connect wizard could be utilising any of the properties - from the multi-valued "DomainControllers" to the single-valued xxxRoleOwner (and the Forest class is more of the same.)

 

If my assumption is correct then DNS is not going to help you solve this problem at all.

 

Again, the MIM Active Directory agent does not work this way which is why it can be directed to use a specific domain controller, and because the AAD Connect sync engine is indeed just MIM, it could work under this DMZ model. But if you can't get past the AAD Connect installation wizard for the above reasons then it counts for nothing.

 

You might be tempted to "hack" around the problem by temporarily granting the AAD Connect host access to all 40 remote locations' domain controllers but, of course, you'll only potentially run into the same issue when you have to re-run the wizard, making it not worth it.

 

DNS isn't going to solve your requirements on its own. Either give your central AAD connect host access to all remote domain controllers or stick with MIM. Or petition Microsoft via a design change request to alter the AAD Connect wizard to allow for the specification of a specific domain controller - if you're an influential-enough client to take that path.

 

Cheers,

Lain

@LainRobertson Thank you so much for testing this out to the depth of the .Net class and ascertaining that DNS manipulation won't necessarily resolve our issue. These are indeed great details and finally I did manage to get hold of someone who knows Multi-forest identity sync very well. 

 

Just 1 correction here and fortunately, we are dealing with this issue on just 1 forest right now and only 2 more are remaining which don't pose this same error. So, we need to allow the internal network access to the AD Connect's subnets for just 1 country/directory/forest. I believe all the previously onboarded 40-odd directories have this allowed probably, it increasingly feels like a pre-requisite now. I have actually, already asked the country guys to arrange for this configuration. Will post the update when it finally works.

 

The other 2 remaining forests pose the "Get-Forest not found" error and not the Get-Domain error. That one is a typical error due to lack of network connectivity or Yellow DC not being able to resolve all DCs to their FQDNs. They will get resolved as we have faced them earlier as well.

 

@LainRobertson Thanks you so much again for pointing us in the right direction. We have finally been able to resolve this issue. In the end, it was not a DNS or Network issue. Here are the steps that resolved it:-

1) By narrowing down the error in PowerShell by running just the Confirm-ValidDomains or Get-ForestFQDN commands and simultaneously running a live network capture on Netmon, we got a Netlogon error however, the service was always running in the Yellow DC and restarting it as well didn’t change anything. 
Netlogon error.png

2) Then finally the DS Engineers from MS team asks to collect the Netlogon & DC Diag logs. It becomes clearly visible that the Yellow DC isn’t advertising itself, hence we are asked to enable the SysvolReady flag is the Registry Editor by setting its value to 1. Upon rebooting, the AD Connect validation tests are successful.