This is the second article in a series:
Stop Worrying and Love the Outage, Vol I: Group Policy and Sharing Violations
Stop Worrying and Love the Outage, Vol II: DCs, custom ports, and Firewalls/ACLs
Stop Worrying and Love the Outage, Vol III: Cached Logons
Hello, it's Chris Cartwright from the Directory Services support team again. This is the second entry in a series where I try to provide the IT community with some tools and verbiage that will hopefully save you and your business many hours, dollars and frustrations. Here we’re going to focus on some major direct and/or indirect changes to Active Directory that tend be pushed onto AD administrators. I want to arm you with the knowledge required for those conversations, and hopefully some successful deflection. After all, isn’t it better to learn the hard lessons from others, if you can?
Periodically, we get cases for replication failures, many times, involving lingering objects. Almost without fail, the cause is one of the following reasons:
Network communications issues almost always come down to a “blinky box” in the middle that is doing something it shouldn’t be, whether due to defective hardware/software, misconfiguration or the ever-present misguided configuration. Today, we’re going to focus on the third, a misguided configuration. That is to say, the things that your compliancy section has said must be done, that have little to no security benefit, but can easily result in a multi-hour, multi-day, or even (yes) multi-week long outage. To be fair, the portents of an outage should be readily apparent with any monitoring in place. However, sometimes reporting agents are not installed, fail to function properly, or misconfigured or events themselves are missed (alert fatigue). So, one of things to do when compliancy starts talking about locking down DC communications is to ask them...
The primary effect of doing any of this is alert fatigue for replication errors, which is a path to outage later. Additionally, if you have “Bridge all site links” enabled, you are giving the KCC the wrong information to create site links.
Every AD administrator should be familiar with the firewall ports and how RPC works. By default, DCs will register on these port ranges and listen for traffic. The RPC Endpoint Mapper keeps track of these ports and tells incoming requests where to go. One thing that RPC Endpoint Mapper will not do is keep track of firewall or ACL changes that were made.
Again, what is the security benefit here? It is one thing to control DC communications outbound from the perimeter. It is another thing to suggest that X port is more secure than Y port, especially when we’re talking about ports upon which DCs are listening. If your compliancy team is worried about rogue applications listening on DCs, you have bigger problems..like rogue applications existing on your DCs, presumably put there by rogue agents who now have control over your entire Active Directory.
The primary effect of “locking down” a DC in this way is not to improve security, but to mandate the creation or modification of some document with fine print, “Don’t forget to add these registry keys to avoid an outage”, that will inevitably be lost during turnover. Furthermore, going too far can lead to port exhaustion, another type of outage.
It is true that once you’ve found the network issue preventing replication that it is usually an easy fix. However, if the “easy” fix is to rehost all your global catalog partitions with tens of thousands of objects on 90+ DCs, requiring manual administrative intervention, and a specific sequence of commands because your environment is filled with lingering objects, you’re going to be busy for a while.
As the venerable Aaron Margosis said, “…if you stick with the Windows defaults wherever possible or industry-standard configurations such as the Microsoft Windows security guidance or the USGCB, and use proven enterprise management technologies instead of creating and maintaining your own, you will increase flexibility, reduce costs, and be better able to focus on your organization’s real mission.”
Security is critical in this day and age, but so is understanding the implications and reasons beyond some check box on an audit document. Monitoring is also critical, but of little use if polluted with noise. Remember who will be mainlining caffeine all night to get operations back online when the lingering objects start rolling in, because it will not be the people that click the “scan” button once a month…
Creating a Site Link Bridge Design | Microsoft Learn
You Are Not Smarter Than The KCC | Microsoft Learn
Configure firewall for AD domain and trusts - Windows Server | Microsoft Learn
RPC over IT/Pro - Microsoft Community Hub
Remote Procedure Call (RPC) dynamic port work with firewalls - Windows Server | Microsoft Learn
Restrict Active Directory RPC traffic to a specific port - Windows Server | Microsoft Learn
10 Immutable Laws of Security | Microsoft Learn
Sticking with Well-Known and Proven Solutions | Microsoft Learn
Chris “Was it really worth it” Cartwright
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.