My name is Ron Arestia, and I am a Security Researcher with Microsoft’s Detection and Response Team (DART). We respond to customer cybersecurity incidents to assist with containment and recovery from threat actors.
We are actively honing our processes to help customers protect their environments before the worst-case scenario becomes a reality. In this article, I’m going to go over four critical focus areas for hardening your AD Certificate Services environment based on several years and dozens of customer engagements focused on assessing and remediating configurations:
- Offline root is non-negotiable
- Audit certificate services (the right way)
- Understand and apply role separation
- Clean up your templates and their permissions
Let’s get started!
Offline root certificate authority is non-negotiable
Across dozens if not hundreds of customer environments, I’m consistently surprised by how many agencies are running a root certificate authority (CA) fully connected to their network and, in most cases, connected to the Internet. The most common public key infrastructure (PKI) for enterprises is a simple two-tier hierarchy with an offline root CA and one or two issuing CAs. The Internet Engineering Task Force (IETF) has several requests for comment (RFCs) covering PKI, and a number of public entities including the US Federal Government’s NIST (SP 1800-16D), UK’s NCSC, Entrust, and Microsoft recommend that the root CA in any PKI be completely disconnected from any networks. I’ve worked with some high security agencies with policies where if a root CA is ever connected to a network, it is considered compromised (assume breach in extremis) and must be destroyed and rebuilt. Why is this such a difficult concept to implement?
The most common refrain I get from customers is: “We inherited this infrastructure. The people who built it are no longer with the company.” Fair enough. Turnover is a thing, and historically PKI systems are built up and left alone with exceptionally long CA certificate validity periods (longest I’ve seen was 100 years). Unfortunately, a functional PKI needs regular care and feeding at the same level expected of, say, your Active Directory domain controllers. Leaving the root CA on a network does provide some benefits: it can be patched regularly, monitored centrally, easily accessed. What if I told you that none of those things are necessary?
First, a truly offline root CA can be stood up from a clean source, and it never needs to be patched. This makes the eyes twitch of any security engineer or leader in an office, because patching is far and away the most important method to keep attackers at bay or at least make them work for access. But the server is not connected to a network. What does it matter if there’s a 10.0 CVE published for the operating system if the machine is in a locked cabinet in a secure datacenter with no connection to any network? Can it be exploited? A threat actor would have to directly access your datacenter, open the locked cabinet, attach a crash cart, and be able to login or exploit the vulnerability locally to make use of it. That level of effort is absurdly high for even the most determined state actors.
Second, why do you need to monitor a system that’s not on the wire? The most important logical piece of information for any CA is its private key. That private key resides on the hard drive of the system if you’re not leveraging a hardware security module (HSM). If that hard drive is in a server in the datacenter, even if the server itself suffers a failure of some type, the private key material is recoverable. Further, you should make a copy of the relevant certificates, their private keys, Active Directory Certificate Services (ADCS) registry and configuration, and the database and back it up to a USB drive regularly or at least every time you boot the system to do a signing operation. That USB drive should be placed in a fire safe or a safe deposit box as a precaution. If the physical hardware suffers the worst electronic fate imaginable, you have a way to recover that private key. Monitoring isn’t really necessary for your offline root CA except maybe to go into the datacenter on occasion and make sure it’s still in its rack.
Finally, the root CA should be deliberately difficult to access. If you, as the trusted administrator, have to take pains to get to the physical system, any persistent threat actor is going to find it impossible short of some plot that you would only see in the most outrageous Hollywood heist script. Signing operations from the root CA should be rare. The most common, regular operation is to sign the root CA Certificate Revocation List (CRL). A common PKI configuration would have a root CA CRL lifetime somewhere between 12-18 months. This provides a couple of advantages: 1) it forces you to actually acknowledge your root CA’s existence which usually begets proper documentation to ensure engineers know how to access and when; and 2) it ensures that you’re keeping the CRL fresh. Nothing instills more confidence in a PKI than a CRL that was signed in 1998, am I right?
Also remember this often-misunderstood concept with CA operations: a certificate authority can only issue certificates with a validity period lifetime within the validity period of its own certificate. As an example, if you have a root CA with a 10-year validity period and an issuer with a 5-year validity period, if you wait until year 8 to reissue the root CA’s certificate but sign a new certificate for the issuer at year 7, guess how long the issuer’s certificate is valid? Hint: it’s not 5 years (it’s only 3). This usually becomes problematic when issuing CAs are neglected and some web developer who relies on certificates with 2-year validity periods suddenly receives certificates only valid for 18 months or less. This is an indication that either someone messed with the templates or the issuing CA’s validity period is coming to a close.
This last part is relevant to keep the conversation going about the critical nature of the root CA and that it should be treated with the same level of respect as any of your domain controllers, and arguably more scrutinized. It is the root of trust for every cryptographic operation in your environment. Protect those keys like you protect your housekeys. Talk to your manager and have them put it in the budget for an annual after-hours pizza delivery for you and the team to perform a “key signing party.” (It’s really a thing.) These foster trust among the team, ensure you have witnesses to the operations performed, and allow you to cross-train your long-time and new team members to understand what to do for regular CA operations. An offline root CA should not be something to fear for any reason. Along with an HSM, the private key is most secure on an offline root CA.
If you are not leveraging an offline CA, best practice is to plan to move that system offline. A proper two-tier hierarchy (offline root and enterprise issuer(s)) should mean that the offline root is just sitting there consuming resources. Shutting it off should not impact the functionality of your infrastructure barring extremely tight CRL publishing intervals. Those can be easily pushed out to allow for the root to be shutdown without impact. Remember, however, that many agencies would view an online root as a critical risk requiring a wholesale rebuild of the PKI (think: Zero Trust).
If you’re in a situation where you’re unsure how to move forward to protect your root CA, consider engaging your Microsoft Customer Success team or a Microsoft engineer to discuss next steps. We have several engagements to educate you, assess, and even migrate your solution to a modern PKI following industry best practices.
Audit Certificate Services (the right way)
Let’s say, however, you’ve got the offline root CA, your issuer is humming along, you’re all set, right? What are you capturing for the operations of your CA? Auditing in Windows is another surprisingly misunderstood concept for many customers, and while this is not intended as a primer on doing proper auditing (this is covered elsewhere), audit configuration for ADCS requires a couple of steps to get going. Refer to the Events to Monitor docs on all events generated for ADCS.
First, acclimate yourself with the ADCS server management console (certsrv.msc). This is your one-stop-GUI-shop for managing your CA. There’s an auditing tab where you can configure all of the events captured by the event logging subsystem.
Note the options available to you. Microsoft best practice is to check them all.
This correlates with a decimal value of 127 on the AuditFilter registry value which can be helpful to streamline installations using scripts instead of manually going through the wizard.
PS C:\Windows\system32> certutil -getreg ca\auditfilter
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\CertSvc\Configuration\Issuing CA 1\AuditFilter:
AuditFilter REG_DWORD = 7f (127)
CertUtil: -getreg command completed successfully.
If you have all of these boxes checked and/or the certutil attribute value is 127, congratulations! You’re halfway there. Next, we need to either change policy values through GPO or directly on the local machine using secpol.msc.
Note on policy subcategories: Years ago, Microsoft added the ability to configure granular auditing with a new setting “Audit: Force audit policy subcategory settings” which then allows you to get deeper audit results from the operating system than with the traditional event logs. This is required to get your ADCS logs into the proper event logs in Windows.
Once properly configured in both the ADCS server properties and security policy, relevant events will start to flow into the Application event logs with a CertificationAuthority source.
These events can be parsed locally or, ideally, offloaded to either Windows Event Forwarding collectors or a SIEM solution. SIEM solutions such as Sentinel are preferred to allow you to create workbooks and playbooks around events to identify anomalous behavior or just keep track of your CA health. Have a very in depth discussion with your SOC, security team, and/or your auditors to understand what and how you should monitor your PKI, and ultimately decide who is responsible for the events coming out of ADCS. Don’t let these events fall by the wayside. They are the most important bits of information flowing from your ADCS architecture giving you insights into operations and overall health.
A note on exit modules: Exit modules still present an exceptional opportunity to offload events and data from your PKI if desired, but for event management specifically, Windows event logs have proven to be a better option to standardize without worrying about SMTP functionality. It’s long been conventional wisdom that customers should configure SMTP exit modules to offload CA events, but this practice has grown out of favor with the transition to Internet-based SMTP providers such as Exchange Online that curtail unauthenticated email. If you’re still using SMTP relays on-premises for Exchange, it’s possible to configure and make use of SMTP exit modules for this process, but offloading Windows event logs provide the same level of detail and are highly extensible compared to SMTP.
Understand and Apply Role Separation
The Common Criteria for Information Technology Security Evaluation (Common Criteria) has long established the critical need for separation of duties in every IT organization. Role based access controls were first discussed and developed in the RBAC 987 publication as a way to reduce overprovisioning of administrative rights and to protect access to critical IT systems through the layering of rights based around roles. This speaks directly to the modern concept of “least privilege,” and it’s vital to protect your CA infrastructure with the same fervor as your directory services environment. I would argue the CA infrastructure is even more important than your AD infrastructure if not equal. Threat actors are going to go out of their way to try to access and disrupt operations of both your CA and AD infrastructures, and leaving the welcome mat out to your CA infrastructure makes it trivially easy for a determined threat actor to compromise your AD environment in short order.
The AD Certificate Services server properties lay out four very simple rights based around roles every agency should use to provide proper role based access control: Read, Manage CA, Issue and Manage Certificates, and Request Certificates.
Read permissions, in the context of the CA itself, allows for the visibility of the CA infrastructure configuration and reading of the CA database. This isn’t used by default but could confer the ability for auditors, for example, to review the CA configuration or the CA database itself.
Issue and Manage Certificates permissions should be limited to those who are designated as active administrators of the CA infrastructure outside of the operational configuration in the server properties. This role allows the holder to issue or deny certificates pending issuance on the CA and perform database operations. If you have certificates configured to require manager approval (more on this in the next section), then you need to have individuals, through a role group, assigned to review and issue requests when they’re sent to the CA to sign.
Do NOT overlook the power of this group! A threat actor who compromises a user with this role could assign certificates for their own uses undetected, and, since many templates are misconfigured due to misunderstanding or ignorance, it’s possible for someone to create very powerful certificates whether by accident or with malicious intent. At best, overlooking this role makes for extra administrative oversight and troubleshooting. At worst, your environment could be at risk.
Manage CA permissions allow the assigned role holders to configure anything on the CA directly. To put it simply, anything in the server properties dialog is fair game for someone with this right. It should be very limited, and you absolutely should configure a custom AD group to manage this right. Do NOT rely on Enterprise Admins or Domain Admins. If members of the Enterprise or Domain Admin groups are compromised, the entire PKI should be considered breached requiring a rebuild of the issuers, at a minimum, or the entire PKI if the root is online and domain joined. In cybersecurity incidents, some customers believe because a threat actor was not seen in their PKI they can safely continue to use it. If Enterprise or Domain Admins were compromised, as they often are, the PKI should be considered compromised as well.
Finally, Request Certificates permissions, as the name suggests, allows a user to actually request a certificate be issued to them with the security ACL providing additional constraints on a per-template basis. This right is granted to Authenticated Users by default and should not be changed. This is necessary for normal functionality of the PKI, and while you could assign other domain groups (i.e., Domain Users), Authenticated Users includes computer objects and is considered an acceptable risk for most PKI implementations.
While the definitions of these roles are fairly straightforward, the Devil is in the details, as they say. Are your Domain Admins configured to manage the CA as well as allowed to issue and request certificates? What happens if a Domain Admin user’s account is compromised? That would allow the threat actor full control over your CA. Your PKI is not a downstream impacted system, it is directly in line of the compromise and should be rebuilt.
Role Separation
Active Directory Certificate Services have an often-missed registry configuration available to help prevent role overlap that could put the CA at risk. It’s very important to note that once you configure this registry entry, you are setting the CA to require that everyone only have one assigned role per user. This means even if you’re a Domain Admin, if you’re configured across both Manage CA and Issue Certificates, for example, you’ll be denied access to both. For this reason, you should proceed with caution with this configuration. I could tell you many stories about customers calling frantic that they can’t administer their CA, and I’m not embarrassed to admit this happened to me as well prior to coming on board here at Microsoft. Thankfully you can revert the registry setting with local administrator access to the server itself, but that’s not the point. This should be a deliberate and understood change.
You can configure this setting quickly from the command line using the certutil.exe command:
certutil -setreg ca\RoleSeparationEnabled 1
A common refrain from customers on this is, “how do we manage our CA if we have to use different accounts for each feature?” In a small IT shop, yes, this presents some challenges. Since Microsoft’s Securing Privileged Access methodology has prescribed the use of purpose-built administrative accounts, this shouldn’t be too difficult to work around, but with some customers, we’ve had to recommend multiple accounts to perform specific tasks on the CA. This is by design. Don’t make it too difficult on yourself to manage your CA, but also understand that threat actors are going to discover and use every workaround you do. Putting additional accounts in front of critical functions can serve to slow down even the most persistent threat actor.
In the end, this setting is usually bypassed with some sort of compensating control or seen as an unnecessary step. If you have regulatory or compliance audits, ask your auditors or your IT Security administrators if this setting is necessary. Knowing it’s there can help you bypass hours of discussions and meetings about how to do the same thing with complex group configurations. At a minimum, you should always use dedicated administrative accounts for managing the PKI. End user accounts can be assigned to specific delegated functions, but best practice is that those accounts also be dedicated administrative accounts to prevent cascading failures in the event of a cybersecurity incident.
Clean up your templates and their permissions
With our ADCS Assessments, the grand majority of the time going over the findings is spent reviewing templates and their permissions. Entire treatises have been authored on this part alone. In the interest of brevity, we are going to look at a few of the most common misconfigurations to avoid. It’s important to understand that misconfigurations of the ACLs on templates represent one of the biggest threats to a PKI, as an attacker can leverage any template available to them regardless of the account they use, even if that account is someone in a non-IT related department. If a highly privileged certificate template such as Code Signing is gained by a threat actor, they can perform trusted operations through very simple processes.
Do not use the default templates
The default templates in AD are a sort of time machine. They represent templates configured from the very beginning of the product and some more modern implementations for better practice. They can be easily identified in the Certificate Templates view with a Schema Version of 1.
Ultimately, they should never be used by themselves. They are often lacking critical controls, and the older templates aren’t going to provide adequate cryptographic protections for modern computing. Best practice is to right-click on a relevant default template, duplicate it, and configure the new template to the standards you require.
(Mis)configuring Authenticated Users
Every single template will have Authenticated Users configured with Read permissions. Do not change this! This ACE allows all authenticated users (and computers) in the domain to list the template for use. NEVER configure this ACE alone. Again: do not change the Authenticated Users ACE on ANY template!
Further, if Authenticated Users is left alone with just Read permissions, it’s usually not necessary to configure Read permissions on any other ACEs.
Default Administrative Rights
There are three ACEs in every ACL that should either be removed or changed: Domain Admins, Enterprise Admins, and an ACE created for you or the user who created the template. Remember in any conversation around Active Directory or ACLs in general, individual user attribution should be avoided at all costs. AD is a group-based identity solution, and ADCS is no different. Remove yourself or the user who created the template, as those rights should be given through membership in an appropriate group. It also helps prevent ugly unresolved SIDs later in the life of the CA.
Domain Admins and Enterprise Admins are configured by default for every template, as they are assigned rights to the container in AD where all of the templates reside. Best practice is to strip Domain Admins and Enterprise Admins of these rights during installation of ADCS and replace them with a dedicated CA Admins or similar group in AD. This goes a long way toward hardening your ADCS environment.
You should also have a dedicated group for template management (i.e., CA Template Managers) with Write permissions configured to allow for management of all of the templates in your PKI, and these should be limited to only this purpose to maintain control of your templates.
Autoenroll: the quickest way to bring your PKI to its knees
I can’t express enough how important it is to NOT use Autoenrollment on widely scoped templates. There are very few real-world use cases where autoenrollment is useful for the majority of enterprises. The most notable exception would be 802.1x certificates where a machine must have a certificate to get on your network. In this case, certificate mapping is one-to-one. Certificates should be issued purposefully and with limited scope. For example, do you really want every service account in your AD environment to grab a user certificate for Email Encryption? Or worse: Encrypting File System (EFS)? Do your Domain Admins really need any certificates outside of maybe smart cards? Are you certain all Domain Computers need a Server Authentication EKU certificate? Are you allowing user endpoints to host services that require it?
I’ve seen autoenrollment configured on over a dozen templates in one customer environment with an ADCS database over 20 GB in size. When we reviewed the templates, every single one was automatically issued with no oversight, and many users had dozens of certificates assigned to them with no apparent purpose. This also presents a major problem for Domain Controllers when more than one Server Authentication EKU certificate is present, making LDAP/S configurations tricky to troubleshoot.
Encrypting File System: a menace in disguise
Encrypting File System (EFS) has been around for decades. It allows users to encrypt individual files and folders to which they have access on their local machine and on network shares.
If necessary, this can provide security in a robust file infrastructure. Unfortunately, EFS is enabled by default in an Active Directory domain, and if left enabled, this will allow anyone with an EFS certificate to encrypt any data to which they have access.
Are you confident your file system ACLs are proper? If someone is terminated from your organization, and they used a certificate on their local machine to encrypt important files, do you have a back up of that EFS certificate private key to decrypt that data? If that machine is scavenged and destroyed, chances are that data is also lost. I’ve unfortunately seen this scenario across a half dozen customers, and Microsoft Support has no way to decrypt that data. It’s functionally gone unless you have backups from before the encryption was enabled.
First, have a very frank discussion with your IT Security team to determine if this functionality is even necessary. If not, turn it off via Group Policy. Then, scour your templates for any existence of the Encrypting File System EKU. If you’re not using EFS, and you have no intention to support EFS, then you should never issue certificates for this purpose. This would not prevent a tech-savvy user from generating a self-signed certificate for the same purpose, but any systems where group policy is properly applied would prevent access to the dialog in the file properties.
EFS is made redundant by disk encryption technologies such as BitLocker. Unless you have a very specific use case requiring it, we do not recommend using EFS. If you do need to use EFS, make sure you have Key Recovery Agents (KRAs) configured for your organization and the templates used to issue the EFS certificates are configured to archive the private key.
Understand Domain Controller Certificate Options
When you spin up your very first enterprise CA, the Domain Controller certificate template is automatically configured for issuance. To prevent this, make use of the CAPolicy.inf configuration to prevent the default templates from being published on a new CA:
LoadDefaultTemplates=0
When Active Directory detects that a new issuing CA is available, domain controllers will automatically request a relevant Domain Controller certificate to standardize their cryptographic operations to the new PKI. The primary purpose is to protect LDAP ports with a proper certificate, but this also serves client and server authentication purposes for smart cards, if used. It’s better to create a duplicate of the relevant template, boost the key length to 4096, issue for authentication-specific activities, and create a completely separate certificate to secure LDAP/S using the NTDS store.
Finally, to ensure consistency with Kerberos authentication, issue a template with the KDC Authentication EKU. Managing this centrally with ADCS is a great way to standardize your cryptographic health.
Never allow “Enrollee Supplied Subject” without manager approval
Saving the best for last… if you have templates where the subject is supplied in the request, save yourself the security headache and require manager approval on the template.
Why? How many times have you mistyped a domain name? If a certificate is issued with an errant letter, when the app dev folks start trying to get to the website, they’re going to be hit with certificate mismatch errors. Manager approval on a pending request allows you to check the vital aspects of the certificate to ensure everything is in order before issuance. It’s much easier to catch this early than to spend hours troubleshooting only to have to reissue another certificate. This also leads to database bloat, because it’s seldom that an admin will revoke the mis-issued certificate.
If you do NOT enable manager approval, anyone with the rights to request the certificate can have a certificate issued for anything they desire, and since these templates often also allow for the private key to be exported, you could wind up with rogue servers or websites being setup with no administrative oversight. This could happen by accident or, if a malicious user or threat actor gets access to your environment, they could generate certificates for their own purposes. I’ve seen instances where a threat actor created a Code Signing certificate from a misconfigured template, signed their malware, distributed it, and no one in the organization was any wiser, because that code was trusted by every system in the organization.
While this adds some administrative overhead, this is seldom met with disapproval by security admins and engineers, because it ensures there are checks in place to prevent accidents.
Conclusion
More than ever in our industry, security hardening and resiliency is critical to maintain the operational capabilities of our core infrastructure. If you have any doubts about the security of your core infrastructure, talk to your leadership about engaging Microsoft to help you harden it. We see far too many unfortunate situations arise where a threat actor has laid waste to on-premises Active Directory due to misconfigurations and security gaps. I hope this helped you all to start thinking about how to best accomplish a defense-in-depth strategy to vex even the most persistent threats. If you need any help with your PKI, please reach out to one of our many professionals. We love helping our customers get this right!
Dos and Don’ts
- Do leverage an offline root and HSMs, if financially feasible. The root CA private key is the most important logical data in any PKI!
- Do enable granular auditing of ADCS and offload logs to a SIEM to get a baseline of proper functionality for your PKI to detect when something is wrong or a security issue is occurring.
- Do leverage the ADCS agent for Microsoft Defender for Identity to monitor your PKI.
- Do leverage Microsoft Local Administrator Password Solution (LAPS) or similar solution to protect the local administrator account on your PKI infrastructure.
- Don’t use autoenrollment unless you’re certain you need it. It is one of the most common causes for issues in an ADCS PKI.
- Don’t rely on default permissions for your PKI. Replace Enterprise and Domain Admins with dedicated PKI Admins and Certificate Template Managers.
- Don’t leave Encrypting File System (EFS) enabled in your domain unless you’re using it.
- Do configure Key Recovery Agents and properly constrain the EFS template to users who need to use it if you are going to use EFS.
- Don’t use default templates. Duplicate templates and tailor them for the need and your enterprise.
- Don’t allow enrollee-supplied subjects without requiring manager approval for the certificate request.
- Do leverage Microsoft as a partner to harden and optimize your ADCS implementation.
Appendix
https://www.commoncriteriaportal.org/index.cfm
https://www.commoncriteriaportal.org/files/ppfiles/RBAC_987.pdf
https://learn.microsoft.com/en-us/windows-server/identity/ad-cs/certificate-template-concepts
https://learn.microsoft.com/en-us/windows/win32/seccrypto/exit-modules