Revisiting Enterprise Policy as Code v10
Published May 01 2024 08:47 AM 1,783 Views

As EPAC has reached version 10, it is time to revisit Enterprise Policy as Code (EPAC for short) to give you an update from the original post (https://techcommunity.microsoft.com/t5/core-infrastructure-and-security/azure-enterprise-policy-as-c...) published on September 12th, 2022.

The maintainers of the OSS project EPAC work daily with Microsoft’s customers implementing Azure governance and security in general and more specifically Policy implementation via EPAC. EPAC was born out of the need to manage Policy at scale, while dramatically reducing the cost of implementation with traditional Infrastructure as Code (IaC) tools, such as ARM, Bicep, and Terraform. Those tools are great for IaC in general; however, their lack the knowledge of dependencies between definitions, assignments, exemptions, and role assignments and the simplifications to Policy Assignments and Policy Exemptions. EPAC understands the dependencies and will sequence the deployment correctly.

EPAC consists of PowerShell scripts and a starter kit:

  • Deployment scripts to create deployment plans, deploy the created Policy plan, and deploy the created role assignment plans. They can be executed manually (not recommended) or any CI/CD tool capable of running PowerShell core.
  • Scripts for operational tasks related to Policy, for example: creating remediation tasks at scale, extracting documentation, etc. Note: I’m not covering them in this article.
  • Hydration scripts, for the initial setup of EPAC. This is a work-in-progress. One of the scripts can extract exiting Policy resources from Azure tenants in EPAC format to enable a smooth transition to EPAC.
  • Starter kit contains sample pipelines/workflows for Azure DevOps and GitHub.

For the details, please follow these links:

Blog Posts:

Getting started

Decide on your approach!

EPAC is extremely flexible as you can implement any Policy development workflow, branching strategy, CI/CD tool, organizational structure for single and multi-tenant scenarios. The key decisions are:

  • Consume EPAC as a PowerShell module, or by forking the GitHub repo.
  • Implement GitHub flow (simple) or Release flow (allows for staged deployment of changes) as your CI/CD and branching approach.
  • One centralized team (recommended) or multiple teams (by function, and/or hierarchical) managing Policy.
  • Handling existing Policy implementation by
    • Exporting them into the EPAC repository and subsuming all existing Policies into EPAC (recommended)
    • Enabling co-existence with the desired state strategy set to owned only. Owned only should be used for a short transitional period (weeks); keeping it longer leads to increasing difficulty managing your Policy deployments.

Implementing EPAC

EPAC deployment scripts

EPAC contains three scripts to deploy Policy. They are individual scripts to enable approval gates and implement the least privilege principle for the service principals executing the job/stages in CI/CD.

Heinrich_Gantenbein_0-1714577638491.png

 

EPAC environments

As with any other software development, Policy development requires a development and testing area for just Policy. This can be one or more EPAC environments.

Simple flow using GitHub Flow

In the simplest case you’ll deploy the developed Policy resources to your tenant root or pseudo root beneath tenant root for each tenant. The downside of this approach is that any mistakes in Policy development immediately impact deployments to production, breaking your solutions CI/CD and in rare cases could even break running systems. The obvious advantage is its simplicity. You would name such an environment with the generic word tenant, prod, or something descriptive of the tenants. If you have multiple tenants, your CI/CD will run multiple deployments (one per tenant).

Heinrich_Gantenbein_1-1714577696860.png

 

Release flow

If you have differentiated your Azure tenant or tenants into nonprod and prod environments, using Release flow (https://devblogs.microsoft.com/devops/release-flow-how-we-do-branching-on-the-vsts-team/) makes more sense. Steps:

  1. Develop Policy in a feature branch.
  2. Pull request into main, deploys Policy to nonprod after a successful PR merge.
  3. Let it “soak” in for a few days and observe if it causes any issues for your solutions.
  4. Create a releases branch deploys the changes to prod.

If you need to deploy prod Exemptions during the “soak” period, you need a way to fast-track those exemptions without deploying the Policy changes being “soaked”. This is done by creating a releases-prod-exemptions-fast-track branch which plans the deployment with ‘Build-DeploymentPlans ‑BuildExemptionsOnly’ and Deploy the Policies with Deploy-PolicyPlans. No role changes will occur in this pipeline.

Heinrich_Gantenbein_2-1714577696871.png

 

 

Global settings file

The global-settings file ‘global-settings.jsonc’ in the ‘Definitions’ folder for release flow would look like this.

 

 

{
    "$schema": "https://raw.githubusercontent.com/Azure/enterprise-azure-policy-as-code/main/Schemas/global-settings-schema.json",
    "pacOwnerId": "11111111-2222-3333-4444-555555555555",
    "pacEnvironments": [
        {
            "pacSelector": "epac-dev",
            "cloud": "AzureCloud",
            "tenantId": "77777777-8888-9999-1111-222222222222",
            "deploymentRootScope": "/providers/Microsoft.Management/managementGroups/mg-epac-dev",
            "desiredState": {
                "strategy": "full",
                "keepDfcSecurityAssignments": false
            }
        },
        {
            "pacSelector": "nonprod",
            "cloud": "AzureCloud",
            "tenantId": "77777777-8888-9999-1111-222222222222",
            "deploymentRootScope": "/providers/Microsoft.Management/managementGroups/mg-nonprod",
            "desiredState": {
                "strategy": "full",
                "keepDfcSecurityAssignments": false
            }
        },
        {
            "pacSelector": "prod",
            "cloud": "AzureCloud",
            "tenantId": "77777777-8888-9999-1111-222222222222",
            "deploymentRootScope": "/providers/Microsoft.Management/managementGroups/mg-enterprise",
            "managedIdentityLocation": "eastus2",
            "desiredState": {
                "strategy": "full",
                "keepDfcSecurityAssignments": false
            },
            "globalNotScopes": [
                "/providers/Microsoft.Management/managementGroups/mg-nonprod",
                "/providers/Microsoft.Management/managementGroups/mg-epac-dev"
            ]
        }
    ]
}

 

 

Policy Assignment and effect parameters

Using JSON for parameters works great for smaller Initiatives and single Policy Assignments. However, when assigning the big security and compliance-oriented Initiatives, such as ‘Microsoft cloud security benchmark’, ‘NIST 800-53’, and ‘CIS’ (often multiple of them), defining ‘effect parameters via JSON is cumbersome and time consuming. You will need to define hundreds or even thousands of parameters. I had a customer which had ~5000 lines of JSON just for the effect parameters. This makes the JSON file hard to maintain and completely unreadable.

EPAC solves this problem by reading them from a spreadsheet (CSV file). The spreadsheet only defines the Policy name and effect, while EPAC will figure out the parameter names and settings for all the assignments driven by this spreadsheet. If the Initiative does not parameterize the effect, EPAC will automatically generate ‘overrides’ to implement. Lastly, if the effect is Deny, EPAC will only set the Policy to deny in one of the Initiatives and set the effect to Audit for the remaining Initiatives; this prevents the already difficult to read error messages blocked by a Deny from getting more complex.

Efficient Exemption definitions

Normally when creating an Exemption for a Policy if that Policy is included in multiple Initiatives assigned (a frequent occurrence with built-in security and regulatory compliance Initiatives), you must define one exemption per Policy, per Assignment, and per Scope and find (tedious) the policyDefinitionreferenceId in the Initiative definition. For an average exemption, this can be tens or even hundreds of entries in the definition files.

Staring with v10.0.0, this can be simplified to one entry, defining instead of a policyAssignmentId and policyDefinitionReferenceId, the Policy definition Id or Name. EPAC will find all the assignments which include that definition either directly assigned, or due to being included in an assigned Initiative and create one exemption per relevant Assignment. EPAC will generate unique names and augment the displayName and description for the exemptions.

Staring in v10.1.0, instead of specifying one scope per entry, you can define a scopes array. EPAC will generate a set of exemptions for each scope while augmenting the displayName and description with the last part of the scope (or a string override in the definition). Assuming five Assignments containing the Policy definition with the specified Id would generate ten Exemptions. If you specified 16 scopes, that number will be an impressive 80 Exemptions.

 

 

  {
      "exemptions": [
          {
              "name": "short-name",
              "displayName": "Descriptive name displayed on portal",
              "description": "More details",
              "exemptionCategory": "Waiver",
              "scopes": [
                  "humanReadableName:/subscriptions/11111111-2222-3333-4444-555555555555",
                  "/subscriptions/11111111-2222-3333-4444-555555555556/resourceGroups/resourceGroupName1",
              ],
              "policyDefinitionId": "/providers/microsoft.authorization/policyDefinitions/00000000-0000-0000-0000-000000000000",
          }
      ]
  }

 

 

What we learned

Security and regulatory compliance Initiatives

Limit the number of assigned Initiatives to a handful or less. Always assign ‘Microsoft cloud security benchmark’; Defender for Cloud relies on the input generated by the included Policies.

Management Groups and Policy Resources

Custom Policy/Initiative Definitions and Policy Assignments need to be deployed at a scope. They should always be deployed at the top Management Group (MG) in each tenant. That MG should be the single MG (no siblings) underneath the “Tenant root group” as recommended by Microsoft (see https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/design-areas) or at the actual “Tenant root group” if you are not following Microsoft’s recommendation verbatim. Keep the management group names and display names the same readable name to keep Policy and RBAC elements readable. Do not use GUIDs or other obfuscated names for management groups.

Policy Assignments

Policies are inert elements in Azure until you create a Policy Assignment at a scope. Each assignment should:

  • Define semi-readable short name (limited to 24 characters by Azure)
  • Define a readable displayName (visible in Portal).
  • May have metadata, such as a work item id.

Assignments containing Policies with Modify or DeployIfNotExists Policies require a Managed Identity (MI). The MI must be granted Azure roles, as specified in the details section of the Policy rule. EPAC calculates these. I prefer System-assigned Managed Identity SPN (service principal names) since they cannot be used outside a single assignment, eliminating the minimal (Azure provides controls for the usage) threat of malicious usage. However, to reduce the number of role assignments, user-assigned MI can be used.

Custom Definitions

First question the need for any custom Policy/Initiative definition requested. While the built-in Policies are not perfect, the choices made are often made due to constraints and conflicts between settings and include tradeoffs in risk versus usability. If you still think you need custom definitions, sleep on it, and revisit the topic one more time.

If you have multiple tenants, the same definition should be propagated to every tenant (DRY principle) by EPAC. Do not use a separate repo which would cause copy/paste issue (WET anti-pattern).

Policy Exemptions

Even with the best intentions some Policies may get in the way. If there is a business reason within acceptable risk parameters, you can grant an Exemption.

Exemptions come in two flavors (without any technical meaning):

  • Mitigated – Most often used for permanent exemptions. An example is allowing public IP addresses for a storage account which is used as an upload folder AND mitigations, such as Virus scans and deleting processed data.
  • Waiver – Most often used for temporary exemptions to allow a solution team to fix their non-compliant deployment. Generally granted until Monday after the ETA (estimated time of arrival) for the fix.

Exemptions allow metadata. Add a link in metadata to the work item (e.g., Azure DevOps work item, GitHub issue, Jira ticket, etc.) to keep a record of why the exemption was granted and who granted it.

If you exempt an entire subscription with a Mitigated, it is likely that you should have used notScope (called Excluded Scope in Azure Portal) in the Assignment instead.

Warning: When you delete a Policy Assignment with Exemptions, then the Exemptions are not deleted and become orphaned.

Operating Azure Policy

Operational tasks (e.g., Remediation tasks, generating documentation) must be scripted. Do not use CI/CD tools to execute operational tasks since CI/CD is intended to deploy resources, not to operate those resources.

Keeping track of built-in Policy changes

I frequently consult AzAdvertizer (https://www.azadvertizer.net/). In addition, I keep track of changes by cloning and following Microsoft’s official Azure Policy repo on GitHub (https://github.com/Azure/azure-policy/tree/master/built-in-policies). When I receive an email about a merged PR (pull request), I’ll fetch the latest version from GitHub into my clone. This allows me to use Visual Studio Code on my local clone instead of using Azure Portal or GitHub web interface.

That’s it for this round

Remember to thoroughly test the code and policies in a safe environment before deploying to production. If there are any issues with the code, please raise a GitHub Issue.

Until next time.

Disclaimer

The sample scripts are not supported under any Microsoft standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.

Version history
Last update:
‎May 01 2024 08:47 AM
Updated by: