Azure Policy Remediation with Deployment Scripts
Published Aug 21 2020 12:00 AM 15.2K Views
Microsoft

How many times have you wanted to remediate a non-compliant object using Azure Policy but found you can’t because the policy language or type of object can’t be manipulated in that way. Or maybe you’ve had to write a policy with an audit effect instead of being able to create a deployment to remediate the issue. Deployment Scripts are currently in preview and allow you to execute PowerShell or CLI scripts using Azure Container Instances as part of an Azure Resource Manager template. To put it simply – now you can run a script as part of a template deployment.

 

So my thought was if I can deploy a template using an Azure Policy DeployIfNotExists effect – why can’t I deploy a Deployment Script object which then runs the code to remediate my non-compliant Azure resource?

 

Well as it turns out you can! And this allows several interesting use cases which are not possible with the default policy language such as: -

  • Deleting orphaned objects.
  • Changing the license type to Hybrid Benefit for existing Azure machines.
  • Detailed tag application – running a script to build a tag value based on many other resources or conditions.
  • Performing data plane operations on objects like Azure Key Vault and Azure Storage.

The rest of this post takes you through how I set this functionality up to ensure that all Windows virtual machines are running with Azure Hybrid Benefit enabled.

 

Azure Policy

I won’t go into the details of creating the basic Azure Policy rules however you want to ensure that the effect for your policy is DeployIfNotExists. In my case I started of with a very simple rule which will help filter out resources I’m not interested in – so as part of the rule I’m looking for resource types which are virtual machines, and that have Microsoft Windows Server as the publisher.

 

 

{
    "if": {
        "allOf": [
            {
                "field": "type",
                "equals": "Microsoft.Compute/virtualMachines"
            },
            {
                "field": "Microsoft.Compute/virtualMachines/storageProfile.imageReference.publisher",
                "equals": "MicrosoftWindowsServer"
            }
        ]
    }
}

 

 

 

For the policy effect I specify DeployIfNotExists and then retrieve the same object and apply some more checks to it. This time as part of the existence condition I’m going to check the license type field to check if it is correct.

 

 

"existenceCondition": {
    "allOf": [
        {
            "field": "Microsoft.Compute/virtualMachines/licenseType",
            "exists": true
        },
        {
            "field": "Microsoft.Compute/virtualMachines/licenseType",
            "equals": "Windows_Server"
        },
        {
            "field": "Microsoft.Compute/virtualMachines/licenseType",
            "notEquals": "[parameters('StorageAccountId')]"
        }
    ]
}

 

 

 

In the JSON above there is a check for the StorageAccountId parameter which has nothing to do with the object we’re running the policy against – but I need to include it as I’ve used it as a parameter for my policy. It will always return true, so it’s not really included as part of the evaluation and by itself won’t trigger the deployment. (If you try to add a policy without consuming all the parameters in the policy rules you will get an error).

 

The rest of a DeployIfNotExists policy contains the object I want to deploy, and it does get a bit complicated. If I was to just deploy the deployment script object it would deploy in the same resource group as my resource to be remediated which isn’t a desirable outcome as it would leave a mess of orphaned objects. The deployment script also requires a storage account to work and I don’t want my subscription littered with random storage accounts. To get around this I create a subscription level deployment – which deploys a deployment resource, which contains a nested deployment to deploy the deployment script. Confused? Here it is in a diagram and you can follow the previous links or look at the policy itself.

 

policy01.png

The best part is we don’t have to manage the Azure Container Instance as the deployment script object does that for you.

 

What I do have to worry about is the script that runs – it uses a user assigned managed identity which must have permission manage the resources, in this case I need to give it Reader and Virtual Machine Contributor rights on the subscription so it can change that license type and update the virtual machine.

 

The PowerShell script which runs is so simple: -

 

 

Param($ResourceGroupName, $VMName)
$vm = Get-AzVM -ResourceGroupName $ResourceGroupName -Name $VMName
$vm.LicenseType = "Windows_Server"
Update-AzVM -VM $vm -ResourceGroupName $ResourceGroupName

 

 

 

The container instance comes with the Az modules already or if you prefer to use the Azure CLI you can specify that in the deployment script object in the template. The script can be either be provided inline or link to an external URL. If you are linking externally and don’t want it to be in a public location, you might have to provide a SAS URL. I also specify arguments to provide to the script in a concatenated string format, the documentation on the deployment script provides some more information on these arguments but you can incorporate parameters from the policy which means the inputs can come from the non-compliant objects. As well you can choose to use an existing storage account, or you can let the deployment script create one for you.

 

 

"forceUpdateTag": "[utcNow()]",
"azPowerShellVersion": "4.1",
"storageAccountSettings": {
    "storageAccountName": "[parameters('StorageAccountName')]",
    "storageAccountKey": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', parameters('StorageAccountName')), '2019-06-01').keys[0].value]"
},
"arguments": "[concat('-ResourceGroupName ',parameters('VMResourceGroup'),' -VMName ',parameters('VMName'))]",
"retentionInterval": "P1D",
"cleanupPreference": "OnSuccess",
"primaryScriptUri": "https://raw.githubusercontent.com/anwather/My-Scripts/master/license.ps1"

 

 

 

I’ve linked the policy rule here for you to review, be careful to observe the flow of the parameters as  they are provided in the policy assignment and then are passed down through each deployment object as a value.  The ‘StorageAccountName’ parameter is a good example of this.

 

Deploying the Solution

Scripts and policies are located in my GitHub repository – you can clone or download them.

The steps to deploy the required resources and policy are as below: -

  1. Ensure that you have the latest version of the Az PowerShell modules available.
  2. Connect to Azure using Connect-AzAccount
  3. Modify the deploy.ps1 script and change the values where indicated.

 

 

$resourceGroupName = "ACI" # <- Replace with your value
$location = "australiaeast" # <- This must be a location that can host Azure Container Instances
$storageAccountName = "deploymentscript474694" # <- Unique storage account name
$userManagedIdentity = "scriptRunner" # <- Change this if you don’t like the name

 

 

     4. Run the deploy.ps1 script. The output should be like below.

policy02.png

 

The script will create a resource group, storage account and deploy the policy definition.

 

Create a Policy Assignment

In the Azure portal Policy section, we can now create the assignment and deploy the policy. Click on “Assign Policy”.

 

pol1.png

 

Select the scope you want to assign the policy to and ensure that the correct policy definition is selected.

pol2.png

 

Click next and fill in the parameters – the values for this are output by the deployment script. 

pol3.png

Click next – you can leave the options as is for this screen and simply click Review and Create. On the final screen just click create.

 

The policy will be assigned, and a new managed identity will also be created which allows us to remediate any non-compliant resources.

pol4.png

 

Testing It Out

To test the policy and remediation task I have built a new Windows Server making sure that I haven’t selected to use Azure Hybrid Benefit.

pp1.png

Once the policy evaluation cycle is complete (use Start-AzPolicyComplianceScan to trigger) I can see that my new resource is now showing as non-compliant.

pp2.png

 

I can go in now and create a remediation task for this machine by clicking on Create Remediation Task. The task will launch and begin the deployment of my Deployment Script object.

pp3.png

 

I can check the resource group I specified (ACI) that the deployment script objects are created in and will be able to see the object in there.

pp4.png

 

Selecting this resource will show the details about the container instance that was launched (it’s been deleted already since the container has run) and the logs. You can also see that during the script deployment it has been able to bring the parameters I specified in the template into the script.

pp5.png

 

 And finally, we can check the virtual machine itself, and I find that the Azure Hybrid Benefit has been applied successfully.

pp6.png

 

When I look at the resource now in the Azure Policy blade it is now showing the resource as compliant.

pp7.png

 

So there you have it, what started as a theory for remediating objects has been proven to work nicely and now I have the task of looking over all my other policies and seeing what I can remediate using this method.

 

Known Issues:

  • In the example given – Azure Spot instances can’t be remediated using this process
  • My testing cases are small and in no way should reflect your own testing.
  • This is hosted on GitHub – if there are issues or you make changes please submit a PR for review.

 

Disclaimer:

The sample scripts are not supported under any Microsoft standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.

6 Comments
Version history
Last update:
‎Aug 20 2020 01:41 AM
Updated by: