Deep Dive: Secure Orchestration of Confidential Containers on Azure Kubernetes Service

mahuber · ‎May 17 2024

Introduction

Building on our previous blog post about Confidential Containers on Azure Kubernetes Services (AKS) powered by Azure Linux, this blog post dives into the design and implementation of the stack’s security policy. The security policy feature is a critical building block for the trustworthy orchestration of confidential Kubernetes workloads on IaaS platforms. The feature protects the interface between the cloud provider’s stack and the user's trusted computing base (TCB). The user's confidential workloads run inside the TCB within virtual machines (VMs) which are encrypted by a hardware-based Trusted Execution Environment (TEE), such as AMD Secure Encrypted Virtualization-Secure Nested Paging (SEV-SNP). Trust in the security policy and its enforcement can be established via remote attestation. We will explore establishing this trust and how end users can generate and apply security policies using our new genpolicy tool.

Protecting the Trust Boundary Interface

Figure 1: Simplified overview of the components of the Confidential Containers computing stack.

One of the main components of the Kata Containers system architecture is the Kata Agent, which we will refer to as Agent. When using Kata Containers to implement Confidential Containers, the Agent is executed inside the hardware-based TEE and therefore is part of the TCB. As shown in the Figure 1, the Agent provides a set of ttRPC APIs allowing the system components outside of the TEE, i.e., the Kata Shim, to create and manage Kubernetes pods inside confidential VMs (CVMs) transparent to the Kubernetes stack. From a confidentiality standpoint, the Kata Shim to Agent communication represents a control channel crossing the TCB boundary, which is why the Agent must protect itself from potentially malicious Agent API calls.

To systematically secure this control channel, we designed and implemented a security policy feature for the Kata Containers project, known as the Kata “Agent Policy” feature. This feature allows the owner of a confidential pod deployment to specify a document articulating the security policy a priori to running the pod. This policy document dictates what API calls are allowed and disallowed for the pod.

The policy document can be added in the form of an encoded string as an annotation to Kubernetes pod manifests, allowing the policy document to naturally travel through kubelet and containerd to the Kata Shim, which we will refer to as Shim. The Shim then provides the policy document to the Agent during early CVM initialization. Since the policy document travels through components that are not part of the TCB prior to reaching the Agent, the policy is not inherently trustworthy at CVM initialization. We can establish trustworthiness through remote attestation which will be explained in an upcoming section.

Structure of the Security Policy Document

The security policy document is composed using the Rego policy language and describes all the Agent’s ttRPC API calls along with their parameters that are expected for creating and managing the confidential pod. This section takes a closer look at the three high-level sections of the policy document – the rules, default values and data sections.

Rules

The rules section is a static part of the policy document, independent of the individual pod deployment. Rules express the semantics for validating API calls, and in particular implement input parameter validation for parametrized calls. An example for a simple rule is the one for the unparametrized WriteStreamRequest call which explicitly enforces that the call can only be made if the policy document’s default value for the call is set to true:

WriteStreamRequest {
    policy_data.request_defaults.WriteStreamRequest == true
}

Let’s now look at a rule for the parametrized CreateContainerRequest call which implements input parameter validation:

CreateContainerRequest {
    i_oci := input.OCI
    i_storages := input.storages
    ...

    some p_container in policy_data.containers

    p_pidns := p_container.sandbox_pidns
    i_pidns := input.sandbox_pidns
    p_pidns == i_pidns

    p_oci := p_container.OCI

    p_oci.Version == i_oci.Version
    p_oci.Root.Readonly == i_oci.Root.Readonly
    ...

    p_storages := p_container.storages
    ...
}

This rule validates all input parameters by comparing them with the expected parameters based on the document’s data section and rejects when a change to fields like the command line, storage mount, execution security context, or environment variables is detected. In the code snippet, the variables starting with “i_” are the input parameters whereas the variables starting with “p_” represent the expected values based on the policy document’s data section.

Default Values

Default values for API calls determine the behavior when no rule for a given call was positively evaluated:

default CreateContainerRequest := false

The default value of false means that any CreateContainer API call will be rejected unless a set of policy rules explicitly allows that call.

default GuestDetailsRequest := true

The default value of true means that calls from outside of the TEE to the GuestDetailsRequest API are always allowed to be executed. One would set this default value to true when the data returned by this API is not considered sensitive for confidentiality of the workloads.

Data

The data section contains expected values that are derived from a Kubernetes pod manifest and that are compared during policy rule evaluation with the actual values from the input parameters of a ttRPC API request. With this, the data section directly depends on the individual pod deployment with its containers. Based on the result of the comparison between the values, a rule can either allow or deny the call by returning true or false.

Coming back to the above rule for CreateContainerRequest, all the characteristics of a container are specified in a fine-granular way in the policy document’s data section: image integrity information, command line, storage volumes and mounts, the execution security context, environment variables, and other fields from the Open Container Initiative (OCI) container runtime configuration. An example for the command line section is the following:

policy_data := {
    "containers": [
        {
            "OCI": {
                    ...
                    "Args": [
                        "/bin/sh"
                    ],
                    ...
                },
                ...
            },
            ...

Any diverging command line observed in the CreateContainerRequest for the given container will be rejected by policy. Another example is for the validation of the storages input field of the CreateContainerRequest:

policy_data := {
    "containers": [
        {
            "OCI": {
                ...
            },
            "storages": [
                {
                    "driver": "blk",
                    "driver_options": [],
                    "source": "",
                    "fstype": "tar",
                    "options": [
                        "$(hash0)"
                    ],
                    "mount_point": "$(layer0)",
                    "fs_group": null
                },
                ...

This example shows how the security policy constrains the way block devices can be mapped from the host into the CVM. In this example, a tar filesystem type block device is expected to be mapped to a certain mount point into the CVM.

Policy Enforcement in the Kata Agent

The Agent is responsible for enforcing the security policy by evaluating the policy for each Agent ttRPC API call. We implemented the enforcement of the security policy using the Open Policy Agent (OPA) – a graduated project of the Cloud Native Computing Foundation (CNCF). Before carrying out the actions corresponding to the API, the Agent queries OPA by using the OPA REST API to check if the policy rules and data allow or block the call. The Agent provides the policy document and all input data from the API request parameters as a JSON format representation to OPA. OPA uses the rules to check if the inputs are consistent with policy data. OPA tries to find at least one rule with the same name as the ttRPC API call to return true while considering the call’s potential input parameters.

For example, when the Agent receives a CreateContainerRequest call, any rules defined in the policy that are using the name CreateContainerRequest are evaluated. OPA evaluates these rules and tries to find at least one CreateContainerRequest rule that returns value true. If at least one CreateContainerRequest rule returns true, OPA returns a true result to the Agent, and the Agent creates the container as requested by the Shim. On the other hand, if the API inputs are not allowed by the document’s rules or if no rule exists, OPA returns the default value for that API to the Agent, or false when no default value is supplied. In the case false is returned, the Agent rejects the API call by returning a “blocked by policy” error message.

We achieved this behavior by adding a gate to the Agent’s RPC interface implementation for each call. We added the is_allowed() function call early in every call handler:

async fn exec_process(…) -> ttrpc::Result<Empty> {
    ...
    is_allowed(&req).await?;
    ...
}

The function enforces above-described logic and can be found in the Agent policy implementation.

An important policy enforcement aspect of the CreateContainerRequest call is the Agent’s protection of the integrity of block devices, as described in the example for the storages input field of the CreateContainerRequest from the previous section and replicated below.

policy_data := {
    "containers": [
        {
            "OCI": {
                ...
            },
            "storages": [
                {
                    "driver": "blk",
                    "driver_options": [],
                    "source": "",
                    "fstype": "tar",
                    "options": [
                        "$(hash0)"
                    ],
                    "mount_point": "$(layer0)",
                    "fs_group": null
                },
                ...

As each container image layer is exposed as a read-only virtio block device to the CVM, the Agent protects the integrity of these block devices using the dm-verity technology of the CVM’s Linux kernel, enforcing the root value of the dm-verity hash tree through policy enforcement. The policy document’s data section contains the expected root values of the dm-verity hash tree for each container image layer, hash0 in the above example. These root values are verified at runtime by the Agent via calling OPA to compare the received input values with the expected values using policy rules semantics as defined by the policy document. With this, not only the security policy enforcement but also the integrity of the container image layers can be verified by remote attestation, as described next.

Security Policy and Remote Attestation

Before handling sensitive information, confidential workloads should perform remote attestation to prove to any relying party that exactly the desired workload with the user’s desired policy, using exactly the expected versions of the TEE, and of the CVM’s software stack has been orchestrated by the control plane.

Figure 2 depicts the confidential container creation flow starting with a user deploying a pod manifest to running the workload in the CVM. The pod manifest depicted in orange color reaches the Shim which in turn brings up the CVM with the help of the VMM and HV. The Shim uses the CreateVM call the VMM exposes through its API.

Figure 2: Simplified CVM and container creation flow of Confidential Container implementations.

Before triggering this call, the Shim computes the SHA256 hash of the user-provided policy document that the VMM uses to set a field measured by the TEE. In the case of AMD SEV-SNP, the VMM sets the HOST_DATA field to the hash value which the AMD SEV-SNP TEE includes in the attestation evidence. This action creates a strong binding between the contents of the policy and the CVM. This TEE field cannot be modified later by the software executed inside or outside of the CVM. However, it is readable within the TEE after launch.

As the Shim launches the CVM and the CVM OS boots, the Agent starts up using an initial security policy that is included in the CVMs root file system. This initial security policy only allows the Shim to set a new policy document through the SetPolicyRequest ttRPC call once the Agent’s ttRPC interface becomes available. Upon receiving the policy from the Shim, the Agent verifies that the hash of the policy matches the value in the immutable TEE field. The Agent rejects the incoming policy if it detects a hash mismatch. If the hash matches, the Agent enforces the new policy and listens for ttRPC calls. After the Agent receives and validates the Shim’s CreateContainerRequest call, the Agent creates the workload container pertaining to the user’s pod manifest.

The remote attestation procedure can be implemented in different ways. One option is to implement in a container running inside the CVM that obtains the signed attestation evidence from the AMD SEV-SNP TEE. With the policy hash being part of one of the measured TEE fields above, the attestation service can verify the integrity of the security policy by comparing the value of this field with the expected hash of the pod policy that was preconfigured by the user.

Microsoft’s Azure Attestation (MAA) provides an end-to-end attestation solution for workloads in Azure. We have added support for Confidential Containers on AKS to MAA by utilizing the open-source confidential sidecar container as the attestation client. So, MAA just needs to be seeded with relevant policy measurements for confidential pods to enable remote attestation.

Policy Document Creation using the genpolicy Tool

To simplify creating the policy document for container workloads, we built the genpolicy tool to automate the generation of the security policy document with its policy data, rules, and default values derived from the users’ individual Kubernetes pod manifests. The genpolicy tool encodes the security policy document in base64 format and adds it to the Kubernetes pod manifest as an annotation. An example is a pod manifest for Confidential Containers on AKS where the given runtimeClassName field indicates that the pod is to be run as a confidential container:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    io.katacontainers.config.agent.policy: cGFja2FnZSBhZ2VudF<...>
spec:
  runtimeClassName: kata-cc-isolation
...

The annotation value can be decoded using “base64 -d”, revealing the set of default values, rules, and data, for example:

...
# default values for API calls
default CopyFileRequest := false
...
default ExecProcessRequest := false
...

# rules for API calls
CreateContainerRequest { ... }
...
CreateSandboxRequest { ... }
...
WriteStreamRequest { ... }
...

# data, for instance listing the pod’s containers and fields
policy_data := {
  "containers": [
    {
      "OCI": {
        "Version": "1.1.0-rc.1",
  ...
}

To generate the policy, run the following command:

genpolicy -y <path/to/pod.yaml>

This will embed the policy into the pod yaml file. Then the pod manifest can be deployed onto a cluster supporting confidential containers as normal, for instance, using:

kubectl apply -f <path/to/pod.yaml>

If any policy violations are detected, the Agent will refuse to execute the relevant ttRPC call, resulting in the following failure when using kubectl describe pod:

Error: failed to create containerd task: failed to create shim task: “CreateContainerRequest is blocked by policy”

Users should review the auto-generated policy document and verify that the policy fits the desired confidentiality goals and modify the policy as needed. To change the behavior of the tool, the user can specify further parameters:

genpolicy -p <path/to/rules.rego> -j <path/to/genpolicy-settings.json> -y <path/to/pod.yaml>

Using these parameters, the policy’s default values and rules and data fields can be modified by supplying custom rules.rego and settings JSON files. More details and examples are provided in the upstream Kata Agent policy documentation.

To simplify genpolicy usage in Azure, the Azure CLI ‘confcom’ extension wraps the latest releases of the genpolicy tool to enable end users generating pod security policies via the Azure CLI, which is as simple as calling:

az confcom katapolicygen -y <path/to/pod.yaml>

An end-to-end example starting with cluster deployment and running a confidential container with attached security policy can be found in our confidential container deployment documentation.

Conclusion

We have walked through the security policy of our Confidential Containers on AKS offering - from the syntax of the policy file to the enforcement with OPA, to establishing trust with remote attestation, and how to automatically generate and embed the policy using our genpolicy tool. The Azure Linux team collaborated with the Confidential Containers and Kata Containers communities on the design and implementation of Confidential Containers, as part of Microsoft’s commitment to open source. We contributed the policy implementation upstream - the Agent code responsible for enforcing the security policy, the Shim and Agent code for setting the policy and reading its measured hash value with different VMMs and HVs for AMD SEV-SNP and Intel TDX, and the genpolicy tool to create the security policy document. Along with this, a how-to for the policy feature and a README for the genpolicy tool can be found. We will continue to contribute and expand the security policy implementation upstream with the Kata Containers and Confidential Containers communities, so join us there to build this feature with us.

Products (49)

Special Topics (26)

Video Hub (462)

Most Active Hubs