Confidential VMs on Azure

Microsoft

May 31, 2023

Microsoft’s’ virtualization stack, which powers the Microsoft Azure Cloud, is made up of the Microsoft hypervisor and the Azure Host OS. Security is foundational for the Azure Cloud, and Microsoft is committed to the highest levels of trust, transparency, and regulatory compliance (see Azure Security and Hypervisor Security to learn more). Confidential VMs are a new security offering that allow customers to protect their most sensitive data in use and during computation in the Azure Cloud.

In this blog we’ll describe the Confidential VM model and share how Microsoft built the Confidential VM capabilities by leveraging confidential hardware platforms (we refer to the hardware platform as the combination of the hardware and architecture specific firmware/software supplied by the hardware vendor). We will give an overview of our goals and our design approach and then explain how we took steps to enable confidential VMs to protect their memory, as well as to provide them secure devices like a vTPM, to protect their execution state and their firmware, and lastly to allow them to verify their environment through remote attestation.

What is Confidential VM?

The Confidential Computing Consortium defines confidential computing as “the protection of data in use by performing computation in a hardware-based, attested Trusted Execution Environment (TEE)”, with three primary attributes for what constitutes a TEE: data integrity, data confidentiality, and code integrity¹. A Confidential VM is a VM executed inside a TEE, “whereby code and data within the entire VM image is protected from the hypervisor and the host OS”¹. As crazy as this sounds – a VM that runs protected from the underlying software that makes its very existence possible – a growing community is coming together to build the technologies to make this possible.

For confidentiality, a Confidential VM requires CPU state protection and private memory to hold contents that cannot be seen in clear text by the virtualization stack. To achieve this, the hardware platform protects the VM’s CPU state and encrypts its private memory with a key unique to that VM. The platform further ensures that the VM’s encryption key remains a secret by storing it in a special register which is inaccessible to the virtualization stack. Finally, the platform ensures that VM private memory is never in clear text outside of the CPU complex, preventing certain physical attacks such as memory bus snooping, cold-boot attacks, etc.

For integrity, a Confidential VM’s requires integrity protection to ensure its memory can only be modified by that VM. To achieve this, the hardware platform both protects the contents of the VM’s memory against software-based integrity attacks, and it also verifies address translation. The latter serves to ensure that the address space layout (memory view) of the VM can only be changed with the cooperation and agreement of the VM.

An overview of our approach to Confidential VMs

As a type 1 hypervisor, Microsoft’s hypervisor runs directly on the hardware and all operating systems, including the host OS, run on top of it. The hypervisor virtualizes resources for guests and controls capabilities that manage memory translations. The host OS provides functionality for VM virtualization including memory management (i.e., providing guest VMs interfaces for accessing memory), and device virtualization (i.e., providing virtual devices to guest VMs) to run and manage guest VMs.

Since they take care virtualizing and assigning resources to guest VMs, all virtualization stacks up until recently assumed full access to guest VM state, but we completely evolved Microsoft's virtualization stack to break those assumptions to be able to support running Confidential VMs. We set a boundary to protect the guest VM from our virtualization stack, and we leverage different hardware platforms to enforce this boundary and help guarantee the Confidential VM’s attributes. Azure Confidential VMs today leverage the following hardware platforms: AMD SEV-SNP (generally available) and Intel TDX (in preview).

We wanted to enable customers to lift and shift workloads with little or no effort, so one of our design goals was to support running mostly unmodified guest operating systems inside Confidential VMs. Operating systems were not designed with confidentiality in mind, so we created the guest paravisor to bridge between the classic OS architecture and the need for confidential computing. The guest paravisor implements the TEE enlightenments on behalf of the guest OS so the guest OS can run mostly unmodified, even across hardware platforms. This can be viewed as the “TEE Shim”¹. You can think of the guest paravisor as a firmware layer that acts like a nested hypervisor. A guest OS that is fully enlightened (modified) to run as a Confidential guest can run without a guest paravisor (this is also supported in Azure today but out of scope for this blog).

VBS lets a VM have multiple layers of software at different privilege levels. We decided to extend VBS to allow us to run the guest paravisor in a more privileged mode than the guest OS in a hardware platform-agnostic manner, using the hardware platform's confidential guest privilege levels. Because of this privilege separation, the guest OS and UEFI don’t have access to secrets held in guest paravisor memory. Our model allows the paravisor to provide a Confidential VM privilege-separated features including Secure Boot and its own dedicated vTPM 2.0. We’ll cover the benefits of these features in the attestation section.

On the guest OS side, we evolved the device drivers (and other components) to enable both Windows and Linux guest operating systems to run inside Confidential VMs on Azure. For the Linux guest OS, we collaborated with the Linux kernel community and with Linux distros such as Ubuntu and SUSE Linux. A Confidential VM turns the threat model for a VM upside-down: a core pillar of any virtualization stack is protecting the host from the guest, but with Confidential VMs there is a focus on protecting the guest from the host as well. This means that guest-host interfaces must also evolve to break previous assumptions. We therefore analyzed and updated these interfaces, including the guest device drivers themselves, to take a defensive posture that examines and validates data items coming from the host (e.g., ring buffer messages). This was one of the design principles for our paravisor, as it allowed us to move logic from the host into the guest to simplify the guest-host interfaces to be better able to protect the guest. We continue to further evolve these interfaces to continuously improve the defensive model of Confidential VMs.

Memory Protections

Two types of memory exist for Confidential VMs, private memory where computation is done by default and shared memory to communicate with the virtualization stack for any purpose, e.g., device IO. Any page of memory can be either private or shared, and we call this its page visibility. A Confidential VM should be configured appropriately to make sure that data that will go in a shared memory is protected (via TLS, BitLocker, dm-crypt, etc.).

All accesses use private pages by default, so when a guest wants to use a shared page, it must explicitly do this by managing page visibility. We evolved the necessary components (including the virtual device drivers in the guest that communicate with the virtual devices on the host via VMbus channels) to enable them to use our mechanism to manage page visibility. This mechanism essentially has the guest maintain a pool of shared memory. When a guest wants to send data to the host, for example to send a packet via networking, it allocates a bounce buffer from that pool and then copies that data from its private memory into the bounce buffer. The I/O operation is initiated against the bounce buffer, and the virtual device on the host can read the data.

The size of the pool of shared memory is not fixed; our memory conversion model is dynamic so that the guest can adapt to the needs of the workload. The memory used by the guest can be converted between private and shared as needed to support all the IO flows. Guest operating systems can choose to make use of our dynamic memory conversion model. The confidential computing platform prevents the hypervisor or anything other than the code inside the Confidential VM from making pages shared. Converting memory is therefore always initiated by the guest, and it kicks off both guest actions and host actions so that our virtualization stack can coordinate with the hardware platform to grant or deny the host access to the page.

Protecting Emulated Devices

For compatibility, the Azure Host OS provides emulated devices to closely mirror hardware, and our hypervisor reads and modifies a VM’s CPU state to emulate the exact behavior of the device. This allows a guest to use an unmodified device driver to interact with its emulated device. Since a Confidential VM’s CPU state is protected from our virtualization stack, it cannot use the emulated devices on the host OS anymore. As part of our goal to enable customers to run a mostly unmodified guest OS inside a Confidential VM, we didn’t want to eliminate emulated devices. Therefore, we decided to evolve our virtualization stack to support device emulation operating inside the guest instead of the host OS, so we moved emulated devices to the guest paravisor.

Protecting Runtime State

We believe a Confidential VM requires assurances about the integrity of its execution state, so we took steps to provide this protection to Confidential VM workloads. To support this, the hardware platform provides mechanisms to protect a Confidential VM from being vulnerable to unexpected interrupts or exceptions during its execution. Normally, our hypervisor emulates the interrupt controller (APIC) to generate emulated interrupts for guests. To help a guest OS running inside a Confidential VM handle interrupts defensively, the guest paravisor performs interrupt proxying, validating an interrupt coming from the hypervisor before re-injecting it into the guest OS. Additionally, with Confidential VMs there is a new exception type that needs to be handled by the guest VM instead of the virtualization stack. This exception is generated only by the hardware and is hardware platform specific. The paravisor can handle this exception on behalf of the guest OS for compatibility purposes as well.

Protecting Firmware

A guest VM normally relies on the host OS to store and provide its firmware state and attributes (UEFI variables). For Confidential VMs, we evolved guest firmware to get trusted UEFI attributes from a new VM Guest State file instead of from the host. This file is packaged as a VHD and can be encrypted before VM deployment to provide a Confidential VM access to persistent storage that is inaccessible to the host. The host only interacts with the encrypted VHD (E2E process for its encryption is out of scope for this blog).

In the same way, a guest VM normally relies on the host OS to implement authenticated UEFI variables to ensure that secure UEFI variables are isolated from the guest OS and cannot be modified without detection. To provide a Confidential VM authenticated UEFI variables that are secure from the host as well as the guest OS, the guest paravisor (running in the guest VM but isolated from the guest OS), manages authenticated UEFI variables. When a Confidential VM uses UEFI runtime services to write a variable, the guest paravisor processes the authenticated variable rights and persists that data in the VM Guest State file. Our design allows a Confidential VM to persistently store and access VM guest state and guest secrets (e.g., UEFI state and vTPM state) that are secure from the host as well as the guest OS.

Remote Attestation

According to the industry definition of confidential computing, any TEE deployment “should provide a mechanism to allow validation of an assertion that it is running in a TEE instance” through “the validation of a hardware signed attestation report of the measurements of the TCB”¹. An attestation report is composed of hardware and software claims about a system, and these claims can be validated by any attestation service. Attestation for a Confidential VM “conceptually authenticates the VM and/or the virtual firmware used to launch the VM”¹.

A Confidential VM cannot solely rely on the host to guarantee that it was launched on a confidential computing capable platform and with the right virtual firmware, so a remote attestation service must be responsible for attesting to its launch. Therefore, a Confidential VM on Azure always validates that it was launched with secure, unmodified firmware and on a confidential computing platform via remote attestation with a hardware root of trust. In addition to this, it validates its guest boot configuration thanks to Secure Boot and vTPM capabilities.

Once the partition for a Confidential VM on Azure is created and the VM is started, the hardware seals the partition from modification by the host, and a measurement about the launch context of the guest is provided by the hardware platform. The guest paravisor boots and performs attestation on behalf of the guest OS, requesting an attestation report from the hardware platform and sending this report to an attestation service. Any failures in this attestation verification process will result in the VM not booting.

After this, the guest paravisor transfers control to UEFI. During this phase Secure Boot verifies the startup components, checking their signatures before they are loaded. In addition to this, thanks to Measured boot, as these startup components are loaded, UEFI accumulates their measurements into the vTPM. If the OS disk is encrypted, the vTPM will only release the key to decrypt it if the VM’s firmware code and configuration, original boot sequence, and boot components are unaltered.

Call to Action

In this blog we described how in collaboration with our industry partners, we evolved our entire virtualization stack to empower customers to lift and shift their most sensitive Windows and Linux workloads into Confidential VMs. We also gave a deep technical overview of how we protect our customers’ workloads in these Confidential VMs. This is an innovative new area, and we want to share that journey with our customers who also want to move into this new Confidential Computing world. As you use Confidential VMs on Azure, we would love to hear about your usage experiences or any other feedback, especially as you think of other scenarios (in the enterprise or cloud).

- the Core OS Platform team.

References:

¹Common Terminology for Confidential Computing, December 2022, Confidential Computing Consortium.

Updated Nov 09, 2023

Version 16.0

virtualization