Blog Post

Windows OS Platform Blog
9 MIN READ

OpenHCL: the new, open source paravisor

Caroline_Perezvargas's avatar
Oct 17, 2024

Intro

 

From the beginning of the cloud computing era, virtualization technology has enabled compute workloads to run as virtual machines (VMs) in a server environment. As hardware has evolved, and new functionality has become available, the software stack has kept VMs running seamlessly, thanks to sophisticated advances in the hypervisor and virtualization software.

 

Confidential computing is now a powerful technology for significantly improving the security of VMs running in the cloud. However, the trust boundary of a confidential VM imposes a barrier that prevents the hypervisor from offering the rich virtualization services that VMs normally expect. Customers desiring the benefits of confidential VMs have been forced to update the operating systems of their VMs to newer versions, which must be continually revised as confidential VM technology maintains its path of rapid evolution.

 

Microsoft has embraced a different approach that offers much more flexibility to customers through the use of a “paravisor”. A paravisor executes within the confidential trust boundary and provides the virtualization and device services needed by a general-purpose operating system (OS), enabling existing VM workloads to execute securely without requiring continual service of the OS to take advantage of innovative advances in confidential computing technology. As confidential computing becomes available on more hardware platforms and evolves, the software stack can keep VMs running seamlessly thanks to the paravisor, in much the same way other advances in virtualization software enabled VMs to run seamlessly on ever evolving hardware.

 

Microsoft developed the first paravisor in the industry, and for years, we have been enhancing the paravisor offered to Azure customers. This effort now culminates in the release of a new, open source paravisor, called OpenHCL. We plan to develop OpenHCL in the open here: microsoft/openvmm: Home of OpenVMM and OpenHCL (github.com).

 

OpenHCL capabilities

 

A paravisor is essentially an execution environment that runs within the guest VM - at a higher privilege level than the guest OS - and provides various services to the guest. A paravisor can run in both confidential environments and non-confidential environments. When running in a confidential environment, these privilege levels must be enforced by the confidential computing hardware platform.

 

We use virtual secure mode (VSM) to run a paravisor on Microsoft’s virtualization stack. When running in a confidential context, our architecture allows VSM to be appropriately enforced in a hardware platform-agnostic manner.

 

Today, OpenHCL can run on both x86-64 and ARM64 platforms, and it has support for Intel TDX and AMD SEV-SNP confidential computing platforms. OpenHCL runs in the L1 VMM of a TDX confidential VM and in the VMPL0 of an SEV-SNP confidential VM. See the OpenHCL user guide for step-by-step instructions to use it. OpenHCL offers a rich set of powerful services to both confidential and non-confidential VMs alike:

  • Device emulation via standard device interfaces, essentially offering a set of emulated devices, such as vTPM and serial.
  • Device translation via standard device interfaces, such as NVMe to para-virtualized SCSI, allowing assignment of hardware devices directly to VMs (accelerated IO) without requiring guest OS changes - enabling VMs to take advantage of the performance of cutting-edge devices.
  • Diagnostics support, particularly useful to allow debugging confidential VMs where it is difficult to use traditional methods of debugging.
  • (To confidential VMs specifically) Support for guests that are not fully enlightened - such as Windows and older versions of Linux - to run on confidential computing platforms via standard architectural interfaces.

For confidential VMs, even though OpenHCL provides amazing value to guests that are not fully enlightened (by enabling them), OpenHCL can also provide a lot of value to fully enlightened guests by offering them any or all its other services as different scenarios require it.

 

OpenHCL is used in Azure in new Azure Boost SKUs, and it will be used in future Azure confidential VM SKUs. In the past month alone, over 1.5 Million VMs were running with OpenHCL in Azure[1].

 

OpenHCL architecture

 

OpenHCL is composed of several open-source components, the most important one being OpenVMM, the modular, cross-platform, virtual machine monitor (VMM) written in Rust. This VMM runs several user mode processes to power OpenHCL. Running a VMM inside OpenHCL allows us to support guests with assigned devices and provide device translation support. Additionally, it allows us to share confidential and non-confidential architecture. We run the same VMM in the same environment for both confidential and non-confidential guests, and the VMM provides the same services tailored to their requirements. This avoids fragmented virtualization solutions among confidential and non-confidential VMs, moving towards closing the feature gaps of confidential VMs.

 

The other components of OpenHCL are a boot loader and a small, customized Linux kernel built to support the VMM, with min. Kconfig to minimize binary size and runtime RAM usage. Running a kernel to support our environment allows the VMM code to be mostly standard Rust, making it much more powerful by enabling the VMM to use the broadly supported and stable Rust toolchains and crate ecosystem.

 

The two approaches to running confidential VMs

 

There are two approaches to running a guest OS inside a confidential VM: either the guest must be fully enlightened (modified to understand and manage all aspects of running as a confidential VM), or it can rely on a paravisor to implement the confidential computing enlightenments on its behalf. When a guest runs in with a paravisor, it doesn’t seem like a confidential guest precisely because it doesn’t need to act like a confidential guest.

 

In Azure, we support all IaaS confidential VMs via a paravisor today. The paravisor enabled Azure to support the widest variety of guests, including Windows versions released almost a decade ago[2] and Linux versions using kernels as old as the 5.19 kernel[3] (and versions using even older kernels that had a small set of patches backported, such as some Ubuntu and RHEL distro versions). This provides customers with an easier lift as well as the flexibility to gain future confidential computing advances without needing to upgrade their workloads. Customers’ legacy solutions are safe with Azure because of the approach we embraced.

 

Why is Windows not fully enlightened to run as a confidential guest? I.e., why does Windows rely on a paravisor?

 

When we developed the first confidential VM in Azure on the confidential computing hardware platforms available at the time, it was not possible to fully enlighten Windows guests for those platforms because Windows required APIC (interrupt controller) emulation to be done in a paravisor. APIC emulation, traditionally done by the hypervisor, must be done by another entity for confidential VMs, where the hypervisor is outside the trust boundary. It can be done by the paravisor or by the hardware platform if it supports APIC virtualization, which early platforms like 3rd Gen AMD EPYC™ processors, didn’t.

 

On those hardware platforms, APIC emulation had to be done in a paravisor for Windows guests but not necessarily for Linux guests. The architecture of Windows relies directly on the APIC for interrupt management. Some aspects of Windows interrupt management don't flow through the kernel and are inlined in drivers, so Windows drivers rely on the interrupt management behavior offered by the APIC. The architecture of Linux, on the other hand, doesn’t rely directly on the APIC for interrupt management. Linux offers kernel service routines for handling interrupt state, so Linux drivers rely on these routines.

 

In addition to that, Windows relies on the presence of a TPM for security features, and one cannot implement a vTPM for a confidential VM with enlightenments alone. We chose to implement a vTPM in a paravisor. Given all the functionality we have built into the paravisor, our plan is not to fully enlighten Windows and continue supporting Windows guests via a paravisor in Azure. For future versions of Linux, we’re evaluating both approaches – fully enlightened and relying on a paravisor – and we will aim to do what is best for customers.

 

OpenHCL and COCONUT-SVSM

 

An SVSM like COCONUT-SVSM plays a very valuable role for confidential computing. It can store secrets and provide virtualization services to improve the usability of fully enlightened guests. OpenHCL solves a different problem than COCONUT-SVSM. COCONUT-SVSM aims to provide services to confidential VMs with fully enlightened guests using new interfaces. OpenHCL aims to provide services to confidential VMs using existing standard architectural interfaces.

 

COCONUT-SVSM provides device emulation, but OpenHCL uniquely provides this via existing standard interfaces. When running with an SVSM (like COCONUT-SVSM), the guest must establish a specific relationship with the SVSM by discovering its presence and then interact with the SVSM using a custom calling convention. Essentially, a guest needs to be specifically modified to be able to take advantage of SVSM services, including devices. With OpenHCL, devices are easier to consume because existing device interfaces just work, and the guest does not need any custom calling contract modifications to consume them. OpenHCL enables devices to be discovered over standard enumeration mechanisms, like PCI virtualization or existing vTPM device contracts.

 

COCONUT-SVSM could potentially be leveraged by OpenHCL in the future. The VMM of component OpenHCL is Rust based, which has strong memory safety properties, and evolving its kernel component to also be Rust based would improve the memory safety of OpenHCL. During the development of OpenHCL, we chose the Linux kernel because it was a familiar OS platform for contributors and provided the capabilities needed. Now that Rust-based COCONUT-SVSM exists, we are interested in moving to that in the future and building OpenHCL support for it if it gains the features that OpenHCL needs.

 

Open for collaboration

 

In this blog we described the value of OpenHCL for the future of computing. We still have much more we plan to do with OpenHCL, and as we develop new functionality in the open, we would love to collaborate with you. You can learn more about this project on: https://openvmm.dev. Please reach out to us if you have ideas you’d like to add to the OpenHCL roadmap or any other feedback. You can open a GitHub issue, reach out to us on Zulip , and even contribute to this project! We track the roadmap of OpenHCL in the open; below are some of its future milestones!

 

OpenHCL support for Intel TDX (Trust Domain Extensions) in Azure

Intel and Microsoft collaborated on and co-developed the TDX partitioning architecture so that it could be leveraged by a paravisor. The first ever TDX module with TD partitioning was an amazing co-engineering project between Intel and Microsoft, and Intel released TD partitioning as part of the TDX Module that accompanied the general availability of 5th Generation Xeon, and this has also been backported to 4th Generation Xeon. Using this TDX module, Azure launched the first generation of Azure TDX confidential VMs with the first paravisor, being the first cloud service provider to offer TDX in public preview as well as the first cloud service provider to offer Windows guest support for TDX. Intel has been contributing to OpenHCL for the past 6+ months, and we’re close to feature completeness in OpenHCL for the next generation of Azure TDX confidential VMs!

 

OpenHCL support for Arm CCA (Confidential Compute Architecture)

We started engaging with Arm almost two years ago to make sure the Arm Confidential Compute Architecture (CCA) is well equipped to support paravisor stacks like OpenHCL. CCA comprises a collection of open-source software, firmware, specifications, and hardware support to bring confidential computing to the Arm architecture.  CCA provides protected environments called Realms, that can be used to host confidential VMs. Our collaboration lead to the creation of the Planes feature, which enables multiple of levels of privilege to coexist inside a Realm. Planes provide the ability to host a paravisor, and a guest VM in the same Realm, with the paravisor providing security and compatibility services to the guest. We are excited to collaborate further and in the open with Arm to build OpenHCL support for Arm CCA.

 

OpenHCL support for AMD SEV-SNP (Secure Encrypted Virtualization-Secure Nested Paging) in Azure

We used AMD’s VMPLs to build the first paravisor for confidential VMs in Azure. We have been engaging with AMD to ensure OpenHCL and AMD’s platform can best work together to provide great performance and security for customers in the future. We will continue to develop support in OpenHCL to reach feature completeness for future generations of Azure SNP confidential VMs.

 

OpenHCL support for KVM as host

Today OpenHCL runs only on the MSFT hypervisor. We are looking forward to developing OpenHCL support for KVM as host in collaboration with other cloud service providers and the Linux and KVM community to enable others to leverage OpenHCL in their virtualization stacks.

 

And more to come

We also began engaging with Red Hat recently to discuss the value of OpenHCL and how it has the potential to open-up the door for significant cross-OS interoperability in the confidential virtualization world. We are excited to collaborate with Red Hat to build an open and collaborative confidential computing ecosystem.

- the Core OS Platform team.

 

[1] This number is from when this blog was published, Oct 2024, but this number keeps growing every month.

[2] Specifically, older versions include Windows Client 10 (released almost a decade ago) and Windows Server 2019.

[3] Specifically, older versions include Linux versions using the kernel 5.19 (for SNP) and 6.6 (for both SNP and TDX).

Updated Nov 22, 2024
Version 11.0
No CommentsBe the first to comment