Blog Post

Apps on Azure Blog
5 MIN READ

Preview support for Kata VM Isolated Containers on AKS for Pod Sandboxing

MichaelWithrow's avatar
Feb 24, 2023

Azure Kubernetes Service (AKS) now supports pod sandboxing in preview in all Azure regions on a subset of Azure VM Sizes that support Nested Virtualization.  

 

Pod Sandboxing compliments other security measures and data protection controls and provides the isolation needed within a single VM node, as well as to protect containers that process sensitive information on AKS. This added functionality also avoids the “operations tax” associated with running standard VM nodes per workload. Pod sandboxing on AKS seamlessly integrates with the existing AKS feature set, making it easy to transition the deployment with simple pod annotation changes to existing workloads while continuing to use existing deployment and monitoring practices. While other methods require that you recompile your code or run into other compatibility requirements, Pod Sandboxing in AKS can run any container unmodified inside a secure VM boundary.  

 

Microsoft Approach to Pod Sandboxing  

 

On AKS, we are adding the pod sandboxing functionality using Kata Containers to provide hypervisor-based isolation per pod. Kata containers is a popular open-source project from the OpenInfra Foundation, with a highly active contributing community and production workloads running in many organizations. Its ability to seamlessly extend the Kubernetes capabilities and provide high workload conformance with negligible performance penalty made it a suitable choice for AKS.  

 

Kata Containers on AKS are built on top of a security-hardened Azure hypervisor using Mariner Linux AKS Container Host (MACH). The isolation per pod is achieved by using a nested lightweight Kata VM that carves out resources from a parent VM node. In this model, each Kata pod gets its own kernel per nested Kata guest VM. This allows users to pack many Kata containers in a single guest VM, while continuing to run containers in the parent VM, providing strong Isolation boundary within a shared AKS Cluster.

 

Historically, AKS customers have been dependent on spinning up separate clusters or node pools to strongly isolate workloads of different teams. While multiple cluster and node pool deployment architectures may be suitable for the isolation requirements of many organizations, there are cases for which a single cluster with shared VM node pools can be more efficient, for example when running untrusted and trusted pods in the same node or co-locating daemon-sets and privilege containers on the same node for faster local communication and functional grouping. In scenarios like these where the container’s code cannot be trusted or your workloads require isolation from the parent node’s kernel, kubelet and system pods, this pod sandboxing on AKS feature can meet your requirements. 

 

Although pod sandboxing brings the capability to host mixed workloads/tenant level workloads within a single AKS cluster, there is more to security than just the pod sandbox. We recommend you follow the best practices, particularly when looking for control plane isolation and cluster level isolation: Azure Kubernetes Service (AKS) considerations for multitenancy - Azure Architecture Center | Microsoft Learn.

 

The Tech Stack Powering Pod Sandboxing 

As part of our core operational principles at Microsoft, we have embraced the culture of contributing to and working with the OSS community, with a particular focus on the OpenInfra Foundation, CNCF and other cloud-native projects. In line with this, we have adopted two large open-source projects: Kata Containers and Cloud-Hypervisor and have committed to building services on top of them. These projects will continue to power the pod sandboxing feature on AKS and Microsoft continues to shape future releases from our customer learnings through project contributions and open sourcing the components designed to improving container security and isolation.

 

The technology 

The technical stack that enables the pod sandboxing capabilities on AKS and the basic scaffolding to add confidentiality to the AKS container offering is based on the following key components: 

  • Mariner AKS Container Host (aka Linux Container Host on AKS) 
  • Microsoft Hypervisor with Linux Root Partition. 
  • Open-source Cloud Hypervisor as the Virtual Machine Monitor (VMM) running within Mariner Container Host 
  • Integration with Kata Container runtime  

 

 

Mariner AKS Container Host - Microsoft announced preview of Mariner as Linux container host on AKS in the fall of 2022. Mariner is Microsoft’s internal Linux distribution that is optimized to run on Azure and provides operational consistency with smaller/ leaner and security hardened image.

 

Microsoft hypervisor - Microsoft hypervisor is a mature virtualization platform that is battletested on Azure and on-premises deployment, and the cornerstone of some of the key virtualization features in Windows. 

 

Linux Root PartitionLinux kernel has been adapted so that Linux can run as the root partition for Microsoft Hypervisor. It will run the management software for the stack and control the hypervisor. 

 

Cloud Hypervisor VMM (Virtual Machine Monitor) is the end-user facing/user space software that is used for creating and managing the lifetime of virtual machines. Microsoft will continue playing an active role stewarding in the project supporting and contributing to community efforts. 

 

Cloud Hypervisor brings fast boot times for the utility VM use cases (such as Kata Containers) with container-friendly optimizations, enables run time configurable VM resources, and implements a minimal set of drivers to enable container workloads on Azure.

 

Kata Container IntegrationKata Containers are a widely used and adopted open-source project. The project is working to build a secure container runtime with lightweight virtual machines (VM) that perform and can be operated/ managed like containers while providing stronger workload isolation leveraging hardware virtualization technology. (Reference picture from katacontainers.io) 

 

 

How it works 

The workflow to deploy Pod Sandboxing using Kata guest VMs is like the regular containerd workflow for deploying containers, with the following differences: 

  • To deploy containers in the sandboxed environment, the pod specification YAML file must specify the Kata runtime class name. 
  • The Kata runtime class name is pre-configured on AKS to activate the Kata Shim (containerd-shim-kata-v2) instead of the regular containerd-shim. 
  • The Kata Shim executes Cloud Hypervisor and instructs it to create a lightweight Kata VM with the Kata Agent running inside it. 
  • The Kata Shim delegates the creation and management of containers to the Kata Agent. The Kata Agent creates and executes these containers inside the guest VM. 
  • When AKS deletes the Sandboxed Pod, the Kata Shim shuts down the guest VM, releasing the resources associated with it back to the container host.  

 Detailed instructions on AKS deployment can be found here. 

 

Looking ahead 

We are excited to learn how this feature can enable industries like Finance, Health, and SaaS/ISV Partners that are looking to optimize operations while still achieving high security with Kata VM Isolated Containers on AKS. We look forward to continuing to innovate in the open-source space of container isolation and confidential computing within Kata and make AKS the platform of choice to run your most sensitive workloads at scale. 

Updated Feb 23, 2023
Version 1.0
  • HowardvanRooijen getting closer to multitenancy, but this is really about kernel isolation to prevent attacks from a shared kernel perspective.   There are other things in play down the road which will make multi-tenancy more of a reality.  Stay tuned.

  • JSteskal's avatar
    JSteskal
    Brass Contributor

    Why wasn't this built on the One OS Kernel? Also, when will this support Windows Server? It seems Windows Containers are not making that much headway at Microsoft.

  • This is very interesting indeed! Is this the first true "Hard Multitenancy" implementation on Azure? If so, it will be a game changer for hosting SaaS platforms.

  • Thanks for the clarification. That's excellent news. Multi-tenancy is a HUGE challenge for customers. Anything that makes it easier and less expensive to implement will be a boon.

  • ohault's avatar
    ohault
    Brass Contributor

    Ok, great but before Pod Sandboxing reaches RTM, the Microsoft Hypervisor with Linux Root Partition / Azure-tuned Dom0 Linux Kernel, still need to be fully documented for auditing. Currently, it is still too obscure, meanwhile some personal efforts exist like Hyper-V on Linux (yes, this way around) - scholz.ruhr