Introduction to Virtual Nodes on Azure Container Instances (ACI)
Azure Container Instances (ACI) is a fully-managed serverless container platform which gives you the ability to run containers on-demand without provisioning infrastructure.
Virtual Nodes on ACI allows you to run Kubernetes pods managed by an AKS cluster in a serverless way on ACI instead of traditional VM‑backed node pools. From a developer’s perspective, Virtual Nodes look just like regular Kubernetes nodes, but under the hood the pods are executed on ACI’s serverless infrastructure, enabling fast scale‑out without waiting for new VMs to be provisioned.
This makes Virtual Nodes ideal for bursty, unpredictable, or short‑lived workloads where speed and cost efficiency matter more than long‑running capacity planning.
The newer Virtual Nodes v2 implementation modernises this capability by removing many of the limitations of the original AKS managed add‑on and delivering a more Kubernetes‑native, flexible, and scalable experience when bursting workloads from AKS to ACI.
In this article I will demonstrate how you can migrate an existing AKS cluster using the Virtual Nodes managed add-on (legacy), to the new generation of Virtual Nodes on ACI, which is deployed and managed via Helm.
More information about Virtual Nodes on Azure Container Instances can be found here, and the GitHub repo is available here.
Advanced documentation for Virtual Nodes on ACI is also available here, and includes topics such as node customisation, release notes and a troubleshooting guide.
Please note that all code samples within this guide are examples only, and are provided without warranty/support.
Background
Virtual Nodes on ACI is rebuilt from the ground-up, and includes several fixes and enhancements, for instance:
Added support/features
- VNet peering, outbound traffic to the internet with network security groups
- Init containers
- Host aliases
- Arguments for exec in ACI
- Persistent Volumes and Persistent Volume Claims
- Container hooks
- Confidential containers (see supported regions list here)
- ACI standby pools
Planned future enhancements
- Support for ACR image pull via Service Principal (SPN)
- Kubernetes network policies
- Support for IPv6
- Windows containers
- Port Forwarding
Note: The new generation of the add-on is managed via Helm rather than as an AKS managed add-on.
Requirements & limitations
-
Each Virtual Nodes on ACI deployment requires 3 vCPUs and 12 GiB memory on one of the AKS cluster’s VMs
-
Each Virtual Nodes on ACI deployment supports up to 200 pods
- DaemonSets are not supported
- Virtual Nodes on ACI requires AKS clusters with Azure CNI networking (Kubenet is not supported)
- Virtual Nodes on ACI is incompatible with API server authorized IP ranges for AKS (because of the subnet delegation to ACI)
Deploying the Virtual Nodes managed add-on (legacy)
For the sake of completeness, I will first guide you through the traditional steps of deploying the Virtual Nodes managed add-on for AKS.
For this walkthrough, I'm using Bash via Windows Subsystem for Linux (WSL), along with the Azure CLI.
Prerequisites
- A recent version of the Azure CLI
- An Azure subscription with sufficient ACI quota for your selected region
Deployment steps
These steps are adapted from the official documentation here
-
Set up environment variables:
location=northeurope rg=rg-virtualnode-demo vnetName=vnet-virtualnode-demo clusterName=aks-virtualnode-demo aksSubnetName=subnet-aks vnSubnetName=subnet-vn -
Create resource group for the cluster and VNet:
az group create --name $rg --location $location -
Create Virtual Network (VNet) and AKS/ACI subnets:
az network vnet create \ --resource-group $rg --name $vnetName \ --address-prefixes 10.0.0.0/8 \ --subnet-name $aksSubnetName \ --subnet-prefix 10.240.0.0/16 az network vnet subnet create \ --resource-group $rg \ --vnet-name $vnetName \ --name $vnSubnetName \ --address-prefixes 10.241.0.0/16 \ --delegations Microsoft.ContainerInstance/containerGroups -
Retrieve the resource IDs for the AKS and ACI subnets:
az network vnet subnet show --resource-group $rg --vnet-name $vnetName --name $aksSubnetName --query id -o tsv subnetId=$(az network vnet subnet show --resource-group $rg --vnet-name $vnetName --name $aksSubnetName --query id -o tsv) vnSubnetId=$(az network vnet subnet show --resource-group $rg --vnet-name $vnetName --name $vnSubnetName --query id -o tsv) -
Create a small AKS cluster with 2 nodes:
az aks create --resource-group $rg --name $clusterName \ --node-count 2 --node-osdisk-size 30 --node-vm-size Standard_B4ms \ --network-plugin azure --vnet-subnet-id $subnetId \ --generate-ssh-keys -
Enable the Virtual Nodes managed add-on (legacy):
az aks enable-addons --resource-group $rg --name $clusterName --addons virtual-node --subnet-name $vnSubnetName -
Retrieve the Managed Identity (MSI) used by Virtual Nodes and assign it the Network Contributor role for the ACI subnet:
vnIdentityId=$(az aks show \ --resource-group $rg \ --name $clusterName \ --query "addonProfiles.aciConnectorLinux.identity.resourceId" \ -o tsv) vnIdentityObjectId=$(az identity show --ids $vnIdentityId --query principalId -o tsv) az role assignment create \ --assignee-object-id "$vnIdentityObjectId" \ --assignee-principal-type ServicePrincipal \ --role "Network Contributor" \ --scope "$vnSubnetId" -
Download the cluster's kubeconfig file:
az aks get-credentials --resource-group $rg --name $clusterName -
Confirm the Virtual Nodes node shows within the cluster and is in a Ready state (virtual-node-aci-linux):
$ kubectl get node NAME STATUS ROLES AGE VERSION aks-nodepool1-35702456-vmss000000 Ready <none> 46m v1.33.6 aks-nodepool1-35702456-vmss000001 Ready <none> 46m v1.33.6 virtual-node-aci-linux Ready agent 3m28s v1.25.0-vk-azure-aci-1.6.2
Migrating to the next generation of Virtual Nodes on Azure Container Instances via Helm chart
I will now explain how to migrate from the Virtual Nodes managed add-on (legacy) to the new generation of Virtual Nodes on ACI.
For this walkthrough, I'm using Bash via Windows Subsystem for Linux (WSL), along with the Azure CLI.
Direct migration is not supported, and therefore the steps below show an example of removing Virtual Nodes managed add-on and its resources and then installing the Virtual Nodes on ACI Helm chart.
In this walkthrough I will explain how to delete and re-create the Virtual Nodes subnet, however if you need to preserve the VNet and/or use a custom subnet name, refer to the Helm customisation steps here.
Prerequisites
- A recent version of the Azure CLI
- An Azure subscription with sufficient ACI quota for your selected region
- Helm
Deployment steps
- Initialise environment variables
location=northeurope rg=rg-virtualnode-demo vnetName=vnet-virtualnode-demo clusterName=aks-virtualnode-demo aksSubnetName=subnet-aks vnSubnetName=subnet-vn - Scale-down any running Virtual Nodes workloads (example below):
kubectl delete deploy <deploymentName> -n <namespace> - Disable the Virtual Nodes managed add-on (legacy):
az aks disable-addons --resource-group $rg --name $clusterName --addons virtual-node - Export a backup of the original subnet configuration:
az network vnet subnet show --resource-group $rg --vnet-name $vnetName --name $vnSubnetName > subnetConfigOriginal.json - Delete the original subnet (subnets cannot be renamed and therefore must be re-created):
az network vnet subnet delete -g $rg -n $vnSubnetName --vnet-name $vnetName - Create the new Virtual Nodes on ACI subnet (replicate the configuration of the original subnet but with the specific name value of cg):
vnSubnetId=$(az network vnet subnet create \ --resource-group $rg \ --vnet-name $vnetName \ --name cg \ --address-prefixes 10.241.0.0/16 \ --delegations Microsoft.ContainerInstance/containerGroups --query id -o tsv) - Assign the cluster's -kubelet identity Contributor access to the infrastructure resource group, and Network Contributor access to the ACI subnet:
nodeRg=$(az aks show --resource-group $rg --name $clusterName --query nodeResourceGroup -o tsv) nodeRgId=$(az group show -n $nodeRg --query id -o tsv) agentPoolIdentityId=$(az aks show --resource-group $rg --name $clusterName --query "identityProfile.kubeletidentity.resourceId" -o tsv) agentPoolIdentityObjectId=$(az identity show --ids $agentPoolIdentityId --query principalId -o tsv) az role assignment create \ --assignee-object-id "$agentPoolIdentityObjectId" \ --assignee-principal-type ServicePrincipal \ --role "Contributor" \ --scope "$nodeRgId" az role assignment create \ --assignee-object-id "$agentPoolIdentityObjectId" \ --assignee-principal-type ServicePrincipal \ --role "Network Contributor" \ --scope "$vnSubnetId" - Download the cluster's kubeconfig file:
az aks get-credentials -n $clusterName -g $rg - Clone the virtualnodesOnAzureContainerInstances GitHub repo:
git clone https://github.com/microsoft/virtualnodesOnAzureContainerInstances.git - Install the Virtual Nodes on ACI Helm chart:
helm install <yourReleaseName> <GitRepoRoot>/Helm/virtualnode -
Confirm the Virtual Nodes node shows within the cluster and is in a Ready state (virtualnode-n):
$ kubectl get node NAME STATUS ROLES AGE VERSION aks-nodepool1-35702456-vmss000000 Ready <none> 4h13m v1.33.6 aks-nodepool1-35702456-vmss000001 Ready <none> 4h13m v1.33.6 virtualnode-0 Ready <none> 162m v1.33.7 - Delete the previous Virtual Nodes node from the cluster:
kubectl delete node virtual-node-aci-linux - Test and confirm pod scheduling on Virtual Node:
apiVersion: v1 kind: Pod metadata: annotations: name: demo-pod spec: containers: - command: - /bin/bash - -c - 'counter=1; while true; do echo "Hello, World! Counter: $counter"; counter=$((counter+1)); sleep 1; done' image: mcr.microsoft.com/azure-cli name: hello-world-counter resources: limits: cpu: 2250m memory: 2256Mi requests: cpu: 100m memory: 128Mi nodeSelector: virtualization: virtualnode2 tolerations: - effect: NoSchedule key: virtual-kubelet.io/provider operator: ExistsIf the pod successfully starts on the Virtual Node, you should see similar to the below:
$ kubectl get pod -o wide demo-pod NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES demo-pod 1/1 Running 0 95s 10.241.0.4 vnode2-virtualnode-0 <none> <none>
Modify your deployments to run on Virtual Nodes on ACI
For Virtual Nodes managed add-on (legacy), the following nodeSelector and tolerations are used to run pods on Virtual Nodes:
nodeSelector:
kubernetes.io/role: agent
kubernetes.io/os: linux
type: virtual-kubelet
tolerations:
- key: virtual-kubelet.io/provider
operator: Exists
- key: azure.com/aci
effect: NoSchedule
For Virtual Nodes on ACI, the nodeSelector/tolerations are slightly different:
nodeSelector:
virtualization: virtualnode2
tolerations:
- effect: NoSchedule
key: virtual-kubelet.io/provider
operator: Exists
Troubleshooting
- Check the virtual-node-admission-controller and virtualnode-n pods are running within the vn2 namespace:
$ kubectl get pod -n vn2 NAME READY STATUS RESTARTS AGE virtual-node-admission-controller-54cb7568f5-b7hnr 1/1 Running 1 (5h21m ago) 5h21m virtualnode-0 6/6 Running 6 (4h48m ago) 4h51mIf these pods are in a Pending state, your node pool(s) may not have enough resources available to schedule them (use kubectl describe pod to validate).
- If the virtualnode-n pod is crashing, check the logs of the proxycri container to see whether there are any Managed Identity permissions issues (the cluster's -agentpool MSI needs to have Contributor access on the infrastructure resource group):
kubectl logs -n vn2 virtualnode-0 -c proxycri
Further troubleshooting guidance is available within the official documentation.
Support
If you have issues deploying or using Virtual Nodes on ACI, add a GitHub issue here