What is ACI/Virtual Nodes?
Azure Container Instances (ACI) is a fully-managed serverless container platform which gives you the ability to run containers on-demand without provisioning infrastructure.
Virtual Nodes on ACI allows you to run Kubernetes pods managed by an AKS cluster in a serverless way on ACI instead of traditional VM‑backed node pools.
From a developer’s perspective, Virtual Nodes look just like regular Kubernetes nodes, but under the hood the pods are executed on ACI’s serverless infrastructure, enabling fast scale‑out without waiting for new VMs to be provisioned.
This makes Virtual Nodes ideal for bursty, unpredictable, or short‑lived workloads where speed and cost efficiency matter more than long‑running capacity planning.
Introducing the next generation of Virtual Nodes on ACI
The newer Virtual Nodes v2 implementation modernises this capability by removing many of the limitations of the original AKS managed add‑on and delivering a more Kubernetes‑native, flexible, and scalable experience when bursting workloads from AKS to ACI.
In this article I will demonstrate how you can migrate an existing AKS cluster using the Virtual Nodes managed add-on (legacy), to the new generation of Virtual Nodes on ACI, which is deployed and managed via Helm.
More information about Virtual Nodes on Azure Container Instances can be found here, and the GitHub repo is available here.
Advanced documentation for Virtual Nodes on ACI is also available here, and includes topics such as node customisation, release notes and a troubleshooting guide.
Please note that all code samples within this guide are examples only, and are provided without warranty/support.
Background
Virtual Nodes on ACI is rebuilt from the ground-up, and includes several fixes and enhancements, for instance:
Added support/features
- VNet peering, outbound traffic to the internet with network security groups
- Init containers
- Host aliases
- Arguments for exec in ACI
- Persistent Volumes and Persistent Volume Claims
- Container hooks
- Confidential containers (see supported regions list here)
- ACI standby pools
- Support for image pulling via Private Link and Managed Identity (MSI)
Planned future enhancements
- Kubernetes network policies
- Support for IPv6
- Windows containers
- Port Forwarding
Note: The new generation of the add-on is managed via Helm rather than as an AKS managed add-on.
Requirements & limitations
-
Each Virtual Nodes on ACI deployment requires 3 vCPUs and 12 GiB memory on one of the AKS cluster’s VMs
-
Each Virtual Nodes node supports up to 200 pods
- DaemonSets are not supported
- Virtual Nodes on ACI requires AKS clusters with Azure CNI networking (Kubenet is not supported, nor is overlay networking)
Migrating to the next generation of Virtual Nodes on Azure Container Instances via Helm chart
For this walkthrough, I'm using Bash via Windows Subsystem for Linux (WSL), along with the Azure CLI.
Direct migration is not supported, and therefore the steps below show an example of removing Virtual Nodes managed add-on and its resources and then installing the Virtual Nodes on ACI Helm chart.
In this walkthrough I will explain how to delete and re-create the Virtual Nodes subnet, however if you need to preserve the VNet and/or use a custom subnet name, refer to the Helm customisation steps here. Be sure to use a new subnet CIDR within the VNet address space, which doesn't overlap with other subnets nor the AKS CIDRS for nodes/pods and ClusterIP services.
To minimise disruption, we'll first install the Virtual Nodes on ACI Helm chart, before then removing the legacy managed add-on and its resources.
Prerequisites
- A recent version of the Azure CLI
- An Azure subscription with sufficient ACI quota for your selected region
- Helm
Deployment steps
- Initialise environment variables
location=northeurope rg=rg-virtualnode-demo vnetName=vnet-virtualnode-demo clusterName=aks-virtualnode-demo aksSubnetName=subnet-aks vnSubnetName=subnet-vn - Create the new Virtual Nodes on ACI subnet with the specific name value of cg (a custom subnet can be used by following the steps here):
vnSubnetId=$(az network vnet subnet create \ --resource-group $rg \ --vnet-name $vnetName \ --name cg \ --address-prefixes <your subnet CIDR> \ --delegations Microsoft.ContainerInstance/containerGroups --query id -o tsv) - Assign the cluster's -kubelet identity Contributor access to the infrastructure resource group, and Network Contributor access to the ACI subnet:
nodeRg=$(az aks show --resource-group $rg --name $clusterName --query nodeResourceGroup -o tsv) nodeRgId=$(az group show -n $nodeRg --query id -o tsv) agentPoolIdentityId=$(az aks show --resource-group $rg --name $clusterName --query "identityProfile.kubeletidentity.resourceId" -o tsv) agentPoolIdentityObjectId=$(az identity show --ids $agentPoolIdentityId --query principalId -o tsv) az role assignment create \ --assignee-object-id "$agentPoolIdentityObjectId" \ --assignee-principal-type ServicePrincipal \ --role "Contributor" \ --scope "$nodeRgId" az role assignment create \ --assignee-object-id "$agentPoolIdentityObjectId" \ --assignee-principal-type ServicePrincipal \ --role "Network Contributor" \ --scope "$vnSubnetId" - Download the cluster's kubeconfig file:
az aks get-credentials -n $clusterName -g $rg - Clone the virtualnodesOnAzureContainerInstances GitHub repo:
git clone https://github.com/microsoft/virtualnodesOnAzureContainerInstances.git - Install the Virtual Nodes on ACI Helm chart:
helm install <yourReleaseName> <GitRepoRoot>/Helm/virtualnode - Confirm the Virtual Nodes node shows within the cluster and is in a Ready state (virtualnode-n):
$ kubectl get node NAME STATUS ROLES AGE VERSION aks-nodepool1-35702456-vmss000000 Ready <none> 4h13m v1.33.6 aks-nodepool1-35702456-vmss000001 Ready <none> 4h13m v1.33.6 virtualnode-0 Ready <none> 162m v1.33.7 - Scale-down any running Virtual Nodes workloads (example below):
kubectl scale deploy <deploymentName> -n <namespace> --replicas=0 - Drain and cordon the legacy Virtual Nodes node:
kubectl drain virtual-node-aci-linux - Disable the Virtual Nodes managed add-on (legacy):
az aks disable-addons --resource-group $rg --name $clusterName --addons virtual-node - Export a backup of the original subnet configuration:
az network vnet subnet show --resource-group $rg --vnet-name $vnetName --name $vnSubnetName > subnetConfigOriginal.json - Delete the original subnet (subnets cannot be renamed and therefore must be re-created):
az network vnet subnet delete -g $rg -n $vnSubnetName --vnet-name $vnetName - Delete the previous (legacy) Virtual Nodes node from the cluster:
kubectl delete node virtual-node-aci-linux - Test and confirm pod scheduling on Virtual Node:
apiVersion: v1 kind: Pod metadata: annotations: name: demo-pod spec: containers: - command: - /bin/bash - -c - 'counter=1; while true; do echo "Hello, World! Counter: $counter"; counter=$((counter+1)); sleep 1; done' image: mcr.microsoft.com/azure-cli name: hello-world-counter resources: limits: cpu: 2250m memory: 2256Mi requests: cpu: 100m memory: 128Mi nodeSelector: virtualization: virtualnode2 tolerations: - effect: NoSchedule key: virtual-kubelet.io/provider operator: ExistsIf the pod successfully starts on the Virtual Node, you should see similar to the below:
$ kubectl get pod -o wide demo-pod NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES demo-pod 1/1 Running 0 95s 10.241.0.4 vnode2-virtualnode-0 <none> <none> - Modify the nodeSelector and tolerations properties of your Virtual Nodes workloads to match the requirements of Virtual Nodes on ACI (see details below)
Modify your deployments to run on Virtual Nodes on ACI
For Virtual Nodes managed add-on (legacy), the following nodeSelector and tolerations are used to run pods on Virtual Nodes:
nodeSelector:
kubernetes.io/role: agent
kubernetes.io/os: linux
type: virtual-kubelet
tolerations:
- key: virtual-kubelet.io/provider
operator: Exists
- key: azure.com/aci
effect: NoSchedule
For Virtual Nodes on ACI, the nodeSelector/tolerations are slightly different:
nodeSelector:
virtualization: virtualnode2
tolerations:
- effect: NoSchedule
key: virtual-kubelet.io/provider
operator: Exists
Troubleshooting
- Check the virtual-node-admission-controller and virtualnode-n pods are running within the vn2 namespace:
$ kubectl get pod -n vn2 NAME READY STATUS RESTARTS AGE virtual-node-admission-controller-54cb7568f5-b7hnr 1/1 Running 1 (5h21m ago) 5h21m virtualnode-0 6/6 Running 6 (4h48m ago) 4h51mIf these pods are in a Pending state, your node pool(s) may not have enough resources available to schedule them (use kubectl describe pod to validate).
- If the virtualnode-n pod is crashing, check the logs of the proxycri container to see whether there are any Managed Identity permissions issues (the cluster's -agentpool MSI needs to have Contributor access on the infrastructure resource group):
kubectl logs -n vn2 virtualnode-0 -c proxycri
Further troubleshooting guidance is available within the official documentation.
Support
If you have issues deploying or using Virtual Nodes on ACI, add a GitHub issue here