Microsoft recently announced the general availability of OSSKU Migration in Azure Kubernetes Service (AKS). This new feature enables users to take an existing AKS node pool and update the OSSKU for an in-place move between Ubuntu and Azure Linux. Previously when OSSKU was immutable, users had to create new node pools and explicitly drain their workloads into them, which was both labor intensive and required additional VM quota.
In this blog post we will dive into how to use this feature, the tech stack that supports it, and some considerations to make sure the upgrade goes smoothly.
Using OSSKU Migration
OSSKU Migration is supported by az-cli, ARM/Bicep templates, and Terraform. All three options will put the affected node pools into the upgrading state, which will take several minutes to resolve. During this time your cluster will scale up depending on your max surge setting, then your pods will be drained and scheduled on to other VMs in your nodepool or cluster.
- If you are using az-cli your version must be 2.61.0 or higher. To trigger a migration with az-cli run the following command on your node pool.
az aks nodepool update --resource-group myResourceGroup --cluster-name myAKSCluster --name myNodePool --os-sku AzureLinux
- If you are using ARM/Bicep templates you must update your apiVersion to 2023-07-01 or newer. Then update the ossku field in your agentPoolProfile section to “AzureLinux” and redeploy your template.
- If you are using Terraform your azurerm provider version must be v3.111.0 or higher. Then update the os_sku field of your node pools to “AzureLinux” and redeploy your Terraform plan.
How it Works
When you send a request to AKS (1) and it notices that your node pool’s OSSKU value has changed, it performs some additional validation to make sure that the change is allowed:
- OSSKU Migration cannot change node pool names.
- Only Ubuntu and AzureLinux are supported OSSKU targets.
- Ubuntu node pools with UseGPUDedicatedVHD enabled cannot change OSSKU.
- Ubuntu node pools with CVM 20.04 enabled cannot change OSSKU.
- AzureLinux node pools with Kata enabled cannot change OSSKU.
- Windows node pools cannot change OSSKU.
If all these conditions pass, then AKS puts the node pool into the upgrading state and picks the latest available image for your new chosen OSSKU. It will then follow the exact same flow as a node image upgrade, scaling the node pool up based on your max surge value (2), then replacing the image on your existing VMs one by one until each node is on the latest image for your new chosen OSSKU (3). Once all your VMs are upgraded to the new image, AKS then removes the surge nodes and signals back to the caller that the upgrade is complete (4).
Things to Consider
Before running an OSSKU Migration in your production clusters, there are two very important things to check:
- Deploy a node pool of your new target OSSKU into both development and production environments to confirm that everything works as expected on your new OSSKU before performing the migration on the rest of your node pools.
- Ensure that your workload has sufficient Pod Disruption Budget to allow AKS to move pods between VMs during the upgrade. This is necessary for OSSKU migration and any AKS node image upgrade to safely move workloads around your cluster while nodes are restarting. For information on troubleshooting PDB failures during upgrade see this documentation.
Conclusion
Throughout public preview, multiple teams within Microsoft have utilized OSSKU Migration to seamlessly move their workloads over to the Azure Linux OSSKU without large surge capacity and without the need for manual intervention within their clusters. We’re looking forward to more users experiencing how easy it is now to update the OSSKU on an existing AKS node pool.