Automated node pool migration to Azure Linux

Former Employee

Nov 20, 2023

The Azure Linux container host is Microsoft’s Linux distribution, tailor-made for cloud-native environments and highly optimized for Azure Kubernetes Service (AKS). Microsoft Threat Protection (MTP) Kubernetes Compute Platform, the single largest application on AKS, recently migrated to Azure Linux nodes on AKS and saw numerous advantages. This blog covers those advantages as well as how we migrated, by utilizing the combined functionality of Cluster Autoscaler, Node Problem Detector, and Draino.

Azure Linux

“Our transition to Azure Linux was initially prompted by the goal of minimizing the system's vulnerability to threats. However, we quickly observed enhancements in performance upon initial boot-up, which, in turn, enabled us to decrease our 'warm standby' reserve capacity thanks to how quickly new infrastructure could be brought online. The reduced attack surface is important to us, as our platform provides compute infrastructure for many products in the Microsoft Security portfolio.”- Joshua Johnson, Group Engineering Manager in Microsoft Security

Azure Linux advantages:

Security: Azure Linux's minimalist design contains only the necessary packages required to run container workloads. This design philosophy not only diminishes the attack surface but also makes it more manageable to maintain a secure node pool environment.
Performance: Being very lightweight, Azure Linux boasts rapid boot times and lower memory consumption. This has improved our overall cluster performance across hundreds of AKS clusters.
Cost Savings: By needing fewer resources due to its smaller footprint, provisioning Azure Linux node pools has allowed us to be more cost-efficient.
Compatibility and Tooling: We were happy to learn that we could migrate to Azure Linux without letting go of familiar tools. Azure Linux seamlessly integrates with common AKS tools, as well as an array of partner tooling and software that our service uses to monitor the health of our clusters. Further, Azure Linux is aligned with Kubernetes versions, features, and update paths that we’re already accustomed to.

In essence, migrating our nodes to Azure Linux helped us leverage a more secure, efficient, and AKS optimized Linux OS for our clusters, all without sacrificing essential features or compatibility.

Migrating node pools

The traditional path to migrate node pools today starts with creating new node pools, cordoning, and draining existing node pools, and then deleting the existing node pools. This method of migration can be very manual, time consuming, and can require tedious coordination for services that are resource constrained. By utilizing the combined functionality of Cluster Autoscaler, Node Problem Detector, and Draino, we were able to gradually migrate workloads to the Azure Linux node pool as we scaled down our existing node pools.

Limitations

Using Cluster Autoscaler to migrate from one node pool to another has a few limitations:

While user node pools can be scaled down to zero, system node pools must always have at least one node. It is possible, though, to convert a system node pool into a user node pool (and then scale it to zero).
Cluster Autoscaler will not automatically remove the existing node pool from the cluster. Nevertheless, node pools that were scaled down to zero do not consume quota or generate costs.
During the migration process, the cluster will briefly have a higher node count than before as new nodes start and work is drained from the existing node pool to the new one.

Process Overview

Tainting Existing Node pools: The existing node pools should be marked with a specific taint. This taint acts as a signal that these node pools should no longer schedule new workloads and prepare them for migration.
Setting Node Condition: The Node Problem Detector (NPD) is configured to watch for the specific taint applied in the previous step. Upon detecting the taint, NPD sets a permanent condition on the affected nodes, indicating that they are ready for the migration process.
Node Drainage: Draino monitors for the condition set by NPD and responds by draining the tainted nodes. Draining ensures that all pods currently running on the old nodes/nodepools are evicted and rescheduled elsewhere in the cluster.
Node Deletion and Replacement: Once a node has been fully drained, the Cluster Autoscaler will mark the node for deletion. Subsequently, to keep optimal capacity across the cluster, the AutoScaler will provision new nodes within the newly created Azure Linux node pools.
Workload Redistribution: As new Azure Linux nodes become available, the workloads previously running on the existing nodes are automatically shifted to the Azure Linux nodes. The Cluster Autoscaler manages the distribution of these workloads to maintain optimal performance and resource utilization.

Necessary Tooling & Prerequisites

Below is a list of tools and prerequisites necessary for completing the examples in this guide. Ensure these are installed and configured before proceeding:

Docker - A platform for developing, shipping, and running applications inside containers.
Azure CLI - A command-line tool for interacting and managing Azure resources.
kubectl - A command-line interface for running commands against Kubernetes clusters.
Helm - A package manager for Kubernetes, simplifying deployment and management of applications.
yq - A portable command-line YAML, JSON and XML processor that allows you to query and update data structures.

Migration Steps

Defining essential environment variables

These variables will be consistently referenced throughout this document/example.

# The subscription ID to which the cluster belongs
export SUBSCRIPTION=12345678-1234-1234-1234-123456789012

# The resource group name that contains the cluster
export RESOURCEGROUP=my-resource-group

# Name of the cluster
export CLUSTER=my-aks-cluster

# Container registry to store the docker images used/built in the examples
export REGISTRY="my-container-registry.azurecr.io/my-repo"

Configuring Azure CLI with provided environment variables

Ensure the environment variables defined earlier are set before executing these commands:

# Log in to your Azure account
az login

# Select the specified subscription using the environment variable previously defined
az account set --subscription $SUBSCRIPTION

Identifying node pools for migration

Before migration, review the node pools within the cluster. Use the following command to list them in a table format for easier selection:

# List the node pools in a table format
az aks nodepool list --resource-group $RESOURCEGROUP --cluster-name $CLUSTER --output table

Sample output

Here’s what you can expect to see after running the command above:

Name       OsType    KubernetesVersion    VmSize             Count    MaxPods    ProvisioningState    Mode
---------  --------  -------------------  -----------------  -------  ---------  -------------------  ------
usrpool1   Linux     1.26.6               Standard_D2ads_v5  1        50         Succeeded            User
usrpool2   Linux     1.26.6               Standard_D2ads_v5  1        50         Succeeded            User
syspool1   Linux     1.26.6               Standard_D2ads_v5  1        50         Succeeded            System

“Cloning” a node pool for further migration/deletion

In this instance, we’ll copy a node pool configuration, updating only its name and osSku to create a fresh node pool:

# Existing node pool name, to be used as base to be copied/cloned (select one from the previous step results and update `NP_NAME`)
NP_NAME= usrpool1
NP_FILE="nodepools/${NP_NAME}.yaml"
# New Azure Linux node pool to be created: reusing the same name, just appending "m" (for Azure Linux) as suffix
NP_AL_NAME="${NP_NAME}m" 

# Folder to store the downloaded node pool configurations
mkdir -p nodepools

# Retrieves the current configuration of the specified node pool and saves it as a YAML file
az aks nodepool show \
    --resource-group $RESOURCEGROUP \
    --cluster-name $CLUSTER \
    --nodepool-name $NP_NAME -oyaml \
    | yq > "nodepools/$NP_NAME.yaml"

# Creates a new node pool reusing the settings of the existing one, updating only the `name` and `osSku`
az aks nodepool add \
    --resource-group $(yq e '.resourceGroup' $NP_FILE) \
    --cluster-name $CLUSTER \
    --name $NP_AL_NAME \
    --node-count $(yq e '.count' $NP_FILE) \
    --node-vm-size $(yq e '.vmSize' $NP_FILE) \
    --os-sku AzureLinux \
    --os-type $(yq e '.osType' $NP_FILE) \
    --kubernetes-version $(yq e '.currentOrchestratorVersion' $NP_FILE) \
    --node-osdisk-size $(yq e '.osDiskSizeGb' $NP_FILE) \
    --node-osdisk-type $(yq e '.osDiskType' $NP_FILE) \
    --max-pods $(yq e '.maxPods' $NP_FILE) \
    --enable-cluster-autoscaler \
    --min-count $(yq e '.minCount' $NP_FILE) \
    --max-count $(yq e '.maxCount' $NP_FILE) \
    --mode $(yq e '.mode' $NP_FILE)

# In case you need to delete the newly created nodepool
# az aks nodepool delete --resource-group $RESOURCEGROUP --cluster-name $CLUSTER --name $NP_AL_NAME

Configuring the Node Problem Detector (NPD)

This section outlines the steps to set up NPD for our goals. NPD will specifically monitor the NodePoolRemoved taint and once it is detected, NPD will assign a condition to the affected nodes. Then, Draino will use this condition to identify and process the nodes for safe removal, ensuring node pool maintenance aligns with cluster health and reliability requirements.

Customizing NPD container image (to include curl)

Save the contents below into a ‘Dockerfile’ file to build a custom NPD image (including ‘curl’)

FROM registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.14
RUN apt-get update && \
    apt-get install -y curl jq && \
    apt-get clean

NPD Helm Chart custom values (npd-values.yaml)

Values specific for our example, basically adding the NPD custom plug-in called ‘NodeTaintMonitor’ (please )

settings:
  custom_monitor_definitions:
    check-node-taints.json: |
      {
        "plugin": "custom",
        "pluginConfig": {
          "invoke_interval": "1m",
          "max_output_length": 80,
          "concurrency": 1,
          "enable_message_change_based_condition_update": true
        },
        "source": "node-taint-plugin-monitor",
        "conditions": [
          {
            "type": "NodeTaintMonitor",
            "reason": "RemovedNodePoolTaintNotApplied",
            "message": "Node is not tainted with RemovedNodePool"
          }
        ],
        "rules": [
          {
            "type": "permanent",
            "condition": "NodeTaintMonitor",
            "reason": "RemovedNodePoolTaintApplied",
            "path": "/custom-config/check-node-taints.sh",
            "args": [
              "RemovedNodePool"
            ]
          }
        ]
      }
    check-node-taints.sh: |-
      #!/bin/bash
    
      OK=0
      NOOK=1
      UNKNOWN=2
      node_name=$NODENAME
      api_server="https://kubernetes.default.svc.cluster.local"
      token="/var/run/secrets/kubernetes.io/serviceaccount/token"
      cacert="/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
      taint_key=$1
    
      isTaint=$(curl -sSk \
        -H "Authorization: Bearer $(cat $token)" \
        --cacert $cacert $api_server/api/v1/nodes/$node_name \
        | jq -r 'if .spec.taints != null then .spec.taints[] | select(.key == "'"$taint_key"'") | .key else empty end') \
        || { echo "Error occurred while retrieving taints"; exit "${UNKNOWN}"; }
    
      if [[ -z "$isTaint" ]]; then
        echo "Node does not have $taint_key"
        exit "${OK}"
      else
        echo "Node has $taint_key"
        exit "${NOOK}"
      fi

  custom_plugin_monitors:
    - /custom-config/check-node-taints.json
env:
  - name: NODENAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

Installing NPD via Helm Chart

To install the NPD Helm Chart, we will follow the instructions provided by the project on GitHub.

# Sets up the NPD image to be used (please update the vars below accordingly to your image registry)
export NPD_IMAGE_REPO="$REGISTRY/npd"
export NPD_IMAGE_TAG="v0.8.14-curl"

# Sets NPD required RBAC permissions
kubectl create clusterrole node-list-role --verb=list --resource=nodes 
kubectl create clusterrolebinding npd-binding --clusterrole=node-list-role --serviceaccount=npd:npd-node-problem-detector

# Add the Delivery Hero helm repository
helm repo add deliveryhero https://charts.deliveryhero.io/

# Installs the NPD Helm Chart in the `npd` namespace
helm upgrade --install npd deliveryhero/node-problem-detector \
  --namespace npd --create-namespace \
  --set image.repository="$NPD_IMAGE_REPO" --set image.tag="$NPD_IMAGE_TAG" \
  -f npd-values.yaml

# In case you need to uninstall the Helm Chart
# helm uninstall --namespace npd npd

Setting up Draino

Draino is no longer being actively maintained. To ensure security and stability, it’s recommended to seek an updated version that includes all the necessary security patches.

# Defines the Draino Git Repo to be used 
export DRAINO_GIT_REPO=https://github.com/planetlabs/draino.git

# Created a folder to store the Draino forked repo
rm -rf repos/draino charts/draino
mkdir -p repos charts
git clone $DRAINO_GIT_REPO repos/draino

# Copies the Helm Chart from the Git repository
cp -r repos/draino/helm/draino/ charts/

# Defines the Draino image to be used (please update the vars below accordingly to your image registry)
export DRAINO_IMAGE_REPO="${REGISTRY}/draino"
export DRAINO_IMAGE_TAG="v0.1.0_patched"

# Builds and publishes the Draino image according to environment variables defined
docker build repos/draino/ -t "${DRAINO_IMAGE_REPO}:${DRAINO_IMAGE_TAG}"
docker push "${DRAINO_IMAGE_REPO}:${DRAINO_IMAGE_TAG}"

# Created the Draino required RBAC permissions
kubectl create clusterrole draino-role --verb=get,list,create,update,delete --resource=leases.coordination.k8s.io
kubectl create clusterrolebinding draino-binding --clusterrole=draino-role --serviceaccount=draino:draino


# Deploys the Draino Helm Chart with our specific settings
helm upgrade --install draino charts/draino \
  --namespace draino --create-namespace \
  --set image.repository="$DRAINO_IMAGE_REPO" --set image.tag="$DRAINO_IMAGE_TAG" \
  -f draino-values.yaml

Tainting the old node pools

NEW_TAINT="RemovedNodePool=true:NoSchedule"
# Get the existing taints for the node pool, join them into a string, or set as empty.
TAINTS=$(az aks nodepool show --resource-group $RESOURCEGROUP --cluster-name $CLUSTER --nodepool-name $NP_NAME --query nodeTaints -o tsv | tr -d '\n')

# Add the new taint if it doesn't already exist
if ! grep -q "$NEW_TAINT" <<< "$TAINTS"; then
TAINTS="${TAINTS:+$TAINTS,}$NEW_TAINT"
fi

# Update the nodepool with the new taints.
az aks nodepool update --resource-group $RESOURCEGROUP --cluster-name $CLUSTER --nodepool-name=$NP_NAME --node-taints