The Azure Linux container host is Microsoft’s Linux distribution, tailor-made for cloud-native environments and highly optimized for Azure Kubernetes Service (AKS). Microsoft Threat Protection (MTP) Kubernetes Compute Platform, the single largest application on AKS, recently migrated to Azure Linux nodes on AKS and saw numerous advantages. This blog covers those advantages as well as how we migrated, by utilizing the combined functionality of Cluster Autoscaler, Node Problem Detector, and Draino.
“Our transition to Azure Linux was initially prompted by the goal of minimizing the system's vulnerability to threats. However, we quickly observed enhancements in performance upon initial boot-up, which, in turn, enabled us to decrease our 'warm standby' reserve capacity thanks to how quickly new infrastructure could be brought online. The reduced attack surface is important to us, as our platform provides compute infrastructure for many products in the Microsoft Security portfolio.”- Joshua Johnson, Group Engineering Manager in Microsoft Security
Azure Linux advantages:
In essence, migrating our nodes to Azure Linux helped us leverage a more secure, efficient, and AKS optimized Linux OS for our clusters, all without sacrificing essential features or compatibility.
The traditional path to migrate node pools today starts with creating new node pools, cordoning, and draining existing node pools, and then deleting the existing node pools. This method of migration can be very manual, time consuming, and can require tedious coordination for services that are resource constrained. By utilizing the combined functionality of Cluster Autoscaler, Node Problem Detector, and Draino, we were able to gradually migrate workloads to the Azure Linux node pool as we scaled down our existing node pools.
Using Cluster Autoscaler to migrate from one node pool to another has a few limitations:
While user node pools can be scaled down to zero, system node pools must always have at least one node. It is possible, though, to convert a system node pool into a user node pool (and then scale it to zero).
Cluster Autoscaler will not automatically remove the existing node pool from the cluster. Nevertheless, node pools that were scaled down to zero do not consume quota or generate costs.
During the migration process, the cluster will briefly have a higher node count than before as new nodes start and work is drained from the existing node pool to the new one.
Tainting Existing Node pools: The existing node pools should be marked with a specific taint. This taint acts as a signal that these node pools should no longer schedule new workloads and prepare them for migration.
Setting Node Condition: The Node Problem Detector (NDP) is configured to watch for the specific taint applied in the previous step. Upon detecting the taint, NDP sets a permanent condition on the affected nodes, indicating that they are ready for the migration process.
Node Drainage: Draino monitors for the condition set by NDP and responds by draining the tainted nodes. Draining ensures that all pods currently running on the old nodes/nodepools are evicted and rescheduled elsewhere in the cluster.
Node Deletion and Replacement: Once a node has been fully drained, the Cluster Autoscaler will mark the node for deletion. Subsequently, to keep optimal capacity across the cluster, the AutoScaler will provision new nodes within the newly created Azure Linux node pools.
Workload Redistribution: As new Azure Linux nodes become available, the workloads previously running on the existing nodes are automatically shifted to the Azure Linux nodes. The Cluster Autoscaler manages the distribution of these workloads to maintain optimal performance and resource utilization.
Below is a list of tools and prerequisites necessary for completing the examples in this guide. Ensure these are installed and configured before proceeding:
These variables will be consistently referenced throughout this document/example.
# The subscription ID to which the cluster belongs
export SUBSCRIPTION=12345678-1234-1234-1234-123456789012
# The resource group name that contains the cluster
export RESOURCEGROUP=my-resource-group
# Name of the cluster
export CLUSTER=my-aks-cluster
# Container registry to store the docker images used/built in the examples
export REGISTRY="my-container-registry.azurecr.io/my-repo"
Ensure the environment variables defined earlier are set before executing these commands:
# Log in to your Azure account
az login
# Select the specified subscription using the environment variable previously defined
az account set --subscription $SUBSCRIPTION
Before migration, review the node pools within the cluster. Use the following command to list them in a table format for easier selection:
# List the node pools in a table format
az aks nodepool list --resource-group $RESOURCEGROUP --cluster-name $CLUSTER --output table
Here’s what you can expect to see after running the command above:
Name OsType KubernetesVersion VmSize Count MaxPods ProvisioningState Mode
--------- -------- ------------------- ----------------- ------- --------- ------------------- ------
usrpool1 Linux 1.26.6 Standard_D2ads_v5 1 50 Succeeded User
usrpool2 Linux 1.26.6 Standard_D2ads_v5 1 50 Succeeded User
syspool1 Linux 1.26.6 Standard_D2ads_v5 1 50 Succeeded System
In this instance, we’ll copy a node pool configuration, updating only its name and osSku to create a fresh node pool:
# Existing node pool name, to be used as base to be copied/cloned (select one from the previous step results and update `NP_NAME`)
NP_NAME= usrpool1
NP_FILE="nodepools/${NP_NAME}.yaml"
# New Azure Linux node pool to be created: reusing the same name, just appending "m" (for Azure Linux) as suffix
NP_AL_NAME="${NP_NAME}m"
# Folder to store the downloaded node pool configurations
mkdir -p nodepools
# Retrieves the current configuration of the specified node pool and saves it as a YAML file
az aks nodepool show \
--resource-group $RESOURCEGROUP \
--cluster-name $CLUSTER \
--nodepool-name $NP_NAME -oyaml \
| yq > "nodepools/$NP_NAME.yaml"
# Creates a new node pool reusing the settings of the existing one, updating only the `name` and `osSku`
az aks nodepool add \
--resource-group $(yq e '.resourceGroup' $NP_FILE) \
--cluster-name $CLUSTER \
--name $NP_AL_NAME \
--node-count $(yq e '.count' $NP_FILE) \
--node-vm-size $(yq e '.vmSize' $NP_FILE) \
--os-sku AzureLinux \
--os-type $(yq e '.osType' $NP_FILE) \
--kubernetes-version $(yq e '.currentOrchestratorVersion' $NP_FILE) \
--node-osdisk-size $(yq e '.osDiskSizeGb' $NP_FILE) \
--node-osdisk-type $(yq e '.osDiskType' $NP_FILE) \
--max-pods $(yq e '.maxPods' $NP_FILE) \
--enable-cluster-autoscaler \
--min-count $(yq e '.minCount' $NP_FILE) \
--max-count $(yq e '.maxCount' $NP_FILE) \
--mode $(yq e '.mode' $NP_FILE)
# In case you need to delete the newly created nodepool
# az aks nodepool delete --resource-group $RESOURCEGROUP --cluster-name $CLUSTER --name $NP_AL_NAME
This section outlines the steps to set up NDP for our goals. NDP will specifically monitor the NodePoolRemoved taint and once it is detected, NDP will assign a condition to the affected nodes. Then, Draino will use this condition to identify and process the nodes for safe removal, ensuring node pool maintenance aligns with cluster health and reliability requirements.
Save the contents below into a ‘Dockerfile’ file to build a custom NPD image (including ‘curl’)
FROM registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.14
RUN apt-get update && \
apt-get install -y curl jq && \
apt-get clean
Values specific for our example, basically adding the NDP custom plug-in called ‘NodeTaintMonitor’ (please )
settings:
custom_monitor_definitions:
check-node-taints.json: |
{
"plugin": "custom",
"pluginConfig": {
"invoke_interval": "1m",
"max_output_length": 80,
"concurrency": 1,
"enable_message_change_based_condition_update": true
},
"source": "node-taint-plugin-monitor",
"conditions": [
{
"type": "NodeTaintMonitor",
"reason": "RemovedNodePoolTaintNotApplied",
"message": "Node is not tainted with RemovedNodePool"
}
],
"rules": [
{
"type": "permanent",
"condition": "NodeTaintMonitor",
"reason": "RemovedNodePoolTaintApplied",
"path": "/custom-config/check-node-taints.sh",
"args": [
"RemovedNodePool"
]
}
]
}
check-node-taints.sh: |-
#!/bin/bash
OK=0
NOOK=1
UNKNOWN=2
node_name=$NODENAME
api_server="https://kubernetes.default.svc.cluster.local"
token="/var/run/secrets/kubernetes.io/serviceaccount/token"
cacert="/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
taint_key=$1
isTaint=$(curl -sSk \
-H "Authorization: Bearer $(cat $token)" \
--cacert $cacert $api_server/api/v1/nodes/$node_name \
| jq -r 'if .spec.taints != null then .spec.taints[] | select(.key == "'"$taint_key"'") | .key else empty end') \
|| { echo "Error occurred while retrieving taints"; exit "${UNKNOWN}"; }
if [[ -z "$isTaint" ]]; then
echo "Node does not have $taint_key"
exit "${OK}"
else
echo "Node has $taint_key"
exit "${NOOK}"
fi
custom_plugin_monitors:
- /custom-config/check-node-taints.json
env:
- name: NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
To install the NPD Helm Chart, we will follow the instructions provided by the project on GitHub.
# Sets up the NDP image to be used (please update the vars below accordingly to your image registry)
export NDP_IMAGE_REPO="$REGISTRY/ndp"
export NDP_IMAGE_TAG="v0.8.14-curl"
# Sets NPD required RBAC permissions
kubectl create clusterrole node-list-role --verb=list --resource=nodes
kubectl create clusterrolebinding npd-binding --clusterrole=node-list-role --serviceaccount=npd:npd-node-problem-detector
# Add the Delivery Hero helm repository
helm repo add deliveryhero https://charts.deliveryhero.io/
# Installs the NPD Helm Chart in the `npd` namespace
helm upgrade --install npd deliveryhero/node-problem-detector \
--namespace npd --create-namespace \
--set image.repository="$NDP_IMAGE_REPO" --set image.tag="$NDP_IMAGE_TAG" \
-f ndp-values.yaml
# In case you need to uninstall the Helm Chart
# helm uninstall --namespace npd npd
Draino is no longer being actively maintained. To ensure security and stability, it’s recommended to seek an updated version that includes all the necessary security patches.
# Defines the Draino Git Repo to be used
export DRAINO_GIT_REPO=https://github.com/planetlabs/draino.git
# Created a folder to store the Draino forked repo
rm -rf repos/draino charts/draino
mkdir -p repos charts
git clone $DRAINO_GIT_REPO repos/draino
# Copies the Helm Chart from the Git repository
cp -r repos/draino/helm/draino/ charts/
# Defines the Draino image to be used (please update the vars below accordingly to your image registry)
export DRAINO_IMAGE_REPO="${REGISTRY}/draino"
export DRAINO_IMAGE_TAG="v0.1.0_patched"
# Builds and publishes the Draino image according to environment variables defined
docker build repos/draino/ -t "${DRAINO_IMAGE_REPO}:${DRAINO_IMAGE_TAG}"
docker push "${DRAINO_IMAGE_REPO}:${DRAINO_IMAGE_TAG}"
# Created the Draino required RBAC permissions
kubectl create clusterrole draino-role --verb=get,list,create,update,delete --resource=leases.coordination.k8s.io
kubectl create clusterrolebinding draino-binding --clusterrole=draino-role --serviceaccount=draino:draino
# Deploys the Draino Helm Chart with our specific settings
helm upgrade --install draino charts/draino \
--namespace draino --create-namespace \
--set image.repository="$DRAINO_IMAGE_REPO" --set image.tag="$DRAINO_IMAGE_TAG" \
-f draino-values.yaml
NEW_TAINT="RemovedNodePool=true:NoSchedule"
# Get the existing taints for the node pool, join them into a string, or set as empty.
TAINTS=$(az aks nodepool show --resource-group $RESOURCEGROUP --cluster-name $CLUSTER --nodepool-name $NP_NAME --query nodeTaints -o tsv | tr -d '\n')
# Add the new taint if it doesn't already exist
if ! grep -q "$NEW_TAINT" <<< "$TAINTS"; then
TAINTS="${TAINTS:+$TAINTS,}$NEW_TAINT"
fi
# Update the nodepool with the new taints.
az aks nodepool update --resource-group $RESOURCEGROUP --cluster-name $CLUSTER --nodepool-name=$NP_NAME --node-taints
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.