One of the benefits in adopting a system like Kubernetes is facilitating burstable and scalable workload. Horizontal application scaling involves adding or removing instances of an application to match demand. Kubernetes Horizontal Pod Autoscaler enables automated pod scaling based on demand. This is cool, however can lead to unpredictable load on the cluster, which may put the cluster into an overcommitted state. Fortunately, with a goal of squeezing every bit of CPU and memory from a cluster, overcommitment may not only be ok but desirable.
The following image represents a three node cluster that runs three applications. Pink is the most critical. Red is burst-able and durable. This means if we need to stop a few instances of red, things will be ok. Blue is non-critical. I have also tried to depict in this image a cluster that is a fully maxed out state. There are no more resources available for additional workload.
Imaging now that a scale out operation is needed on the pink application. This puts the cluster in an overcommitted state with critical workload requiring scheduling. How can Kubernetes facilitate this critical request in an overcommitted state? One option is to use Pod Priority and Preemption, which allows a priority weight to be added to a scheduling request. In the event of overcommitment, priority is evaluated, and lower priority workload is restarted (preemption) to allow for scheduling of the priority workload.
In this article, we will walk through an end-to-end demonstration of using Pod Priority and Pre-emption to ensure critical workload has priority to cluster resources.
In order to complete this tutorial, you need a Kubernetes cluster that consists of three nodes. I've included steps for deploying an appropriately sized Azure Kubernetes cluster. If you need an Azure Subscription or would like to read up on additional operational practices for Azure Kubernetes Service, see the following links.
First things first, ensure you have an appropriately sized Kubernetes cluster for this tutorial (three nodes).
Create a resource group.
az group create --name AKSOperationsDemos --location eastus
Create the cluster. Note, the Azure CLI defaults are suitable for this demo.
az aks create --resource-group AKSOperationsDemos --name AKSOperationsDemos --kubernetes-version 1.11.3
Connect to the cluster as cluster admin.
az aks get-credentials --resource-group AKSOperationsDemos --name AKSOperationsDemos --admin
Create an instance of a Pod Priority Class with a weight of 1000000
. This can be used to ensure that high priority workload is given priority to cluster resource.
To do so, create a file names pc.yml
and copy in the following yaml.
apiVersion: scheduling.k8s.io/v1beta1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
Create the priority class.
kubectl create -f pc.yml
Run some workload to consume all CPU cores in the cluster. In the following example, a deployment consisting of three replicas is started with a CPU request of one core each. This will effectively consume the available CPU resources of the cluster.
Create a file named slam-cpu.yml
and copy in the following yaml.
apiVersion: apps/v1
kind: Deployment
metadata:
name: consume-cpu
spec:
replicas: 3
selector:
matchLabels:
app: consume-cpu
template:
metadata:
labels:
app: consume-cpu
spec:
containers:
- name: nepetersv1
image: neilpeterson/aks-helloworld:v1
resources:
requests:
cpu: 1
memory: 128Mi
limits:
cpu: 1
memory: 128Mi
Run the deployment.
kubectl create -f slam-cpu.yml
Now start another pod without specifying a priority class.
Create a file named pod-no-priority.yml
and copy in the following YAML.
apiVersion: v1
kind: Pod
metadata:
name: pod-no-priority
spec:
containers:
- name: pod-no-priority
image: neilpeterson/aks-helloworld:v1
resources:
requests:
cpu: 1
memory: 128Mi
limits:
cpu: 1
memory: 256Mi
Run the pod.
kubectl create -f pod-no-priority.yml
At this point, what you will find is that the new pod cannot be scheduled due to lack of CPU resources. To see this, list the pods on the cluster and note that the pod-no-priority
is in a Pending
state.
kubectl get pods
consume-cpu-6c8d576684-gf5sk 0/1 ContainerCreating 0 52s
consume-cpu-6c8d576684-mtvmn 0/1 ContainerCreating 0 52s
consume-cpu-6c8d576684-pnkff 0/1 ContainerCreating 0 52s
pod-no-priority 0/1 Pending 0 10s
Return a list of events for the pod to see the actual issue.
kubectl describe pod pod-no-priority
Parsing the output you should see that the pod cannot be scheduled to insufficient cpu.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 1s (x18 over 53s) default-scheduler 0/3 nodes are available: 3 Insufficient cpu.
Finally run another pod, however this time assign the high-priority class to the pod.
Create a file named pod-priority.yml
and copy in the following yaml. Take note that the pod spec includes the priority class created in a previous step.
apiVersion: v1
kind: Pod
metadata:
name: pod-with-priority
spec:
containers:
- name: pod-with-priority
image: neilpeterson/nepetersv1
resources:
requests:
cpu: 1
memory: 128Mi
limits:
cpu: 1
memory: 256Mi
priorityClassName: high-priority
Run the pod.
kubectl create -f pod-priority.yaml
Now return a list of pods. If done quickly you may be able to catch one of the lower priority pods being terminated.
kubectl get pods
NAME READY STATUS RESTARTS AGE
consume-cpu-6c8d576684-gf5sk 1/1 Running 0 7m
consume-cpu-6c8d576684-mtvmn 1/1 Running 0 7m
consume-cpu-6c8d576684-p7tqx 0/1 Pending 0 3s
consume-cpu-6c8d576684-pnkff 1/1 Terminating 0 7m
pod-no-priority 0/1 Pending 0 6m
pod-with-priority 0/1 Pending 0 3s
Once the lower priority pod has been terminated, the pod with priority is started in its place.
kubectl get pods
NAME READY STATUS RESTARTS AGE
consume-cpu-6c8d576684-gf5sk 1/1 Running 0 8m
consume-cpu-6c8d576684-mtvmn 1/1 Running 0 8m
consume-cpu-6c8d576684-p7tqx 0/1 Pending 0 1m
pod-no-priority 0/1 Pending 0 8m
pod-with-priority 1/1 Running 0 1m
Very cool indeed. Feel free to contact me on Twitter (@nepeters) or comment below for discussion on the topic.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.