One of the benefits in adopting a system like Kubernetes is facilitating burstable and scalable workload. Horizontal application scaling involves adding or removing instances of an application to match demand. Kubernetes Horizontal Pod Autoscaler enables automated pod scaling based on demand. This is cool, however can lead to unpredictable load on the cluster, which may put the cluster into an overcommitted state. Fortunately, with a goal of squeezing every bit of CPU and memory from a cluster, overcommitment may not only be ok but desirable.
The following image represents a three node cluster that runs three applications. Pink is the most critical. Red is burst-able and durable. This means if we need to stop a few instances of red, things will be ok. Blue is non-critical. I have also tried to depict in this image a cluster that is a fully maxed out state. There are no more resources available for additional workload.
Imaging now that a scale out operation is needed on the pink application. This puts the cluster in an overcommitted state with critical workload requiring scheduling. How can Kubernetes facilitate this critical request in an overcommitted state? One option is to use Pod Priority and Preemption, which allows a priority weight to be added to a scheduling request. In the event of overcommitment, priority is evaluated, and lower priority workload is restarted (preemption) to allow for scheduling of the priority workload.
Pod Priority and Preemption tutorial
In this article, we will walk through an end-to-end demonstration of using Pod Priority and Pre-emption to ensure critical workload has priority to cluster resources.
In order to complete this tutorial, you need a Kubernetes cluster that consists of three nodes. I've included steps for deploying an appropriately sized Azure Kubernetes cluster. If you need an Azure Subscription or would like to read up on additional operational practices for Azure Kubernetes Service, see the following links.
Run some workload to consume all CPU cores in the cluster. In the following example, a deployment consisting of three replicas is started with a CPU request of one core each. This will effectively consume the available CPU resources of the cluster.
Create a file named slam-cpu.yml and copy in the following yaml.
At this point, what you will find is that the new pod cannot be scheduled due to lack of CPU resources. To see this, list the pods on the cluster and note that the pod-no-priority is in a Pending state.