Application scalability is very important for business success. Companies spend millions in ideation, software development, testing, and deployment to provide value to their customers. These customers then will use the app, but not in a regular basis. We might expect spikes during holidays, weekends, during the morning, etc.… In these cases, if the app is not ready to scale, then much of the investment might be lost.
In this workshop we will see the different solutions to scale applications in Kubernetes. We will explore 3 different solutions: Pods scalability, Node scalability and the Virtual Node. We’ll explore these options with a complete demo for each solution.
This demo is also available as a video.
Setting up the environment
We will need a Kubernetes cluster; we are using Azure Kubernetes Service (AKS) which is a managed k8s cluster.
$ # Create an AKS cluster and a resource group $ $aksRg="aks-demo" $ $aksName="aks-demo" $ #create a Resource Group $ az group create -n $aksRg -l westeurope $ # Create an AKS cluster with 2 nodes $ az aks create -g $aksRg ` -n $aksName ` --node-count 2 $ # Connect to the AKS cluster $ az aks get-credentials -g $aksRg -n $aksName
Then, we need to deploy a sample PHP application into k8s. This app will do some heavy calculations. The following yaml file creates a Deployment that will create a single Pod. And exposes it using a service object.
# deploy-svc.yaml apiVersion: apps/v1 kind: Deployment metadata: name: php-apache spec: selector: matchLabels: run: php-apache replicas: 1 template: metadata: labels: run: php-apache spec: containers: - name: php-apache image: k8s.gcr.io/hpa-example ports: - containerPort: 80 resources: limits: cpu: 500m requests: cpu: 200m --- apiVersion: v1 kind: Service metadata: name: php-apache labels: run: php-apache spec: ports: - port: 80 selector: run: php-apache
Let’s deploy the deployment and service into Kubernetes and check the deployed pod.
$ kubectl apply -f deploy-svc.yaml deployment.apps/php-apache created service/php-apache created $ $ kubectl get pods NAME READY STATUS RESTARTS AGE php-apache-79544c9bd9-vlwjp 1/1 Running 0 15s
Now, we have one single Pod deployed. Suppose we have lots of load/requests for one single pod and we need to scale out. Kubernetes have a built-in support for scalability in its core features. This could be done with 2 options. The first option is manually setting a hard number of replicas for the pods in the YAML file or by command line. The second option uses HPA. Next, we’ll explore these options.
The Deployment we created earlier have replicas set to 1. We can change that using the kubectl scale command as in the following:
# Note 1 single Pod is deployed as per Deployment/Replicas $ kubectl get pods NAME READY STATUS RESTARTS AGE php-apache-79544c9bd9-vlwjp 1/1 Running 0 13m $ $ # Manually scale Pods $ kubectl scale --replicas=2 deployment/php-apache deployment.apps/php-apache scaled $ $ # Note 2 Pods are now deployed as per Deployment/Replicas $ kubectl get pods NAME READY STATUS RESTARTS AGE php-apache-79544c9bd9-ggc77 1/1 Running 0 55s php-apache-79544c9bd9-vlwjp 1/1 Running 0 14m
Manual scalability is just fine for two reasons. First, if we know ahead of time when the load will go up or down. Second, if it is fine to handle it manually. But, in real life, the spike can arrive at any moment. Thus, we should automate how the system will react.
Scalability is one of the great features in Kubernetes. It could be achieved by scale out or scale in. This means increasing or decreasing the number of instances of a Pod. Kubernetes will manage how the load balancing between these Pods. This scalability could be automated by using HorizontalPodAutoscaler (HPA). The HPA will watch for CPU and Memory utilization metrics and decide to scale out or in. The metrics are exposed by the Metrics Server (https://github.com/kubernetes/metrics).
Let’s analyse the following example. This HPA will watch for the average CPU utilization for the Pods of the stated Deployment. The CPU average utilization should be around (and not exactly) 50%. When that is above 50%, the HPA will increase by one the number of replicas for the Deployment. If the average still above 50%, the HPA will increment the replicas again. The same process will be repeated until we either reach the 50% or the maximum number of allowed replicas (maxReplicas).
Scale in will be triggered when average CPU utilization is below 50%. Thus, the HPA will decrease the number of replicas until it reaches the targeted utilization or minimum number of allowed replicas (minReplicas).
The HPA will override the number of replicas stated in the Deployment configuration (replicas: 1) in respect to minReplicas: 3.
# hpa.yaml apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: php-apache spec: minReplicas: 3 maxReplicas: 10 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: php-apache metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50
This HPA is based on CPU average utilization. But we can also use Memory utilization. These metrics are built in. In addition to that, we can extend these metrics by implementing the external or custom metrics server API. There are many implementations to get metrics from Prometheus (https://github.com/DirectXMan12/k8s-prometheus-adapter/blob/master/docs/walkthrough.md) or Azure Application Insights and Azure Monitor (https://github.com/Azure/azure-k8s-metrics-adapter). This enables scenarios like scalability based on Queue length, number of HTTP requests per second, etc.…
Let’s now deploy the HPA and check the created Pods.
$ # Create the HorizontalPodAutoscaler (HPA) $ kubectl apply -f hpa.yaml horizontalpodautoscaler.autoscaling/php-apache created $ $ # Note 3 Pods are now deployed as per HPA minReplicas $ kubectl get pods NAME READY STATUS RESTARTS AGE php-apache-79544c9bd9-ggc77 1/1 Running 0 4m33s php-apache-79544c9bd9-vlwjp 1/1 Running 0 18m php-apache-79544c9bd9-zmffh 1/1 Running 0 63s $ $ # Check the current status of autoscaler $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache 0%/50% 3 10 3 2m39s
To test the HPA, we will increase the CPU utilization for the Pods. To do that, we will create 10 Pods that will send infinite HTTP/GET requests to the application Pods by using the script: 'while true; do wget -q -O- http://php-apache; done'. The file have is the following content:
# load-generator-deploy.yaml apiVersion: apps/v1 kind: Deployment metadata: name: load-generator spec: selector: matchLabels: run: load-generator replicas: 2 template: metadata: labels: run: load-generator spec: containers: - name: load-generator image: busybox args: [/bin/sh, -c, 'while true; do wget -q -O- http://php-apache; done']
Let’s deploy the testing Deployment into Kubernetes. And after a few seconds we check the created Pods and the HPA status. Note we have now 10 instances of the application Pod created by the HPA.
$ kubectl apply -f load-generator-deploy.yaml deployment.apps/load-generator configured $ # Few seconds later $ kubectl get pods NAME READY STATUS RESTARTS AGE load-generator-6d74bb99d5-6njgd 1/1 Running 0 9m27s load-generator-6d74bb99d5-qn8pt 1/1 Running 0 9m27s php-apache-79544c9bd9-2clfz 1/1 Running 0 20s php-apache-79544c9bd9-925qp 1/1 Running 0 2m17s php-apache-79544c9bd9-fl9hp 1/1 Running 0 5s php-apache-79544c9bd9-hsn25 1/1 Running 0 5s php-apache-79544c9bd9-kzscp 1/1 Running 0 5s php-apache-79544c9bd9-lmxv7 1/1 Running 0 2m7s php-apache-79544c9bd9-pwj5d 1/1 Running 0 20s php-apache-79544c9bd9-r5487 1/1 Running 0 20s php-apache-79544c9bd9-x59wz 1/1 Running 0 2m7s php-apache-79544c9bd9-x9ptv 1/1 Running 0 5s $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache 58%/50% 3 10 10 17m
This was how to scale an application on the Pod level. Next, we’ll demo scalability on the VM or Node level.
Scaling out the number of Pods is useful and efficient. But it is limited by the capacity available for the cluster. To solve this problem, AKS has a nice feature to scale out and scale in based on the number of VM or Node instances. This will add capacity to the cluster allowing for even more Pod scalability.
Like with Pods scalability, AKS can scale either manually or automatically. Let’s explore these options in the following sections.
We have created the cluster with only 2 nodes. But we can increase or decrease that value at any time. In this demo we’ll set the number of instances to 3 and we should see a third node attached to the cluster in a few minutes. This could be done using the Azure portal:
And here is how to do that using the command line:
$ az aks scale ` --resource-group $aksRg ` --name $aksName ` --node-count 3 $ kubectl get nodes NAME STATUS ROLES AGE VERSION aks-agentpool-51725664-vmss000000 Ready agent 32h v1.17.7 aks-agentpool-51725664-vmss000001 Ready agent 32h v1.17.7 aks-agentpool-51725664-vmss00000d Ready agent 8m42s v1.17.7
Note here we are using the Azure CLI instead of the kubectl. That is because the cluster scalability is implemented by the cloud provider and not by Kubernetes itself.
Manual scalability is fine for some cases. But in real life, we need to be proactive. That is why we can automate this task. The next section will show you how.
AKS have a built in API that is surveilling the Scheduler API for any Pods that could not be scheduled due to cluster capacity or due to maximum number of allowed Pods per Node. And it will talk to the Azure ARM to provision and attach a new VM/Node to the cluster. The same process will run in a loop until it reaches the maximum number of allowed instances.
When the load goes down and Pods instances will be decreased, the VMs will be removed progressively in few minutes.
To demo how this works, we’ll increase the load on the application Pods by increasing the replicas of the load generator Pods to 100. And we will increase the maxReplicas of the HPA to 1000. Let’s edit the values in the YAML configuration files then deploy the changes.
$ kubectl apply -f load-generator-deploy.yaml deployment.apps/load-generator configured $ kubectl apply -f hpa.yaml horizontalpodautoscaler.autoscaling/php-apache configured $ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% aks-agentpool-51725664-vmss000000 1769m 93% 1786Mi 39% aks-agentpool-51725664-vmss000001 1945m 102% 1780Mi 39% aks-agentpool-51725664-vmss00000d 2010m 105% 1400Mi 30% $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache 116%/50% 3 1000 50 86m $ kubectl get pods NAME READY STATUS RESTARTS AGE load-generator-6d74bb99d5-2gjtn 1/1 Running 0 3m8s <OTHER_PODS_REMOVED_FOR_BRIEVETY> php-apache-79544c9bd9-xdlg7 1/1 Running 0 101s php-apache-79544c9bd9-zj66j 0/1 Pending 0 101s <OTHER_PODS_REMOVED_FOR_BRIEVETY>
Many Pods should be created to handle all the load, but that was stopped by the cluster capacity. Note how the Nodes CPU utilization is nearly 100%. And we still have Pods in Pending state. So, let’s leverage the AKS auto scalability.
We can do that using the Azure Portal as following:
And we can also configure scalability using the command line. In the following example we are enabling cluster autoscaler for AKS and we are setting min and max nodes count.
$ az aks nodepool update ` --resource-group $aksRg ` --cluster-name $aksName ` --name agentpool ` --enable-cluster-autoscaler ` --min-count 3 ` --max-count 10 $ # After few (5) minutes $ kubectl get nodes NAME STATUS ROLES AGE VERSION aks-agentpool-51725664-vmss000000 Ready agent 34h v1.17.7 aks-agentpool-51725664-vmss000001 Ready agent 34h v1.17.7 aks-agentpool-51725664-vmss00000d Ready agent 125m v1.17.7 aks-agentpool-51725664-vmss00000e Ready agent 11m v1.17.7 aks-agentpool-51725664-vmss00000f Ready agent 10m v1.17.7 aks-agentpool-51725664-vmss00000g Ready agent 11m v1.17.7 aks-agentpool-51725664-vmss00000h Ready agent 10m v1.17.7 aks-agentpool-51725664-vmss00000i Ready agent 6m17s v1.17.7 aks-agentpool-51725664-vmss00000j Ready agent 6m32s v1.17.7 aks-agentpool-51725664-vmss00000k Ready agent 6m18s v1.17.7 $ kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE php-apache Deployment/php-apache 102%/50% 3 1000 144 3h1m
Note that now we are running 10 Nodes in the AKS cluster because we have lots of Pods to be scheduled and we have set the maximum number to 10.
The HPA shows that the average CPU utilization is above the target. It needs to create even more Pods and Nodes. So, we can set the --max-count to a higher number like 100. In some extreme scenarios this might not be enough. A simple solution to this would be scaling up the VMs in the Node Pool.
Cluster autoscaler is a great feature to manage scalability. The Nodes will be ready in typically 2 to 5 minutes before the Pods could be scheduled. In some scenarios, these few minutes are like eternity. We need a faster solution for scalability. Here is where the Virtual Node comes in to play. The Virtual Node can schedule these Pods in just a few seconds.
The Virtual Node uses the Azure Container Instance (ACI) which is the Azure offering for Serverless Containers. The promise of ACI is that it can run a high number of containers in just few seconds without worrying on the infrastructure behind. Virtual Node extends AKS capacity with ACI.
The integration could be achieved using the Azure portal as following:
We can also set it up using the command line as performed in this link: https://docs.microsoft.com/en-us/azure/aks/virtual-nodes-cli.
After the creation of the cluster with 3 VMs and the Virtual Node enabled, we can see that there is a 4th Node named virtual-node-aci-linux. As the name states, it is a virtual node. It is not a VM. It is a connection to ACI, attached to the cluster as a Node that have ‘unlimited’ capacity.
$ kubectl get nodes NAME STATUS ROLES AGE VERSION aks-agentpool-10295500-vmss000000 Ready agent 79m v1.17.7 aks-agentpool-10295500-vmss000001 Ready agent 80m v1.17.7 aks-agentpool-10295500-vmss000002 Ready agent 79m v1.17.7 virtual-node-aci-linux Ready agent 43m v1.14.3-vk-azure-aci-v1.2 $ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% aks-agentpool-10295500-vmss000000 134m 7% 982Mi 21% aks-agentpool-10295500-vmss000001 83m 4% 1115Mi 24% aks-agentpool-10295500-vmss000002 52m 2% 913Mi 20% virtual-node-aci-linux <unknown> <unknown> <unknown> <unknown>
We can schedule an application on the ACI by adding nodeSelector and tolerations. Here is an example file:
# virtual-node.yaml apiVersion: apps/v1 kind: Deployment metadata: name: aci-helloworld spec: replicas: 1 selector: matchLabels: app: aci-helloworld template: metadata: labels: app: aci-helloworld spec: containers: - name: aci-helloworld image: microsoft/aci-helloworld ports: - containerPort: 80 nodeSelector: kubernetes.io/role: agent beta.kubernetes.io/os: linux type: virtual-kubelet tolerations: - key: virtual-kubelet.io/provider operator: Exists
A full tutorial on how to work with Virtual Node is available in the following link:
AKS brings many options for application scalability that can work together in order to manage application scalability.
The sample scripts are not supported under any Microsoft standard support program or service. The sample scripts are provided AS IS without warranty of any kind. Microsoft further disclaims all implied warranties including, without limitation, any implied warranties of merchantability or of fitness for a particular purpose. The entire risk arising out of the use or performance of the sample scripts and documentation remains with you. In no event shall Microsoft, its authors, or anyone else involved in the creation, production, or delivery of the scripts be liable for any damages whatsoever (including, without limitation, damages for loss of business profits, business interruption, loss of business information, or other pecuniary loss) arising out of the use of or inability to use the sample scripts or documentation, even if Microsoft has been advised of the possibility of such damages.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.