Blog Post

Azure Infrastructure Blog
5 MIN READ

Autoscaling AKS Microservices with Knative: HTTP Workload Optimization in Action

dhaneshuk's avatar
dhaneshuk
Icon for Microsoft rankMicrosoft
May 20, 2025

🌟In the world of cloud-native applications, efficiently scaling microservices is essential for maintaining performance and optimizing costs. Autoscaling—the ability to dynamically adjust the number of running instances based on demand—plays a vital role in achieving this. In this blog post, we’ll explore how Knative and Azure Kubernetes Service (AKS) can be used together to autoscale microservices based on HTTP workloads.

🚀 Why Autoscaling Matters

Autoscaling ensures that your application can handle varying levels of traffic without manual intervention. It helps with:

  • Cost Optimization: Scale down during periods of low demand to save on infrastructure costs.
  • Performance: Scale up during peak traffic to maintain responsiveness.
  • Reliability: Automatically recover from failures by provisioning new instances as needed.

🧱 Understanding the Building Blocks

🔹 AKS (Azure Kubernetes Service)

AKS is a managed Kubernetes service that simplifies the deployment, management, and scaling of containerized applications. It supports features like node pools, the cluster autoscaler, and integration with Azure Monitor.

🔹 Knative Serving

Knative Serving provides autoscaling capabilities for HTTP-based workloads. It supports:

  • Scale-to-zero when there’s no traffic
  • Rapid scale-up in response to incoming HTTP requests

Knative supports two types of autoscalers:

  • KPA (Knative Pod Autoscaler) – The default, based on concurrency or requests per second (RPS)
  • HPA (Horizontal Pod Autoscaler) – Optional, based on CPU/memory metrics

🔹 KEDA (Optional)

KEDA (Kubernetes-based Event Driven Autoscaling) is useful for event-driven workloads (e.g., Azure Service Bus, Kafka). It can complement Knative for hybrid autoscaling scenarios.

 

⚖️ Pod vs. Node Autoscaling

🧩 Pod Autoscaling (Knative)

Knative handles pod-level autoscaling based on HTTP traffic. It dynamically adjusts the number of pods depending on:

  • Request concurrency (e.g., 100 requests per pod)
  • Requests per second (RPS)
  • CPU usage (if HPA mode is enabled)

Knative can even scale to zero when there’s no traffic, making it ideal for bursty or event-driven workloads.

🧱 Node Autoscaling (AKS)

AKS uses the Cluster Autoscaler for node-level autoscaling. It ensures that the underlying infrastructure can support the number of pods requested by Knative.

  • If there's insufficient capacity for new pods, AKS adds nodes.
  • If nodes are underutilized, AKS removes them to reduce cost.

Together, these layers ensure both application responsiveness and infrastructure efficiency.

🛠️ Real Example: Deployment and Testing

Step 1: Create an AKS Cluster

az group create -l centralus -n MyResourceGroup az aks create --resource-group myResourceGroup --name myAKSCluster1 --enable-managed-identity --enable-aad --location centralus --node-count 2 --enable-addons monitoring --generate-ssh-keys

 

Add a new node pool and enable the cluster autoscaler:

az aks nodepool add --resource-group myResourceGroup --cluster-name myAKSCluster --name newpool --node-vm-size Standard_D4s_v3 --enable-cluster-autoscaler --min-count 1 --max-count 5

 

🔧 Step 2: Connect to AKS

az aks get-credentials --resource-group <your-resource-group> --name <your-aks-cluster> --overwrite-existing

✅ Verify:

kubectl get nodes

 

📦 Step 3: Install Knative Serving Core

kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.18.0/serving-crds.yaml kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.18.0/serving-core.yaml

✅ Verify:

kubectl get pods -n knative-serving

Ensure all pods are in Running or Completed state.

🌐 Step 4: Install Istio (Ingress for Knative)

Install Istio components

kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.18.0/istio.yaml

Install Knative-Istio integration

kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.18.0/net-istio.yaml

 

✅ Verify Istio installation:

kubectl get pods -n istio-system

Expected pods:

  • istiod
  • istio-ingressgateway

✅ Check Istio IngressGateway Service (with public IP):

kubectl get svc istio-ingressgateway -n istio-system

Ensure the EXTERNAL-IP field has a public IP (not <pending>) like in the screenshot below.

✅ Verify Knative uses Istio 

 

# Linux / Git Bash kubectl get configmap config-network -n knative-serving -o yaml | grep ingress-class # Windows kubectl get configmap config-network -n knative-serving -o yaml | findstr ingress-class

 

Expected Output:

🚀 Step 6: Deploy a Sample Microservice with Autoscaling

Create autoscale-service.yaml:

apiVersion: serving.knative.dev/v1 kind: Service metadata: name: autoscale-demo spec: template: metadata: annotations: autoscaling.knative.dev/target: "50" autoscaling.knative.dev/minScale: "1" autoscaling.knative.dev/maxScale: "10" spec: containers: - image: gcr.io/knative-samples/helloworld-go env: - name: TARGET value: "Knative Autoscaler"

 

Apply it:

kubectl apply -f autoscale-service.yaml

✅ Verify:

kubectl get ksvc autoscale-demo

 

Ensure READY is True and URL is populated.

Sample output:-

📌 Optional: Configure a Custom Domain (For Simpler Access)

If you want to access Knative services like http://autoscale-demo.myapps.com directly:

  1. Point a domain/subdomain to the Istio EXTERNAL-IP.
  2. Update the config-domain ConfigMap:
kubectl edit configmap config-domain -n knative-serving

Add:

myapps.com: ""

 

Deploy the service again. It will now be available under:

http://autoscale-demo.default.myapps.com

 

🌍 Step 7: Load Test the Service

Load testing is done by sending 500 concurrent requests for 60 seconds to the autoscale-demo web page. You should observe Knative scaling up pods in response to the load.

Using Internal Cluster URL

kubectl run -i --tty --rm loadgen --image=williamyeh/hey --restart=Never -- -z 60s -c 500 http://autoscale-demo.default.svc.cluster.local

This command will run a pod in the cluster and run a load test.

 

Test hey tool using External endpoint

Find Istio’s external IP:

kubectl get svc istio-ingressgateway -n istio-system

Run hey load test using host header and external IP:

Install https://github.com/rakyll/hey to generate HTTP traffic:

hey -z 60s -c 500 -host autoscale-demo.default.example.com http://72.152.40.239

Replace <EXTERNAL-IP> with the IP from above Replace autoscale-demo.default.example.com with the host from the Knative URL if you have configured a custom domain.

💡 autoscale-demo.default.example.com is the default domain assigned by Knative if you're using the built-in example.com magic DNS (no custom domain).

 

🧰 Optional (if you want to avoid Host headers): Use Magic DNS (like nip.io)

Since your EXTERNAL-IP in this example is 72.152.40.239, you can use nip.io like this:

Edit the Knative config map:

kubectl edit configmap config-domain -n knative-serving

And replace:

data: nip.io: ""

Then your Knative URL will become:

http://autoscale-demo.default.72.152.40.239.nip.io

So you can directly run:

curl http://autoscale-demo.default.72.152.40.239.nip.io

# Or

hey -z 60s -c 500 http://autoscale-demo.default.72.152.40.239.nip.io

💡No Host header needed.

📈 Step 8: Monitor Autoscaling

kubectl get pods -l serving.knative.dev/service=autoscale-demo -w

💡Observe how the number of pods increases during load and scales down when idle.

Pods before starting Load test:

Running load test:

Pods while running load test:

Pods few minutes after running load test:

 

🧼 Step 9: (Optional) Clean Up

kubectl delete ksvc autoscale-demo

🧾 Conclusion

Combining Knative with AKS provides a robust, cloud-native platform for running microservices that scale dynamically based on real-time HTTP traffic. Whether you're building a FinOps-aligned landing zone or simply optimizing for elasticity, this setup offers:

  • Elasticity: Scale from zero to high traffic effortlessly.
  • Efficiency: Pay only for what you consume.
  • Simplicity: Use YAML and native Kubernetes constructs to manage autoscaling.
Updated May 20, 2025
Version 1.0
No CommentsBe the first to comment