🌟In the world of cloud-native applications, efficiently scaling microservices is essential for maintaining performance and optimizing costs. Autoscaling—the ability to dynamically adjust the number of running instances based on demand—plays a vital role in achieving this. In this blog post, we’ll explore how Knative and Azure Kubernetes Service (AKS) can be used together to autoscale microservices based on HTTP workloads.
🚀 Why Autoscaling Matters
Autoscaling ensures that your application can handle varying levels of traffic without manual intervention. It helps with:
- Cost Optimization: Scale down during periods of low demand to save on infrastructure costs.
- Performance: Scale up during peak traffic to maintain responsiveness.
- Reliability: Automatically recover from failures by provisioning new instances as needed.
🧱 Understanding the Building Blocks
🔹 AKS (Azure Kubernetes Service)
AKS is a managed Kubernetes service that simplifies the deployment, management, and scaling of containerized applications. It supports features like node pools, the cluster autoscaler, and integration with Azure Monitor.
🔹 Knative Serving
Knative Serving provides autoscaling capabilities for HTTP-based workloads. It supports:
- Scale-to-zero when there’s no traffic
- Rapid scale-up in response to incoming HTTP requests
Knative supports two types of autoscalers:
- KPA (Knative Pod Autoscaler) – The default, based on concurrency or requests per second (RPS)
- HPA (Horizontal Pod Autoscaler) – Optional, based on CPU/memory metrics
🔹 KEDA (Optional)
KEDA (Kubernetes-based Event Driven Autoscaling) is useful for event-driven workloads (e.g., Azure Service Bus, Kafka). It can complement Knative for hybrid autoscaling scenarios.
⚖️ Pod vs. Node Autoscaling
🧩 Pod Autoscaling (Knative)
Knative handles pod-level autoscaling based on HTTP traffic. It dynamically adjusts the number of pods depending on:
- Request concurrency (e.g., 100 requests per pod)
- Requests per second (RPS)
- CPU usage (if HPA mode is enabled)
Knative can even scale to zero when there’s no traffic, making it ideal for bursty or event-driven workloads.
🧱 Node Autoscaling (AKS)
AKS uses the Cluster Autoscaler for node-level autoscaling. It ensures that the underlying infrastructure can support the number of pods requested by Knative.
- If there's insufficient capacity for new pods, AKS adds nodes.
- If nodes are underutilized, AKS removes them to reduce cost.
Together, these layers ensure both application responsiveness and infrastructure efficiency.
🛠️ Real Example: Deployment and Testing
Step 1: Create an AKS Cluster
az group create -l centralus -n MyResourceGroup az aks create --resource-group myResourceGroup --name myAKSCluster1 --enable-managed-identity --enable-aad --location centralus --node-count 2 --enable-addons monitoring --generate-ssh-keys
Add a new node pool and enable the cluster autoscaler:
az aks nodepool add --resource-group myResourceGroup --cluster-name myAKSCluster --name newpool --node-vm-size Standard_D4s_v3 --enable-cluster-autoscaler --min-count 1 --max-count 5
🔧 Step 2: Connect to AKS
az aks get-credentials --resource-group <your-resource-group> --name <your-aks-cluster> --overwrite-existing
✅ Verify:
kubectl get nodes
📦 Step 3: Install Knative Serving Core
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.18.0/serving-crds.yaml kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.18.0/serving-core.yaml
✅ Verify:
kubectl get pods -n knative-serving
Ensure all pods are in Running or Completed state.
🌐 Step 4: Install Istio (Ingress for Knative)
Install Istio components
kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.18.0/istio.yaml
Install Knative-Istio integration
kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.18.0/net-istio.yaml
✅ Verify Istio installation:
kubectl get pods -n istio-system
Expected pods:
- istiod
- istio-ingressgateway
✅ Check Istio IngressGateway Service (with public IP):
kubectl get svc istio-ingressgateway -n istio-system
Ensure the EXTERNAL-IP field has a public IP (not <pending>) like in the screenshot below.
✅ Verify Knative uses Istio
# Linux / Git Bash kubectl get configmap config-network -n knative-serving -o yaml | grep ingress-class # Windows kubectl get configmap config-network -n knative-serving -o yaml | findstr ingress-class
Expected Output:
🚀 Step 6: Deploy a Sample Microservice with Autoscaling
Create autoscale-service.yaml:
apiVersion: serving.knative.dev/v1 kind: Service metadata: name: autoscale-demo spec: template: metadata: annotations: autoscaling.knative.dev/target: "50" autoscaling.knative.dev/minScale: "1" autoscaling.knative.dev/maxScale: "10" spec: containers: - image: gcr.io/knative-samples/helloworld-go env: - name: TARGET value: "Knative Autoscaler"
Apply it:
kubectl apply -f autoscale-service.yaml
✅ Verify:
kubectl get ksvc autoscale-demo
Ensure READY is True and URL is populated.
Sample output:-
📌 Optional: Configure a Custom Domain (For Simpler Access)
If you want to access Knative services like http://autoscale-demo.myapps.com directly:
- Point a domain/subdomain to the Istio EXTERNAL-IP.
- Update the config-domain ConfigMap:
kubectl edit configmap config-domain -n knative-serving
Add:
myapps.com: ""
Deploy the service again. It will now be available under:
http://autoscale-demo.default.myapps.com
🌍 Step 7: Load Test the Service
Load testing is done by sending 500 concurrent requests for 60 seconds to the autoscale-demo web page. You should observe Knative scaling up pods in response to the load.
Using Internal Cluster URL
kubectl run -i --tty --rm loadgen --image=williamyeh/hey --restart=Never -- -z 60s -c 500 http://autoscale-demo.default.svc.cluster.local
This command will run a pod in the cluster and run a load test.
Test hey tool using External endpoint
Find Istio’s external IP:
kubectl get svc istio-ingressgateway -n istio-system
Run hey load test using host header and external IP:
Install https://github.com/rakyll/hey to generate HTTP traffic:
hey -z 60s -c 500 -host autoscale-demo.default.example.com http://72.152.40.239
Replace <EXTERNAL-IP> with the IP from above Replace autoscale-demo.default.example.com with the host from the Knative URL if you have configured a custom domain.
💡 autoscale-demo.default.example.com is the default domain assigned by Knative if you're using the built-in example.com magic DNS (no custom domain).
🧰 Optional (if you want to avoid Host headers): Use Magic DNS (like nip.io)
Since your EXTERNAL-IP in this example is 72.152.40.239, you can use nip.io like this:
Edit the Knative config map:
kubectl edit configmap config-domain -n knative-serving
And replace:
data: nip.io: ""
Then your Knative URL will become:
http://autoscale-demo.default.72.152.40.239.nip.io
So you can directly run:
curl http://autoscale-demo.default.72.152.40.239.nip.io
# Or
hey -z 60s -c 500 http://autoscale-demo.default.72.152.40.239.nip.io
💡No Host header needed.
📈 Step 8: Monitor Autoscaling
kubectl get pods -l serving.knative.dev/service=autoscale-demo -w
💡Observe how the number of pods increases during load and scales down when idle.
Pods before starting Load test:
Running load test:
Pods while running load test:
Pods few minutes after running load test:
🧼 Step 9: (Optional) Clean Up
kubectl delete ksvc autoscale-demo
🧾 Conclusion
Combining Knative with AKS provides a robust, cloud-native platform for running microservices that scale dynamically based on real-time HTTP traffic. Whether you're building a FinOps-aligned landing zone or simply optimizing for elasticity, this setup offers:
- Elasticity: Scale from zero to high traffic effortlessly.
- Efficiency: Pay only for what you consume.
- Simplicity: Use YAML and native Kubernetes constructs to manage autoscaling.