Istio‑Based Weighted Traffic Management
Pre-Requisites:
- AKS containers (internal pods, no Envoy sidecar)
- Istio Installed and Configured at AKS Cluster (Ingress Gateway–based traffic management)
Section 1 — Overview & Why Istio Is Required
In modern distributed systems, traffic control is no longer limited to simple round-robin distribution. Enterprises increasingly require fine-grained traffic shaping, such as routing a defined percentage of requests to specific backends.
This article demonstrates how to implement weighted load balancing from AKS workloads to backend AKS internal services using Istio—addressing a key limitation in native Azure load balancing services.
Preceding setups achieve the same goal: split incoming HTTP traffic 80:20 between two pools, with cookie-based session affinity so a returning user always hits the same pool.
| Component | AKS Internal Pods Setup |
|---|---|
| Backend | Kubernetes Pods (Windows containers) |
| Service Discovery | Kubernetes Service (automatic) |
| Traffic Split | Istio VirtualService weights |
| Pool Selection | DestinationRule subsets by label |
| Ingress | Istio Gateway |
| Sidecar |
No (Windows pods can't run Envoy) |
✅ The architectures use the exact same Istio routing logic for every backend it routes traffic.
✅ The only change is service discovery, not traffic behavior
1.1. Scenario
Customer Requirement:
- Route traffic from Client → AKS → Azure backend workloads
- Implement weighted traffic distribution:
- 80% → Backend workload1
- 20% → Backend workload2
- Ensure a fully private architecture (no public endpoints)
- Enable real-time traffic control without relying on DNS propagation delays
This scenario is explained in Use Case 2.2 — Istio Routing to Azure VM‑Based Workloads
Additional scenarios where Istio weighted traffic management is useful:
- Canary or blue‑green deployments – Gradually shift a controlled percentage of production traffic to a new application version to validate behavior before full rollout.
- A/B testing and gradual feature rollout – Expose different user cohorts to alternate application variants to measure performance, reliability, or feature impact without affecting all users.
- Unequal backend capacity across versions – Route traffic proportionally based on backend capacity or performance characteristics, ensuring heavier loads land on more capable pools.
These patterns are further covered in Use Case 2.1 — Istio with AKS‑Only Workloads
1.2. ⚠️ Limitations of Native Azure Load Balancing
Azure Standard Load Balancer
- Layer 4 (TCP/UDP)
- Uses 5-tuple hash
- ❌ No weighted routing
- ❌ No HTTP awareness
- ❌ No session-level control
Result: Only equal or deterministic distribution is possible
Azure Traffic Manager (why not used?)
Azure Traffic Manager supports weighted routing but:
- Works at DNS level
- Suffers from DNS caching delays
- Not suitable for:
- Real-time traffic shaping
- Private/internal routing scenarios
1.3. Why Istio
Istio enables Layer‑7 traffic control using Envoy proxies, allowing:
- Request‑level routing
- Header and cookie inspection
- Weighted traffic distribution
- Deterministic session affinity
Istio shifts traffic control from infrastructure to intent.
Section 2 — Architecture & Traffic Flow (Istio Traffic‑Management Use Cases)
This section presents two common deployment use cases that apply the same Istio traffic‑management pattern—weighted routing with cookie‑based session affinity—but differ in where backend workloads run.
In both cases:
- Istio acts as a central Layer‑7 control plane
- All routing decisions are enforced at the Istio Ingress Gateway
- Backend location (VM or pod) does not affect traffic behavior
AKS is used as the traffic‑management control plane, while Istio ingress Envoy acts as the single Layer‑7 enforcement point. Backend workloads—whether pods or virtual machines—are abstracted behind the same routing model.
Use Case 2.1 — Istio with AKS‑Only Workloads
This model is used when all application workloads run inside AKS, typically as Windows or Linux containers.
Architecture Overview
Flow Diagram 1.1
Traffic Characteristics
- First request → weighted routing + cookie issued
- Subsequent requests → cookie match overrides weights
- Scaling → handled automatically via HPA
- Routing decisions → always at the ingress gateway
⚠️ Important — Windows workload constraint
Windows containers cannot run the Envoy sidecar used by Istio.
As a result, pod‑level mesh participation is not available for these workloads.In environments where mesh‑wide Peer Authentication is set to STRICT, traffic from the Istio ingress gateway to Windows pods will fail unless the Destination Rule explicitly disables TLS:
trafficPolicy: tls: mode: DISABLEThis setting is not optional for Windows workloads in a STRICT mTLS environment.
✅ Routing still works correctly because the ingress gateway Envoy remains the single enforcement point.
Use Case 2.2 — Istio Routing to Azure VM‑Based Workloads
Scenario Overview
This use case applies when AKS hosts the ingress and traffic‑management layer, while backend workloads continue to run on Azure Virtual Machines (for example, existing IIS servers).
This pattern is common in:
- Legacy application environments
- VM‑to‑container modernization journeys
- Mixed‑capacity backend pools
- Risk‑controlled traffic cutovers
AKS is used only as the control plane, not as the workload host.
✅ Backend location: Azure Virtual Machines
✅ Routing layer: Istio Ingress Gateway in AKS
✅ Traffic control: Centralized and L7‑aware
High‑Level Architecture
ℹ️ VM workloads do not run Envoy sidecars.
All routing decisions are enforced at the ingress gateway.
How Traffic Is Routed
- Incoming traffic enters via the Istio Gateway.
- Cookie‑based rules are evaluated first.
- New sessions are assigned using weighted routing.
- DestinationRule subsets resolve to VM endpoints.
- ServiceEntry exposes VMs to Istio as routable services.
✅ User‑level stickiness
✅ Real‑time weight changes
✅ No Azure Load Balancer weighting involved
Service Discovery for VMs
VM backends are registered using ServiceEntry:
- Represents external services inside Istio
- Associates endpoints with pool labels
- Enables subset‑based routing identical to Kubernetes pods
ℹ️ From Istio’s perspective, VM endpoints are first‑class upstreams.
Platform Considerations
⚠️ Azure Standard Load Balancer is not used for weighted routing
⚠️ DNS‑based routing (Traffic Manager) is intentionally avoided
✅ Istio performs all traffic decisions at Layer 7
✅ Backend infrastructure remains unchanged
Key Takeaway
✅ This use case allows organizations to:
- Retain existing VM‑based applications
- Introduce modern traffic‑management capabilities
- Safely control and shift production traffic
- Centralize routing logic without re‑architecting workloads
Istio becomes the unifying traffic‑management layer, independent of backend location.
Why Istio Instead of NGINX
NGINX is capable but:
| Capability | NGINX | Istio |
|---|---|---|
| Weighted routing | ✅ | ✅ |
| Kubernetes-native | ⚠️ | ✅ |
| Observability | Limited | Rich (Kiali, Prometheus) |
| Traffic policies | Manual | Declarative |
| Service mesh | ❌ | ✅ |
Result: Istio provides enterprise-grade traffic control + observability
Conclusion
While Azure provides scalable Layer 4 load balancing, it lacks application-aware weighted routing. Istio bridges this gap by enabling fine-grained, real-time traffic control, even for external VM-based backends.
This approach allows organizations to:
- Safely migrate workloads
- Optimize backend utilization
- Implement advanced deployment strategies
Section 3 — Implementation Details
This section walks through the production‑ready Istio configuration used to implement weighted traffic routing with cookie‑based session affinity.
Although both use cases share the same routing logic, the implementation differs based on where backend workloads run. For clarity, the configuration is presented separately.
3.1 Implementation — AKS‑Only Workloads
This configuration applies when all backend workloads run inside AKS, and Istio is used to control ingress traffic.
Namespace
A dedicated namespace is created to host the application workloads.
apiVersion: v1
kind: Namespace
metadata:
name: nginx-app
labels:
purpose: nginx-workloads
✅ Keeps application resources isolated
✅ Allows independent policy and scaling control
Deployments (Backend Pools)
Two Deployments represent independent backend pools. Each pool can scale independently and is identified using labels.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-app-pool1
namespace: nginx-app
labels:
app: nginx-app
pool: pool1
spec:
replicas: 2
selector:
matchLabels:
app: nginx-app
pool: pool1
A similar Deployment exists for pool2 with pool: pool2.
✅ Pool identity is defined via labels
✅ Routing is handled by Istio, not Kubernetes
Service
A single Kubernetes Service selects all pods across both pools.
apiVersion: v1
kind: Service
metadata:
name: nginx-app
namespace: nginx-app
spec:
selector:
app: nginx-app
ports:
- name: http
port: 80
targetPort: 80
ℹ️ Kubernetes does not perform traffic splitting.
All pool selection is deferred to Istio.
Destination Rule (Subset Definition)
Istio subsets map traffic to specific pod pools using labels.
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: nginx-app-dr
namespace: istio-ingress
spec:
host: nginx-app.nginx-app.svc.cluster.local
trafficPolicy:
connectionPool:
http:
h2UpgradePolicy: DO_NOT_UPGRADE
tls:
mode: DISABLE
subsets:
- name: pool1
labels:
pool: pool1
- name: pool2
labels:
pool: pool2
⚠️ Important — Windows workload constraint
Windows pods cannot run the Envoy sidecar.
In environments where mesh‑wide Peer Authentication is set to STRICT, traffic will fail unless TLS is explicitly disabled.
✅ Ingress routing continues to work correctly because the ingress gateway Envoy remains the single enforcement point.
Gateway & Virtual Service (Routing Logic)
The gateway exposes the application, while the VirtualService enforces cookie‑based stickiness and weighted routing.
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: nginx-app-gateway
namespace: istio-ingress
spec:
selector:
istio: ingress
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: nginx-app-ingress
namespace: istio-ingress
spec:
gateways:
- nginx-app-gateway
hosts:
- "*"
http:
- match:
- headers:
Cookie:
regex: '(^|; )nginx-app-subset=pool1($|;)'
route:
- destination:
host: nginx-app.nginx-app.svc.cluster.local
subset: pool1
- match:
- headers:
Cookie:
regex: '(^|; )nginx-app-subset=pool2($|;)'
route:
- destination:
host: nginx-app.nginx-app.svc.cluster.local
subset: pool2
- route:
- destination:
host: nginx-app.nginx-app.svc.cluster.local
subset: pool1
weight: 80
headers:
response:
add:
"Set-Cookie": nginx-app-subset=pool1; Max-Age=2592000
- destination:
host: nginx-app.nginx-app.svc.cluster.local
subset: pool2
weight: 20
headers:
response:
add:
"Set-Cookie": nginx-app-subset=pool2; Max-Age=2592000
---
✅ New users are routed using weighted logic
✅ Returning users are pinned using cookies
✅ Session stability is preserved during scaling events
Horizontal Pod Auto-scaler
Each pool scales independently based on traffic share.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-app-pool1-hpa
namespace: nginx-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-app-pool1
minReplicas: 2
maxReplicas: 8
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
A similar HPA exists for pool2 with a lower maximum.
3.2 Implementation — Azure VM‑Based Workloads
In this model, AKS hosts the traffic‑management layer, while backend workloads run on Azure Virtual Machines.
Service Entry (Register External VMs)
VMs are introduced into Istio’s service registry using ServiceEntry.
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: windows-vm-service
namespace: istio-system
spec:
hosts:
- windows-vm-service.internal
location: MESH_EXTERNAL
resolution: STATIC
ports:
- number: 80
name: http
protocol: HTTP
endpoints:
- address: 10.0.1.4
labels:
version: v1
- address: 10.0.1.5
labels:
version: v2
✅ VMs are treated as first‑class upstreams
✅ No changes required on the VM workloads
Destination Rule (VM Subsets)
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: windows-vm-destination
namespace: istio-system
spec:
host: windows-vm-service.internal
subsets:
- name: vm1
labels:
version: v1
- name: vm2
labels:
version: v2
Virtual Service & Gateway
The same routing pattern used for AKS workloads applies here.
routes:
- destination:
host: windows-vm-service.internal
subset: vm1
weight: 60
- destination:
host: windows-vm-service.internal
subset: vm2
weight: 40
✅ Istio performs all traffic decisions at Layer 7
✅ Azure Load Balancer is not used for weighting
✅ DNS‑based routing is avoided
✅ Key Implementation Takeaway
Although the mechanics of service discovery differ, the routing model remains identical:
- VirtualService defines intent
- DestinationRule maps pools
- Ingress Envoy enforces behavior
AKS acts as the traffic‑management control plane, regardless of where backend workloads run.
🚀 Outcome:
ISTIO WEIGHTED LOAD BALANCE FOR EACH NEW REQUEST
🧪 Script Used — Weighted Routing Validation
The following script generates 100 independent HTTP requests with cookies and keep‑alive explicitly disabled to simulate new-user traffic on each request.
#!/bin/bash
URL="http://98.70.237.77/"
REQUESTS=100
pool1=0
pool2=0
unknown=0
echo "Running $REQUESTS independent HTTP requests..."
echo "---------------------------------------------"
for i in $(seq 1 $REQUESTS); do
response=$(curl -s \
--no-keepalive \
-H "Cache-Control: no-cache" \
-H "Pragma: no-cache" \
-H "Cookie:" \
"$URL")
if echo "$response" | grep -iq "pool1"; then
((pool1++))
elif echo "$response" | grep -iq "pool2"; then
((pool2++))
else
((unknown++))
fi
done
echo ""
echo "Results:"
echo "--------"
echo "pool1 : $pool1"
echo "pool2 : $pool2"
echo "unknown : $unknown"
echo ""
echo "Distribution:"
echo "pool1 = $(awk "BEGIN {printf \"%.2f\", ($pool1/$REQUESTS)*100}")%"
echo "pool2 = $(awk "BEGIN {printf \"%.2f\", ($pool2/$REQUESTS)*100}")%"
By parsing the HTML response to identify pool1 or pool2, it validates that Istio’s ingress‑level weighted routing is applied correctly before session stickiness takes effect.
✅ Result
The observed 83:17 distribution confirms that Istio correctly enforces the configured 80:20 traffic weights for new sessions, validating the weighted routing behavior used for canary and A/B testing.
🧪 Script Used — Sends 100 stateless requests to validate ingress‑level weighted routing by inspecting backend identifiers in the HTML response.
✅ Result — The observed 83:17 split aligns with the configured 80:20 weights, confirming correct traffic distribution for new users.