Azure Infrastructure Blog

11 MIN READ

Building Enterprise-Grade Shared AKS Clusters: A Guide to Multi-Tenant Kubernetes Architecture

dhaneshuk

Microsoft

Nov 10, 2025

Building Shared AKS Clusters: A Hands-On Guide with Labs and Best Practices

1. Overview of Shared AKS Architecture

1.1 Goals

Accelerate application delivery by providing a hardened shared Kubernetes platform per environment (prod / test / dev).
Enable safe multi-tenancy using namespace, RBAC, NetworkPolicy, quotas, and pod security.
Enforce consistency (standards, guardrails) while allowing team autonomy for app lifecycle.
Optimize cost through shared cluster capacity, rightsizing, and autoscaling.

1.2 Pattern Summary

One AKS cluster per environment (e.g., aks-shared-prod, aks-shared-test, aks-shared-dev).
Multiple business units / product teams share the environment cluster via isolated namespaces.
Platform services (ingress, cert management, monitoring, cost, backup) run in a dedicated platform-<env> namespace.
Each team gets namespaces per environment: payments-prod, orders-prod, etc.

1.3 High-Level Architecture Diagram

1.4 Multi-Tenancy Mechanisms

Mechanism	Purpose	Enforcement Layer
Namespaces	Logical isolation per team/app	Kubernetes API
RBAC	Access control (who can do what)	Azure AD + K8s RBAC
NetworkPolicy	East-west traffic control	CNI (Azure CNI)
ResourceQuota & LimitRange	Prevent noisy neighbors	K8s admission
Pod Security Standards	Baseline/Restricted enforcement	PS Admission
Images from ACR only	Trusted supply chain	Admission / Policy

1.5 Why Not One Cluster For All Environments?

Aspect	Single Cluster (All Envs)	Per-Environment Clusters
Blast Radius	High	Contained per env
Change Windows	Complex coordination	Independent
Compliance	Hard to segregate	Easier mapping
Observability Noise	Mixed signals	Clean per env
Scaling Decisions	Conflicting	Environment-specific

Per-environment clusters simplify lifecycle, versioning, and SLA management at the slight cost of control-plane duplication.

1.6 Network Isolation & Azure CNI

Azure CNI assigns IPs from a VNet subnet directly to pods (no overlay) enabling IP-level visibility.
Use separate subnets per node pool (system vs workload vs batch) for clearer network policy scoping.
Leverage network policies (Calico or Azure native) to restrict cross-namespace traffic.
Private cluster option ensures API server accessible only via private endpoint / VNet.

1.7 Tenancy Diagram (Namespaces → Apps per BU)

2. Key Components

2.1 Autoscaling Architecture

Cluster Autoscaler: adjusts node count (workload & batch pools) based on pending pods.
Horizontal Pod Autoscaler (HPA): scales replicas based on CPU, memory, or custom metrics.
Vertical Pod Autoscaler (VPA): recommends or applies resource request updates.
KEDA: event-driven autoscaling (queue length, Azure Service Bus, Kafka, etc.).

2.2 Optional Service Mesh (Istio or Ambient Mesh)

Use only when:

Need mTLS between services.
Require fine-grained traffic shift (canary, A/B, fault injection).
Require zero-trust identity propagation. Otherwise keep complexity low and rely on ingress + network policies.

2.3 Ingress & API Exposure

NGINX or Azure Application Gateway Ingress Controller.
TLS via cert-manager + Azure Key Vault (CSI driver for secrets if needed).
Central routing and WAF (if AGIC used).

2.4 Secrets & Configuration

Prefer Azure Key Vault: reference secrets in pods via CSI driver or sync controller.
Use Kubernetes sealed-secrets only for GitOps edge cases.
External Secrets Operator can streamline mapping.

2.5 Storage

Azure Disk: DB/data workloads needing single-node high IOPS.
Azure Files: shared RW across replicas.
Azure NetApp Files: high throughput/low latency enterprise workloads.
Blob Storage: backup target + object data.

2.6 Backup & Disaster Recovery

Velero backs up cluster metadata + PV snapshots (when using supported providers).
Off-cluster backups stored in Blob with lifecycle management.
DR strategy: recreate cluster via IaC + restore Velero backups + bootstrap GitOps.

2.7 Observability Stack

Prometheus (metrics) + exporters (node, kube-state, custom).
Grafana dashboards per team and shared platform board.
Azure Monitor Container Insights for baseline + log retention.
Tracing: OpenTelemetry Collector + Jaeger or Azure Monitor tracing backend.

3. Placeholder: CI/CD Strategy (to be expanded)

3. CI/CD Strategy

3.1 Principles

Everything declarative (Helm charts / Kustomize) stored in Git.
One pipeline per application per environment stage (build once, promote with image digest).
GitOps for platform and cross-cutting components (ingress, monitoring, backup) via Flux or Argo CD.
Separation of duties: App teams manage their namespace manifests; platform team manages cluster addons.

3.2 Recommended Flow

Developer merges to main → CI builds & scans image → pushes to ACR with immutable tag & digest.
CI updates Helm values-prod.yaml (or image tag file) in Git (infra repo) via PR.
GitOps controller detects change → deploys to namespace.
Post-deploy tests & smoke checks run.
Promotion to higher environment uses same image digest (no rebuild).

3.3 Pipeline Diagram

3.4 Tools

Concern	Tool	Notes
Build	Azure DevOps / GitHub Actions	Container build, unit tests
Scan (Image)	Trivy / Microsoft Defender	Fail pipeline on critical vulnerabilities
Sign	Cosign	Image provenance signature
Deploy	Helm + Flux	Reconciled from Git
Secrets	Key Vault CSI / External Secrets	No secrets in Git

3.5 Sample GitHub Actions Snippet (Build & Push)

jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Login ACR uses: azure/docker-login@v1 with: login-server: ${{ env.ACR_NAME }}.azurecr.io username: ${{ secrets.ACR_USERNAME }} password: ${{ secrets.ACR_PASSWORD }} - name: Build run: docker build -t ${{ env.ACR_NAME }}.azurecr.io/payments:${{ github.sha }} . - name: Scan uses: aquasecurity/trivy-action@v0.13.0 with: image-ref: ${{ env.ACR_NAME }}.azurecr.io/payments:${{ github.sha }} severity: HIGH,CRITICAL - name: Push run: docker push ${{ env.ACR_NAME }}.azurecr.io/payments:${{ github.sha }}

3.6 Helm Deployment Command (Manual)

helm upgrade --install payments charts/payments \ --namespace payments-prod \ --set image.repository=myacr.azurecr.io/payments \ --set image.tag=sha256:<digest>

3.7 GitOps Advantages

Drift detection & self-healing.
Auditability (all changes in PR history).
Immutable artifacts (image digest pinned).

3.8 Namespace Alignment

Namespace	Repo Path	Release Strategy
payments-prod	apps/payments/overlays/prod	Auto after PR merge
orders-prod	apps/orders/overlays/prod	Manual approval
inventory-prod	apps/inventory/overlays/prod	Auto
platform-prod	platform/addons	Platform team only

4. Backup Strategy (Deep Dive)

4.1 Objectives

Recover from accidental deletion, corruption, cluster loss.
Meet RPO/RTO defined per application tier (e.g., Tier-1: RPO 15m, RTO 1h).

4.2 Velero Architecture

4.3 Backup Scope

Item	Method	Notes
Namespace manifests	Velero backup	Included automatically
Persistent Volumes (Azure Disk)	CSI snapshots	Fast point-in-time
Azure Files	File-level backup (optional)	Consider rsync / custom
ACR images	Not needed (immutable stored)	Use retention policies
Secrets	Included; consider encryption	Key Vault references not stored

4.4 Backup Command Examples

# Create daily schedule velero schedule create daily-prod --schedule "0 2 * * *" --include-namespaces payments-prod,orders-prod,inventory-prod # On-demand backup velero backup create payments-manual-$(date +%Y%m%d) --include-namespaces payments-prod # Restore velero restore create --from-backup payments-manual-20250101

4.5 Blob Storage Configuration

Use versioning & soft delete for container.
Configure lifecycle: transition >90 day old backups to Cool / Archive.
Private endpoint for storage account if cluster private.

4.6 DR Runbook (Summary)

Recreate cluster via Bicep/Terraform.
Install platform addons (GitOps bootstrap).
Install Velero + connect to backup bucket.
Restore critical namespaces (payments → orders → inventory).
Run validation scripts & synthetic tests.

4.7 Testing Backups

Monthly restore into ephemeral test cluster.
Validate app startup & data integrity checksums.

4.8 KPIs

KPI	Target
Backup success rate	> 99%
Restore drill frequency	Monthly
DR RTO (Tier-1)	<= 60m
DR RPO (Tier-1)	<= 15m

5. Operational Insights

5.1 Resource Optimization

Use VPA recommendations to refine requests bi-weekly.
KEDA for spiky workloads (workers / consumers) to avoid over-provisioning.
Batch node pool with Spot instances for cost-efficient asynchronous jobs.

5.2 Quotas & LimitRanges

Namespace	CPU Quota	Memory Quota	Notes
payments-prod	30	60Gi	High-traffic workload
orders-prod	40	80Gi	Larger processing window
inventory-prod	20	40Gi	Moderate update frequency

5.3 Noisy Neighbor Mitigation

Enforce per-deployment resource limits.
Use priority classes (system > platform > business apps > batch).
Alert on sustained throttling or eviction events.

5.4 Operational Dashboard Metrics

Category	Metric	Source
Capacity	Node utilization %	Prometheus node exporter
Efficiency	Requested vs actual usage	kube-state-metrics
Reliability	Pod restarts, crash loops	kube events
Performance	API latency P95	Ingress metrics
Scaling	HPA decision latency	Prometheus adapter

5.5 SLO Examples

Service	SLO	Measurement
Payments API	99.9% availability	Successful request % over 5m windows
Orders API	P95 < 400ms	Ingress / app metrics
Inventory Sync	Completion < 10m	Job duration metrics

5.6 Incident Playbook (Abbreviated)

Detect (alert fires) → classify severity.
Gather: kubectl describe, logs, metrics timeline.
Mitigate: rollback image / scale resources / isolate via NetworkPolicy.
Communicate: status page / stakeholder channel.
Postmortem within 48h → action items tracked.

6. Cost & Billing Strategy

6.1 Tagging & Labeling

Azure resources: Environment=Prod, BusinessUnit=Payments, CostCenter=1234.
Kubernetes: Namespace labels bu=payments, env=prod used by cost tools (Kubecost / Azure Advisor).

6.2 Cost Visibility

No diagram type detected matching given configuration for text:

6.3 Optimization Levers

Lever	Description	Example
Rightsizing	Adjust requests via VPA	Reduce CPU from 500m to 200m
Spot	Use for batch/non-critical	BatchPool spot nodes
Autoscaling	Scale down at night	WorkloadPool min nodes reduced
Image Slimming	Smaller images → faster deploy	Multi-stage Docker builds
Storage Tiering	Archive old backups	Blob lifecycle rules

6.4 Chargeback/Showback

Monthly export per namespace (CPU-hours, memory GB-hours, storage GB, network egress).
Map to internal rate card (e.g., $ per vCPU-hour).
Provide dashboard + monthly PDF summary.

6.5 KPIs

KPI	Target
Unallocated capacity	< 20%
Spot utilization	> 30% of batch workloads
Orphan resources cleanup time	< 7 days

7. Security & Compliance

7.1 Layered Model

Layer	Control	Tool
Identity	Azure AD RBAC → K8s RBAC	Azure AD groups
Workload Policies	PSP Replacement / PSS	Built-in Pod Security Admission
Network	Namespace isolation	NetworkPolicy (Calico/Azure)
Supply Chain	Image provenance/signature	Cosign + ACR Content Trust
Secrets	External vault storage	Azure Key Vault
Runtime	Behavioral detection	Defender for Containers

7.2 RBAC Pattern

ClusterRoles for common verbs (view, deploy, ops).
Bind Role to Azure AD groups via AAD integration (Group mapping).
Example groups: aks-platform-admins, aks-payments-devs, aks-readonly.

7.3 NetworkPolicy Example

apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-ingress-to-payments namespace: payments-prod spec: podSelector: matchLabels: app: payments-api ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: platform-prod ports: - protocol: TCP port: 8080 policyTypes: [Ingress]

7.4 Pod Security Standards

Set namespace labels: pod-security.kubernetes.io/enforce=restricted for prod.
Use baseline for dev/test to allow debugging tools.

7.5 Image Policy

Admission controller verifies images originate from approved ACR & are signed.

7.6 Secret Management Pattern

apiVersion: secrets-store.csi.x-k8s.io/v1 kind: SecretProviderClass metadata: name: payments-kv namespace: payments-prod spec: provider: azure parameters: usePodIdentity: "false" useVMManagedIdentity: "true" userAssignedIdentityID: <client-id> keyvaultName: my-shared-kv objects: | array: - | secret;db-password tenantId: <tenant-id>

7.7 Compliance Mapping

Requirement	Control	Evidence
Least Privilege	RBAC roles	GitOps repo + audit logs
Data Protection	Encrypted disks	Azure policy compliance report
Audit	Central log retention	Azure Monitor workspace
Vulnerability Mgmt	Image scanning	Pipeline reports

8. Monitoring & Observability

8.1 Pillars

Pillar	Tool	Output
Metrics	Prometheus	Time-series dashboards
Logs	Azure Monitor / Loki (optional)	Query & retention
Traces	OpenTelemetry Collector	Distributed latency maps
Events	Kubernetes API / Alertmanager	Incident triggers

8.2 Observability Diagram

8.3 Example PrometheusRule

apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: payments-alerts namespace: platform-prod spec: groups: - name: payments-availability rules: - alert: PaymentsHighErrorRate expr: rate(http_requests_total{namespace="payments-prod",status=~"5.."}[5m]) > 5 for: 2m labels: severity: page annotations: summary: High 5xx error rate in payments API description: More than 5 errors/min for 2 minutes.

8.4 Alerting Strategy

Page only on user-impact (availability, latency SLO breaches).
Ticket for capacity trend warnings.
Daily digest for low-priority issues.

8.5 Dashboards

Platform Overview (cluster health, node capacity).
Namespace Cost & Efficiency.
Application Performance (latency, errors, throughput).

8.6 Log Retention

30 days hot, 180 days archive (Blob / ADLS).
PII scrubbing before long-term archival.

8.7 Tracing Adoption Steps

Inject OTel SDK into services.
Export spans to collector via OTLP.
Add trace ID to logs for correlation.
Establish latency budgets per critical path.

8.8 KPIs

KPI	Target
Alert false positives	< 10%
Missing metrics coverage	< 5% of services
Trace sampled rate (critical paths)	> 90%

9. Placeholder: Lab Section

9. Hands-On Lab: Build & Operate Shared AKS

Estimated Time: ~120 minutes. Run commands from an Azure Cloud Shell or local workstation logged into Azure (az login). Replace variables as needed.

9.1 Prerequisites

export LOCATION=eastus

export RG=rg-shared-aks-prod

export AKS=aks-shared-prod

export ACR=acrsharedprod$RANDOM

export KV=kv-shared-prod-$RANDOM

export BLOBSA=stsharedprod$RANDOM

9.2 Create Resource Group & Shared Services

az group create -n $RG -l $LOCATION az acr create -n $ACR -g $RG --sku Premium --location $LOCATION az keyvault create -n $KV -g $RG -l $LOCATION --enabled-for-deployment true az storage account create -n $BLOBSA -g $RG -l $LOCATION --sku Standard_LRS --kind StorageV2 export BLOBKEY=$(az storage account keys list -g $RG -n $BLOBSA --query [0].value -o tsv) az acr login -n $ACR

9.3 Provision AKS Cluster (Per-Environment)

az aks create -g $RG -n $AKS \ --enable-managed-identity \ --node-count 3 \ --node-vm-size Standard_D4s_v5 \ --network-plugin azure \ --enable-addons monitoring \ --enable-oidc-issuer \ --enable-workload-identity \ --generate-ssh-keys az aks nodepool add -g $RG --cluster-name $AKS -n workloadpool \ --node-count 3 --enable-cluster-autoscaler --min-count 3 --max-count 10 \ --node-vm-size Standard_D8s_v5 az aks nodepool add -g $RG --cluster-name $AKS -n batchpool \ --enable-cluster-autoscaler --min-count 0 --max-count 5 \ --node-vm-size Standard_D4s_v5 --priority Spot az aks get-credentials -g $RG -n $AKS

9.4 Create Namespaces & Quotas

for ns in platform-prod payments-prod orders-prod inventory-prod; do kubectl create namespace $ns; done kubectl apply -f - <<'EOF' apiVersion: v1 kind: ResourceQuota metadata: name: rq-payments namespace: payments-prod spec: hard: requests.cpu: "30" requests.memory: 60Gi limits.cpu: "40" limits.memory: 80Gi --- apiVersion: v1 kind: LimitRange metadata: name: lr-payments namespace: payments-prod spec: limits: - type: Container default: cpu: "500m" memory: "512Mi" defaultRequest: cpu: "250m" memory: "256Mi" EOF

9.5 Deploy Ingress & cert-manager (Platform Namespace)

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo add jetstack https://charts.jetstack.io helm repo update helm upgrade --install ingress ingress-nginx/ingress-nginx \ --namespace platform-prod \ --set controller.replicaCount=2 \ --set controller.resources.requests.cpu=200m \ --set controller.resources.requests.memory=256Mi kubectl apply -n platform-prod -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.3/cert-manager.crds.yaml helm upgrade --install cert-manager jetstack/cert-manager \ --namespace platform-prod \ --version v1.14.3

9.6 Sample App (Payments API) with HPA

kubectl apply -n payments-prod -f - <<'EOF' apiVersion: apps/v1 kind: Deployment metadata: name: payments-api spec: replicas: 2 selector: matchLabels: app: payments-api template: metadata: labels: app: payments-api spec: containers: - name: api image: nginx:1.25 resources: requests: cpu: 250m memory: 256Mi limits: cpu: 500m memory: 512Mi ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: payments-api spec: selector: app: payments-api ports: - port: 80 targetPort: 80 --- apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: payments-api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: payments-api minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 EOF

9.7 KEDA Installation & Event-Driven Worker

kubectl apply -f https://github.com/kedacore/keda/releases/latest/download/keda.yml kubectl apply -n payments-prod -f - <<'EOF' apiVersion: apps/v1 kind: Deployment metadata: name: payments-worker spec: replicas: 1 selector: matchLabels: app: payments-worker template: metadata: labels: app: payments-worker spec: containers: - name: worker image: busybox args: ["/bin/sh","-c","while true; do echo processing; sleep 30; done"] resources: requests: cpu: 100m memory: 128Mi limits: cpu: 200m memory: 256Mi --- apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: payments-worker-so spec: scaleTargetRef: name: payments-worker minReplicaCount: 1 maxReplicaCount: 10 triggers: - type: azure-servicebus metadata: queueName: payments-queue messageCount: '5' connectionFromEnv: SERVICEBUS_CONNECTION_STRING EOF

9.8 Monitoring Stack (Prometheus & Grafana via Helm)

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo add grafana https://grafana.github.io/helm-charts helm repo update helm upgrade --install kube-prom prometheus-community/kube-prometheus-stack \ --namespace platform-prod \ --set grafana.enabled=true \ --set prometheus.prometheusSpec.retention=15d

9.9 Velero Backup Setup

velero install \ --provider azure \ --plugins velero/velero-plugin-for-microsoft-azure:v1.8.0 \ --bucket backups \ --secret-file ./credentials-velero \ --backup-location-config resourceGroup=$RG,storageAccount=$BLOBSA \ --use-restic velero schedule create daily --schedule "0 1 * * *" --include-namespaces payments-prod,orders-prod,inventory-prod

9.10 Cost Visibility (Optional Kubecost)

helm repo add kubecost https://kubecost.github.io/cost-analyzer/ helm upgrade --install kubecost kubecost/cost-analyzer \ --namespace platform-prod \ --set global.prometheus.enabled=false \ --set global.prometheus.fqdn=http://kube-prom-prometheus.platform-prod.svc

9.11 Basic Load Test (Simulate Traffic)

kubectl run loader -n payments-prod --image=busybox --restart=Never -- /bin/sh -c 'for i in $(seq 1 1000); do wget -q -O- http://payments-api.payments-prod.svc.cluster.local; done'

9.12 Validate Autoscaling

kubectl get hpa -n payments-prod kubectl describe hpa payments-api-hpa -n payments-prod kubectl get pods -n payments-prod -w

9.13 Cleanup

az group delete -n $RG --yes --no-wait

9.14 Lab Outcomes

Provisioned shared prod cluster with node pools.
Established namespaces & quotas.
Deployed sample app + HPA + KEDA worker.
Installed ingress, monitoring, cost, backup tooling.
Validated scaling & backup schedule.

9.15 Next Steps

Add GitOps bootstrap (Flux) repo sync.
Implement NetworkPolicies per namespace.
Integrate image signing (Cosign) & policy enforcement.

Updated Nov 10, 2025

Version 1.0