Building Shared AKS Clusters: A Hands-On Guide with Labs and Best Practices
1. Overview of Shared AKS Architecture
1.1 Goals
- Accelerate application delivery by providing a hardened shared Kubernetes platform per environment (prod / test / dev).
- Enable safe multi-tenancy using namespace, RBAC, NetworkPolicy, quotas, and pod security.
- Enforce consistency (standards, guardrails) while allowing team autonomy for app lifecycle.
- Optimize cost through shared cluster capacity, rightsizing, and autoscaling.
1.2 Pattern Summary
- One AKS cluster per environment (e.g., aks-shared-prod, aks-shared-test, aks-shared-dev).
- Multiple business units / product teams share the environment cluster via isolated namespaces.
- Platform services (ingress, cert management, monitoring, cost, backup) run in a dedicated platform-<env> namespace.
- Each team gets namespaces per environment: payments-prod, orders-prod, etc.
1.3 High-Level Architecture Diagram
1.4 Multi-Tenancy Mechanisms
| Mechanism | Purpose | Enforcement Layer |
|---|---|---|
| Namespaces | Logical isolation per team/app | Kubernetes API |
| RBAC | Access control (who can do what) | Azure AD + K8s RBAC |
| NetworkPolicy | East-west traffic control | CNI (Azure CNI) |
| ResourceQuota & LimitRange | Prevent noisy neighbors | K8s admission |
| Pod Security Standards | Baseline/Restricted enforcement | PS Admission |
| Images from ACR only | Trusted supply chain | Admission / Policy |
1.5 Why Not One Cluster For All Environments?
| Aspect | Single Cluster (All Envs) | Per-Environment Clusters |
|---|---|---|
| Blast Radius | High | Contained per env |
| Change Windows | Complex coordination | Independent |
| Compliance | Hard to segregate | Easier mapping |
| Observability Noise | Mixed signals | Clean per env |
| Scaling Decisions | Conflicting | Environment-specific |
Per-environment clusters simplify lifecycle, versioning, and SLA management at the slight cost of control-plane duplication.
1.6 Network Isolation & Azure CNI
- Azure CNI assigns IPs from a VNet subnet directly to pods (no overlay) enabling IP-level visibility.
- Use separate subnets per node pool (system vs workload vs batch) for clearer network policy scoping.
- Leverage network policies (Calico or Azure native) to restrict cross-namespace traffic.
- Private cluster option ensures API server accessible only via private endpoint / VNet.
1.7 Tenancy Diagram (Namespaces → Apps per BU)
2. Key Components
2.1 Autoscaling Architecture
- Cluster Autoscaler: adjusts node count (workload & batch pools) based on pending pods.
- Horizontal Pod Autoscaler (HPA): scales replicas based on CPU, memory, or custom metrics.
- Vertical Pod Autoscaler (VPA): recommends or applies resource request updates.
- KEDA: event-driven autoscaling (queue length, Azure Service Bus, Kafka, etc.).
2.2 Optional Service Mesh (Istio or Ambient Mesh)
Use only when:
- Need mTLS between services.
- Require fine-grained traffic shift (canary, A/B, fault injection).
- Require zero-trust identity propagation. Otherwise keep complexity low and rely on ingress + network policies.
2.3 Ingress & API Exposure
- NGINX or Azure Application Gateway Ingress Controller.
- TLS via cert-manager + Azure Key Vault (CSI driver for secrets if needed).
- Central routing and WAF (if AGIC used).
2.4 Secrets & Configuration
- Prefer Azure Key Vault: reference secrets in pods via CSI driver or sync controller.
- Use Kubernetes sealed-secrets only for GitOps edge cases.
- External Secrets Operator can streamline mapping.
2.5 Storage
- Azure Disk: DB/data workloads needing single-node high IOPS.
- Azure Files: shared RW across replicas.
- Azure NetApp Files: high throughput/low latency enterprise workloads.
- Blob Storage: backup target + object data.
2.6 Backup & Disaster Recovery
- Velero backs up cluster metadata + PV snapshots (when using supported providers).
- Off-cluster backups stored in Blob with lifecycle management.
- DR strategy: recreate cluster via IaC + restore Velero backups + bootstrap GitOps.
2.7 Observability Stack
- Prometheus (metrics) + exporters (node, kube-state, custom).
- Grafana dashboards per team and shared platform board.
- Azure Monitor Container Insights for baseline + log retention.
- Tracing: OpenTelemetry Collector + Jaeger or Azure Monitor tracing backend.
3. Placeholder: CI/CD Strategy (to be expanded)
3. CI/CD Strategy
3.1 Principles
- Everything declarative (Helm charts / Kustomize) stored in Git.
- One pipeline per application per environment stage (build once, promote with image digest).
- GitOps for platform and cross-cutting components (ingress, monitoring, backup) via Flux or Argo CD.
- Separation of duties: App teams manage their namespace manifests; platform team manages cluster addons.
3.2 Recommended Flow
- Developer merges to main → CI builds & scans image → pushes to ACR with immutable tag & digest.
- CI updates Helm values-prod.yaml (or image tag file) in Git (infra repo) via PR.
- GitOps controller detects change → deploys to namespace.
- Post-deploy tests & smoke checks run.
- Promotion to higher environment uses same image digest (no rebuild).
3.3 Pipeline Diagram
3.4 Tools
| Concern | Tool | Notes |
|---|---|---|
| Build | Azure DevOps / GitHub Actions | Container build, unit tests |
| Scan (Image) | Trivy / Microsoft Defender | Fail pipeline on critical vulnerabilities |
| Sign | Cosign | Image provenance signature |
| Deploy | Helm + Flux | Reconciled from Git |
| Secrets | Key Vault CSI / External Secrets | No secrets in Git |
3.5 Sample GitHub Actions Snippet (Build & Push)
jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Login ACR uses: azure/docker-login@v1 with: login-server: ${{ env.ACR_NAME }}.azurecr.io username: ${{ secrets.ACR_USERNAME }} password: ${{ secrets.ACR_PASSWORD }} - name: Build run: docker build -t ${{ env.ACR_NAME }}.azurecr.io/payments:${{ github.sha }} . - name: Scan uses: aquasecurity/trivy-action@v0.13.0 with: image-ref: ${{ env.ACR_NAME }}.azurecr.io/payments:${{ github.sha }} severity: HIGH,CRITICAL - name: Push run: docker push ${{ env.ACR_NAME }}.azurecr.io/payments:${{ github.sha }}
3.6 Helm Deployment Command (Manual)
helm upgrade --install payments charts/payments \ --namespace payments-prod \ --set image.repository=myacr.azurecr.io/payments \ --set image.tag=sha256:<digest>
3.7 GitOps Advantages
- Drift detection & self-healing.
- Auditability (all changes in PR history).
- Immutable artifacts (image digest pinned).
3.8 Namespace Alignment
| Namespace | Repo Path | Release Strategy |
|---|---|---|
| payments-prod | apps/payments/overlays/prod | Auto after PR merge |
| orders-prod | apps/orders/overlays/prod | Manual approval |
| inventory-prod | apps/inventory/overlays/prod | Auto |
| platform-prod | platform/addons | Platform team only |
4. Backup Strategy (Deep Dive)
4.1 Objectives
- Recover from accidental deletion, corruption, cluster loss.
- Meet RPO/RTO defined per application tier (e.g., Tier-1: RPO 15m, RTO 1h).
4.2 Velero Architecture
4.3 Backup Scope
| Item | Method | Notes |
|---|---|---|
| Namespace manifests | Velero backup | Included automatically |
| Persistent Volumes (Azure Disk) | CSI snapshots | Fast point-in-time |
| Azure Files | File-level backup (optional) | Consider rsync / custom |
| ACR images | Not needed (immutable stored) | Use retention policies |
| Secrets | Included; consider encryption | Key Vault references not stored |
4.4 Backup Command Examples
# Create daily schedule velero schedule create daily-prod --schedule "0 2 * * *" --include-namespaces payments-prod,orders-prod,inventory-prod # On-demand backup velero backup create payments-manual-$(date +%Y%m%d) --include-namespaces payments-prod # Restore velero restore create --from-backup payments-manual-20250101
4.5 Blob Storage Configuration
- Use versioning & soft delete for container.
- Configure lifecycle: transition >90 day old backups to Cool / Archive.
- Private endpoint for storage account if cluster private.
4.6 DR Runbook (Summary)
- Recreate cluster via Bicep/Terraform.
- Install platform addons (GitOps bootstrap).
- Install Velero + connect to backup bucket.
- Restore critical namespaces (payments → orders → inventory).
- Run validation scripts & synthetic tests.
4.7 Testing Backups
- Monthly restore into ephemeral test cluster.
- Validate app startup & data integrity checksums.
4.8 KPIs
| KPI | Target |
|---|---|
| Backup success rate | > 99% |
| Restore drill frequency | Monthly |
| DR RTO (Tier-1) | <= 60m |
| DR RPO (Tier-1) | <= 15m |
5. Operational Insights
5.1 Resource Optimization
- Use VPA recommendations to refine requests bi-weekly.
- KEDA for spiky workloads (workers / consumers) to avoid over-provisioning.
- Batch node pool with Spot instances for cost-efficient asynchronous jobs.
5.2 Quotas & LimitRanges
| Namespace | CPU Quota | Memory Quota | Notes |
|---|---|---|---|
| payments-prod | 30 | 60Gi | High-traffic workload |
| orders-prod | 40 | 80Gi | Larger processing window |
| inventory-prod | 20 | 40Gi | Moderate update frequency |
5.3 Noisy Neighbor Mitigation
- Enforce per-deployment resource limits.
- Use priority classes (system > platform > business apps > batch).
- Alert on sustained throttling or eviction events.
5.4 Operational Dashboard Metrics
| Category | Metric | Source |
|---|---|---|
| Capacity | Node utilization % | Prometheus node exporter |
| Efficiency | Requested vs actual usage | kube-state-metrics |
| Reliability | Pod restarts, crash loops | kube events |
| Performance | API latency P95 | Ingress metrics |
| Scaling | HPA decision latency | Prometheus adapter |
5.5 SLO Examples
| Service | SLO | Measurement |
|---|---|---|
| Payments API | 99.9% availability | Successful request % over 5m windows |
| Orders API | P95 < 400ms | Ingress / app metrics |
| Inventory Sync | Completion < 10m | Job duration metrics |
5.6 Incident Playbook (Abbreviated)
- Detect (alert fires) → classify severity.
- Gather: kubectl describe, logs, metrics timeline.
- Mitigate: rollback image / scale resources / isolate via NetworkPolicy.
- Communicate: status page / stakeholder channel.
- Postmortem within 48h → action items tracked.
6. Cost & Billing Strategy
6.1 Tagging & Labeling
- Azure resources: Environment=Prod, BusinessUnit=Payments, CostCenter=1234.
- Kubernetes: Namespace labels bu=payments, env=prod used by cost tools (Kubecost / Azure Advisor).
6.2 Cost Visibility
No diagram type detected matching given configuration for text:
6.3 Optimization Levers
| Lever | Description | Example |
|---|---|---|
| Rightsizing | Adjust requests via VPA | Reduce CPU from 500m to 200m |
| Spot | Use for batch/non-critical | BatchPool spot nodes |
| Autoscaling | Scale down at night | WorkloadPool min nodes reduced |
| Image Slimming | Smaller images → faster deploy | Multi-stage Docker builds |
| Storage Tiering | Archive old backups | Blob lifecycle rules |
6.4 Chargeback/Showback
- Monthly export per namespace (CPU-hours, memory GB-hours, storage GB, network egress).
- Map to internal rate card (e.g., $ per vCPU-hour).
- Provide dashboard + monthly PDF summary.
6.5 KPIs
| KPI | Target |
|---|---|
| Unallocated capacity | < 20% |
| Spot utilization | > 30% of batch workloads |
| Orphan resources cleanup time | < 7 days |
7. Security & Compliance
7.1 Layered Model
| Layer | Control | Tool |
|---|---|---|
| Identity | Azure AD RBAC → K8s RBAC | Azure AD groups |
| Workload Policies | PSP Replacement / PSS | Built-in Pod Security Admission |
| Network | Namespace isolation | NetworkPolicy (Calico/Azure) |
| Supply Chain | Image provenance/signature | Cosign + ACR Content Trust |
| Secrets | External vault storage | Azure Key Vault |
| Runtime | Behavioral detection | Defender for Containers |
7.2 RBAC Pattern
- ClusterRoles for common verbs (view, deploy, ops).
- Bind Role to Azure AD groups via AAD integration (Group mapping).
- Example groups: aks-platform-admins, aks-payments-devs, aks-readonly.
7.3 NetworkPolicy Example
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-ingress-to-payments namespace: payments-prod spec: podSelector: matchLabels: app: payments-api ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: platform-prod ports: - protocol: TCP port: 8080 policyTypes: [Ingress]
7.4 Pod Security Standards
- Set namespace labels: pod-security.kubernetes.io/enforce=restricted for prod.
- Use baseline for dev/test to allow debugging tools.
7.5 Image Policy
- Admission controller verifies images originate from approved ACR & are signed.
7.6 Secret Management Pattern
apiVersion: secrets-store.csi.x-k8s.io/v1 kind: SecretProviderClass metadata: name: payments-kv namespace: payments-prod spec: provider: azure parameters: usePodIdentity: "false" useVMManagedIdentity: "true" userAssignedIdentityID: <client-id> keyvaultName: my-shared-kv objects: | array: - | secret;db-password tenantId: <tenant-id>
7.7 Compliance Mapping
| Requirement | Control | Evidence |
|---|---|---|
| Least Privilege | RBAC roles | GitOps repo + audit logs |
| Data Protection | Encrypted disks | Azure policy compliance report |
| Audit | Central log retention | Azure Monitor workspace |
| Vulnerability Mgmt | Image scanning | Pipeline reports |
8. Monitoring & Observability
8.1 Pillars
| Pillar | Tool | Output |
|---|---|---|
| Metrics | Prometheus | Time-series dashboards |
| Logs | Azure Monitor / Loki (optional) | Query & retention |
| Traces | OpenTelemetry Collector | Distributed latency maps |
| Events | Kubernetes API / Alertmanager | Incident triggers |
8.2 Observability Diagram
8.3 Example PrometheusRule
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: payments-alerts namespace: platform-prod spec: groups: - name: payments-availability rules: - alert: PaymentsHighErrorRate expr: rate(http_requests_total{namespace="payments-prod",status=~"5.."}[5m]) > 5 for: 2m labels: severity: page annotations: summary: High 5xx error rate in payments API description: More than 5 errors/min for 2 minutes.
8.4 Alerting Strategy
- Page only on user-impact (availability, latency SLO breaches).
- Ticket for capacity trend warnings.
- Daily digest for low-priority issues.
8.5 Dashboards
- Platform Overview (cluster health, node capacity).
- Namespace Cost & Efficiency.
- Application Performance (latency, errors, throughput).
8.6 Log Retention
- 30 days hot, 180 days archive (Blob / ADLS).
- PII scrubbing before long-term archival.
8.7 Tracing Adoption Steps
- Inject OTel SDK into services.
- Export spans to collector via OTLP.
- Add trace ID to logs for correlation.
- Establish latency budgets per critical path.
8.8 KPIs
| KPI | Target |
|---|---|
| Alert false positives | < 10% |
| Missing metrics coverage | < 5% of services |
| Trace sampled rate (critical paths) | > 90% |
9. Placeholder: Lab Section
9. Hands-On Lab: Build & Operate Shared AKS
Estimated Time: ~120 minutes. Run commands from an Azure Cloud Shell or local workstation logged into Azure (az login). Replace variables as needed.
9.1 Prerequisites
export LOCATION=eastus
export RG=rg-shared-aks-prod
export AKS=aks-shared-prod
export ACR=acrsharedprod$RANDOM
export KV=kv-shared-prod-$RANDOM
export BLOBSA=stsharedprod$RANDOM
9.2 Create Resource Group & Shared Services
az group create -n $RG -l $LOCATION az acr create -n $ACR -g $RG --sku Premium --location $LOCATION az keyvault create -n $KV -g $RG -l $LOCATION --enabled-for-deployment true az storage account create -n $BLOBSA -g $RG -l $LOCATION --sku Standard_LRS --kind StorageV2 export BLOBKEY=$(az storage account keys list -g $RG -n $BLOBSA --query [0].value -o tsv) az acr login -n $ACR
9.3 Provision AKS Cluster (Per-Environment)
az aks create -g $RG -n $AKS \ --enable-managed-identity \ --node-count 3 \ --node-vm-size Standard_D4s_v5 \ --network-plugin azure \ --enable-addons monitoring \ --enable-oidc-issuer \ --enable-workload-identity \ --generate-ssh-keys az aks nodepool add -g $RG --cluster-name $AKS -n workloadpool \ --node-count 3 --enable-cluster-autoscaler --min-count 3 --max-count 10 \ --node-vm-size Standard_D8s_v5 az aks nodepool add -g $RG --cluster-name $AKS -n batchpool \ --enable-cluster-autoscaler --min-count 0 --max-count 5 \ --node-vm-size Standard_D4s_v5 --priority Spot az aks get-credentials -g $RG -n $AKS
9.4 Create Namespaces & Quotas
for ns in platform-prod payments-prod orders-prod inventory-prod; do kubectl create namespace $ns; done kubectl apply -f - <<'EOF' apiVersion: v1 kind: ResourceQuota metadata: name: rq-payments namespace: payments-prod spec: hard: requests.cpu: "30" requests.memory: 60Gi limits.cpu: "40" limits.memory: 80Gi --- apiVersion: v1 kind: LimitRange metadata: name: lr-payments namespace: payments-prod spec: limits: - type: Container default: cpu: "500m" memory: "512Mi" defaultRequest: cpu: "250m" memory: "256Mi" EOF
9.5 Deploy Ingress & cert-manager (Platform Namespace)
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx helm repo add jetstack https://charts.jetstack.io helm repo update helm upgrade --install ingress ingress-nginx/ingress-nginx \ --namespace platform-prod \ --set controller.replicaCount=2 \ --set controller.resources.requests.cpu=200m \ --set controller.resources.requests.memory=256Mi kubectl apply -n platform-prod -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.3/cert-manager.crds.yaml helm upgrade --install cert-manager jetstack/cert-manager \ --namespace platform-prod \ --version v1.14.3
9.6 Sample App (Payments API) with HPA
kubectl apply -n payments-prod -f - <<'EOF' apiVersion: apps/v1 kind: Deployment metadata: name: payments-api spec: replicas: 2 selector: matchLabels: app: payments-api template: metadata: labels: app: payments-api spec: containers: - name: api image: nginx:1.25 resources: requests: cpu: 250m memory: 256Mi limits: cpu: 500m memory: 512Mi ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: payments-api spec: selector: app: payments-api ports: - port: 80 targetPort: 80 --- apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: payments-api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: payments-api minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 EOF
9.7 KEDA Installation & Event-Driven Worker
kubectl apply -f https://github.com/kedacore/keda/releases/latest/download/keda.yml kubectl apply -n payments-prod -f - <<'EOF' apiVersion: apps/v1 kind: Deployment metadata: name: payments-worker spec: replicas: 1 selector: matchLabels: app: payments-worker template: metadata: labels: app: payments-worker spec: containers: - name: worker image: busybox args: ["/bin/sh","-c","while true; do echo processing; sleep 30; done"] resources: requests: cpu: 100m memory: 128Mi limits: cpu: 200m memory: 256Mi --- apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: payments-worker-so spec: scaleTargetRef: name: payments-worker minReplicaCount: 1 maxReplicaCount: 10 triggers: - type: azure-servicebus metadata: queueName: payments-queue messageCount: '5' connectionFromEnv: SERVICEBUS_CONNECTION_STRING EOF
9.8 Monitoring Stack (Prometheus & Grafana via Helm)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo add grafana https://grafana.github.io/helm-charts helm repo update helm upgrade --install kube-prom prometheus-community/kube-prometheus-stack \ --namespace platform-prod \ --set grafana.enabled=true \ --set prometheus.prometheusSpec.retention=15d
9.9 Velero Backup Setup
velero install \ --provider azure \ --plugins velero/velero-plugin-for-microsoft-azure:v1.8.0 \ --bucket backups \ --secret-file ./credentials-velero \ --backup-location-config resourceGroup=$RG,storageAccount=$BLOBSA \ --use-restic velero schedule create daily --schedule "0 1 * * *" --include-namespaces payments-prod,orders-prod,inventory-prod
9.10 Cost Visibility (Optional Kubecost)
helm repo add kubecost https://kubecost.github.io/cost-analyzer/ helm upgrade --install kubecost kubecost/cost-analyzer \ --namespace platform-prod \ --set global.prometheus.enabled=false \ --set global.prometheus.fqdn=http://kube-prom-prometheus.platform-prod.svc
9.11 Basic Load Test (Simulate Traffic)
kubectl run loader -n payments-prod --image=busybox --restart=Never -- /bin/sh -c 'for i in $(seq 1 1000); do wget -q -O- http://payments-api.payments-prod.svc.cluster.local; done'
9.12 Validate Autoscaling
kubectl get hpa -n payments-prod kubectl describe hpa payments-api-hpa -n payments-prod kubectl get pods -n payments-prod -w
9.13 Cleanup
az group delete -n $RG --yes --no-wait
9.14 Lab Outcomes
- Provisioned shared prod cluster with node pools.
- Established namespaces & quotas.
- Deployed sample app + HPA + KEDA worker.
- Installed ingress, monitoring, cost, backup tooling.
- Validated scaling & backup schedule.
9.15 Next Steps
- Add GitOps bootstrap (Flux) repo sync.
- Implement NetworkPolicies per namespace.
- Integrate image signing (Cosign) & policy enforcement.