container networking
20 TopicsIntroducing WireGuard In-Transit Encryption for AKS (Generally Available)
Update (Generally Available) WireGuard in‑transit encryption for Azure Kubernetes Service (AKS) is now generally available for clusters using Azure CNI powered by Cilium and Advanced Container Networking Services. The feature is production‑ready and no longer requires preview enrollment. The core behavior, scope, and configuration model remain unchanged from the public preview. As organizations continue to scale containerized workloads in Azure Kubernetes Service (AKS), securing network traffic between applications and services is critical—especially in regulated or security‑sensitive environments. WireGuard in‑transit encryption is now generally available in AKS, delivering transparent, node‑level encryption for inter‑node pod traffic as part of Advanced Container Networking Services, powered by Azure CNI built on Cilium. What is WireGuard? WireGuard is a modern, high-performance VPN protocol known for its simplicity, and robust cryptography. Integrated into the Cilium data plane and managed as part of AKS networking, WireGuard offers an efficient way to encrypt traffic transparently within your cluster. With this new feature, WireGuard is now natively supported as part of Azure CNI powered by Cilium with Advanced Container Networking services, no need for third-party encryption tools or custom key management systems. What Gets Encrypted? The WireGuard integration in AKS focuses on the most critical traffic path: ✅ Encrypted: Inter-node pod traffic: Network communication between pods running on different nodes in the AKS cluster. This traffic traverses the underlying network infrastructure and is encrypted using WireGuard to ensure confidentiality and integrity. ❌ Not encrypted: Same-node pod traffic: Communication between pods that are running on the same node. Since this traffic does not leave the node, it bypasses WireGuard and remains unencrypted. Node-generated traffic: Traffic initiated by the node itself, which is currently not routed through WireGuard and thus not encrypted. This scope is designed to strike the right balance between strong protection and performance by securing the most critical traffic, which is data that leaves the host and traverses the network. Key Benefits Simple Configuration: Enable WireGuard with just a few flags during AKS cluster creation or update. Automatic Key Management: Each node generates and exchanges WireGuard keys automatically, no need for manual configuration. Transparent to Applications: No application-level changes are required. Encryption happens at the network layer. Cloud-Native Integration: Fully managed as part of Advanced Container Networking Services and Cilium, offering a seamless and reliable experience Architecture: How It Works When WireGuard is enabled: Each node generates a unique public/private key pair. The public keys are securely shared between nodes via the CiliumNode custom resource. A dedicated network interface (cilium_wg0) is created and managed by the Cilium agent running on each node. Peers are dynamically updated, and keys are rotated automatically every 120 seconds to minimize risk. This mechanism ensures that only validated nodes can participate in encrypted communication. WireGuard and VNet Encryption AKS now offers two powerful in-transit encryption options: Feature WireGuard Encryption VNet Encryption Scope Pod-to-pod inter-node traffic All traffic in the VNet VM Support Works on all VM SKUs Requires hardware support (e.g., Gen2 VMs) Deployment Flexibility Cloud-agnostic, hybrid ready Azure-only Performance Software-based, moderate CPU usage Hardware-accelerated, low overhead Choose WireGuard if you want encryption flexibility across clouds or have VM SKUs that don’t support VNet encryption . Choose VNet Encryption for full-network coverage and ultra-low CPU overhead. Conclusion and Next Steps With WireGuard now generally available in AKS, customers can secure inter‑node pod traffic using a lightweight, cloud‑native encryption mechanism that requires no application changes and minimal operational overhead Ready to get started? Check out our how-to guide for step-by-step instructions on enabling WireGuard in your cluster and securing your container networking with ease. Explore more about Advanced Container Networking Services: Container Network Observability L7 network policies FQDN-based Policy759Views0likes0CommentsIntroducing the Container Network Insights Agent for AKS: Now in Public Preview
We are thrilled to announce public preview of Container Network Insights Agent - Agentic AI network troubleshooting for your workloads running in Azure Kubernetes Service (AKS). The Challenge AKS networking is layered by design. Azure CNI, eBPF, Cilium, CoreDNS, NetworkPolicy, CiliumNetworkPolicy, Hubble. Each layer contributes capabilities, and some of these can fail silently in ways the surrounding layers cannot observe. When something breaks, the evidence usually exists. Operators already have the tools such as Azure Monitor for metrics, Container Insights for cluster health, Prometheus and Grafana for dashboarding, Cilium and Hubble for pod network observation, and Kubectl for direct inspection. However, correlating different signals and identifying the root cause takes time. Imagine this scenario: An application performance alert fires. The on-call engineer checks dashboards, reviews events, inspects pod health. Each tool shows its own slice. But the root cause usually lives in the relationship between signals, not in any single tool. So the real work begins to manually cross-reference Hubble flows, NetworkPolicy specs, DNS state, node-level stats, and verdicts. Each check is a separate query, a separate context switch, a separate mental model of how the layers interact. This process is manual, it is slow, needs domain knowledge, and does not scale. Mean time to resolution (MTTR) stays high not because engineers lack skill, but because the investigation surface is wide and the interactions between the layers are complex. The solution: Container Network Insights Agent Container Network Insights Agent is agentic AI to simplify and speed up AKS network troubleshooting Rather than replacing your existing observability tools, the container network insights agent correlates signals on demand to help you quickly identify and resolve network issues. You describe a problem in natural language, and the agent runs a structured investigation across layers. It delivers a diagnosis with the evidence, the root cause, and the exact commands to fix it. The container network insights agent gets its visibility through two data sources: - AKS MCP server container network insight agent integrates with the AKS MCP (Model Context Protocol) server, a standardized and secure interface to kubectl, Cilium, and Hubble. Every diagnostic command runs through the same tools operators already use, via a well-defined protocol that enforces security boundaries. No ad-hoc scripts, no custom API integrations. - Linux Networking plugin For diagnostics that require visibility below the Kubernetes API layer, container network insights agent collects kernel-level telemetry directly from cluster nodes. This includes NIC ring buffer stats, kernel packet counters, SoftIRQ distribution, and socket buffer utilization. This is how it pinpoints packet drops and network saturation that surface-level metrics cannot explain. When you describe a symptom, the container network insights agent: - Classifies the issue and plans an investigation tailored to the symptom pattern - Gathers evidence through the AKS MCP server and its Linux networking plugin across DNS, service routing, network policies, Cilium, and node-level statistics - Reasons across layers to identify how a failure in one component manifests in another - Delivers a structured report with pass/fail evidence, root cause analysis, and specific remediation guidance The container network insight agent is scoped to AKS networking: DNS failures, packet drops, connectivity issues, policy conflicts, and Cilium dataplane health. It does not modify workloads or change configurations. All remediation guidance is advisory. The agent tells you what to run, and you decide whether to apply it. What makes the container network insights agent different Deep telemetry, not just surface metrics Most observability tools operate at the Kubernetes API level. container network insight agent goes deeper, collecting kernel-level network statistics, BPF program drop counters, and interface-level diagnostics that pinpoint exactly where packets are being lost and why. This is the difference between knowing something is wrong and knowing precisely what is causing it. Cross-layer reasoning Networking incidents rarely have single-layer explanations. The container network insights agent correlates evidence from DNS, service routing, network policy, Cilium, and node-level statistics together. It surfaces causal relationships that span layers. For example: node-level RX drops caused by a Cilium policy denial triggered by a label mismatch after a routine Helm deployment, even though the pods themselves appear healthy. Structured and auditable Every conclusion traces to a specific check, its output, and its pass/fail status. If all checks pass, container network insights agent reports no issue. It does not invent problems. Investigations are deterministic and reproducible. Results can be reviewed, shared, and rerun. Guidance, not just findings The container network insights agent explains what the evidence means, identifies the root cause, and provides specific remediation commands. The analysis is done; the operator reviews and decides. Where the container network insights agent fits The container network insights agent is not another monitoring tool. It does not collect continuous metrics or replace dashboards. Your existing observability stack, including Azure Monitor, Prometheus, Grafana, Container Insights, and your log pipelines, keeps doing what it does. The agent complements those tools by adding an intelligence layer that turns fragmented signals into actionable diagnosis. Your alerting detects the problem; this agent helps you understand it. Safe by Design The container network insights agent is built for production clusters. - Read-only access Minimal RBAC scoped to pods, services, endpoints, nodes, namespaces, network policies, and Cilium resources. container network insight agent deploys a temporary debug DaemonSet only for packet-drop diagnostics that require host-level stats. - Advisory remediation only The container network insights agent tells you what to run. It never executes changes. - Evidence-backed conclusions Every root cause traces to a specific failed check. No speculation. - Scoped and enforced The agent handles AKS networking questions only. It does not respond to off-topic requests. Prompt injection defenses are built in. - Credentials stay in the cluster The container network insights agent authenticates via managed identity with workload identity federation. No secrets, no static credentials. Only a session ID cookie reaches the browser. Get Started Container network insights agent is available in Public Preview in **Central US, East US, East US 2, UK South, and West US 2**. The agent deploys as an AKS cluster extension and uses your own Azure OpenAI resource, giving you control over model configuration and data residency. Full capabilities require Cilium and Advanced Container Networking Services. DNS and packet drop diagnostics work on all supported AKS clusters. To try it: - Review the Container Network Insights Agent overview on Microsoft Learn https://learn.microsoft.com/en-us/azure/aks/container-network-insights-agent-overview - Follow the quickstart to deploy container network insights agent and run your first diagnostic - Share feedback via the Azure feedback channel or the thumbs-up and thumbs-down feedback controls on each response Your feedback shapes the roadmap. If the agent gets something wrong or misses a scenario you encounter, we want to hear about it.
558Views0likes0CommentsAnnouncing public preview: Cilium mTLS encryption for Azure Kubernetes Service
We are thrilled to announce the public preview of Cilium mTLS encryption in Azure Kubernetes Service (AKS), delivered as part of Advanced Container Networking Services and powered by the Azure CNI dataplane built on Cilium. This capability is the result of a close engineering collaboration between Microsoft and Isovalent (now part of Cisco). It brings transparent, workload‑level mutual TLS (mTLS) to AKS without sidecars, without application changes, and without introducing a separate service mesh stack. This public preview represents a major step forward in delivering secure, high‑performance, and operationally simple networking for AKS customers. In this post, we’ll walk through how Cilium mTLS works, when to use it, and how to get started. Why Cilium mTLS encryption matters Traditionally, teams looking to in-transit traffic encryption in Kubernetes have had two primary options: Node-level encryption (for example, WireGuard or virtual network encryption), which secures traffic in transit but lacks workload identity and authentication. Service meshes, which provide strong identity and mTLS guarantees but introduce operational complexity. This trade‑off has become increasingly problematic, as many teams want workload‑level encryption and authentication, but without the cost, overhead, and architectural impact of deploying and operating a full-service mesh. Cilium mTLS closes this gap directly in the dataplane. It delivers transparent, inline mTLS encryption and authentication for pod‑to‑pod TCP traffic, enforced below the application layer. And implemented natively in the Azure CNI dataplane built on Cilium, so customers gain workload‑level security without introducing a separate service mesh, resulting in a simpler architecture with lower operational overhead. To see how this works under the hood, the next section breaks down the Cilium mTLS architecture and follows a pod‑to‑pod TCP flow from interception to authentication and encryption. Architecture and design: How Cilium mTLS works Cilium mTLS achieves workload‑level authentication and encryption by combining three key components, each responsible for a specific part of the authentication and encryption lifecycle. Cilium agent: Transparent traffic interception and wiring Cilium agent which already exists on any cluster running with Azure CNI powered by cilium, is responsible for making mTLS invisible to applications. When a namespace is labelled with “io.cilium/mtls-enabled=true”, The Cilium agent enrolls all pods in that namespace. It enters each pod's network namespace and installs iptables rules that redirect outbound traffic to ztunnel on port 15001. It is also responsible for passing workload metadata (such as pod IP and namespace context) to ztunnel. Ztunnel: Node‑level mTLS enforcement Ztunnel is an open source, lightweight, node‑level Layer 4 proxy that was originally created by Istio. Ztunnel runs as a DaemonSet, on the source node it looks up the destination workload via XDS (streamed from the Cilium agent) and establishes mutually authenticated TLS 1.3 sessions between source and destination nodes. Connections are held inline until authentication is complete, ensuring that traffic is never sent in plaintext. The destination ztunnel decrypts the traffic and delivers it into the target pod, bypassing the interception rules via an in-pod mark. The application sees a normal plaintext connection — it is completely unaware encryption happened. SPIRE: Workload identity and trust SPIRE (SPIFFE Runtime Environment) provides the identity foundation for Cilium mTLS. SPIRE acts as the cluster Certificate Authority, issuing short‑lived X.509 certificates (SVIDs) that are automatically rotated and validated. This is a key design principle of Cilium mTLS - trust is based on workload identity, not network topology. Each workload receives a cryptographic identity derived from: Kubernetes namespace Kubernetes ServiceAccount These identities are issued and rotated automatically by SPIRE and validated on both sides of every connection. As a result: Identity remains stable across pod restarts and rescheduling Authentication is decoupled from IP addresses Trust decisions align naturally with Kubernetes RBAC and namespace boundaries This enables a zero‑trust networking model that fits cleanly into existing AKS security practices. End‑to‑End workflow example To see how these components work together, consider a simple pod‑to‑pod connection: A pod initiates a TCP connection to another pod. Traffic intercepted inside the pod network namespace and redirected to the local ztunnel instance. ztunnel retrieves the workload identity using certificates issued by SPIRE. ztunnel establishes a mutually authenticated TLS session with the destination node’s ztunnel. Traffic is encrypted and sent between pods. The destination ztunnel decrypts the traffic and delivers it to the target pod. Every packet from an enrolled pod is encrypted. There is no plaintext window, and no dropped first packets. The connection is held inline by ztunnel until the mTLS tunnel is established, then traffic flows bidirectionally through an HBONE (HTTP/2 CONNECT) tunnel. Workload enrolment and scope Cilium mTLS in AKS is opt‑in and scoped at the namespace level. Platform teams enable mTLS by applying a single label to a namespace. From that point on: All pods in that namespace participate in mTLS Authentication and encryption are mandatory between enrolled workloads Non-enrolled namespaces continue to operate unchanged Encryption is applied only when both pods are enrolled. Traffic between enrolled and non‑enrolled workloads continues in plaintext without causing connectivity issues or hard failures. This model enables gradual rollout, staged migrations, and low-risk adoption across environments. Getting started in AKS Cilium mTLS encryption is available in public preview for AKS clusters that use: Azure CNI powered by Cilium Advanced Container Networking Services You can enable mTLS: When creating a new cluster, or On an existing cluster by updating the Advanced Container Networking Services configuration Once enabled, enrolling workloads is as simple as labelling a namespace. 👉 Learn more Concepts: How Cilium mTLS works, architecture, and trust boundaries How-to guide: Step-by-step instructions to enable and verify mTLS in AKS Looking ahead This public preview represents an important step forward in simplifying network security for AKS and reflects a deep collaboration between Microsoft and Isovalent to bring open, standards‑based innovation into production‑ready cloud platforms. We’re continuing to work closely with the community to improve the feature and move it toward general availability. If you’re looking for workload‑level encryption without the overhead of a traditional service mesh, we invite you to try Cilium mTLS in AKS and share your experience.1.2KViews3likes0CommentsIntroducing eBPF Host Routing: High performance AI networking with Azure CNI powered by Cilium
AI-driven applications demand low-latency workloads for optimal user experience. To meet this need, services are moving to containerized environments, with Kubernetes as the standard. Kubernetes networking relies on the Container Network Interface (CNI) for pod connectivity and routing. Traditional CNI implementations use iptables for packet processing, adding latency and reducing throughput. Azure CNI powered by Cilium natively integrates Azure Kubernetes service (AKS) data plane with Azure CNI networking modes for superior performance, hardware offload support, and enterprise-grade reliability. Azure CNI powered by Cilium delivers up to 30% higher throughput in both benchmark and real-world customer tests compared to a bring-your-own Cilium setup on AKS. The next leap forward: Now, AKS data plane performance can be optimized even further with eBPF host routing, which is an open-source Cilium CNI capability that accelerates packet forwarding by executing routing logic directly in eBPF. As shown in the figure, this architecture eliminates reliance on iptables and connection tracking (conntrack) within the host network namespace. As a result, significantly improving packet processing efficiency, reducing CPU overhead and optimized performance for modern workloads. Comparison of host routing using the Linux kernel stack vs eBPF Azure CNI powered by Cilium is battle-tested for mission-critical workloads, backed by Microsoft support, and enriched with Advanced Container Networking Services features for security, observability, and accelerated performance. eBPF host routing is now included as part of Advanced Container Networking Services suite, delivering network performance acceleration. In this blog, we highlight the performance benefits of eBPF host routing, explain how to enable it in an AKS cluster, and provide a deep dive into its implementation on Azure. We start by examining AKS cluster performance before and after enabling eBPF host routing. Performance comparison Our comparative benchmarks measure the difference in Azure CNI Powered by Cilium, by enabling eBPF host routing. To perform these measurements, we use AKS clusters on K8s version 1.33, with host nodes of 16 cores, running Ubuntu 24.04. We are interested in throughput and latency numbers for pod-to-pod traffic in these clusters. For throughput measurements, we deploy netperf client and server pods, and measure TCP_STREAM throughput at varying message sizes in tests running 20 seconds each. The wide range of message sizes are meant to capture the variety of workloads running on AKS clusters, ranging from AI training and inference to messaging systems and media streaming. For latency, we run TCP_RR tests, measuring latency at various percentiles, as well as transaction rates. The following figure compares pods on the same node; eBPF-based routing results in a dramatic improvement in throughput (~30%). This is because, on the same node, the throughput is not constrained by factors such as the VM NIC limits and is almost entirely determined by host routing performance. For pod-to-pod throughput across different nodes in the cluster. eBPF host routing results in better pod-to-pod throughput across nodes, and the difference is more pronounced with smaller message sizes (3x more). This is because, with smaller messages, the per-message overhead incurred in the host network stack has a bigger impact on performance. Next, we compare latency for pod-to-pod traffic. We limit this benchmark to intra-node traffic, because cross-node traffic latency is determined by factors other than the routing latency incurred in the hosts. eBPF host routing results in reduced latency compared to the non-accelerated configuration at all measured percentiles. We have also measured the transaction rate between client and server pods, with and without eBPF host routing. This benchmark is an alternative measurement of latency because a transaction is essentially a small TCP request/response pair. We observe that eBPF host routing improves transactions per second by around 27% as compared to legacy host routing. Transactions/second (same node) Azure CNI configuration Transactions/second eBPF host routing 20396.9 Traditional host routing 16003.7 Enabling eBPF routing through Advanced Container Networking Services eBPF host routing is disabled by default in Advanced Container Networking Services because bypassing iptables in the host network namespace can ignore custom user rules and host-level security policies. This may lead to visible failures such as dropped traffic or broken network policies, as well as silent issues like unintended access or missed audit logs. To mitigate these risks, eBPF host routing is offered as an opt-in feature, enabled through Advanced Container Networking Services on Azure CNI powered by Cilium. The Advanced Container Networking Services advantage: Built-in safeguards: Enabling eBPF Host Routing in ACNS enhances the open-source offering with strong built-in safeguards. Before activation, ACNS validates existing iptables rules in the host network namespace and blocks enablement if user-defined rules are detected. Once enabled, kernel-level protections prevent new iptables rules and generate Kubernetes events for visibility. These measures allow customers to benefit from eBPF’s performance gains while maintaining security and reliability. Thanks to the additional safeguards, eBPF host routing in Advanced Container Networking Services is a safer and more robust option for customers who wish to obtain the best possible networking performance on their Kubernetes infrastructure. How to enable eBPF Host Routing with ACNS Visit the documentation on how to enable eBPF Host Routing for new and existing Azure CNI Powered by Cilium clusters. Verify the network profile with the new performance `accelerationMode`field set to `BpfVeth`. "networkProfile": { "advancedNetworking": { "enabled": true, "performance": { "accelerationMode": "BpfVeth" }, … For more information on Advanced Container Networking Services and ACNS Performance, please visit https://aka.ms/acnsperformance. Resources For more info about Advanced Container Networking Services please visit (Container Network Security with Advanced Container Networking Services (ACNS) - Azure Kubernetes Service | Microsoft Learn). For more info about Azure CNI Powered by Cilium please visit (Configure Azure CNI Powered by Cilium in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Learn).1.3KViews1like2CommentsSimplify container network metrics filtering in Azure Container Networking Services for AKS
We’re excited to introduce container network metrics filtering in Azure Container Networking Services for Azure Kubernetes Service (AKS) is now in public preview! This capability transforms how you manage network observability in Kubernetes clusters by giving you control over what metrics matter most. Why excessive metrics are a problem (and how we’re fixing it) In today’s large-scale, microservices-driven environments, teams often face metrics bloat, collecting far more data than they need. The result? High storage & ingestion costs: Paying for data you’ll never use. Cluttered dashboards: Hunting for critical latency spikes in a sea of irrelevant pod restarts. Operational overhead: Slower queries, higher maintenance, and fatigue. Our new filtering capability solves this by letting you define precise filters at the pod level using standard Kubernetes custom resources. You collect only what matters, before it ever reaches your monitoring stack. Key Benefits: Signal Over Noise Benefit Your Gain Fine-grained control Filter by namespace or pod label. Target critical services and ignore noise. Cost optimization Reduce ingestion costs for Prometheus, Grafana, and other tools. Improved observability Cleaner dashboards and faster troubleshooting with relevant metrics only. Dynamic & zero-downtime Apply or update filters without restarting Cilium agents or Prometheus. How it works: Filtering at the source Unlike traditional sampling or post-processing, filtering happens at the Cilium agent level—inside the kernel’s data plane. You define filters using the ContainerNetworkMetric custom resource to include or exclude metrics such as: DNS lookups TCP connection metrics Flow metrics Drop (error) metrics This reduces data volume before metrics leave the host, ensuring your observability tools receive only curated, high-value data. Example: Filtering flow metrics to reduce noise Here’s a sample ContainerNetworkMetric CRD that filters only dropped flows from the traffic/http namespace and excludes flows from traffic/fortio pods: apiVersion: acn.azure.com/v1alpha1 kind: ContainerNetworkMetric metadata: name: container-network-metric spec: filters: - metric: flow includeFilters: # Include only DROPPED flows from traffic namespace verdict: - "dropped" from: namespacedPod: - "traffic/http" excludeFilters: # Exclude traffic/fortio flows to reduce noise from: namespacedPod: - "traffic/fortio" Before filtering: After applying filters: Getting started today Ready to simplify your network observability? Enable Advanced Container Networking Services: Make sure Advanced Container Networking Services is enabled on your AKS cluster. Define Your Filter: Apply the ContainerNetworkMetric CRD with your include/exclude rules. Validate: Check your settings via ConfigMap and Cilium agent logs. See the Impact: Watch ingestion costs drop and dashboards become clearer! 👉 Learn more in the Metrics Filtering Guide. Try the public preview today and take control of your container network metrics.465Views0likes0CommentsLayer 7 Network Policies for AKS: Now Generally Available for Production Security and Observability!
We are thrilled to announce that Layer 7 (L7) Network Policies for Azure Kubernetes Service (AKS), powered by Cilium and Advanced Container Networking Services (ACNS), has reached General Availability (GA)! The journey from public preview to GA signifies a critical step: L7 Network Policies are now fully supported, highly optimized, and ready for your most demanding, mission-critical production workloads. A Practical Example: Securing a Multi-Tier Retail Application Let's walk through a common production scenario. Imagine a standard retail application running on AKS with three core microservices: frontend-app: Handles user traffic and displays product information. inventory-api: A backend service that provides product stock levels. It should be read-only for the frontend. payment-gateway: A highly sensitive service that processes transactions. It should only accept POST requests from the frontend to a specific endpoint. The Security Challenge: A traditional L4 policy would allow the frontend-app to talk to the inventory-api on its port, but it couldn't prevent a compromised frontend pod from trying to exploit a potential vulnerability by sending a DELETE or POST request to modify inventory data. The L7 Policy Solution: With GA L7 policies, you can enforce the Principle of Least Privilege at the application layer. Here's how you would protect the inventory-api: apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: protect-inventory-api spec: endpointSelector: matchLabels: app: inventory-api ingress: - fromEndpoints: - matchLabels: app: frontend-app toPorts: - ports: - port: "8080" # The application port protocol: TCP rules: http: - method: "GET" # ONLY allow the GET method path: "/api/inventory/.*" # For paths under /api/inventory/ The Outcome: Allowed: A legitimate request from the frontend-app (GET /api/inventory/item123) is seamlessly forwarded. Blocked: Assuming frontend-app is compromised, any malicious request (like DELETE /api/inventory/item123) originating from it is blocked at the network layer. This Zero Trust approach protects the inventory-api service from the threat, regardless of the security state of the source service. This same principle can be applied to protect the payment-gateway, ensuring it only accepts POST requests to the /process-payment endpoint, and nothing else. Beyond L7: Supporting Zero Trust with Enhanced Security In addition, toL7 application-level policies to ensure Zero Trust, we support Layer 3/4 network security and advanced egress controls like Fully Qualified Domain Name (FQDN) filtering. This comprehensive approach allows administrators to: Restrict Outbound Connections (L3/L4 & FQDN): Implement strict egress control by ensuring that workloads can only communicate with approved external services. FQDN filtering is crucial here, allowing pods to connect exclusively to trusted external domains (e.g., www.trusted-partner.com), significantly reducing the risk of data exfiltration and maintaining compliance. To learn more, visit the FQDN Filtering Overview. Enforce Uniform Policy Across the Cluster (CCNP): Extend protections beyond individual namespaces. By defining security measures as a Cilium Clusterwide Network Policy (CCNP), thanks to its General Availability (GA), administrators can ensure uniform policy enforcement across multiple namespaces or the entire Kubernetes cluster, simplifying management and strengthening the overall security posture of all workloads. To learn CCNP Example: L4 Egress Policy with FQDN Filtering This policy ensures that all pods across the cluster (CiliumClusterwideNetworkPolicy) are only allowed to establish outbound connections to the domain *.example.com on the standard web ports (80 and 443). apiVersion: cilium.io/v2 kind: CiliumClusterwideNetworkPolicy metadata: name: allow-egress-to-example-com spec: endpointSelector: {} # Applies to all pods in the cluster egress: - toFQDNs: - matchPattern: "*.example.com" # Allows access to any subdomain of example.com toPorts: - ports: - port: "443" protocol: TCP - port: "80" protocol: TCP Operational Excellence: Observability You Can Trust A secure system must be observable. With GA, the integrated visibility of your L7 traffic is production ready. In our example above, the blocked DELETE request isn't silent. It is immediately visible in your Azure Managed Grafana dashboards as a "Dropped" flow, attributed directly to the protect-inventory-api policy. This makes security incidents auditable and easy to diagnose, enabling operations teams to detect misconfigurations or threats in real time. Below is a sample dashboard layout screenshot. Next Steps: Upgrade and Secure Your Production! We encourage you to enable L7 Network Policies on your AKS clusters and level up your network security controls for containerized workloads. We value your feedback as we continue to develop and improve this feature. Please refer to the Layer 7 Policy Overview for more information and visit How to Apply L7 Policy for an example scenario.763Views1like0CommentsAzure CNI Overlay for Application Gateway for Containers and Application Gateway Ingress Controller
What are Azure CNI Overlay and Application Gateway? Azure CNI Overlay leverages logical network spaces for pod IP assignment management (IPAM). This provides enhanced IP scalability with reduced management responsibilities. Application Gateway for Containers is the latest and most recommended container L7 load-balancing solution. It introduces a new scalable control plane and data plane to address the performance demands and modern workloads being deployed to AKS clusters on Azure. Azure network control plane configures routing between Application Gateway and overlay pods. Why is the feature needed? As businesses increasingly use containerized solutions, managing container networks at scale has become a priority. Within container network management, IP address exhaustion, scalability and application load balancing performance are highly requested and discussed in many forums. Azure CNI Overlay is the default container networking IPAM mode on AKS. In the overlay design, AKS nodes use IPs from Azure virtual network (VNet) IP address range and pods are addressed from an overlay IP address range. The overlay pods can communicate with each other directly via a different routing domain. Overlay IP addresses can be reused across multiple clusters in the same VNet, provisioning a solution for IP exhaustion and increasing IP scale to over 1M. Azure CNI Overlay supporting Application Gateway for Containers provides customers with a more performant, reliable, and scalable container networking solution. Meanwhile, Azure CNI Overlay supporting AGIC provides customers with full feature parity if they choose to upgrade AKS clusters from kubenet to Azure CNI Overlay. Key Benefits High scale with Azure CNI Overlay combined with a high-performance ingress solution Azure CNI Overlay provides direct pod to pod routing with high IP scale using direct azure native routing with no encapsulation overhead. IPs can be reused across clusters in the same VNET allowing customers to conserve IP addresses. Application Gateway for Containers is the latest and most recommended container L7 load-balancing solution. Installing Application Gateway for Containers on AKS clusters with Azure CNI Overlay provides customers with the best solution combination of IP scalability and ingress solution on Azure. Feature parity between kubenet and Azure CNI Overlay With the retirement announcement of kubenet, we expect to see customers upgrade their AKS container networking solution from kubenet to Azure CNI Overlay soon. This feature allows customers to maintain business continuity during the transitioning process. Learn More Read more about Azure CNI Overlay and Application Gateway for Containers. Learn more on how to upgrade AKS clusters’ IPAM to Azure CNI Overlay. Learn more about Azure Kubernetes Service and Application Gateway.712Views2likes0CommentsProvide a Flat Network Scaling Solution to AKS - Azure CNI Pod Subnet - Static Block Allocation
We are excited to announce the general availability of Azure CNI Pod Subnet - Static Block Allocation – a networking solution that transforms how you scale Azure Kubernetes Service (AKS) clusters! This long-awaited feature is now here, providing enterprise-grade flat networking for clusters in unprecedented capacity. What is Azure CNI Pod Subnet - Static Block Allocation? Azure CNI Pod Subnet - Static Block Allocation revolutionizes AKS networking by expanding cluster capacity from 65K to 1M pods – a game-changing 15x increase that eliminates traditional scaling barriers. Instead of assigning a batch of random individual IP addresses to each node, this innovative approach assigns dedicated Azure subnet CIDR ranges directly to nodes. Every pod scheduled on a node receives its IP address from that node's pre-allocated CIDR block, raising IP limit and simplifying massive deployments. The result is you gain unmatched flexibility with separate node and pod subnets, granular control over NAT and NSG policies, isolated workloads at the pod level, and VNet-native pod networking that maintains peak performance. It also seamlessly works with Azure CNI Powered by Cilium to provide advanced networking capabilities and comprehensive network policy enforcement. Why is Azure CNI Pod Subnet - Static Block Allocation needed? Kubernetes network solutions are challenging to plan due to rapidly evolving business needs. AKS users often face difficulties balancing simplicity, security, and scalability, while environmental changes further increase management costs. Many AKS users need a flat network architecture, pods with direct inbound connectivity, and Azure-native solution integrations, but traditional flat networks couldn't scale beyond 65K pods. Until the launch of static block, customers either choose overlay networks to achieve massive scale or sacrifice the benefits of flat networking. Azure CNI Pod Subnet - Static Block Allocation enables VNet-routed IP addresses that can scale to over 1M pods, providing the simplicity and low latency of a flat network. Each node receives pre-allocated CIDR blocks, and all pods on that node obtain IP addresses from these ranges. This approach delivers massive scale, previously only available with overlay networks (up to 1M pods) while maintaining all the benefits of a flat network architecture. It also works seamlessly alongside Azure CNI Pod Subnet - Dynamic IP Allocation, simply deploy it on new node pools with dedicated subnets. AKS users can scale up AKS network solutions with minimal effort while maintaining enterprise-grade reliability and security. Key Benefits That Matter to You Massive Scale Increase: Break through the 65K pod limitation and scale up to 1M pods per cluster. This isn't just a number—it's about giving you the freedom to build and scale without hitting unexpected networking limits. High Performance: AKS users’ pods get routed on the VNet which is a benefit for ingress/egress, eliminating unnecessary network hops and reducing latency for VNet-native pod networking. Efficient IP Management: AKS users now can allocate CIDR blocks to nodes. This approach raises the IP scalability limit for large-scale deployments. Unmatched Flexibility: Work seamlessly with existing clusters with Azure CNI Pod Subnet - Dynamic IP Allocation Share pod subnets across multiple node pools or even different clusters. Scale your node and pod networks independently Granular Control and Security: Since pods get their own dedicated subnet, AKS users can: Apply different network security policies to pods vs. nodes. Configure customized NAT and NSG policies. Implement isolated workloads at the pod level. Learn more about Azure CNI Pod Subnet - Static Block Allocation Read more in Azure CNI Pod Subnet - Static Block Allocation and try it out in your environment today. Learn more about the solution limitations. Learn more about Azure Kubernetes Service.821Views2likes0CommentsAccelerate designing, troubleshooting & securing your network with Gen-AI powered tools, now GA.
We are thrilled to announce the general availability of Azure Networking skills in Copilot, an extension of Copilot in Azure and Security Copilot designed to enhance cloud networking experience. Azure Networking Copilot is set to transform how organizations design, operate, and optimize their Azure Network by providing contextualized responses tailored to networking-specific scenarios and using your network topology.1.8KViews1like1CommentIntroducing Container Network Logs with Advanced Container Networking Services for AKS
Overview of container network logs Container network logs offer a comprehensive way to monitor network traffic in AKS clusters. Two modes of support, stored-logs and on-demand logs, provides debugging flexibility with cost optimization. The on-demand mode provides a snapshot of logs with queries and visualization with Hubble CLI UI for specific scenarios and does not use log storage to persist the logs. The stored-logs mode when enabled continuously collects and persists logs based on user-defined filters. Logs can be stored either in Azure Log Analytics (managed) or locally (unmanaged). Managed storage: Logs are forwarded to Azure Log Analytics for secure, scalable, and compliant storage. This enables advanced analytics, anomaly detection, and historical trend analysis. Both basic and analytics table plans are supported for storage. Unmanaged storage: Logs are stored locally on the host nodes under /var/log/acns/hubble. These logs are rotated automatically at 50 MB to manage storage efficiently. These logs can be exported to external logging systems or collectors for further analysis. Use cases Connectivity monitoring: Identify and visualize how Kubernetes workloads communicate within the cluster and with external endpoints, helping to resolve application connectivity issues efficiently. Troubleshooting network errors: Gain deep granular visibility into dropped packets, misconfigurations, or errors with details on where and why errors are occurring (TCP/UDP, DNS, HTTP) for faster root cause analysis. Security policy enforcement: Detect and analyze suspicious traffic patterns to strengthen cluster security and ensure regulatory compliance. How it works Container network logs use eBPF technology with Cilium to capture network flows from AKS nodes. Log collection is disabled by default. Users can enable log collection by defining custom resources (CRs) to specify the types of traffic to monitor, such as namespaces, pods, services, or protocols. The Cilium agent collects and processes this traffic, storing logs in JSON format. These logs can either be retained locally or integrated with Azure Monitoring for long-term storage and advanced analytics and visualization with Azure managed Grafana. Fig1: Container network logs overview If using managed storage, users will enable Azure monitor log collection using Azure CLI or ARM templates. Here’s a quick example of enabling container network logs on Azure monitor using the CLI: az aks enable-addons -a monitoring --enable-high-log-scale-mode -g $RESOURCE_GROUP -n $CLUSTER_NAME az aks update --enable-acns \ --enable-retina-flow-logs \ -g $RESOURCE_GROUP \ -n $CLUSTER_NAME Key benefits Faster issue resolution: Detailed logs enable quick identification of connectivity and performance issues. Operational efficiency: Advanced filtering reduces data management overhead. Enhanced application reliability: Proactive monitoring ensures smoother operations. Cost optimization: Customized logging scopes minimize storage and data ingestion costs. Streamlined compliance: Comprehensive logs support audits and security requirements. Observing logs in Azure managed Grafana dashboards Users can visualize container network logs in Azure managed Grafana dashboards, which simplify monitoring and analysis: Flow logs dashboard: View internal communication between Kubernetes workloads. This dashboard highlights metrics such as total requests, dropped packets, and error rates. Error logs dashboard: Easily zoom in only on the logs which show errors for faster log parsing. Service dependency graph: Visualize relationships between services, detect bottlenecks, and optimize network flows. These dashboards provide filtering options to isolate specific logs, such as DNS errors or traffic patterns, enabling efficient root cause analysis. Summary statistics and top-level metrics further enhance understanding of cluster health and activity. Fig 2: Azure managed Grafana dashboard for container network logs Conclusion Container network logs for AKS offer a powerful and cost optimized way to monitor and analyze network activity, enhance troubleshooting, security, and ensure compliance. To get started, enable Advanced Container Networking Services in your AKS cluster and configure custom resources for logging. Visualize your logs in Grafana dashboards and Azure Log Analytics to unlock actionable insights. Learn more here.1.5KViews3likes0Comments