ebpf
17 TopicseBPF-Powered Observability Beyond Azure: A Multi-Cloud Perspective with Retina
Kubernetes simplifies container orchestration but introduces observability challenges due to dynamic pod lifecycles and complex inter-service communication. eBPF technology addresses these issues by providing deep system insights and efficient monitoring. The open-source Retina project leverages eBPF for comprehensive, cloud-agnostic network observability across AKS, GKE, and EKS, enhancing troubleshooting and optimization through real-world demo scenarios.948Views9likes0CommentsUse cases of Advanced Network Observability for your Azure Kubernetes Service clusters
The blog explores the use cases of Advanced Network Observability for Azure Kubernetes Service (AKS) clusters. It introduces the Advanced Network Observability feature, which brings Hubble's control plane to both Cilium and Non-Cilium Linux data planes. This feature provides deep insights into containerized workloads, enabling precise detection and root-cause analysis of network-related issues in Kubernetes clusters. The document also includes customer scenarios that demonstrate the benefits of Advanced Network Observability, such as DNS metrics, network policy drops at the pod level, and traffic imbalance for pods within a workload4.1KViews4likes2CommentsIntroducing Container Network Logs with Advanced Container Networking Services for AKS
Overview of container network logs Container network logs offer a comprehensive way to monitor network traffic in AKS clusters. Two modes of support, stored-logs and on-demand logs, provides debugging flexibility with cost optimization. The on-demand mode provides a snapshot of logs with queries and visualization with Hubble CLI UI for specific scenarios and does not use log storage to persist the logs. The stored-logs mode when enabled continuously collects and persists logs based on user-defined filters. Logs can be stored either in Azure Log Analytics (managed) or locally (unmanaged). Managed storage: Logs are forwarded to Azure Log Analytics for secure, scalable, and compliant storage. This enables advanced analytics, anomaly detection, and historical trend analysis. Both basic and analytics table plans are supported for storage. Unmanaged storage: Logs are stored locally on the host nodes under /var/log/acns/hubble. These logs are rotated automatically at 50 MB to manage storage efficiently. These logs can be exported to external logging systems or collectors for further analysis. Use cases Connectivity monitoring: Identify and visualize how Kubernetes workloads communicate within the cluster and with external endpoints, helping to resolve application connectivity issues efficiently. Troubleshooting network errors: Gain deep granular visibility into dropped packets, misconfigurations, or errors with details on where and why errors are occurring (TCP/UDP, DNS, HTTP) for faster root cause analysis. Security policy enforcement: Detect and analyze suspicious traffic patterns to strengthen cluster security and ensure regulatory compliance. How it works Container network logs use eBPF technology with Cilium to capture network flows from AKS nodes. Log collection is disabled by default. Users can enable log collection by defining custom resources (CRs) to specify the types of traffic to monitor, such as namespaces, pods, services, or protocols. The Cilium agent collects and processes this traffic, storing logs in JSON format. These logs can either be retained locally or integrated with Azure Monitoring for long-term storage and advanced analytics and visualization with Azure managed Grafana. Fig1: Container network logs overview If using managed storage, users will enable Azure monitor log collection using Azure CLI or ARM templates. Here’s a quick example of enabling container network logs on Azure monitor using the CLI: az aks enable-addons -a monitoring --enable-high-log-scale-mode -g $RESOURCE_GROUP -n $CLUSTER_NAME az aks update --enable-acns \ --enable-retina-flow-logs \ -g $RESOURCE_GROUP \ -n $CLUSTER_NAME Key benefits Faster issue resolution: Detailed logs enable quick identification of connectivity and performance issues. Operational efficiency: Advanced filtering reduces data management overhead. Enhanced application reliability: Proactive monitoring ensures smoother operations. Cost optimization: Customized logging scopes minimize storage and data ingestion costs. Streamlined compliance: Comprehensive logs support audits and security requirements. Observing logs in Azure managed Grafana dashboards Users can visualize container network logs in Azure managed Grafana dashboards, which simplify monitoring and analysis: Flow logs dashboard: View internal communication between Kubernetes workloads. This dashboard highlights metrics such as total requests, dropped packets, and error rates. Error logs dashboard: Easily zoom in only on the logs which show errors for faster log parsing. Service dependency graph: Visualize relationships between services, detect bottlenecks, and optimize network flows. These dashboards provide filtering options to isolate specific logs, such as DNS errors or traffic patterns, enabling efficient root cause analysis. Summary statistics and top-level metrics further enhance understanding of cluster health and activity. Fig 2: Azure managed Grafana dashboard for container network logs Conclusion Container network logs for AKS offer a powerful and cost optimized way to monitor and analyze network activity, enhance troubleshooting, security, and ensure compliance. To get started, enable Advanced Container Networking Services in your AKS cluster and configure custom resources for logging. Visualize your logs in Grafana dashboards and Azure Log Analytics to unlock actionable insights. Learn more here.1.3KViews3likes0CommentsSecuring Microservices with Cilium and Istio
The adoption of Kubernetes and containerized applications is booming, leading to new challenges in visibility and security. As the landscape of cloud-native applications is rapidly evolving so are the number of sophisticated attacks targeting containerized workloads. Traditional tools often fall short in tracking the usage and traffic flows within these applications. The immutable nature of container images and the short lifespan of containers further necessitate addressing vulnerabilities early in the delivery pipeline. Comprehensive Security Controls in Kubernetes Microsoft Azure offers a range of security controls to ensure comprehensive protection across various layers of the Kubernetes environment. These controls include but are not limited to: Cluster Security: Features such as private clusters, managed cluster identity, and API server authorized ranges enhance security at the cluster level. Node and Pod Security: Hardened bootstrapping, confidential nodes, and pod sandboxing are implemented to secure the nodes and pods within a cluster. Network Security: Advanced Container Networking Services and Cilium Network policies offer granular control over network traffic. Authentication and Authorization: Azure Policy in-cluster enforcement, Entra authentication, and Istio mTLS and authorization policies provide robust identity and access management. Image Scanning: Microsoft Defender for Cloud provides both image and runtime scanning to identify vulnerabilities and threats. Let’s highlight how you can secure micro services while scaling your applications running on Azure Kubernetes Service (AKS) using service mesh for robust traffic management, and network policies for security. Micro segmentation with Network Policies Micro segmentation is crucial for enhancing security within Kubernetes clusters, allowing for the isolation of workloads and controlled traffic between microservices. Azure CNI by Cilium leverages eBPF to provide high-performance networking, security, and observability features. It dynamically inserts eBPF bytecode into the Linux kernel, offering efficient and flexible control over network traffic. Cilium Network Policies enable network isolation within and across Kubernetes clusters. Cilium also provides an identity-based security model, offering Layer 7 (L7) traffic control, and integrates deep observability for L4 to L7 metrics in Kubernetes clusters. A significant advantage of using Azure CNI based on Cilium is its seamless integration with existing AKS environments, requiring minimal modifications to your infrastructure. Note that Cilium Clusterwide Network Policy (CCNP) is not supported at the time of writing this blog post. FQDN Filtering with Advanced Container Networking Services (ACNS) Traditional IP-based policies can be cumbersome to maintain. ACNS allows for DNS-based policies, providing a more granular and user-friendly approach to managing network traffic. This is supported only with Azure CNI powered by Cilium and includes a security agent DNS proxy for FQDN resolution even during upgrades. It’s worth noting that with Cilium’s L7 enforcement, you can control traffic based on HTTP methods, paths, and headers, making it ideal for APIs, microservices, and services that use protocols like HTTP, gRPC, or Kafka. At the time of writing this blog, this capability is not supported via ACNS. More on this in a future blog! AKS Istio Add-On: Mutual TLS (mTLS) and Authorization Policy Istio enhances the security of microservices through its built-in features, including mutual TLS (mTLS) and authorization policies. The Istiod control plane, acting as a certificate authority, issues X.509 certificates to the Envoy sidecar proxies via the Secret Discovery Service (SDS). Integration with Azure Key Vault allows for secure management of root and intermediate certificates. The PeerAuthentication Custom Resource in Istio controls the traffic accepted by workloads. By default, it is set to PERMISSIVE to facilitate migration but can be set to STRICT to enforce mTLS across the mesh. Istio also supports granular authorization policies, allowing for control over IP blocks, namespaces, service accounts, request paths, methods, and headers. The Istio add-on also supports integration with Azure Key Vault (AKV) and the AKV Secrets Store CSI Driver Add-On for plug-in CA certificates, where the root CA lives on a secure machine offline, and the intermediate certs for the Istiod control plane are synced to the cluster by the CSI Driver Add-On. Additionally, certificates for the Istio ingress gateway for TLS termination or SNI passthrough can also be stored in AKV. Defense-In-Depth with Cilium, ACNS and Istio Combining the capabilities of Cilium's eBPF technologies through ACNS and AKS managed Istio addon, AKS provides a defense-in-depth strategy for securing Kubernetes clusters. Azure CNI's Cilium Network Policies and ACNS FQDN filtering enforce Pod-to-Pod and Pod-to-egress policies at Layer 3 and 4, while Istio enforces STRICT mTLS and Layer 7 authorization policies. This multi-layered approach ensures comprehensive security coverage across all layers of the stack. Now, let’s highlight the key steps in achieving this: Step 1: Create an AKS Cluster with Azure CNI (by Cilium), ACNS and Istio Addon enabled. az aks create \ --resource-group $RESOURCE_GROUP \ --name $CLUSTER_NAME \ --location $LOCATION \ --kubernetes-version 1.30.0 \ --node-count 3 \ --node-vm-size standard_d16_v3 \ --enable-managed-identity \ --network-plugin azure \ --network-dataplane cilium \ --network-plugin-mode overlay \ --pod-cidr 192.168.0.0/16 \ --enable-asm \ --enable-acns \ --generate-ssh-keys Step 2: Create Cilium FQDN policy that allows egress traffic to google.com while blocking traffic to httpbin.org. Sample Policy (fqdn-filtering-policy.yaml): apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: sleep-network-policy namespace: foo spec: endpointSelector: matchLabels: app: sleep egress: - toFQDNs: - matchPattern: "*.google.com" - toEndpoints: - matchLabels: "k8s:io.kubernetes.pod.namespace": foo "k8s:app": helloworld - toEndpoints: - matchLabels: "k8s:io.kubernetes.pod.namespace": kube-system "k8s:k8s-app": kube-dns toPorts: - ports: - port: "53" protocol: ANY Apply policy: kubectl apply -f fqdn-filtering-policy.yaml Step 3: Create an Istio deny-by-default AuthorizationPolicy. This denies all requests across the mesh unless specifically authorized with an “ALLOW” policy. Sample Policy (istio-deny-all-authz.yaml): apiVersion: security.istio.io/v1 kind: AuthorizationPolicy metadata: name: allow-nothing namespace: aks-istio-system spec: {} Apply policy: kubectl apply -f istio-deny-all-authz.yaml Step 4: Deploy an Istio L7 AuthorizationPolicy to explicitly allow traffic to the “sample” pod in namespace foo for http “GET” requests. Sample Policy (istio-L7-allow-policy.yaml): apiVersion: security.istio.io/v1 kind: AuthorizationPolicy metadata: name: allow-get-requests namespace: foo spec: selector: matchLabels: app: sample action: ALLOW rules: - to: - operation: methods: [“GET”] Apply policy: kubectl apply -f istio-L7-allow-policy.yaml Step 5: Deploy an Istio strict mTLS PeerAuthentication Resource to enforce that all workloads in the mesh only accept Istio mTLS traffic. Sample PeerAuthentication (istio-peerauth.yaml): apiVersion: security.istio.io/v1 kind: PeerAuthentication metadata: name: strict-mtls namespace: aks-istio-system spec: mtls: mode: STRICT Apply policy: kubectl apply -f istio-peerauth.yaml These examples demonstrate how you can manage traffic to specific FQDNs and enforce L7 authorization rules in your AKS cluster. Conclusion Traditional IP and perimeter security models are insufficient for the dynamic nature of cloud-native environments. More sophisticated security mechanisms, such as identity-based policies and DNS names, are required. Azure CNI, powered by Cilium and ACNS, provides robust FQDN filtering and Layer 3/4 Network Policy enforcement. The Istio add-on offers mTLS for identity-based encryption and Layer7 authorization policies. A defense-in-depth model, incorporating both Azure CNI and service mesh mechanisms, is recommended for maximizing security posture. So, give these a try and let us know (Azure Kubernetes Service Roadmap (Public)) how we can evolve our roadmap to help you build the best with Azure. Credit(s): Niranjan Shankar, Sr. Software Engineer, Microsoft1.6KViews3likes0CommentsAzure CNI Powered by Cilium for Azure Kubernetes Service (AKS)
Azure CNI powered by Cilium integrates the scalable and flexible Azure IPAM control plane with the robust dataplane offered by Cilium OSS to create a modern container networking stack that meets the demands of cloud native workloads.17KViews3likes1CommentIntroducing Layer 7 Network Policies with Advanced Container Networking Services for AKS Clusters!
We have been on an exciting journey to enhance the network observability and security capabilities of Azure Kubernetes Service (AKS) clusters through our Azure Container Networking Services offering. Our initial step, the launch of Fully Qualified Domain Name (FQDN) filtering, marked a foundational step in enabling policy-driven egress control. By allowing traffic management at the domain level, we set the stage for more advanced and fine-grained security capabilities that align with modern, distributed workloads. This was just the beginning, a glimpse into our commitment to enabling AKS users with robust and granular security controls. Today, we are thrilled to announce the public preview of Layer 7 (L7) Network Policies for AKS and Azure CNI powered by Cilium users with Advanced Container Networking Services enabled. This update brings a whole new dimension of security to your containerized environments, offering much more granular control over your application layer traffic. Overview of L7 Policy Unlike traditional Layer 3 and Layer 4 policies that operate at the network and transport layers, L7 policies operate at the application layer. This enables more precise and effective traffic management based on application-specific attributes. L7 policies enable you to define security rules based on application-layer protocols such as HTTP(S), gRPC, and Kafka. For example, you can create policies that allow traffic based on HTTP(S) methods (GET, POST, etc.), headers, paths, and other protocol-specific attributes. This level of control helps in implementing fine-grained access control, restricting traffic based on the actual content of the communication, and gaining deeper insights into your network traffic at the application layer. Use cases of L7 policy API Security: For applications exposing APIs, L7 policies provide fine-grained control over API access. You can define policies that only allow specific HTTP(S) methods (e.g., GET for read-only operations, POST for creating resources) on particular API paths. This helps in enforcing API security best practices and prevent unnecessary access. Zero-Trust Implementation: L7 policies are a key component in implementing a Zero-Trust security model within your AKS environment. By default-denying all traffic and then explicitly allowing only necessary communication based on application-layer context, you can significantly reduce the attack surface and improve overall security posture. Microservice Segmentation and Isolation: In microservice architecture, it's essential to isolate services to limit the blast radius of potential security breaches. L7 policies allow you to define precise rules for inter-service communication. For example, you can ensure that a billing service can only be accessed by an order processing service via specific API endpoints and HTTP(S) methods, preventing unauthorized access from other services. How Does It Work? When a pod sends out network traffic, it is first checked against your defined rules using a small, efficient program called an extended Berkley Packet Filter (eBPF) probe. This probe marks the traffic if L7 policies are enabled for that pod, it is then redirected to a local Envoy Proxy. The Envoy proxy here is part of the ACNS Security Agent, separate from the Cilium agent. The Envoy Proxy, part of the ACNS Security Agent, then acts as a gatekeeper, deciding whether the traffic is allowed to proceed based on your policy criteria. If the traffic is permitted, it flows to its destination. If not, the application receives an error message saying access denied. Example: Restricting HTTP POST Requests Let's say you have a web application running on your AKS cluster, and you want to ensure that a specific backend service (backend-app) only accepts GET requests on the /data path from your frontend application (frontend-app). With L7 Network Policies, you can define a policy like this: apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: allow-get-products spec: endpointSelector: matchLabels: app: backend-app # Replace with your backend app name ingress: - fromEndpoints: - matchLabels: app: frontend-app # Replace with your frontend app name toPorts: - ports: - port: "80” protocol: TCP rules: http: - method: "GET" path: "/data" How Can You Observe L7 Traffic? Advanced Container Networking Services with L7 policies also provides observability into L7 traffic, including HTTP(S), gRPC, and Kafka, through Hubble. Hubble is enabled by default with Advanced Container Networking Services. To facilitate easy analysis, Advanced Container Networking Services with L7 policy offers pre-configured Azure Managed Grafana dashboards, located in the Dashboards > Azure Managed Prometheus folder. These dashboards, such as "Kubernetes/Networking/L7 (Namespace)" and "Kubernetes/Networking/L7 (Workload)", provide granular visibility into L7 flow data at the cluster, namespace, and workload levels. The screenshot below shows a Grafana dashboard visualizing incoming HTTP traffic for the http-server-866b29bc75 workload in AKS over the last 5 minutes. It displays request rates, policy verdicts (forwarded/dropped), method/status breakdown, and dropped flow rates for real-time monitoring and troubleshooting. This is just an example; similar detailed metrics and visualizations, including heatmaps, are also available for gRPC and Kafka traffic on our pre-created dashboards. For real-time log analysis, you can also leverage the Hubble CLI and UI options offered as part of the Advanced Container Networking Services observability solution, allowing you to inspect individual L7 flows and troubleshoot any policy enforcement. Call to Action We encourage you to try out the public preview of L7 Network Policies on your AKS clusters and level up your network security controls for containerized workloads. We value your feedback as we continue to develop and improve this feature. Please refer to the Layer 7 Policy Overview for more information and visit How to Apply L7 Policy for an example scenario.1.1KViews2likes0CommentsSecure, High-Performance Networking for Data-Intensive Kubernetes Workloads
In today’s data-driven world, AI and high-performance computing (HPC) workloads demand a robust, scalable, and secure networking infrastructure. As organizations rely on Kubernetes to manage these complex workloads, the need for advanced network performance becomes paramount. In this blog series, we explore how Azure CNI powered by Cilium, built on eBPF technology, is transforming Kubernetes networking. From high throughput and low latency to enhanced security and real-time observability, discover how these cutting-edge advancements are paving the way for secure, high-performance AI workloads. Ready to optimize your Kubernetes clusters?1.8KViews2likes0CommentsProject Pavilion Presence at KubeCon NA 2025
KubeCon + CloudNativeCon NA took place in Atlanta, Georgia, from 10-13 November, and continued to highlight the ongoing growth of the open source, cloud-native community. Microsoft participated throughout the event and supported several open source projects in the Project Pavilion. Microsoft’s involvement reflected our commitment to upstream collaboration, open governance, and enabling developers to build secure, scalable and portable applications across the ecosystem. The Project Pavilion serves as a dedicated, vendor-neutral space on the KubeCon show floor reserved for CNCF projects. Unlike the corporate booths, it focuses entirely on open source collaboration. It brings maintainers and contributors together with end users for hands-on demos, technical discussions, and roadmap insights. This space helps attendees discover emerging technologies and understand how different projects fit into the cloud-native ecosystem. It plays a critical role for idea exchanges, resolving challenges and strengthening collaboration across CNCF approved technologies. Why Our Presence Matters KubeCon NA remains one of the most influential gatherings for developers and organizations shaping the future of cloud-native computing. For Microsoft, participating in the Project Pavilion helps advance our goals of: Open governance and community-driven innovation Scaling vital cloud-native technologies Secure and sustainable operations Learning from practitioners and adopters Enabling developers across clouds and platforms Many of Microsoft’s products and cloud services are built on or aligned with CNCF and open-source technologies. Being active within these communities ensures that we are contributing back to the ecosystem we depend on and designing by collaborating with the community, not just for it. Microsoft-Supported Pavilion Projects containerd Representative: Wei Fu The containerd team engaged with project maintainers and ecosystem partners to explore solutions for improving AI model workflows. A key focus was the challenge of handling large OCI artifacts (often 500+ GiB) used in AI training workloads. Current image-pulling flows require containerd to fetch and fully unpack blobs, which significantly delays pod startup for large models. Collaborators from Docker, NTT, and ModelPack discussed a non-unpacking workflow that would allow training workloads to consume model data directly. The team plans to prototype this behavior as an experimental feature in containerd. Additional discussions included updates related to nerdbox and next steps for the erofs snapshotter. Copacetic Representative: Joshua Duffney The Copa booth attracted roughly 75 attendees, with strong representation from federal agencies and financial institutions, a sign of growing adoption in regulated industries. A lightning talk delivered at the conference significantly boosted traffic and engagement. Key feedback and insights included: High interest in customizable package update sources Demand for application-level patching beyond OS-level updates Need for clearer CI/CD integration patterns Expectations around in-cluster image patching Questions about runtime support, including Podman The conversations revealed several documentation gaps and feature opportunities that will inform Copa’s roadmap and future enablement efforts. Drasi Representative: Nandita Valsan KubeCon NA 2025 marked Drasi’s first in-person presence since its launch in October 2024 and its entry into the CNCF Sandbox in early 2025. With multiple kiosk slots, the team interacted with ~70 visitors across shifts. Engagement highlights included: New community members joining the Drasi Discord and starring GitHub repositories Meaningful discussions with observability and incident management vendors interested in change-driven architectures Positive reception to Aman Singh’s conference talk, which led attendees back to the booth for deeper technical conversations Post-event follow-ups are underway with several sponsors and partners to explore collaboration opportunities. Flatcar Container Linux Representatives: Sudhanva Huruli and Vamsi Kavuru The Flatcar project had some fantastic conversations at the pavilion. Attendees were eager to learn about bare metal provisioning, GPU support for AI workloads, and how Flatcar’s fully automated build and test process keeps things simple and developer friendly. Questions around Talos vs. Flatcar and CoreOS sparked lively discussions, with the team emphasizing Flatcar’s usability and independence from an OS-level API. Interest came from government agencies and financial institutions, and the preview of Flatcar on AKS opened the door to deeper conversations about real-world adoption. The Project Pavilion proved to be the perfect venue for authentic, technical exchanges. Flux Representatives: Dipti Pai The Flux booth was active throughout all three days of the Project Pavilion, where Microsoft joined other maintainers to highlight new capabilities in Flux 2.7, including improved multi-tenancy, enhanced observability, and streamlined cloud-native integrations. Visitors shared real-world GitOps experiences, both successes and challenges, which provided valuable insights for the project’s ongoing development. Microsoft’s involvement reinforced strong collaboration within the Flux community and continued commitment to advancing GitOps practices. Headlamp Representatives: Joaquim Rocha, Will Case, and Oleksandr Dubenko Headlamp had a booth for all three days of the conference, engaging with both longstanding users and first-time attendees. The increased visibility from becoming a Kubernetes sub-project was evident, with many attendees sharing their usage patterns across large tech organizations and smaller industrial teams. The booth enabled maintainers to: Gather insights into how teams use Headlamp in different environments Introduce Headlamp to new users discovering it via talks or hallway conversations Build stronger connections with the community and understand evolving needs Inspektor Gadget Representatives: Jose Blanquicet and Mauricio Vásquez Bernal Hosting a half-day kiosk session, Inspektor Gadget welcomed approximately 25 visitors. Attendees included newcomers interested in learning the basics and existing users looking for updates. The team showcased new capabilities, including the tcpdump gadget and Prometheus metrics export, and invited visitors to the upcoming contribfest to encourage participation. Istio Representatives: Keith Mattix, Jackie Maertens, Steven Jin Xuan, Niranjan Shankar, and Mike Morris The Istio booth continued to attract a mix of experienced adopters and newcomers seeking guidance. Technical discussions focused on: Enhancements to multicluster support in ambient mode Migration paths from sidecars to ambient Improvements in Gateway API availability and usage Performance and operational benefits for large-scale deployments Users, including several Azure customers, expressed appreciation for Microsoft’s sustained investment in Istio as part of their service mesh infrastructure. Notary Project Representative: Feynman Zhou and Toddy Mladenov The Notary Project booth saw significant interest from practitioners concerned with software supply chain security. Attendees discussed signing, verification workflows, and integrations with Azure services and Kubernetes clusters. The conversations will influence upcoming improvements across Notary Project and Ratify, reinforcing Microsoft’s commitment to secure artifacts and verifiable software distribution. Open Policy Agent (OPA) - Gatekeeper Representative: Jaydip Gabani The OPA/Gatekeeper booth enabled maintainers to connect with both new and existing users to explore use cases around policy enforcement, Rego/CEL authoring, and managing large policy sets. Many conversations surfaced opportunities around simplifying best practices and reducing management complexity. The team also promoted participation in an ongoing Gatekeeper/OPA survey to guide future improvements. ORAS Representative: Feynman Zhou and Toddy Mladenov ORAS engaged developers interested in OCI artifacts beyond container images which includes AI/ML models, metadata, backups, and multi-cloud artifact workflows. Attendees appreciated ORAS’s ecosystem integrations and found the booth examples useful for understanding how artifacts are tagged, packaged, and distributed. Many users shared how they leverage ORAS with Azure Container Registry and other OCI-compatible registries. Radius Representative: Zach Casper The Radius booth attracted the attention of platform engineers looking for ways to simplify their developer's experience while being able to enforce enterprise-grade infrastructure and security best practices. Attendees saw demos on deploying a database to Kubernetes and using managed databases from AWS and Azure without modifying the application deployment logic. They also saw a preview of Radius integration with GitHub Copilot enabling AI coding agents to autonomously deploy and test applications in the cloud. Conclusion KubeCon + CloudNativeCon North America 2025 reinforced the essential role of open source communities in driving innovation across cloud native technologies. Through the Project Pavilion, Microsoft teams were able to exchange knowledge with other maintainers, gather user feedback, and support projects that form foundational components of modern cloud infrastructure. Microsoft remains committed to building alongside the community and strengthening the ecosystem that powers so much of today’s cloud-native development. For anyone interested in exploring or contributing to these open source efforts, please reach out directly to each project’s community to get involved, or contact Lexi Nadolski at lexinadolski@microsoft.com for more information.195Views1like0CommentsIntroducing eBPF Host Routing: High performance AI networking with Azure CNI powered by Cilium
AI-driven applications demand low-latency workloads for optimal user experience. To meet this need, services are moving to containerized environments, with Kubernetes as the standard. Kubernetes networking relies on the Container Network Interface (CNI) for pod connectivity and routing. Traditional CNI implementations use iptables for packet processing, adding latency and reducing throughput. Azure CNI powered by Cilium natively integrates Azure Kubernetes service (AKS) data plane with Azure CNI networking modes for superior performance, hardware offload support, and enterprise-grade reliability. Azure CNI powered by Cilium delivers up to 30% higher throughput in both benchmark and real-world customer tests compared to a bring-your-own Cilium setup on AKS. The next leap forward: Now, AKS data plane performance can be optimized even further with eBPF host routing, which is an open-source Cilium CNI capability that accelerates packet forwarding by executing routing logic directly in eBPF. As shown in the figure, this architecture eliminates reliance on iptables and connection tracking (conntrack) within the host network namespace. As a result, significantly improving packet processing efficiency, reducing CPU overhead and optimized performance for modern workloads. Comparison of host routing using the Linux kernel stack vs eBPF Azure CNI powered by Cilium is battle-tested for mission-critical workloads, backed by Microsoft support, and enriched with Advanced Container Networking Services features for security, observability, and accelerated performance. eBPF host routing is now included as part of Advanced Container Networking Services suite, delivering network performance acceleration. In this blog, we highlight the performance benefits of eBPF host routing, explain how to enable it in an AKS cluster, and provide a deep dive into its implementation on Azure. We start by examining AKS cluster performance before and after enabling eBPF host routing. Performance comparison Our comparative benchmarks measure the difference in Azure CNI Powered by Cilium, by enabling eBPF host routing. To perform these measurements, we use AKS clusters on K8s version 1.33, with host nodes of 16 cores, running Ubuntu 24.04. We are interested in throughput and latency numbers for pod-to-pod traffic in these clusters. For throughput measurements, we deploy netperf client and server pods, and measure TCP_STREAM throughput at varying message sizes in tests running 20 seconds each. The wide range of message sizes are meant to capture the variety of workloads running on AKS clusters, ranging from AI training and inference to messaging systems and media streaming. For latency, we run TCP_RR tests, measuring latency at various percentiles, as well as transaction rates. The following figure compares pods on the same node; eBPF-based routing results in a dramatic improvement in throughput (~30%). This is because, on the same node, the throughput is not constrained by factors such as the VM NIC limits and is almost entirely determined by host routing performance. For pod-to-pod throughput across different nodes in the cluster. eBPF host routing results in better pod-to-pod throughput across nodes, and the difference is more pronounced with smaller message sizes (3x more). This is because, with smaller messages, the per-message overhead incurred in the host network stack has a bigger impact on performance. Next, we compare latency for pod-to-pod traffic. We limit this benchmark to intra-node traffic, because cross-node traffic latency is determined by factors other than the routing latency incurred in the hosts. eBPF host routing results in reduced latency compared to the non-accelerated configuration at all measured percentiles. We have also measured the transaction rate between client and server pods, with and without eBPF host routing. This benchmark is an alternative measurement of latency because a transaction is essentially a small TCP request/response pair. We observe that eBPF host routing improves transactions per second by around 27% as compared to legacy host routing. Transactions/second (same node) Azure CNI configuration Transactions/second eBPF host routing 20396.9 Traditional host routing 16003.7 Enabling eBPF routing through Advanced Container Networking Services eBPF host routing is disabled by default in Advanced Container Networking Services because bypassing iptables in the host network namespace can ignore custom user rules and host-level security policies. This may lead to visible failures such as dropped traffic or broken network policies, as well as silent issues like unintended access or missed audit logs. To mitigate these risks, eBPF host routing is offered as an opt-in feature, enabled through Advanced Container Networking Services on Azure CNI powered by Cilium. The Advanced Container Networking Services advantage: Built-in safeguards: Enabling eBPF Host Routing in ACNS enhances the open-source offering with strong built-in safeguards. Before activation, ACNS validates existing iptables rules in the host network namespace and blocks enablement if user-defined rules are detected. Once enabled, kernel-level protections prevent new iptables rules and generate Kubernetes events for visibility. These measures allow customers to benefit from eBPF’s performance gains while maintaining security and reliability. Thanks to the additional safeguards, eBPF host routing in Advanced Container Networking Services is a safer and more robust option for customers who wish to obtain the best possible networking performance on their Kubernetes infrastructure. How to enable eBPF Host Routing with ACNS Visit the documentation on how to enable eBPF Host Routing for new and existing Azure CNI Powered by Cilium clusters. Verify the network profile with the new performance `accelerationMode`field set to `BpfVeth`. "networkProfile": { "advancedNetworking": { "enabled": true, "performance": { "accelerationMode": "BpfVeth" }, … For more information on Advanced Container Networking Services and ACNS Performance, please visit https://aka.ms/acnsperformance. Resources For more info about Advanced Container Networking Services please visit (Container Network Security with Advanced Container Networking Services (ACNS) - Azure Kubernetes Service | Microsoft Learn). For more info about Azure CNI Powered by Cilium please visit (Configure Azure CNI Powered by Cilium in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Learn).848Views1like2Comments