open-source

9 Topics

Use cases of Advanced Network Observability for your Azure Kubernetes Service clusters
The blog explores the use cases of Advanced Network Observability for Azure Kubernetes Service (AKS) clusters. It introduces the Advanced Network Observability feature, which brings Hubble's control plane to both Cilium and Non-Cilium Linux data planes. This feature provides deep insights into containerized workloads, enabling precise detection and root-cause analysis of network-related issues in Kubernetes clusters. The document also includes customer scenarios that demonstrate the benefits of Advanced Network Observability, such as DNS metrics, network policy drops at the pod level, and traffic imbalance for pods within a workload
VamsiKalapala
Jul 19, 2024 Place Azure Networking Blog
4.1KViews
4likes
2Comments
Paraglider project released as open source to simplify networking within and across clouds
Learn about a new cross-industry open source initiative to simplify single-cloud and multi-cloud network set up and management.
divega
May 08, 2024 Place Azure Networking Blog
2KViews
1like
0Comments
Secure, High-Performance Networking for Data-Intensive Kubernetes Workloads
In today’s data-driven world, AI and high-performance computing (HPC) workloads demand a robust, scalable, and secure networking infrastructure. As organizations rely on Kubernetes to manage these complex workloads, the need for advanced network performance becomes paramount. In this blog series, we explore how Azure CNI powered by Cilium, built on eBPF technology, is transforming Kubernetes networking. From high throughput and low latency to enhanced security and real-time observability, discover how these cutting-edge advancements are paving the way for secure, high-performance AI workloads. Ready to optimize your Kubernetes clusters?
SrinivasJasti
Oct 15, 2024 Place Azure Networking Blog
1.7KViews
2likes
0Comments
Deploying Third-Party Firewalls in Azure Landing Zones: Design, Configuration, and Best Practices
As enterprises adopt Microsoft Azure for large-scale workloads, securing network traffic becomes a critical part of the platform foundation. Azure’s Well-Architected Framework provides the blueprint for enterprise-scale Landing Zone design and deployments, and while Azure Firewall is a built-in PaaS option, some organizations prefer third-party firewall appliances for familiarity, feature depth, and vendor alignment. This blog explains the basic design patterns, key configurations, and best practices when deploying third-party firewalls (Palo Alto, Fortinet, Check Point, etc.) as part of an Azure Landing Zone. 1. Landing Zone Architecture and Firewall Role The Azure Landing Zone is Microsoft’s recommended enterprise-scale architecture for adopting cloud at scale. It provides a standardized, modular design that organizations can use to deploy and govern workloads consistently across subscriptions and regions. At its core, the Landing Zone follows a hub-and-spoke topology: Hub (Connectivity Subscription): Central place for shared services like DNS, private endpoints, VPN/ExpressRoute gateways, Azure Firewall (or third-party firewall appliances), Bastion, and monitoring agents. Provides consistent security controls and connectivity for all workloads. Firewalls are deployed here to act as the traffic inspection and enforcement point. Spokes (Workload Subscriptions): Application workloads (e.g., SAP, web apps, data platforms) are placed in spoke VNets. Additional specialized spokes may exist for Identity, Shared Services, Security, or Management. These are isolated for governance and compliance, but all connectivity back to other workloads or on-premises routes through the hub. Traffic Flows Through Firewalls North-South Traffic: Inbound connections from the Internet (e.g., customer access to applications). Outbound connections from Azure workloads to Internet services. Hybrid connectivity to on-premises datacenters or other clouds. Routed through the external firewall set for inspection and policy enforcement. East-West Traffic: Lateral traffic between spokes (e.g., Application VNet to Database VNet). Communication across environments like Dev → Test → Prod (if allowed). Routed through an internal firewall set to apply segmentation, zero-trust principles, and prevent lateral movement of threats. Why Firewalls Matter in the Landing Zone While Azure provides NSGs (Network Security Groups) and Route Tables for basic packet filtering and routing, these are not sufficient for advanced security scenarios. Firewalls add: Deep packet inspection (DPI) and application-level filtering. Intrusion Detection/Prevention (IDS/IPS) capabilities. Centralized policy management across multiple spokes. Segmentation of workloads to reduce blast radius of potential attacks. Consistent enforcement of enterprise security baselines across hybrid and multi-cloud. Organizations May Choose Depending on security needs, cost tolerance, and operational complexity, organizations typically adopt one of two models for third party firewalls: Two sets of firewalls One set dedicated for north-south traffic (external to Azure). Another set for east-west traffic (between VNets and spokes). Provides the highest security granularity, but comes with higher cost and management overhead. Single set of firewalls A consolidated deployment where the same firewall cluster handles both east-west and north-south traffic. Simpler and more cost-effective, but may introduce complexity in routing and policy segregation. This design choice is usually made during Landing Zone design, balancing security requirements, budget, and operational maturity. 2. Why Choose Third-Party Firewalls Over Azure Firewall? While Azure Firewall provides simplicity as a managed service, customers often choose third-party solutions due to: Advanced features – Deep packet inspection, IDS/IPS, SSL decryption, threat feeds. Vendor familiarity – Network teams trained on Palo Alto, Fortinet, or Check Point. Existing contracts – Enterprise license agreements and support channels. Hybrid alignment – Same vendor firewalls across on-premises and Azure. Azure Firewall is a fully managed PaaS service, ideal for customers who want a simple, cloud-native solution without worrying about underlying infrastructure. However, many enterprises continue to choose third-party firewall appliances (Palo Alto, Fortinet, Check Point, etc.) when implementing their Landing Zones. The decision usually depends on capabilities, familiarity, and enterprise strategy. Key Reasons to Choose Third-Party Firewalls Feature Depth and Advanced Security Third-party vendors offer advanced capabilities such as: Deep Packet Inspection (DPI) for application-aware filtering. Intrusion Detection and Prevention (IDS/IPS). SSL/TLS decryption and inspection. Advanced threat feeds, malware protection, sandboxing, and botnet detection. While Azure Firewall continues to evolve, these vendors have a longer track record in advanced threat protection. Operational Familiarity and Skills Network and security teams often have years of experience managing Palo Alto, Fortinet, or Check Point appliances on-premises. Adopting the same technology in Azure reduces the learning curve and ensures faster troubleshooting, smoother operations, and reuse of existing playbooks. Integration with Existing Security Ecosystem Many organizations already use vendor-specific management platforms (e.g., Panorama for Palo Alto, FortiManager for Fortinet, or SmartConsole for Check Point). Extending the same tools into Azure allows centralized management of policies across on-premises and cloud, ensuring consistent enforcement. Compliance and Regulatory Requirements Certain industries (finance, healthcare, government) require proven, certified firewall vendors for security compliance. Customers may already have third-party solutions validated by auditors and prefer extending those to Azure for consistency. Hybrid and Multi-Cloud Alignment Many enterprises run a hybrid model, with workloads split across on-premises, Azure, AWS, or GCP. Third-party firewalls provide a common security layer across environments, simplifying multi-cloud operations and governance. Customization and Flexibility Unlike Azure Firewall, which is a managed service with limited backend visibility, third-party firewalls give admins full control over operating systems, patching, advanced routing, and custom integrations. This flexibility can be essential when supporting complex or non-standard workloads. Licensing Leverage (BYOL) Enterprises with existing enterprise agreements or volume discounts can bring their own firewall licenses (BYOL) to Azure. This often reduces cost compared to pay-as-you-go Azure Firewall pricing. When Azure Firewall Might Still Be Enough Organizations with simple security needs (basic north-south inspection, FQDN filtering). Cloud-first teams that prefer managed services with minimal infrastructure overhead. Customers who want to avoid manual scaling and VM patching that comes with IaaS appliances. In practice, many large organizations use a hybrid approach: Azure Firewall for lightweight scenarios or specific environments, and third-party firewalls for enterprise workloads that require advanced inspection, vendor alignment, and compliance certifications. 3. Deployment Models in Azure Third-party firewalls in Azure are primarily IaaS-based appliances deployed as virtual machines (VMs). Leading vendors publish Azure Marketplace images and ARM/Bicep templates, enabling rapid, repeatable deployments across multiple environments. These firewalls allow organizations to enforce advanced network security policies, perform deep packet inspection, and integrate with Azure-native services such as Virtual Network (VNet) peering, Azure Monitor, and Azure Sentinel. Note: Some vendors now also release PaaS versions of their firewalls, offering managed firewall services with simplified operations. However, for the purposes of this blog, we will focus mainly on IaaS-based firewall deployments. Common Deployment Modes Active-Active Description: In this mode, multiple firewall VMs operate simultaneously, sharing the traffic load. An Azure Load Balancer distributes inbound and outbound traffic across all active firewall instances. Use Cases: Ideal for environments requiring high throughput, resilience, and near-zero downtime, such as enterprise data centers, multi-region deployments, or mission-critical applications. Considerations: Requires careful route and policy synchronization between firewall instances to ensure consistent traffic handling. Typically involves BGP or user-defined routes (UDRs) for optimal traffic steering. Scaling is easier: additional firewall VMs can be added behind the load balancer to handle traffic spikes. Active-Passive Description: One firewall VM handles all traffic (active), while the secondary VM (passive) stands by for failover. When the active node fails, Azure service principals manage IP reassignment and traffic rerouting. Use Cases: Suitable for environments where simpler management and lower operational complexity are preferred over continuous load balancing. Considerations: Failover may result in a brief downtime, typically measured in seconds to a few minutes. Synchronization between the active and passive nodes ensures firewall policies, sessions, and configurations are mirrored. Recommended for smaller deployments or those with predictable traffic patterns. Network Interfaces (NICs) Third-party firewall VMs often include multiple NICs, each dedicated to a specific type of traffic: Untrust/Public NIC: Connects to the Internet or external networks. Handles inbound/outbound public traffic and enforces perimeter security policies. Trust/Internal NIC: Connects to private VNets or subnets. Manages internal traffic between application tiers and enforces internal segmentation. Management NIC: Dedicated to firewall management traffic. Keeps administration separate from data plane traffic, improving security and reducing performance interference. HA NIC (Active-Passive setups): Facilitates synchronization between active and passive firewall nodes, ensuring session and configuration state is maintained across failovers. This design choice is usually made during Landing Zone design, balancing security requirements, budget, and operational maturity. : NICs of Palo Alto External Firewalls and FortiGate Internal Firewalls in two sets of firewall scenario 4. Key Configuration Considerations When deploying third-party firewalls in Azure, several design and configuration elements play a critical role in ensuring security, performance, and high availability. These considerations should be carefully aligned with organizational security policies, compliance requirements, and operational practices. Routing User-Defined Routes (UDRs): Define UDRs in spoke Virtual Networks to ensure all outbound traffic flows through the firewall, enforcing inspection and security policies before reaching the Internet or other Virtual Networks. Centralized routing helps standardize controls across multiple application Virtual Networks. Depending on the architecture traffic flow design, use appropriate Load Balancer IP as the Next Hop on UDRs of spoke Virtual Networks. Symmetric Routing: Ensure traffic follows symmetric paths (i.e., outbound and inbound flows pass through the same firewall instance). Avoid asymmetric routing, which can cause stateful firewalls to drop return traffic. Leverage BGP with Azure Route Server where supported, to simplify route propagation across hub-and-spoke topologies. : Azure UDR directing all traffic from a Spoke VNET to the Firewall IP Address Policies NAT Rules: Configure DNAT (Destination NAT) rules to publish applications securely to the Internet. Use SNAT (Source NAT) to mask private IPs when workloads access external resources. Security Rules: Define granular allow/deny rules for both north-south traffic (Internet to VNet) and east-west traffic (between Virtual Networks or subnets). Ensure least privilege by only allowing required ports, protocols, and destinations. Segmentation: Apply firewall policies to separate workloads, environments, and tenants (e.g., Production vs. Development). Enforce compliance by isolating workloads subject to regulatory standards (PCI-DSS, HIPAA, GDPR). Application-Aware Policies: Many vendors support Layer 7 inspection, enabling controls based on applications, users, and content (not just IP/port). Integrate with identity providers (Azure AD, LDAP, etc.) for user-based firewall rules. : Example Configuration of NAT Rules on a Palo Alto External Firewall Load Balancers Internal Load Balancer (ILB): Use ILBs for east-west traffic inspection between Virtual Networks or subnets. Ensures that traffic between applications always passes through the firewall, regardless of origin. External Load Balancer (ELB): Use ELBs for north-south traffic, handling Internet ingress and egress. Required in Active-Active firewall clusters to distribute traffic evenly across firewall nodes. Other Configurations: Configure health probes for firewall instances to ensure faulty nodes are automatically bypassed. Validate Floating IP configuration on Load Balancing Rules according to the respective vendor recommendations. Identity Integration Azure Service Principals: In Active-Passive deployments, configure service principals to enable automated IP reassignment during failover. This ensures continuous service availability without manual intervention. Role-Based Access Control (RBAC): Integrate firewall management with Azure RBAC to control who can deploy, manage, or modify firewall configurations. SIEM Integration: Stream logs to Azure Monitor, Sentinel, or third-party SIEMs for auditing, monitoring, and incident response. Licensing Pay-As-You-Go (PAYG): Licenses are bundled into the VM cost when deploying from the Azure Marketplace. Best for short-term projects, PoCs, or variable workloads. Bring Your Own License (BYOL): Enterprises can apply existing contracts and licenses with vendors to Azure deployments. Often more cost-effective for large-scale, long-term deployments. Hybrid Licensing Models: Some vendors support license mobility from on-premises to Azure, reducing duplication of costs. 5. Common Challenges Third-party firewalls in Azure provide strong security controls, but organizations often face practical challenges in day-to-day operations: Misconfiguration Incorrect UDRs, route tables, or NAT rules can cause dropped traffic or bypassed inspection. Asymmetric routing is a frequent issue in hub-and-spoke topologies, leading to session drops in stateful firewalls. Performance Bottlenecks Firewall throughput depends on the VM SKU (CPU, memory, NIC limits). Under-sizing causes latency and packet loss, while over-sizing adds unnecessary cost. Continuous monitoring and vendor sizing guides are essential. Failover Downtime Active-Passive models introduce brief service interruptions while IPs and routes are reassigned. Some sessions may be lost even with state sync, making Active-Active more attractive for mission-critical workloads. Backup & Recovery Azure Backup doesn’t support vendor firewall OS. Configurations must be exported and stored externally (e.g., storage accounts, repos, or vendor management tools). Without proper backups, recovery from failures or misconfigurations can be slow. Azure Platform Limits on Connections Azure imposes a per-VM cap of 250,000 active connections, regardless of what the firewall vendor appliance supports. This means even if an appliance is designed for millions of sessions, it will be constrained by Azure’s networking fabric. Hitting this cap can lead to unexplained traffic drops despite available CPU/memory. The workaround is to scale out horizontally (multiple firewall VMs behind a load balancer) and carefully monitor connection distribution. 6. Best Practices for Third-Party Firewall Deployments To maximize security, reliability, and performance of third-party firewalls in Azure, organizations should follow these best practices: Deploy in Availability Zones: Place firewall instances across different Availability Zones to ensure regional resilience and minimize downtime in case of zone-level failures. Prefer Active-Active for Critical Workloads: Where zero downtime is a requirement, use Active-Active clusters behind an Azure Load Balancer. Active-Passive can be simpler but introduces failover delays. Use Dedicated Subnets for Interfaces: Separate trust, untrust, HA, and management NICs into their own subnets. This enforces segmentation, simplifies route management, and reduces misconfiguration risk. Apply Least Privilege Policies: Always start with a deny-all baseline, then allow only necessary applications, ports, and protocols. Regularly review rules to avoid policy sprawl. Standardize Naming & Tagging: Adopt consistent naming conventions and resource tags for firewalls, subnets, route tables, and policies. This aids troubleshooting, automation, and compliance reporting. Validate End-to-End Traffic Flows: Test both north-south (Internet ↔ VNet) and east-west (VNet ↔ VNet/subnet) flows after deployment. Use tools like Azure Network Watcher and vendor traffic logs to confirm inspection. Plan for Scalability: Monitor throughput, CPU, memory, and session counts to anticipate when scale-out or higher VM SKUs are needed. Some vendors support autoscaling clusters for bursty workloads. Maintain Firmware & Threat Signatures: Regularly update the firewall’s software, patches, and threat intelligence feeds to ensure protection against emerging vulnerabilities and attacks. Automate updates where possible. Conclusion Third-party firewalls remain a core building block in many enterprise Azure Landing Zones. They provide the deep security controls and operational familiarity enterprises need, while Azure provides the scalable infrastructure to host them. By following the hub-and-spoke architecture, carefully planning deployment models, and enforcing best practices for routing, redundancy, monitoring, and backup, organizations can ensure a secure and reliable network foundation in Azure.
PramodPalukuru
Nov 05, 2025 Place Azure Networking Blog
1.6KViews
5likes
2Comments
Securing Microservices with Cilium and Istio
The adoption of Kubernetes and containerized applications is booming, leading to new challenges in visibility and security. As the landscape of cloud-native applications is rapidly evolving so are the number of sophisticated attacks targeting containerized workloads. Traditional tools often fall short in tracking the usage and traffic flows within these applications. The immutable nature of container images and the short lifespan of containers further necessitate addressing vulnerabilities early in the delivery pipeline. Comprehensive Security Controls in Kubernetes Microsoft Azure offers a range of security controls to ensure comprehensive protection across various layers of the Kubernetes environment. These controls include but are not limited to: Cluster Security: Features such as private clusters, managed cluster identity, and API server authorized ranges enhance security at the cluster level. Node and Pod Security: Hardened bootstrapping, confidential nodes, and pod sandboxing are implemented to secure the nodes and pods within a cluster. Network Security: Advanced Container Networking Services and Cilium Network policies offer granular control over network traffic. Authentication and Authorization: Azure Policy in-cluster enforcement, Entra authentication, and Istio mTLS and authorization policies provide robust identity and access management. Image Scanning: Microsoft Defender for Cloud provides both image and runtime scanning to identify vulnerabilities and threats. Let’s highlight how you can secure micro services while scaling your applications running on Azure Kubernetes Service (AKS) using service mesh for robust traffic management, and network policies for security. Micro segmentation with Network Policies Micro segmentation is crucial for enhancing security within Kubernetes clusters, allowing for the isolation of workloads and controlled traffic between microservices. Azure CNI by Cilium leverages eBPF to provide high-performance networking, security, and observability features. It dynamically inserts eBPF bytecode into the Linux kernel, offering efficient and flexible control over network traffic. Cilium Network Policies enable network isolation within and across Kubernetes clusters. Cilium also provides an identity-based security model, offering Layer 7 (L7) traffic control, and integrates deep observability for L4 to L7 metrics in Kubernetes clusters. A significant advantage of using Azure CNI based on Cilium is its seamless integration with existing AKS environments, requiring minimal modifications to your infrastructure. Note that Cilium Clusterwide Network Policy (CCNP) is not supported at the time of writing this blog post. FQDN Filtering with Advanced Container Networking Services (ACNS) Traditional IP-based policies can be cumbersome to maintain. ACNS allows for DNS-based policies, providing a more granular and user-friendly approach to managing network traffic. This is supported only with Azure CNI powered by Cilium and includes a security agent DNS proxy for FQDN resolution even during upgrades. It’s worth noting that with Cilium’s L7 enforcement, you can control traffic based on HTTP methods, paths, and headers, making it ideal for APIs, microservices, and services that use protocols like HTTP, gRPC, or Kafka. At the time of writing this blog, this capability is not supported via ACNS. More on this in a future blog! AKS Istio Add-On: Mutual TLS (mTLS) and Authorization Policy Istio enhances the security of microservices through its built-in features, including mutual TLS (mTLS) and authorization policies. The Istiod control plane, acting as a certificate authority, issues X.509 certificates to the Envoy sidecar proxies via the Secret Discovery Service (SDS). Integration with Azure Key Vault allows for secure management of root and intermediate certificates. The PeerAuthentication Custom Resource in Istio controls the traffic accepted by workloads. By default, it is set to PERMISSIVE to facilitate migration but can be set to STRICT to enforce mTLS across the mesh. Istio also supports granular authorization policies, allowing for control over IP blocks, namespaces, service accounts, request paths, methods, and headers. The Istio add-on also supports integration with Azure Key Vault (AKV) and the AKV Secrets Store CSI Driver Add-On for plug-in CA certificates, where the root CA lives on a secure machine offline, and the intermediate certs for the Istiod control plane are synced to the cluster by the CSI Driver Add-On. Additionally, certificates for the Istio ingress gateway for TLS termination or SNI passthrough can also be stored in AKV. Defense-In-Depth with Cilium, ACNS and Istio Combining the capabilities of Cilium's eBPF technologies through ACNS and AKS managed Istio addon, AKS provides a defense-in-depth strategy for securing Kubernetes clusters. Azure CNI's Cilium Network Policies and ACNS FQDN filtering enforce Pod-to-Pod and Pod-to-egress policies at Layer 3 and 4, while Istio enforces STRICT mTLS and Layer 7 authorization policies. This multi-layered approach ensures comprehensive security coverage across all layers of the stack. Now, let’s highlight the key steps in achieving this: Step 1: Create an AKS Cluster with Azure CNI (by Cilium), ACNS and Istio Addon enabled. az aks create \ --resource-group $RESOURCE_GROUP \ --name $CLUSTER_NAME \ --location $LOCATION \ --kubernetes-version 1.30.0 \ --node-count 3 \ --node-vm-size standard_d16_v3 \ --enable-managed-identity \ --network-plugin azure \ --network-dataplane cilium \ --network-plugin-mode overlay \ --pod-cidr 192.168.0.0/16 \ --enable-asm \ --enable-acns \ --generate-ssh-keys Step 2: Create Cilium FQDN policy that allows egress traffic to google.com while blocking traffic to httpbin.org. Sample Policy (fqdn-filtering-policy.yaml): apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: sleep-network-policy namespace: foo spec: endpointSelector: matchLabels: app: sleep egress: - toFQDNs: - matchPattern: "*.google.com" - toEndpoints: - matchLabels: "k8s:io.kubernetes.pod.namespace": foo "k8s:app": helloworld - toEndpoints: - matchLabels: "k8s:io.kubernetes.pod.namespace": kube-system "k8s:k8s-app": kube-dns toPorts: - ports: - port: "53" protocol: ANY Apply policy: kubectl apply -f fqdn-filtering-policy.yaml Step 3: Create an Istio deny-by-default AuthorizationPolicy. This denies all requests across the mesh unless specifically authorized with an “ALLOW” policy. Sample Policy (istio-deny-all-authz.yaml): apiVersion: security.istio.io/v1 kind: AuthorizationPolicy metadata: name: allow-nothing namespace: aks-istio-system spec: {} Apply policy: kubectl apply -f istio-deny-all-authz.yaml Step 4: Deploy an Istio L7 AuthorizationPolicy to explicitly allow traffic to the “sample” pod in namespace foo for http “GET” requests. Sample Policy (istio-L7-allow-policy.yaml): apiVersion: security.istio.io/v1 kind: AuthorizationPolicy metadata: name: allow-get-requests namespace: foo spec: selector: matchLabels: app: sample action: ALLOW rules: - to: - operation: methods: [“GET”] Apply policy: kubectl apply -f istio-L7-allow-policy.yaml Step 5: Deploy an Istio strict mTLS PeerAuthentication Resource to enforce that all workloads in the mesh only accept Istio mTLS traffic. Sample PeerAuthentication (istio-peerauth.yaml): apiVersion: security.istio.io/v1 kind: PeerAuthentication metadata: name: strict-mtls namespace: aks-istio-system spec: mtls: mode: STRICT Apply policy: kubectl apply -f istio-peerauth.yaml These examples demonstrate how you can manage traffic to specific FQDNs and enforce L7 authorization rules in your AKS cluster. Conclusion Traditional IP and perimeter security models are insufficient for the dynamic nature of cloud-native environments. More sophisticated security mechanisms, such as identity-based policies and DNS names, are required. Azure CNI, powered by Cilium and ACNS, provides robust FQDN filtering and Layer 3/4 Network Policy enforcement. The Istio add-on offers mTLS for identity-based encryption and Layer7 authorization policies. A defense-in-depth model, incorporating both Azure CNI and service mesh mechanisms, is recommended for maximizing security posture. So, give these a try and let us know (Azure Kubernetes Service Roadmap (Public)) how we can evolve our roadmap to help you build the best with Azure. Credit(s): Niranjan Shankar, Sr. Software Engineer, Microsoft
SrinivasJasti
Feb 17, 2025 Place Azure Networking Blog
1.5KViews
3likes
0Comments
Introducing Container Network Logs with Advanced Container Networking Services for AKS
Overview of container network logs Container network logs offer a comprehensive way to monitor network traffic in AKS clusters. Two modes of support, stored-logs and on-demand logs, provides debugging flexibility with cost optimization. The on-demand mode provides a snapshot of logs with queries and visualization with Hubble CLI UI for specific scenarios and does not use log storage to persist the logs. The stored-logs mode when enabled continuously collects and persists logs based on user-defined filters. Logs can be stored either in Azure Log Analytics (managed) or locally (unmanaged). Managed storage: Logs are forwarded to Azure Log Analytics for secure, scalable, and compliant storage. This enables advanced analytics, anomaly detection, and historical trend analysis. Both basic and analytics table plans are supported for storage. Unmanaged storage: Logs are stored locally on the host nodes under /var/log/acns/hubble. These logs are rotated automatically at 50 MB to manage storage efficiently. These logs can be exported to external logging systems or collectors for further analysis. Use cases Connectivity monitoring: Identify and visualize how Kubernetes workloads communicate within the cluster and with external endpoints, helping to resolve application connectivity issues efficiently. Troubleshooting network errors: Gain deep granular visibility into dropped packets, misconfigurations, or errors with details on where and why errors are occurring (TCP/UDP, DNS, HTTP) for faster root cause analysis. Security policy enforcement: Detect and analyze suspicious traffic patterns to strengthen cluster security and ensure regulatory compliance. How it works Container network logs use eBPF technology with Cilium to capture network flows from AKS nodes. Log collection is disabled by default. Users can enable log collection by defining custom resources (CRs) to specify the types of traffic to monitor, such as namespaces, pods, services, or protocols. The Cilium agent collects and processes this traffic, storing logs in JSON format. These logs can either be retained locally or integrated with Azure Monitoring for long-term storage and advanced analytics and visualization with Azure managed Grafana. Fig1: Container network logs overview If using managed storage, users will enable Azure monitor log collection using Azure CLI or ARM templates. Here’s a quick example of enabling container network logs on Azure monitor using the CLI: az aks enable-addons -a monitoring --enable-high-log-scale-mode -g $RESOURCE_GROUP -n $CLUSTER_NAME az aks update --enable-acns \ --enable-retina-flow-logs \ -g $RESOURCE_GROUP \ -n $CLUSTER_NAME Key benefits Faster issue resolution: Detailed logs enable quick identification of connectivity and performance issues. Operational efficiency: Advanced filtering reduces data management overhead. Enhanced application reliability: Proactive monitoring ensures smoother operations. Cost optimization: Customized logging scopes minimize storage and data ingestion costs. Streamlined compliance: Comprehensive logs support audits and security requirements. Observing logs in Azure managed Grafana dashboards Users can visualize container network logs in Azure managed Grafana dashboards, which simplify monitoring and analysis: Flow logs dashboard: View internal communication between Kubernetes workloads. This dashboard highlights metrics such as total requests, dropped packets, and error rates. Error logs dashboard: Easily zoom in only on the logs which show errors for faster log parsing. Service dependency graph: Visualize relationships between services, detect bottlenecks, and optimize network flows. These dashboards provide filtering options to isolate specific logs, such as DNS errors or traffic patterns, enabling efficient root cause analysis. Summary statistics and top-level metrics further enhance understanding of cluster health and activity. Fig 2: Azure managed Grafana dashboard for container network logs Conclusion Container network logs for AKS offer a powerful and cost optimized way to monitor and analyze network activity, enhance troubleshooting, security, and ensure compliance. To get started, enable Advanced Container Networking Services in your AKS cluster and configure custom resources for logging. Visualize your logs in Grafana dashboards and Azure Log Analytics to unlock actionable insights. Learn more here.
SandhyaCastelino
Jun 10, 2025 Place Azure Networking Blog
1.2KViews
3likes
0Comments
High-Scale Kubernetes Networking with Azure CNI Powered by Cilium
Kubernetes users have diverse cluster networking needs, but paramount to them all are efficient pod networking and robust security features. Azure CNI (Container Networking Interface) powered by Cilium is a solution that addresses these needs by blending the capabilities of the Azure CNI’s control plane and Cilium’s eBPF dataplane. Cilium enables performant networking and security by leveraging the power of eBPF (extended Berkeley Packet Filter), a revolutionary Linux kernel technology. eBPF enables the execution of custom code within the Linux kernel, providing both flexibility and efficiency. This translates to: High-performance networking: eBPF enables efficient packet processing, reduced latencies and improved throughput Enhanced security: Azure CNI (AzCNI) powered by Cilium enables network security policies based on DNS to easily manage and secure network traffic through Advanced Network Security features. Better observability: Our eBPF-based Advanced Network Observability suite provides detailed monitoring, tracing and diagnostics tools for cluster users. Introducing CiliumEndpointSlice A performant CNI dataplane is crucial for low-latency, high-throughput pod communication, enhancing distributed application efficiency and user experience. While Cilium’s eBPF-powered dataplane provides high-performance networking today, we sought to further enhance its scalability and performance. To do this, we looked at enabling a new feature in the dataplane’s configuration, CiliumEndpointSlice, thereby achieving: Lower traffic load on the Kubernetes control plane, leading to reduced control plane memory consumption and improved performance Faster pod start-up latencies Faster in-cluster network latencies for better application performance In particular, this feature improves upon how Azure CNI powered by Cilium manages pods. Previously, Cilium managed pods using Custom Resource Definitions (CRDs) called CiliumEndpoints. Each pod has a CiliumEndpoint associated with it. The CRD contains information about a pod’s status and properties. The Cilium Agent, a core component of the dataplane, runs on every node and watches each of these CiliumEndpoints for information about updates to pods. We have observed that this behavior can place significant stress and load on the control plane, leading to performance bottlenecks, especially for larger clusters. To alleviate load on the control plane, we are bringing in CiliumEndpointSlice, a feature which batches CiliumEndpoints and their associated updates. This reduces the number of updates propagated to the control plane. Consequently, we greatly reduce the risk of overloading the control plane at scale, ensuring smoother operation of the cluster. Performance Testing We have conducted performance testing of Azure CNI powered by Cilium with and without CiliumEndpointSlice enabled. The testing was done on a cluster with the following dimensions: 1000 nodes (Standard_D4_v3) 20,000 pods (i.e. 20 pods per node) 1 service with 4000 backends 800 services with 20 backends each The test involved repeating the following actions 10 times: creating deployments and services, restarting deployments, and deleting deployments and services. We detail the various performance metrics measured below. Average APIServer Responsiveness This metric measures the average latency of responses to LIST requests (one of the most expensive types of requests to the control plane) by the kube-apiserver. With CiliumEndpointSlice enabled, we observed a remarkable 50% decrease in latency, dropping from an average of ~1.5 seconds to ~0.25 seconds! For cluster users, this means much faster processing of queries sent to the kube-apiserver, leading to improved performance. Pod Startup Latencies This metric measures the time taken for a pod to be reported as running. Here, an over 60% decrease in pod startup latency was observed with CiliumEndpointSlice enabled, allowing for faster deployment and scaling of applications. In-Cluster Network Latency This is a critical metric, measuring the latency of pings from a prober pod to a server. An over 80% decrease in latency was observed. This reduction in latency translates to better application performance. Azure CNI powered by Cilium offers a powerful eBPF-based solution for Kubernetes networking and security. With the enablement of CiliumEndpointSlice from Kubernetes version 1.32 on AzCNI Powered by Cilium clusters, we see further improvements in application and control plane performance. For more information, visit https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium.
ShreyaJ
Apr 23, 2025 Place Azure Networking Blog
933Views
1like
0Comments
Introducing eBPF Host Routing: High performance AI networking with Azure CNI powered by Cilium
AI-driven applications demand low-latency workloads for optimal user experience. To meet this need, services are moving to containerized environments, with Kubernetes as the standard. Kubernetes networking relies on the Container Network Interface (CNI) for pod connectivity and routing. Traditional CNI implementations use iptables for packet processing, adding latency and reducing throughput. Azure CNI powered by Cilium natively integrates Azure Kubernetes service (AKS) data plane with Azure CNI networking modes for superior performance, hardware offload support, and enterprise-grade reliability. Azure CNI powered by Cilium delivers up to 30% higher throughput in both benchmark and real-world customer tests compared to a bring-your-own Cilium setup on AKS. The next leap forward: Now, AKS data plane performance can be optimized even further with eBPF host routing, which is an open-source Cilium CNI capability that accelerates packet forwarding by executing routing logic directly in eBPF. As shown in the figure, this architecture eliminates reliance on iptables and connection tracking (conntrack) within the host network namespace. As a result, significantly improving packet processing efficiency, reducing CPU overhead and optimized performance for modern workloads. Comparison of host routing using the Linux kernel stack vs eBPF Azure CNI powered by Cilium is battle-tested for mission-critical workloads, backed by Microsoft support, and enriched with Advanced Container Networking Services features for security, observability, and accelerated performance. eBPF host routing is now included as part of Advanced Container Networking Services suite, delivering network performance acceleration. In this blog, we highlight the performance benefits of eBPF host routing, explain how to enable it in an AKS cluster, and provide a deep dive into its implementation on Azure. We start by examining AKS cluster performance before and after enabling eBPF host routing. Performance comparison Our comparative benchmarks measure the difference in Azure CNI Powered by Cilium, by enabling eBPF host routing. To perform these measurements, we use AKS clusters on K8s version 1.33, with host nodes of 16 cores, running Ubuntu 24.04. We are interested in throughput and latency numbers for pod-to-pod traffic in these clusters. For throughput measurements, we deploy netperf client and server pods, and measure TCP_STREAM throughput at varying message sizes in tests running 20 seconds each. The wide range of message sizes are meant to capture the variety of workloads running on AKS clusters, ranging from AI training and inference to messaging systems and media streaming. For latency, we run TCP_RR tests, measuring latency at various percentiles, as well as transaction rates. The following figure compares pods on the same node; eBPF-based routing results in a dramatic improvement in throughput (~30%). This is because, on the same node, the throughput is not constrained by factors such as the VM NIC limits and is almost entirely determined by host routing performance. For pod-to-pod throughput across different nodes in the cluster. eBPF host routing results in better pod-to-pod throughput across nodes, and the difference is more pronounced with smaller message sizes (3x more). This is because, with smaller messages, the per-message overhead incurred in the host network stack has a bigger impact on performance. Next, we compare latency for pod-to-pod traffic. We limit this benchmark to intra-node traffic, because cross-node traffic latency is determined by factors other than the routing latency incurred in the hosts. eBPF host routing results in reduced latency compared to the non-accelerated configuration at all measured percentiles. We have also measured the transaction rate between client and server pods, with and without eBPF host routing. This benchmark is an alternative measurement of latency because a transaction is essentially a small TCP request/response pair. We observe that eBPF host routing improves transactions per second by around 27% as compared to legacy host routing. Transactions/second (same node) Azure CNI configuration Transactions/second eBPF host routing 20396.9 Traditional host routing 16003.7 Enabling eBPF routing through Advanced Container Networking Services eBPF host routing is disabled by default in Advanced Container Networking Services because bypassing iptables in the host network namespace can ignore custom user rules and host-level security policies. This may lead to visible failures such as dropped traffic or broken network policies, as well as silent issues like unintended access or missed audit logs. To mitigate these risks, eBPF host routing is offered as an opt-in feature, enabled through Advanced Container Networking Services on Azure CNI powered by Cilium. The Advanced Container Networking Services advantage: Built-in safeguards: Enabling eBPF Host Routing in ACNS enhances the open-source offering with strong built-in safeguards. Before activation, ACNS validates existing iptables rules in the host network namespace and blocks enablement if user-defined rules are detected. Once enabled, kernel-level protections prevent new iptables rules and generate Kubernetes events for visibility. These measures allow customers to benefit from eBPF’s performance gains while maintaining security and reliability. Thanks to the additional safeguards, eBPF host routing in Advanced Container Networking Services is a safer and more robust option for customers who wish to obtain the best possible networking performance on their Kubernetes infrastructure. How to enable eBPF Host Routing with ACNS Visit the documentation on how to enable eBPF Host Routing for new and existing Azure CNI Powered by Cilium clusters. Verify the network profile with the new performance `accelerationMode`field set to `BpfVeth`. "networkProfile": { "advancedNetworking": { "enabled": true, "performance": { "accelerationMode": "BpfVeth" }, … For more information on Advanced Container Networking Services and ACNS Performance, please visit https://aka.ms/acnsperformance. Resources For more info about Advanced Container Networking Services please visit (Container Network Security with Advanced Container Networking Services (ACNS) - Azure Kubernetes Service | Microsoft Learn). For more info about Azure CNI Powered by Cilium please visit (Configure Azure CNI Powered by Cilium in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Learn).
Sam_Foo
Nov 08, 2025 Place Azure Networking Blog
802Views
1like
2Comments
What's New in the World of eBPF from Azure Container Networking!
Azure Container Networking Interface (CNI) continues to evolve, now bolstered by the innovative capabilities of Cilium. Azure CNI Powered by Cilium (ACPC) leverages Cilium’s extended Berkeley Packet Filter (eBPF) technologies to enable features such as network policy enforcement, deep observability, and improved service routing. Here’s a deeper look into the latest features that make management of Azure Kubernetes Service (AKS) clusters more efficient, scalable, and secure. Improved Performance: Cilium Endpoint Slices One of the standout features in the recent updates is the introduction of CiliumEndpointSlice. This feature significantly enhances the performance and scalability of the Cilium dataplane in AKS clusters. Previously, Cilium used Custom Resource Definitions (CRDs) called CiliumEndpoints to manage pods. Each pod had a CiliumEndpoint associated with it, which contained information about the pod’s status and properties. However, this approach placed significant stress on the control plane, especially in larger clusters. To alleviate this load, CiliumEndpointSlice batches CiliumEndpoints and their updates, reducing the number of updates propagated to the control plane. Our performance testing has shown remarkable improvements: Average API Server Responsiveness: Upto 50% decrease in latency, meaning faster processing of queries. Pod Startup Latencies: Upto 60% reduction, allowing for faster deployment and scaling. In-Cluster Network Latency: Upto 80% decrease, translating to better application performance. Note that this feature is Generally Available in AKS clusters, by default, using Cilium 1.17 release and above and does not require additional configuration changes! Learn more about improvements unlocked by CiliumEndpointSlices with Azure CNI by Cilium - High-Scale Kubernetes Networking with Azure CNI Powered by Cilium | Microsoft Community Hub. Deployment Flexibility: Dual Stack for Cilium Network Policies Kubernetes clusters operating on an IPv4/IPv6 dual-stack network enable workloads to natively access both IPv4 and IPv6 endpoints without incurring additional complexities or performance drawbacks. Previously, we had enabled the use of dual stack networking on AKS clusters (starting with AKS 1.29) running Azure CNI powered by Cilium in preview mode. Now, we are happy to announce that the feature is Generally Available! By enabling both IPv4 and IPv6 addressing, you can manage your production AKS clusters in mixed environments, accommodating various network configurations seamlessly. More importantly, dual-stack support in Azure CNI’s Cilium network policies extend security benefits for AKS clusters in those complex environments. For instance, you can enable dual stack AKS clusters using eBPF dataplane as follows: az aks create \ --location <region> \ --resource-group <resourceGroupName> \ --name <clusterName> \ --network-plugin azure \ --network-plugin-mode overlay \ --network-dataplane cilium \ --ip-families ipv4,ipv6 \ --generate-ssh-keys Learn more about Azure CNI’s Network Policies - Configure Azure CNI Powered by Cilium in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Learn Ease of Use: Node Subnet Mode with Cilium Azure CNI now supports Node Subnet IPAM mode with Cilium Dataplane. In Node Subnet mode IP addresses to pods are assigned from the same subnet as the node itself, simplifying routing and policy management. This mode is particularly beneficial for smaller clusters where managing multiple subnets is cumbersome. AKS clusters using this mode also gain the benefits of improved network observability, Cilium Network Policies and FQDN filtering and more capabilities unlocked by Advanced Container Networking Services (ACNS). More notable, with this feature we now support all IPAM configuration options with eBPF dataplane on AKS clusters. You can create an AKS cluster with node subnet IPAM mode and eBPF dataplane as follows: az aks create \ --name <clusterName> \ --resource-group <resourceGroupName> \ --location <location> \ --network-plugin azure \ --network-dataplane cilium \ --generate-ssh-keys Learn more about Node Subnet - Configure Azure CNI Powered by Cilium in Azure Kubernetes Service (AKS) - Azure Kubernetes Service | Microsoft Learn. Defense-in-depth: Cilium Layer 7 Policies Azure CNI by Cilium extends its comprehensive Layer4 network policy capabilities to Layer7, offering granular control over application traffic. This feature enables users to define security policies based on application-level protocols and metadata, adding a powerful layer of security and compliance management. Layer7 policies are implemented using Envoy, an open-source service proxy, which is part of ACNS Security Agent operating in conjunction with the Cilium agent. Envoy handles traffic between services and provides necessary visibility and control at the application layer. Policies can be enforced based on HTTP and gRPC methods, paths, headers, and other application-specific attributes. Additionally, Cilium Network Policies support Kafka-based workflows, enhancing security and traffic management. This feature is currently in public preview mode and you can learn more about the getting started experience here - Introducing Layer 7 Network Policies with Advanced Container Networking Services for AKS Clusters! | Microsoft Community Hub. Coming Soon: Transparent Encryption with Wireguard By leveraging Cilium’s Wireguard, customers can achieve regulatory compliance by ensuring that all network traffic, whether HTTP-based or non-HTTP, is encrypted. Users can enable inter-node transparent encryption in their Kubernetes environments using Cilium’s open-source based solution. When Wireguard is enabled, the cilium agent on each cluster node will establish a secure Wireguard tunnel with all other known nodes in the cluster to encrypt traffic between cilium endpoints. This feature will soon be in public preview and will be enabled as part of ACNS. Stay tuned for more details on this. Conclusion These new features in Azure CNI Powered by Cilium underscore our commitment to enhancing default network performance and security in your AKS environments, all while collaborating with the open-source community. From the impressive performance boost with CiliumEndpointSlice to the adaptability of dual-stack support and the advanced security of Layer7 policies and Wireguard based encryption, these innovations ensure your AKS clusters are not just ready for today but are primed for the future. Also, don’t forget to dive into the fascinating world of eBPF-based observability in multi-cloud environments! Check out our latest post - Retina: Bridging Kubernetes Observability and eBPF Across the Clouds. Why wait, try these out now! Stay tuned to the AKS public roadmap for more exciting developments! For additional information, visit the following resources: For more info about Azure CNI Powered by Cilium visit - Configure Azure CNI Powered by Cilium in AKS. For more info about ACNS visit Advanced Container Networking Services (ACNS) for AKS | Microsoft Learn.
SrinivasJasti
May 02, 2025 Place Azure Networking Blog
692Views
0likes
0Comments