ebpf

7 Topics

eBPF-Powered Observability Beyond Azure: A Multi-Cloud Perspective with Retina
Kubernetes simplifies container orchestration but introduces observability challenges due to dynamic pod lifecycles and complex inter-service communication. eBPF technology addresses these issues by providing deep system insights and efficient monitoring. The open-source Retina project leverages eBPF for comprehensive, cloud-agnostic network observability across AKS, GKE, and EKS, enhancing troubleshooting and optimization through real-world demo scenarios.
Simone_Rodigari
Apr 17, 2025 Place Linux and Open Source Blog
1.1KViews
10likes
0Comments
Scaling DNS on AKS with Cilium: NodeLocal DNSCache, LRP, and FQDN Policies
Why Adopt NodeLocal DNSCache? The primary drivers for adoption are usually: Eliminating Conntrack Pressure: In high-QPS UDP DNS scenarios, conntrack contention and UDP tracking can cause intermittent DNS response loss and retries; depending on resolver retry/timeouts, this can appear as multi-second lookup delays and sometimes much longer tails. Reducing Latency: By placing a cache on every node, you remove the network hop to the CoreDNS service. Responses are practically instantaneous for cached records. Offloading CoreDNS: A DaemonSet architecture effectively shards the DNS query load across the entire cluster, preventing the central CoreDNS deployment from becoming a single point of congestion during bursty scaling events. Who needs this? You should prioritize this architecture if you run: Large-scale clusters large clusters (hundreds of nodes or thousands of pods), where CoreDNS scaling becomes difficult to manage. High-churn endpoints, such as spot instances or frequent auto-scaling jobs that trigger massive waves of DNS queries. Real-time applications where multi-second (and occasionally longer) DNS lookup delays are unacceptable. The Challenge with Cilium Deploying NodeLocal DNSCache on a cluster managed by Cilium (CNI) requires a specific approach. Standard NodeLocal DNSCache relies on node-level interface/iptables setup. In Cilium environments, you can instead implement the interception via Cilium Local Redirect Policy (LRP), which redirects traffic destined to the kube-dns ClusterIP service to a node-local backend pod. This post details a production-ready deployment strategy aligned with Cilium’s Local Redirect Policy model. It covers necessary configuration tweaks to avoid conflicts and explains how to maintain security filtering. Architecture Overview In a standard Kubernetes deployment, NodeLocal DNSCache creates a dummy network interface and uses extensive iptables rules to hijack traffic destined for the Cluster DNS IP. When using Cilium, we can achieve this more elegantly and efficiently using Local Redirect Policies. DaemonSet: Runs node-local-dns on every node. Configuration: Configured to skip interface creation and iptables manipulation. Redirection: Cilium LRP intercepts traffic to the kube-dns Service IP and redirects it to the local pod on the same node. 1. The NodeLocal DNSCache DaemonSet The critical difference in this manifest is the arguments passed to the node-local-dns binary. We must explicitly disable its networking setup functions to let Cilium handle the traffic. The NodeLocal DNSCache deployment also requires the node-local-dns ConfigMap and the kube-dns-upstream Service (plus RBAC/ServiceAccount). For brevity, the snippet below shows only the DaemonSet arguments that differ in the Cilium/LRP approach. The node-cache reads the template Corefile (/etc/coredns/Corefile.base) and generates the active Corefile (/etc/Corefile). The -conf flag points CoreDNS at the active Corefile it should load. The node-cache binary accepts -localip as an IP list; 0.0.0.0 is a valid value and makes it listen on all interfaces, appropriate for the LRP-based redirection model. apiVersion: apps/v1 kind: DaemonSet metadata: name: node-local-dns namespace: kube-system labels: k8s-app: node-local-dns spec: selector: matchLabels: k8s-app: node-local-dns template: metadata: labels: k8s-app: node-local-dns annotations: # Optional: policy.cilium.io/no-track-port can be used to bypass conntrack for DNS. # Validate the impact on your Cilium version and your observability/troubleshooting needs. policy.cilium.io/no-track-port: "53" spec: # IMPORTANT for the "LRP + listen broadly" approach: # keep hostNetwork off so you don't hijack node-wide :53 hostNetwork: false dnsPolicy: ClusterFirst containers: - name: node-cache image: registry.k8s.io/dns/k8s-dns-node-cache:1.15.16 args: - "-localip" # Use a bind-all approach. Ensure server blocks bind broadly in your Corefile. - "0.0.0.0" - "-conf" - "/etc/Corefile" - "-upstreamsvc" - "kube-dns-upstream" # CRITICAL: Disable internal setup - "-skipteardown=true" - "-setupinterface=false" - "-setupiptables=false" ports: - containerPort: 53 name: dns protocol: UDP - containerPort: 53 name: dns-tcp protocol: TCP # Ensure your Corefile includes health :8080 so the liveness probe works livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 60 timeoutSeconds: 5 volumeMounts: - name: config-volume mountPath: /etc/coredns - name: kube-dns-config mountPath: /etc/kube-dns volumes: - name: kube-dns-config configMap: name: kube-dns optional: true - name: config-volume configMap: name: node-local-dns items: - key: Corefile path: Corefile.base 2. The Cilium Local Redirect Policy (LRP) Instead of iptables, we define a CRD that tells Cilium: "When you see traffic for `kube-dns`, send it to the `node-local-dns` pod on this same node." apiVersion: "cilium.io/v2" kind: CiliumLocalRedirectPolicy metadata: name: "nodelocaldns" namespace: kube-system spec: redirectFrontend: # ServiceMatcher mode is for ClusterIP services serviceMatcher: serviceName: kube-dns namespace: kube-system redirectBackend: # The backend pods selected by localEndpointSelector must be in the same namespace as the LRP localEndpointSelector: matchLabels: k8s-app: node-local-dns toPorts: - port: "53" name: dns protocol: UDP - port: "53" name: dns-tcp protocol: TCP This is an LRP-based NodeLocal DNSCache deployment: we disable node-cache’s iptables/interface setup and let Cilium LRP handle local redirection. This differs from the upstream NodeLocal DNSCache manifest, which uses hostNetwork + dummy interface + iptables. LRP must be enabled in Cilium (e.g., localRedirectPolicies.enabled=true) before applying the CRD. Official Cilium LRP doc DNS-Based FQDN Policy Enforcement Flow The diagram below illustrates how Cilium enforces FQDN-based egress policies using DNS observation and datapath programming. During the DNS resolution phase, queries are redirected to NodeLocal DNS (or CoreDNS), where responses are observed and used to populate Cilium’s FQDN-to-IP cache. Cilium then programs these mappings into eBPF maps in the datapath. In the connection phase, when the client initiates an HTTPS connection to the resolved IP, the datapath checks the IP against the learned FQDN map and applies the policy decision before allowing or denying the connection. The Network Policy "Gotcha" If you use CiliumNetworkPolicy to restrict egress traffic, specifically for FQDN filtering, you typically allow access to CoreDNS like this: - toEndpoints: - matchLabels: k8s:io.kubernetes.pod.namespace: kube-system k8s:k8s-app: kube-dns toPorts: - ports: - port: "53" protocol: ANY This will break with local redirection. Why? Because LRP redirects the DNS request to the node-local-dns backend endpoint; strict egress policies must therefore allow both kube-dns (upstream) and node-local-dns (the redirected destination). The Repro Setup To demonstrate this failure, the cluster is configured with: NodeLocal DNSCache: Deployed as a DaemonSet (node-local-dns) to cache DNS requests locally on every node. Local Redirect Policy (LRP): An active LRP intercepts traffic destined for the kube-dns Service IP and redirects it to the local node-local-dns pod. Incomplete Network Policy: A strict CiliumNetworkPolicy (CNP) is enforced on the client pod. While it explicitly allows egress to kube-dns, it misses the corresponding rule for node-local-dns. Reveal the issue using Hubble: In this scenario, the client pod dns-client is attempting to resolve the external domain github.com. When inspecting the traffic flows, you will see EGRESS DENIED verdicts. Crucially, notice the destination pod in the logs below: kube-system/node-local-dns, not kube-dns. Although the application originally sent the packet to the Cluster IP of CoreDNS, Cilium's Local Redirect Policy modified the destination to the local node cache. Since strictly defined Network Policies assume traffic is going to the kube-dns identity, this redirected traffic falls outside the allowed rules and is dropped by the default deny stance. The Fix: You must allow egress to both labels. - toEndpoints: - matchLabels: k8s:io.kubernetes.pod.namespace: kube-system k8s:k8s-app: kube-dns # Add this selector for the local cache - matchLabels: k8s:io.kubernetes.pod.namespace: kube-system k8s:k8s-app: node-local-dns toPorts: - ports: - port: "53" protocol: ANY Without this addition, pods protected by strict egress policies will timeout resolving DNS, even though the cache is running. Use Hubble to observe the network flows: After adding matchLabels: k8s:k8s-app: node-local-dns, the traffic is now allowed. Hubble confirms a policy verdict of EGRESS ALLOWED for UDP traffic on port 53. Because DNS resolution now succeeds, the response populates the Cilium FQDN cache, subsequently allowing the TCP traffic to github.com on port 443 as intended. Real-World Example: Restricting Egress with FQDN Policies Here is a complete CiliumNetworkPolicy that locks down a workload to only access api.example.com. Note how the DNS rule explicitly allows traffic to both kube-dns (for upstream) and node-local-dns (for the local cache). apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy metadata: name: secure-workload-policy spec: endpointSelector: matchLabels: app: critical-workload egress: # 1. Allow DNS Resolution (REQUIRED for FQDN policies) - toEndpoints: - matchLabels: k8s:io.kubernetes.pod.namespace: kube-system k8s:k8s-app: kube-dns # Allow traffic to the local cache redirection target - matchLabels: k8s:io.kubernetes.pod.namespace: kube-system k8s:k8s-app: node-local-dns toPorts: - ports: - port: "53" protocol: ANY rules: dns: - matchPattern: "*" # 2. Allow specific FQDN traffic (populated via DNS lookups) - toFQDNs: - matchName: "api.example.com" toPorts: - ports: - port: "443" protocol: TCP Configuration & Upstream Loops When configuring the ConfigMap for node-local-dns, use the standard placeholders provided by the image. The binary replaces them at runtime: __PILLAR__CLUSTER__DNS__: The Upstream Service IP (kube-dns-upstream). __PILLAR__UPSTREAM__SERVERS__: The system resolvers (usually /etc/resolv.conf). Ensure kube-dns-upstream exists as a Service selecting the CoreDNS pods so cache misses are forwarded to the actual CoreDNS backends. Alternative: AKS LocalDNS LocalDNS is an Azure Kubernetes Services (AKS)-managed node-local DNS proxy/cache. Pros: Managed lifecycle at the node pool level. Support for custom configuration via localdnsconfig.json (e.g., custom server blocks, cache tuning). No manual DaemonSet management required. Cons & Limitations: Incompatibility with FQDN Policies: As noted in the official documentation, LocalDNS isn’t compatible with applied FQDN filter policies in ACNS/Cilium; if you rely on FQDN enforcement, prefer a DNS path that preserves FQDN learning/enforcement. Updating configuration requires reimaging the node pool. For environments heavily relying on strict Cilium Network Policies and FQDN filtering, the manual deployment method described above (using LRP) can be more reliable and transparent. AKS recommends not enabling both upstream NodeLocal DNSCache and LocalDNS in the same node pool, as DNS traffic is routed through LocalDNS and results may be unexpected. References Kubernetes Documentation: NodeLocal DNSCache Cilium Documentation: Local Redirect Policy AKS Documentation: Configure LocalDNS
Simone_Rodigari
Jan 23, 2026 Place Linux and Open Source Blog
513Views
2likes
0Comments
Troubleshooting Network Issues with Retina
An overview of how Retina solves key challenges of performing packet captures in a Kubernetes environment, and additional debug tools which are provided within the Retina Shell.
kamilp
Aug 22, 2025 Place Linux and Open Source Blog
993Views
2likes
0Comments
Project Pavilion Presence at KubeCon NA 2025
KubeCon + CloudNativeCon NA took place in Atlanta, Georgia, from 10-13 November, and continued to highlight the ongoing growth of the open source, cloud-native community. Microsoft participated throughout the event and supported several open source projects in the Project Pavilion. Microsoft’s involvement reflected our commitment to upstream collaboration, open governance, and enabling developers to build secure, scalable and portable applications across the ecosystem. The Project Pavilion serves as a dedicated, vendor-neutral space on the KubeCon show floor reserved for CNCF projects. Unlike the corporate booths, it focuses entirely on open source collaboration. It brings maintainers and contributors together with end users for hands-on demos, technical discussions, and roadmap insights. This space helps attendees discover emerging technologies and understand how different projects fit into the cloud-native ecosystem. It plays a critical role for idea exchanges, resolving challenges and strengthening collaboration across CNCF approved technologies. Why Our Presence Matters KubeCon NA remains one of the most influential gatherings for developers and organizations shaping the future of cloud-native computing. For Microsoft, participating in the Project Pavilion helps advance our goals of: Open governance and community-driven innovation Scaling vital cloud-native technologies Secure and sustainable operations Learning from practitioners and adopters Enabling developers across clouds and platforms Many of Microsoft’s products and cloud services are built on or aligned with CNCF and open-source technologies. Being active within these communities ensures that we are contributing back to the ecosystem we depend on and designing by collaborating with the community, not just for it. Microsoft-Supported Pavilion Projects containerd Representative: Wei Fu The containerd team engaged with project maintainers and ecosystem partners to explore solutions for improving AI model workflows. A key focus was the challenge of handling large OCI artifacts (often 500+ GiB) used in AI training workloads. Current image-pulling flows require containerd to fetch and fully unpack blobs, which significantly delays pod startup for large models. Collaborators from Docker, NTT, and ModelPack discussed a non-unpacking workflow that would allow training workloads to consume model data directly. The team plans to prototype this behavior as an experimental feature in containerd. Additional discussions included updates related to nerdbox and next steps for the erofs snapshotter. Copacetic Representative: Joshua Duffney The Copa booth attracted roughly 75 attendees, with strong representation from federal agencies and financial institutions, a sign of growing adoption in regulated industries. A lightning talk delivered at the conference significantly boosted traffic and engagement. Key feedback and insights included: High interest in customizable package update sources Demand for application-level patching beyond OS-level updates Need for clearer CI/CD integration patterns Expectations around in-cluster image patching Questions about runtime support, including Podman The conversations revealed several documentation gaps and feature opportunities that will inform Copa’s roadmap and future enablement efforts. Drasi Representative: Nandita Valsan KubeCon NA 2025 marked Drasi’s first in-person presence since its launch in October 2024 and its entry into the CNCF Sandbox in early 2025. With multiple kiosk slots, the team interacted with ~70 visitors across shifts. Engagement highlights included: New community members joining the Drasi Discord and starring GitHub repositories Meaningful discussions with observability and incident management vendors interested in change-driven architectures Positive reception to Aman Singh’s conference talk, which led attendees back to the booth for deeper technical conversations Post-event follow-ups are underway with several sponsors and partners to explore collaboration opportunities. Flatcar Container Linux Representatives: Sudhanva Huruli and Vamsi Kavuru The Flatcar project had some fantastic conversations at the pavilion. Attendees were eager to learn about bare metal provisioning, GPU support for AI workloads, and how Flatcar’s fully automated build and test process keeps things simple and developer friendly. Questions around Talos vs. Flatcar and CoreOS sparked lively discussions, with the team emphasizing Flatcar’s usability and independence from an OS-level API. Interest came from government agencies and financial institutions, and the preview of Flatcar on AKS opened the door to deeper conversations about real-world adoption. The Project Pavilion proved to be the perfect venue for authentic, technical exchanges. Flux Representatives: Dipti Pai The Flux booth was active throughout all three days of the Project Pavilion, where Microsoft joined other maintainers to highlight new capabilities in Flux 2.7, including improved multi-tenancy, enhanced observability, and streamlined cloud-native integrations. Visitors shared real-world GitOps experiences, both successes and challenges, which provided valuable insights for the project’s ongoing development. Microsoft’s involvement reinforced strong collaboration within the Flux community and continued commitment to advancing GitOps practices. Headlamp Representatives: Joaquim Rocha, Will Case, and Oleksandr Dubenko Headlamp had a booth for all three days of the conference, engaging with both longstanding users and first-time attendees. The increased visibility from becoming a Kubernetes sub-project was evident, with many attendees sharing their usage patterns across large tech organizations and smaller industrial teams. The booth enabled maintainers to: Gather insights into how teams use Headlamp in different environments Introduce Headlamp to new users discovering it via talks or hallway conversations Build stronger connections with the community and understand evolving needs Inspektor Gadget Representatives: Jose Blanquicet and Mauricio Vásquez Bernal Hosting a half-day kiosk session, Inspektor Gadget welcomed approximately 25 visitors. Attendees included newcomers interested in learning the basics and existing users looking for updates. The team showcased new capabilities, including the tcpdump gadget and Prometheus metrics export, and invited visitors to the upcoming contribfest to encourage participation. Istio Representatives: Keith Mattix, Jackie Maertens, Steven Jin Xuan, Niranjan Shankar, and Mike Morris The Istio booth continued to attract a mix of experienced adopters and newcomers seeking guidance. Technical discussions focused on: Enhancements to multicluster support in ambient mode Migration paths from sidecars to ambient Improvements in Gateway API availability and usage Performance and operational benefits for large-scale deployments Users, including several Azure customers, expressed appreciation for Microsoft’s sustained investment in Istio as part of their service mesh infrastructure. Notary Project Representative: Feynman Zhou and Toddy Mladenov The Notary Project booth saw significant interest from practitioners concerned with software supply chain security. Attendees discussed signing, verification workflows, and integrations with Azure services and Kubernetes clusters. The conversations will influence upcoming improvements across Notary Project and Ratify, reinforcing Microsoft’s commitment to secure artifacts and verifiable software distribution. Open Policy Agent (OPA) - Gatekeeper Representative: Jaydip Gabani The OPA/Gatekeeper booth enabled maintainers to connect with both new and existing users to explore use cases around policy enforcement, Rego/CEL authoring, and managing large policy sets. Many conversations surfaced opportunities around simplifying best practices and reducing management complexity. The team also promoted participation in an ongoing Gatekeeper/OPA survey to guide future improvements. ORAS Representative: Feynman Zhou and Toddy Mladenov ORAS engaged developers interested in OCI artifacts beyond container images which includes AI/ML models, metadata, backups, and multi-cloud artifact workflows. Attendees appreciated ORAS’s ecosystem integrations and found the booth examples useful for understanding how artifacts are tagged, packaged, and distributed. Many users shared how they leverage ORAS with Azure Container Registry and other OCI-compatible registries. Radius Representative: Zach Casper The Radius booth attracted the attention of platform engineers looking for ways to simplify their developer's experience while being able to enforce enterprise-grade infrastructure and security best practices. Attendees saw demos on deploying a database to Kubernetes and using managed databases from AWS and Azure without modifying the application deployment logic. They also saw a preview of Radius integration with GitHub Copilot enabling AI coding agents to autonomously deploy and test applications in the cloud. Conclusion KubeCon + CloudNativeCon North America 2025 reinforced the essential role of open source communities in driving innovation across cloud native technologies. Through the Project Pavilion, Microsoft teams were able to exchange knowledge with other maintainers, gather user feedback, and support projects that form foundational components of modern cloud infrastructure. Microsoft remains committed to building alongside the community and strengthening the ecosystem that powers so much of today’s cloud-native development. For anyone interested in exploring or contributing to these open source efforts, please reach out directly to each project’s community to get involved, or contact Lexi Nadolski at lexinadolski@microsoft.com for more information.
lexinadolski
Nov 25, 2025 Place Linux and Open Source Blog
236Views
1like
0Comments