Linux and Open Source Blog

7 MIN READ

Scaling DNS on AKS with Cilium: NodeLocal DNSCache, LRP, and FQDN Policies

Simone_Rodigari

Microsoft

Jan 23, 2026

Standard Kubernetes DNS forces every pod to traverse the network fabric to a centralized CoreDNS service, a design that becomes a scaling and latency bottleneck at cluster scale. By default, pods send DNS queries to the kube-dns Service IP, which kube-proxy translates to CoreDNS endpoints via iptables rules. NodeLocal DNSCache removes this network hop by resolving queries locally on each node.h node.

Why Adopt NodeLocal DNSCache?

The primary drivers for adoption are usually:

Eliminating Conntrack Pressure: In high-QPS UDP DNS scenarios, conntrack contention and UDP tracking can cause intermittent DNS response loss and retries; depending on resolver retry/timeouts, this can appear as multi-second lookup delays and sometimes much longer tails.
Reducing Latency: By placing a cache on every node, you remove the network hop to the CoreDNS service. Responses are practically instantaneous for cached records.
Offloading CoreDNS: A DaemonSet architecture effectively shards the DNS query load across the entire cluster, preventing the central CoreDNS deployment from becoming a single point of congestion during bursty scaling events.

Who needs this?

You should prioritize this architecture if you run:

Large-scale clusters large clusters (hundreds of nodes or thousands of pods), where CoreDNS scaling becomes difficult to manage.
High-churn endpoints, such as spot instances or frequent auto-scaling jobs that trigger massive waves of DNS queries.
Real-time applications where multi-second (and occasionally longer) DNS lookup delays are unacceptable.

The Challenge with Cilium

Deploying NodeLocal DNSCache on a cluster managed by Cilium (CNI) requires a specific approach. Standard NodeLocal DNSCache relies on node-level interface/iptables setup. In Cilium environments, you can instead implement the interception via Cilium Local Redirect Policy (LRP), which redirects traffic destined to the kube-dns ClusterIP service to a node-local backend pod.

This post details a production-ready deployment strategy aligned with Cilium’s Local Redirect Policy model. It covers necessary configuration tweaks to avoid conflicts and explains how to maintain security filtering.

Architecture Overview

In a standard Kubernetes deployment, NodeLocal DNSCache creates a dummy network interface and uses extensive iptables rules to hijack traffic destined for the Cluster DNS IP.

When using Cilium, we can achieve this more elegantly and efficiently using Local Redirect Policies.

DaemonSet: Runs node-local-dns on every node.
Configuration: Configured to skip interface creation and iptables manipulation.
Redirection: Cilium LRP intercepts traffic to the kube-dns Service IP and redirects it to the local pod on the same node.

1. The NodeLocal DNSCache DaemonSet

The critical difference in this manifest is the arguments passed to the node-local-dns binary. We must explicitly disable its networking setup functions to let Cilium handle the traffic.

The NodeLocal DNSCache deployment also requires the node-local-dns ConfigMap and the kube-dns-upstream Service (plus RBAC/ServiceAccount). For brevity, the snippet below shows only the DaemonSet arguments that differ in the Cilium/LRP approach. The node-cache reads the template Corefile (/etc/coredns/Corefile.base) and generates the active Corefile (/etc/Corefile). The -conf flag points CoreDNS at the active Corefile it should load.

The node-cache binary accepts -localip as an IP list; 0.0.0.0 is a valid value and makes it listen on all interfaces, appropriate for the LRP-based redirection model.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-local-dns
  namespace: kube-system
  labels:
    k8s-app: node-local-dns
spec:
  selector:
    matchLabels:
      k8s-app: node-local-dns
  template:
    metadata:
      labels:
        k8s-app: node-local-dns
      annotations:
        # Optional: policy.cilium.io/no-track-port can be used to bypass conntrack for DNS.
        # Validate the impact on your Cilium version and your observability/troubleshooting needs.
        policy.cilium.io/no-track-port: "53"
    spec:
      # IMPORTANT for the "LRP + listen broadly" approach:
      # keep hostNetwork off so you don't hijack node-wide :53
      hostNetwork: false
      # Don't use cluster DNS
      dnsPolicy: Default
      containers:
      - name: node-cache
        image: registry.k8s.io/dns/k8s-dns-node-cache:1.15.16
        args: 
        - "-localip"
        # Use a bind-all approach. Ensure server blocks bind broadly in your Corefile.
        - "0.0.0.0" 
        - "-conf"
        - "/etc/Corefile"
        - "-upstreamsvc"
        - "kube-dns-upstream"
        # CRITICAL: Disable internal setup
        - "-skipteardown=true"
        - "-setupinterface=false"
        - "-setupiptables=false"
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        # Ensure your Corefile includes health :8080 so the liveness probe works
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 60
          timeoutSeconds: 5
        volumeMounts:
        - name: config-volume
          mountPath: /etc/coredns
        - name: kube-dns-config
          mountPath: /etc/kube-dns
      volumes:
      - name: kube-dns-config
        configMap:
          name: kube-dns
          optional: true
      - name: config-volume
        configMap:
          name: node-local-dns
          items:
            - key: Corefile
              path: Corefile.base

2. The Cilium Local Redirect Policy (LRP)

Instead of iptables, we define a CRD that tells Cilium: "When you see traffic for `kube-dns`, send it to the `node-local-dns` pod on this same node."

apiVersion: "cilium.io/v2"
kind: CiliumLocalRedirectPolicy
metadata:
  name: "nodelocaldns"
  namespace: kube-system
spec:
  redirectFrontend:
    # ServiceMatcher mode is for ClusterIP services
    serviceMatcher:
      serviceName: kube-dns
      namespace: kube-system
  redirectBackend:
    # The backend pods selected by localEndpointSelector must be in the same namespace as the LRP
    localEndpointSelector:
      matchLabels:
        k8s-app: node-local-dns
    toPorts:
      - port: "53"
        name: dns
        protocol: UDP
      - port: "53"
        name: dns-tcp
        protocol: TCP

This is an LRP-based NodeLocal DNSCache deployment: we disable node-cache’s iptables/interface setup and let Cilium LRP handle local redirection. This differs from the upstream NodeLocal DNSCache manifest, which uses hostNetwork + dummy interface + iptables.

LRP must be enabled in Cilium (e.g., localRedirectPolicies.enabled=true) before applying the CRD. Official Cilium LRP doc

DNS-Based FQDN Policy Enforcement Flow

The diagram below illustrates how Cilium enforces FQDN-based egress policies using DNS observation and datapath programming. During the DNS resolution phase, queries are redirected to NodeLocal DNS (or CoreDNS), where responses are observed and used to populate Cilium’s FQDN-to-IP cache. Cilium then programs these mappings into eBPF maps in the datapath. In the connection phase, when the client initiates an HTTPS connection to the resolved IP, the datapath checks the IP against the learned FQDN map and applies the policy decision before allowing or denying the connection.

End-to-end flow of DNS resolution, FQDN learning, and eBPF-based policy enforcement in Cilium.

The Network Policy "Gotcha"

If you use CiliumNetworkPolicy to restrict egress traffic, specifically for FQDN filtering, you typically allow access to CoreDNS like this:

  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: kube-dns
    toPorts:
    - ports:
      - port: "53"
        protocol: ANY

This will break with local redirection.

Why? Because LRP redirects the DNS request to the node-local-dns backend endpoint; strict egress policies must therefore allow both kube-dns (upstream) and node-local-dns (the redirected destination).

The Repro Setup

To demonstrate this failure, the cluster is configured with:

NodeLocal DNSCache: Deployed as a DaemonSet (node-local-dns) to cache DNS requests locally on every node.
Local Redirect Policy (LRP): An active LRP intercepts traffic destined for the kube-dns Service IP and redirects it to the local node-local-dns pod.
Incomplete Network Policy: A strict CiliumNetworkPolicy (CNP) is enforced on the client pod. While it explicitly allows egress to kube-dns, it misses the corresponding rule for node-local-dns.

Reveal the issue using Hubble:

In this scenario, the client pod dns-client is attempting to resolve the external domain github.com.

When inspecting the traffic flows, you will see EGRESS DENIED verdicts. Crucially, notice the destination pod in the logs below: kube-system/node-local-dns, not kube-dns.

Although the application originally sent the packet to the Cluster IP of CoreDNS, Cilium's Local Redirect Policy modified the destination to the local node cache. Since strictly defined Network Policies assume traffic is going to the kube-dns identity, this redirected traffic falls outside the allowed rules and is dropped by the default deny stance.

The Fix: You must allow egress to both labels.

  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: kube-dns
    # Add this selector for the local cache
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: node-local-dns 
    toPorts:
    - ports:
      - port: "53"
        protocol: ANY

Without this addition, pods protected by strict egress policies will timeout resolving DNS, even though the cache is running.

Use Hubble to observe the network flows:

After adding matchLabels: k8s:k8s-app: node-local-dns, the traffic is now allowed. Hubble confirms a policy verdict of EGRESS ALLOWED for UDP traffic on port 53. Because DNS resolution now succeeds, the response populates the Cilium FQDN cache, subsequently allowing the TCP traffic to github.com on port 443 as intended.

Real-World Example: Restricting Egress with FQDN Policies

Here is a complete CiliumNetworkPolicy that locks down a workload to only access api.example.com. Note how the DNS rule explicitly allows traffic to both kube-dns (for upstream) and node-local-dns (for the local cache).

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: secure-workload-policy
spec:
  endpointSelector:
    matchLabels:
      app: critical-workload
  egress:
  # 1. Allow DNS Resolution (REQUIRED for FQDN policies)
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: kube-dns
    # Allow traffic to the local cache redirection target
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
        k8s:k8s-app: node-local-dns
    toPorts:
    - ports:
      - port: "53"
        protocol: ANY
      rules:
        dns:
        - matchPattern: "*"

  # 2. Allow specific FQDN traffic (populated via DNS lookups)
  - toFQDNs:
    - matchName: "api.example.com"
    toPorts:
    - ports:
      - port: "443"
        protocol: TCP

Configuration & Upstream Loops

When configuring the ConfigMap for node-local-dns, use the standard placeholders provided by the image. The binary replaces them at runtime:

__PILLAR__CLUSTER__DNS__: The Upstream Service IP (kube-dns-upstream).
__PILLAR__UPSTREAM__SERVERS__: The system resolvers (usually /etc/resolv.conf).

Ensure kube-dns-upstream exists as a Service selecting the CoreDNS pods so cache misses are forwarded to the actual CoreDNS backends.

Alternative: AKS LocalDNS

LocalDNS is an Azure Kubernetes Services (AKS)-managed node-local DNS proxy/cache.

Pros:

Managed lifecycle at the node pool level.
Support for custom configuration via localdnsconfig.json (e.g., custom server blocks, cache tuning).
No manual DaemonSet management required.

Cons & Limitations:

Incompatibility with FQDN Policies: As noted in the official documentation, LocalDNS isn’t compatible with applied FQDN filter policies in ACNS/Cilium; if you rely on FQDN enforcement, prefer a DNS path that preserves FQDN learning/enforcement.
Updating configuration requires reimaging the node pool.

For environments heavily relying on strict Cilium Network Policies and FQDN filtering, the manual deployment method described above (using LRP) can be more reliable and transparent.

AKS recommends not enabling both upstream NodeLocal DNSCache and LocalDNS in the same node pool, as DNS traffic is routed through LocalDNS and results may be unexpected.