Blog Post

Linux and Open Source Blog
5 MIN READ

eBPF-Powered Observability Beyond Azure: A Multi-Cloud Perspective with Retina

Simone_Rodigari's avatar
Apr 17, 2025

Kubernetes has transformed application deployment and scaling by offering a dynamic, distributed platform that simplifies container orchestration. However, its inherent dynamism introduces significant observability challenges—from continuously changing pod lifecycles and complex inter-service communication to transient network anomalies that are difficult to capture and diagnose. In this post, we explore these challenges, introduce the revolutionary eBPF technology, and show how the open‐source Retina project leverages eBPF to deliver comprehensive, cloud‐agnostic network observability. We'll also highlight three demo scenarios across AKS, GKE, and EKS that illustrate Retina's real-world capabilities.

The Challenge of Kubernetes Observability

Kubernetes clusters are inherently complex:

  • Dynamic Environments: Pods spin up, scale, and terminate on the fly. This ephemeral nature makes it difficult to track and diagnose issues consistently.
  • Distributed Architecture: With services spread across multiple nodes and even different geographical locations, identifying communication bottlenecks or failures requires a consolidated view of network traffic.
  • Manual Debugging Pain Points: Traditionally, diagnosing network issues involves manually running tools like tcpdump on multiple nodes, aggregating logs, and piecing together fragments of data—which is both time-consuming and error-prone.

A modern approach to observability must automate and centralize data collection while exposing rich, actionable insights.

eBPF: Powering Next-Generation Observability

Extended Berkeley Packet Filter (eBPF) is a revolutionary technology embedded in the Linux kernel that allows safe, sandboxed execution of custom programs at runtime without modifying kernel source code or loading kernel modules https://ebpf.io. Its unique qualities include:

  • Efficiency & Safety: eBPF programs are verified by a dedicated kernel verifier and are JIT-compiled for near-native performance. This ensures that they run safely in a privileged context without compromising system stability.
  • Deep System Insights: By attaching to various kernel hooks—such as system calls, tracepoints, or network events—eBPF provides granular visibility into system behavior. This makes it ideal for performance troubleshooting, real-time monitoring, and security analysis.
  • Versatility: Once it eliminates the need for invasive agents, eBPF can monitor everything from network packet flows and DNS queries to API server latencies. Its programmability means new metrics and behaviors can be added on demand.

Together, these features have spurred a wave of innovative observability and security projects in the cloud-native ecosystem.

Introducing Retina: Open-Source, Cloud-Agnostic Observability

Retina is an open-source Kubernetes network observability platform that leverages the power of eBPF for deep network insights https://retina.sh. Designed to work independently of your underlying cloud or on-premises environment, Retina offers:

  • Cloud and CNI Agnostic: Whether you're running Azure Kubernetes Service (AKS), Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (EKS), or even on your own data center, Retina's architecture is built to operate seamlessly across all.
  • Hubble Integration: Retina integrates with Hubble as its control plane, regardless of the underlying Container Networking Interface (CNI). By tapping into Hubble's distributed telemetry capabilities, Retina provides real-time network flows, DNS logs, and other critical metrics. https://github.com/cilium/hubble 
  • Actionable Telemetry: With industry-standard Prometheus metrics and distributed packet capture functionalities, Retina not only monitors network health continuously but also assists in on-demand debugging—capturing network traffic that can later be analyzed with tools like Wireshark.
  • Ease of Deployment: Thanks to modular templates such as Helm charts, deploying Retina is streamlined across multiple cloud platforms with a simple Helm command.

Demo Scenarios: Retina in a Multi-Cloud World

The multi-cloud project in the Retina repository provides infrastructure templates and demo scenarios for deploying Retina on AKS, GKE, and EKS.

The architecture consists of three clusters deployed across three separate cloud providers, a single Grafana Cloud instance, and a Grafana Private Data source Connect network configured for each cluster. This setup allows access to the three clusters as data sources from a single Grafana dashboard.

The environment set up is available on the open source repository at https://github.com/microsoft/retina/tree/main/test/multicloud 

The aim of this post is to demo Retina capabilities on multi-cloud, however, it is not necessary to configure the entire multi-cloud environment to test out Retina. Simply install Retina via Helm on your existing Kubernetes cluster https://retina.sh/docs/Installation/Setup  

Let's dive into the three key demo use cases.

Packets Drop on AKS

In this scenario, an Azure Kubernetes Service (AKS) cluster is configured to simulate packet drop:

  • The Problem: Network traffic issues like dropped packets can severely affect application performance and are traditionally hard to diagnose.
  • The Setup: A client pod's network configuration is manipulated using an iptables DROP rule, effectively simulating traffic loss.
  • Retina's Role: With Retina's automated network observability, operators can quickly detect dropped packets through its Prometheus-integrated metrics and visualize the impact via dashboards. This rapid insight minimizes troubleshooting time and guides corrective actions.
Hubble CLI can be used to analyze network flow logs and identify packet drop cause.

The anomaly in the networking flow between the client and the server can also be detected by leveraging Retina's Prometheus standard metrics, which can be visualized in a Grafana dashboard.

Grafana dashboard: Retina Network Observability Pod Flows (Workload)

DNS Resolution Failure on EKS

This demo focuses on Amazon's Elastic Kubernetes Service (EKS):

  • The Problem: DNS resolution issues may cause intermittent connectivity problems, impeding service discovery and communication between pods.
  • The Setup: CoreDNS is intentionally misconfigured using custom DNS response templates, creating controlled failures for specific domain queries.
  • Retina's Role: Retina monitors for DNS anomalies in real time. The integration with Hubble and the robust metric collection enable administrators to spot spikes in DNS errors, providing the necessary details to trace and resolve misconfigurations.
Hubble CLI to visualize network flow logs and retrieve DNS RCode

Retina DNS metrics can help identify DNS issues. The pre-configured dashboards on GitHub https://github.com/microsoft/retina/tree/main/deploy/grafana-dashboards can visualize spikes in errors and provide details about DNS query RCodes and record types.

Grafana dashboard: Retina Network Observability DNS

Packet Capture on GKE

The third scenario highlights Google Kubernetes Engine (GKE):

  • The Problem: Sometimes diagnosing network issues requires capturing live network traffic, which traditionally involves manual steps and multiple tools.
  • The Setup: Retina is deployed to perform on-demand network packet captures. The captured data can then be exported for further analysis.
  • Retina's Role: With its built-in packet capture capability, Retina captures network traffic from designated pods or nodes. This file can be subsequently extracted and analyzed using Wireshark for a detailed, forensic-level inspection of the traffic flow.

A retina capture can be executed using the kubectl plugin retina, for further information see https://retina.sh/docs/Captures/cli 

Create a Retina capture using Retina CLI

The generated tarball contains useful networking information which can help in a troubleshooting scenario. A  .pcap file is also included in the Retina capture tarball. This file can be used for packet analysis, for example leveraging a tool like Wireshark https://www.wireshark.org/ 

Access the tarball from localhost and visualize .pcap in Wireshark

Conclusions

Retina represents a major step forward in solving the complexities of Kubernetes observability by harnessing the power of eBPF. Its cloud-agnostic design, robust integration with Hubble, and support for both real-time metrics and packet captures make it an invaluable tool for DevOps, SecOps, and compliance teams across diverse environments.

Importantly, Retina allows for Hubble integration without requiring Cilium as the CNI. This means you can enable eBPF-powered observability regardless of the CNI plugin installed on your cluster.

Whether you face packet drops, DNS issues, or need to perform live network captures in your Kubernetes cluster, Retina equips you with the insights you need to troubleshoot and optimize your cloud-native network effortlessly.

Explore the Retina project on GitHub https://github.com/microsoft/retina and give it a ⭐ if you want to support! Join the community of users and contributors, and give your Kubernetes clusters the observability they deserve!

Updated Apr 17, 2025
Version 1.0
No CommentsBe the first to comment