We are excited to announce the first major release of Retina - a significant milestone for the project. This version brings along many new features, enhancements and bug fixes.
The Retina maintainer team would like to thank all contributors, community members, and early adopters who helped make this 1.0 release possible.
What is Retina?
Retina is an open-source, Kubernetes network observability platform. It enables you to continuously observe and measure network health, and investigate network issues on-demand with integrated Kubernetes-native workflows.
Why Retina?
Kubernetes networking failures are rarely isolated or easy to reproduce. Pods are ephemeral, services span multiple nodes, and network traffic crosses multiple layers (CNI, kube-proxy, node networking, policies), making crucial evidence difficult to capture. Manually connecting to nodes and stitching together logs or packet captures simply does not scale as clusters grow in size and complexity.
A modern approach to observability must automate and centralize data collection while exposing rich, actionable insights.
Retina represents a major step forward in solving the complexities of Kubernetes observability by leveraging the power of eBPF. Its cloud-agnostic design, deep integration with Hubble, and support for both real-time metrics and on-demand packet captures make it an invaluable tool for DevOps, SecOps, and compliance teams across diverse environments.
What Does It Do?
Retina can collect two types of telemetry: metrics and packet captures.
The Retina shell enables ad-hoc troubleshooting via pre-installed networking tools.
Metrics
Metrics provide continuous observability. They can be exported to multiple storage options such as Prometheus or Azure Monitor, and visualized in a variety of ways, including Grafana or Azure Log Analytics.
Retina supports two control planes: Hubble and Standard. Both are supported regardless of the underlying CNI. The choice of control plane affects the metrics which are collected.
You can customize which metrics are collected by enabling/disabling their corresponding plugins. Some examples of metrics may include:
- Incoming/outcoming traffic
- Dropped packets
- TCP/UDP
- DNS
- API Server latency
- Node/interface statistics
Packet Captures
Captures provide on-demand observability. They allow users to perform distributed packet captures across the cluster, based on specified Nodes/Pods and other supported filters. They can be triggered via the CLI or through the capture CRD, and may be output to persistent storage options such as the host filesystem, a PVC, or a storage blob.
The result of the capture contains more than just a .pcap file. Retina also captures a number of networking metadata such as iptables rules, socket statistics, kernel network information from /proc/net, and more.
Retina packet capture performed through the CLI.Shell
The Retina shell enables deep ad-hoc troubleshooting by providing a suite of networking tools. The CLI command starts an interactive shell on a Kubernetes node that runs a container image which includes standard tools such as ping or curl, as well as specialized tools like bpftool, pwru, Inspektor Gadget and more.
The Retina shell is currently only available on Linux. Note that some tools require particular capabilities to execute. These can be passed as parameters through the CLI.
Retina shell CLI - showcasing some of the available tools, including ping, dig, bpftool and pwru.Use Cases
- Debugging Pod Connectivity Issues: When services can’t communicate, Retina enables rapid, automated distributed packet capture and drop metrics, drastically reducing troubleshooting time. The Retina shell also brings specialized tools for deep manual investigations.
- Continuous Monitoring of Network Health: Operators can set up alerts and dashboards for DNS failures, API server latency, or packet drops, gaining ongoing visibility into cluster networking.
- Security Auditing and Compliance: Flow logs (in Hubble mode) and metrics support security investigations and compliance reporting, enabling quick identification of unexpected connections or data transfers.
- Multi-Cluster / Multi-Cloud Visibility: Retina standardizes network observability across clouds, supporting unified dashboards and processes for SRE teams.
Where Does It Run?
Retina is designed for broad compatibility across Kubernetes distributions, cloud providers, and operating systems. There are no Azure-specific dependencies - Retina runs anywhere Kubernetes does.
- Operating Systems: Both Linux and Windows nodes are supported.
- Kubernetes Distributions: Retina is distribution-agnostic, deployable on managed services (AKS, EKS, GKE) or self-managed clusters.
- CNI / Network Stack: Retina works with any CNI, focusing on kernel-level events rather than CNI-specific logs.
- Cloud Integration: Retina exports metrics to Azure Monitor and Log Analytics, with pre-built Grafana dashboards for AKS. Integration with AWS CloudWatch or GCP Stackdriver is possible via Prometheus.
- Observability Stacks: Retina integrates with Prometheus & Grafana, Cilium Hubble (for flow logs and UI), and can be extended to other exporters.
Design Overview
Retina’s architecture consists of two layers: a data collection layer in the kernel-space, and processing layer that converts low-level signals into Kubernetes-aware telemetry in the user-space.
When Retina is installed, each node in the cluster runs a Retina agent which collects raw network telemetry from the host kernel - backed by eBPF on Linux, and HNS/VFP on Windows. The agent processes the raw network data and enriches it with Kubernetes metadata, which is then exported for consumption by monitoring tools such as Prometheus, Grafana, or Hubble UI.
Modularity and extensibility are central to the design philosophy. Retina's plugin model lets you enable only the telemetry you need, and add new sources by implementing a common plugin interface. Built-in plugins include Drop Reason, DNS, Packet Forward, and more.
Check out our architecture docs for a deeper dive into Retina's design.
Get Started
Thanks to Helm charts deploying Retina is streamlined across all environments, and can be done with one configurable command. For complete documentation, visit our installation docs.
To install Retina with the Standard control plane and Basic metrics mode:
VERSION=$( curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name)
helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina \
--version $VERSION \
--namespace kube-system \
--set image.tag=$VERSION \
--set operator.tag=$VERSION \
--set logLevel=info \
--set operator.enabled=true \
--set enabledPlugin_linux="\[dropreason\,packetforward\,linuxutil\,dns\]"
Once Retina is running in your cluster, you can then configure Prometheus and Grafana to scrape and visualize your metrics.
Install the Retina CLI with Krew:
kubectl krew install retina
Get Involved
Retina is open-source under the MIT License and welcomes community contributions. Since its announcement in early 2024, the project has gained significant traction, with contributors from multiple organizations helping to expand its capabilities.
The project is hosted on GitHub · microsoft/retina and documentation is available at retina.sh.
If you would like to contribute to Retina you can follow our contributor guide.
What's Next?
Retina 1.1 of course!
We are also discussing the future roadmap, and exploring the possibility of moving the project to community ownership. Stay tuned!
In the meantime, we welcome you to raise an issue if you find any bugs, or start a discussion if you have any questions or suggestions.
You can also reach out to the Retina team via email, we would love to hear from you!