Introducing the new AKS monitoring experience: unified insights at your fingertips
We're excited to announce the public preview of our enhanced Monitoring experience for Azure Kubernetes Service (AKS). This redesign of the existing Insights experience brings comprehensive monitoring capabilities into a single, streamlined view, addressing some of the most common challenges users face when managing their AKS clusters.
Our new Monitoring experience provides both basic (free) and detailed insights (with enabled Prometheus metrics and logging), offering a unified, single-pane-of-glass experience. The basic experience is available for all AKS users with no configuration required at all.
Basic monitoring experience | Upgraded monitoring experience |
A significant benefit of this new experience is in diagnosing pod deployment failures. In the past, identifying pending or failed pods could be a cumbersome process. With the new KPI Card for Pod Status, you can now quickly pinpoint and address these issues before they escalate, ensuring smoother deployments and reduced downtime.
Another key scenario where this enhanced view shines is investigating node resource issues. Understanding node readiness and capacity is crucial for efficient cluster management. The Node Readiness Status card, along with detailed CPU and memory usage metrics, provides clear insights into whether your nodes are fully prepared to host pods. This helps prevent resource bottlenecks and optimizes the overall performance of your cluster.
Drill down to manage your nodesEnsuring cluster health during a scaling operation has never been easier. The new Summary Card for Events helps you monitor Kubernetes warning events and pending pod states, making it simple to track and respond to spikes. This ensures your cluster scales smoothly and efficiently, without unexpected hitches that could disrupt your services.
See warning events at a glance
Additionally, troubleshooting latency and connectivity issues in AKS is now more straightforward. With enhanced insights into node saturation metrics, including VMSS OS Disk Bandwidth and IOPS consumption, you can quickly identify and resolve issues causing latency. Detailed ETCD monitoring and Load Balancer metrics, such as % SNAT Port Usage, provide critical data to maintain optimal cluster performance, keeping your applications running smoothly.
The following comparison table highlights what data comes out of the box for free for ALL AKS users. When you upgrade, you get all the same data collected in the newer Prometheus format as well as access to more rich metrics and logs for your core troubleshooting scenarios.
Basic tier metrics | Additional metrics in upgraded experience |
Alert summary card | Historical Kubernetes events (30 days) |
Events summary card | Warning events by reason |
Pod status KPI card | Namespace CPU and memory % |
Node status KPI card | Container logs by volume |
Node CPU and memory % | Top five controllers by logs volume |
VMSS OS disk bandwidth consumed % (max) | Packets dropped I/O |
VMSS OS disk IOPS consumed % (max) | |
Load balancer SNAT port usage |
We’re committed to providing you with the tools you need to manage and optimize your AKS clusters effectively. Explore the new Monitoring experience in the Azure portal today and experience the future of AKS monitoring!
Updated Feb 06, 2025
Version 2.0austonli
Microsoft
Joined September 27, 2021
Azure Observability Blog
Follow this blog board to get notified when there's new activity