Public Preview: The New AKS Monitoring Experience

austonli

Microsoft

Nov 19, 2024

Introducing the new AKS monitoring experience: unified insights at your fingertips

We're excited to announce the public preview of our enhanced Monitoring experience for Azure Kubernetes Service (AKS). This redesign of the existing Insights experience brings comprehensive monitoring capabilities into a single, streamlined view, addressing some of the most common challenges users face when managing their AKS clusters.

Our new Monitoring experience provides both basic (free) and detailed insights (with enabled Prometheus metrics and logging), offering a unified, single-pane-of-glass experience. The basic experience is available for all AKS users with no configuration required at all.

Basic monitoring experience

Upgraded monitoring experience

A significant benefit of this new experience is in diagnosing pod deployment failures. In the past, identifying pending or failed pods could be a cumbersome process. With the new KPI Card for Pod Status, you can now quickly pinpoint and address these issues before they escalate, ensuring smoother deployments and reduced downtime.

Another key scenario where this enhanced view shines is investigating node resource issues. Understanding node readiness and capacity is crucial for efficient cluster management. The Node Readiness Status card, along with detailed CPU and memory usage metrics, provides clear insights into whether your nodes are fully prepared to host pods. This helps prevent resource bottlenecks and optimizes the overall performance of your cluster.

Drill down to manage your nodes

Ensuring cluster health during a scaling operation has never been easier. The new Summary Card for Events helps you monitor Kubernetes warning events and pending pod states, making it simple to track and respond to spikes. This ensures your cluster scales smoothly and efficiently, without unexpected hitches that could disrupt your services.

See warning events at a glance

Additionally, troubleshooting latency and connectivity issues in AKS is now more straightforward. With enhanced insights into node saturation metrics, including VMSS OS Disk Bandwidth and IOPS consumption, you can quickly identify and resolve issues causing latency. Detailed ETCD monitoring and Load Balancer metrics, such as % SNAT Port Usage, provide critical data to maintain optimal cluster performance, keeping your applications running smoothly.

The following comparison table highlights what data comes out of the box for free for ALL AKS users. When you upgrade, you get all the same data collected in the newer Prometheus format as well as access to more rich metrics and logs for your core troubleshooting scenarios.

Basic tier metrics	Additional metrics in upgraded experience
Alert summary card	Historical Kubernetes events (30 days)
Events summary card	Warning events by reason
Pod status KPI card	Namespace CPU and memory %
Node status KPI card	Container logs by volume
Node CPU and memory %	Top five controllers by logs volume
VMSS OS disk bandwidth consumed % (max)	Packets dropped I/O
VMSS OS disk IOPS consumed % (max)
Load balancer SNAT port usage

We’re committed to providing you with the tools you need to manage and optimize your AKS clusters effectively. Explore the new Monitoring experience in the Azure portal today and experience the future of AKS monitoring!

Updated Feb 06, 2025

Version 2.0

azure monitor

azure monitor managed service for prometheus

microsoft ignite 2024

updates

austonli

Microsoft

Joined September 27, 2021

View Profile

Azure Observability Blog

Follow this blog board to get notified when there's new activity