azure kubernetes service
6 TopicsGenerally Available - High scale mode in Azure Monitor - Container Insights
Container Insights is Azure Monitor’s solution for collecting logs from your Azure Kubernetes Service (AKS) clusters. As the adoption of AKS continues to grow, we are seeing an increasing number of customers with log scaling needs that hit the limits of log collection in Container Insights. Last August, we announced the public preview of High Scale mode in Container Insights to help customers achieve a higher log collection throughput from their AKS clusters. Today, we are happy to announce the General Availability of High Scale mode. High scale mode is ideal for customers approaching or above 10,000 logs/sec from a single node. When High Scale mode is enabled, Container Insights does multiple configuration changes leading to a higher overall throughput. These include using a more powerful agent setup, using a different data pipeline, allocating more memory for the agent, and more. All these changes are made in the background by the service and do not require input or configuration from customers. High Scale mode impacts only the data collection layer (with a new DCR) – the rest of the experience remains the same. Data flows to our existing tables, your queries and alerts work as before too. High Scale mode is available to all customers. Today, High scale is turned off by default. In the future, we plan to enable High Scale mode by default for all customers to reduce the chances of log loss when workloads scale. To get started with High Scale mode, please see our documentation at https://aka.ms/cihsmode117Views1like0CommentsGeneral Availability of Azure Monitor Network Security Perimeter Features
We’re excited to announce that Azure Monitor Network Security Perimeter features are now generally available! This update is an important step forward for Azure Monitor’s security, providing comprehensive network isolation for your monitoring data. In this post, we’ll explain what Network Security Perimeter is, why it matters, and how it benefits Azure Monitor users. Network Security Perimeter is purpose-built to strengthen network security and monitoring, enabling customers to establish a more secure and isolated environment. As enterprise interest grows, it’s clear that this feature will play a key role in elevating the protection of Azure PaaS resources against evolving security threats. What is Network Security Perimeter and Why Does It Matter? Network Security Perimeter is a network isolation feature for Azure PaaS services that creates a trusted boundary around your resources. Azure Monitor’s key components (like Log Analytics workspaces and Application Insights) run outside of customer virtual networks; Network security perimeter allows these services to communicate only within an explicit perimeter and blocks any unauthorized public access. In essence, the security perimeter acts as a virtual firewall at the Azure service level – by default it restricts public network access to resources inside the perimeter, and only permits traffic that meets your defined rules. This prevents unwanted network connections and helps prevent data exfiltration (sensitive monitoring data stays within your control). For Azure Monitor customers, Network Security Perimeter is a game-changer. It addresses a common ask from enterprises for “zero trust” network security on Azure’s monitoring platform. Previously, while you could use Private Link to secure traffic from your VNets to Azure Monitor, Azure Monitor’s own service endpoints were still accessible over the public internet. The security perimeter closes that gap by enforcing network controls on Azure’s side. This means you can lock down your Log Analytics workspace or Application Insights to only accept data from specific sources (e.g. certain IP ranges, or other resources in your perimeter) and only send data out to authorized destinations. If anything or anyone outside those rules attempts to access your monitoring resources, Network Security Perimeter will deny it and log the attempt for auditing. In short, Network Security Perimeter brings a new level of security to Azure Monitor: it allows organizations to create a logical network boundary around their monitoring resources, much like a private enclave. This is crucial for customers in regulated industries (finance, government, healthcare) who need to ensure their cloud services adhere to strict network isolation policies. By using the security perimeter, Azure Monitor can be safely deployed in environments that demand no public exposure and thorough auditing of network access. It’s an important step in strengthening Azure Monitor’s security posture and aligning with enterprise zero-trust networking principles. Key Benefits of Network Security Perimeter in Azure Monitor With Network Security Perimeter now generally available, Azure Monitor users gain several powerful capabilities: 🔒 Enhanced Security & Data Protection: Azure PaaS resources in a perimeter can communicate freely with each other, but external access is blocked by default. You define explicit inbound/outbound rules for any allowed public traffic, ensuring no unauthorized network access to your Log Analytics workspaces, Application Insights components, or other perimeter resources. This greatly reduces the risk of data exfiltration and unauthorized access to monitoring data. ⚖️ Granular Access Control: Network Security Perimeter supports fine-grained rules to tailor access. You can allow inbound access by specific IP address ranges or Azure subscription IDs, and allow outbound calls to specific Fully Qualified Domain Names (FQDNs). For example, you might permit only your corporate IP range to send telemetry to a workspace, or allow a workspace to send data out only to contoso-api.azurewebsites.net. This level of control ensures that only trusted sources and destinations are used. 📜 Comprehensive Logging & Auditing: Every allowed or denied connection governed by Network Security Perimeter can be logged. Azure Monitor’s Network Security Perimeter integration provides unified access logs for all resources in the perimeter. These logs give you visibility into exactly what connections were attempted, from where, and whether they were permitted or blocked. This is invaluable for auditing and compliance – for instance, proving that no external IPs accessed your workspace, or detecting unexpected outbound calls. The logs can be sent to a Log Analytics workspace or storage for retention and analysis. 🔧 Seamless Integration with Azure Monitor Services: Network Security Perimeter is natively integrated across Azure Monitor’s services and workflows. Log Analytics workspaces and Application Insights components support Network Security Perimeter out-of-the-box, meaning ingestion, queries, and alerts all enforce perimeter rules behind the scenes. Azure Monitor Alerts (scheduled query rules) and Action Groups also work with Network Security Perimeter , so that alert notifications or automation actions respect the perimeter (for example, an alert sending to an Event Hub will check Network Security Perimeter rules). This end-to-end integration ensures that securing your monitoring environment with Network Security Perimeter doesn’t break any functionality – everything continues to work, but within your defined security boundary. 🤝 Consistent, Centralized Management: Network Security Perimeter introduces a uniform way to manage network access for multiple resources. You can group resources from different services (and even different subscriptions) into one perimeter and manage network rules in one place. This “single pane of glass” approach simplifies operations: network admins can define a perimeter once and apply it to all relevant Azure Monitor components (and other supported services). It’s a more scalable and consistent method than maintaining disparate firewall settings on each service. Network Security Perimeter uses Azure’s standard API and portal experience, so setting up a perimeter and rules is straightforward. 🌐 No-Compromise Isolation (with Private Link): Network Security Perimeter complements existing network security options. If you’re already using Azure Private Link to keep traffic off the internet, Network Security Perimeter adds another layer of protection. Private Link secures traffic between your VNet and Azure Monitor; Network Security Perimeter secures Azure Monitor’s service endpoints themselves. Used together, you achieve defense-in-depth: e.g., a workspace can be accessible only via private endpoint and only accept data from certain sources due to Network Security Perimeter . This layered approach helps meet even the most stringent security requirements. In conclusion, Network Security Perimeter for Azure Monitor provides strong network isolation, flexible control, and visibility – all integrated into the Azure platform. It helps organizations confidently use Azure Monitor in scenarios where they need to lock down network access and simplify compliance. For detailed information on configuring Azure Monitor with a Network Security Perimeter, please refer to the following link: Configure Azure Monitor with Network Security Perimeter.943Views1like0CommentsWhat’s new in Observability at Build 2025
At Build 2025, we are excited to announce new features in Azure Monitor designed to enhance observability for developers and SREs, making it easier for you to streamline troubleshooting, improve monitoring efficiency, and gain deeper insights into application performance. With our new AI-powered tools, customizable alerts, and advanced visualization capabilities, we’re empowering developers to deliver high-quality, resilient applications with greater operational efficiency. AI-Powered Troubleshooting Capabilities We are excited to disclose two new AI-powered features, as well as share an update to a GA feature, which enhance troubleshooting and monitoring: AI-powered investigations (Public Preview): Identifies possible explanations for service degradations via automated analyses, consolidating all observability-related data for faster problem mitigation. Attend our live demo at Build and learn more here. Health models (Public Preview – coming in June 2025): Significantly improves the efficiency of detecting business-impacting issues in workloads, empowering organizations to deliver applications with operational efficiency and resilience through a full-stack view of workload health. Attend our live demo at Build to get a preview of the experience and learn more here. AI-powered Application Insights Code Optimizations (GA): Provides code-level suggestions for running .NET apps on Azure. Now, it’s easier to get code-level suggestions with GitHub Copilot coding agent (preview) and GitHub Copilot for Azure in VS Code. Learn more here. Enhanced AI and agent observability Azure Monitor and Azure AI Foundry now jointly offer real-time monitoring and continuous evaluation of AI apps and agentic systems in production. These capabilities are deeply integrated with the Foundry Observability experience and allow you to track key metrics such as performance, quality, safety, and resource usage. Features include: Unified observability dashboard for generative AI apps and agents (Public Preview): Provides full-stack visibility of AI apps and infrastructure with AI app metrics surfaced in both Azure Monitor and Foundry Observability. Alerts: Data is published to Azure Monitor Application Insights, allowing users to set alerts and analyze them for troubleshooting. Debug with tracing capabilities: Enables detailed root-cause analysis of issues like groundedness regressions. Learn more in our breakout session at Build! Improved Visualization We have expanded our visualization capabilities, particularly for Kubernetes services: Azure Monitor dashboards with Grafana (Public Preview): Create and edit Grafana dashboards directly in the Azure Portal with no additional cost. This includes dashboards for Azure Kubernetes Services (AKS) and other Azure resources. Learn more. Managed Prometheus Visualizations: Supports managed Prometheus visualizations for both AKS clusters (GA) and Arc-enabled Kubernetes clusters (Public Preview), offering a more cost-efficient and performant solution. Learn more. Customized and Simplified Monitoring Through enhancements to alert customization, we’re making it easier for you to get started with monitoring: Prometheus community recommended alerts: Offers one-click enablement of Prometheus recommended alerts for AKS clusters (GA) and Arc-enabled Kubernetes clusters (Public Preview), providing comprehensive alerting coverage across cluster, node, and pod levels. Simple log alerts (Public Preview): Designed to provide a simplified and more intuitive experience for monitoring and alerting, Simple log alerts evaluate each row individually, providing faster alerting compared to traditional log alerts. Simple log alerts support multiple log tiers, including Analytics and Basic Logs, which previously did not have any alerting solution. Learn more. Customizable email subjects for log search alerts (Public Preview): Allows customers to personalize the subject lines of alert emails including dynamic values, making it easier to quickly identify and respond to alerts. Send a custom event from the Azure Monitor OpenTelemetry Distro (GA): Offers developers a way to track user or system actions that matter the most to their business objectives, now available in the Azure Monitor OpenTelemetry Distro. Learn more. Application Insights auto-instrumentation for Java and Node Microservices on AKS (Public Preview): Easily monitor your Java and Node deployments without changing your code by leveraging auto-instrumentation that is integrated into the AKS cluster. These capabilities will help you easily assess the performance of your application and identify the cause of incidents efficiently. Learn more. Enhancements for Large Enterprises and Government Entities Azure Monitor Logs is introducing several new features aimed at supporting highly sensitive and high-volume logs, empowering large enterprises and government entities. With better data control and access, developers at these organizations can work better with IT Professionals to improve the reliability of their applications. Workspace replication (GA): Enhances resilience to regional incidents by enabling cross-regional workspace replication. Logs are ingested in both regions, ensuring continued observability through dashboards, alerts, and advanced solutions like Microsoft Sentinel. Granular RBAC (Public Preview): Supports granular role-based access control (RBAC) using Azure Attribute-Based Access Control (ABAC). This allows organizations to have row-level control on which data is visible to specific users. Data deletion capability (GA): Allows customers to quickly mark unwanted log entries, such as sensitive or corrupt data, as deleted without physically removing them from storage. It’s useful for unplanned deletions using filters to target specific records, ensuring data integrity for analysis. Process more log records in the Azure Portal (GA): Supports up to 100,000 records per query in the Azure Portal, enabling deeper investigations and broader data analysis directly within the portal without need for additional tools. We’re proud to further Azure Monitor's commitment to providing comprehensive and efficient observability solutions for developers, SREs, and IT Professionals alike. For more information, chat with Observability experts through the following sessions at Build 2025: BRK168: AI and Agent Observability with Azure AI Foundry and Azure Monitor BRK188: Power your AI Apps Across Cloud and Edge with Azure Arc DEM547: Enable application monitoring and troubleshooting faster with Azure Monitor DEM537: Mastering Azure Monitor: Essential Tips in 15 Minutes Expo Hall (Meet the Experts): Azure Arc and Azure Monitor booth2.5KViews2likes0CommentsGA: Managed Prometheus visualizations in Azure Monitor for AKS — unified insights at your fingertips
We’re thrilled to announce the general availability (GA) of Managed Prometheus visualizations in Azure Monitor for AKS, along with an enhanced, unified AKS Monitoring experience. Troubleshooting Kubernetes clusters is often time-consuming and complex whether you're diagnosing failures, scaling issues, or performance bottlenecks. This redesign of the existing Insights experience brings all your key monitoring data into a single, streamlined view reducing the time and effort it takes to diagnose, triage, and resolve problems so you can keep your applications running smoothly with less manual work. By using Managed Prometheus, customers can also realize up to 80% savings on metrics costs and benefit from up to 90% faster blade load performance delivering both a powerful and cost-efficient way to monitor and optimize your AKS environment. What’s New in GA Since the preview release, we’ve added several capabilities: Control plane metrics: Gain visibility into critical components like the API server and ETCD database, essential for diagnosing cluster-level performance bottlenecks. Load balancer chart deep links: Jump directly into the networking drilldown view to troubleshoot failed connections and SNAT port issues more efficiently. Improved at-scale cluster view: Get a faster, more comprehensive overview across all your AKS clusters, making multi-cluster monitoring easier. Simplified Troubleshooting, End to End The enhanced AKS Monitoring experience provides both a basic (free) tier and an upgraded experience with Prometheus metrics and logging — all within a unified, single-pane-of-glass dashboard. Here’s how it helps you troubleshoot faster: Identify failing components immediately With new KPI Cards for Pod and Node Status, you can quickly spot pending or failed pods, high CPU/memory usage, or saturation issues, decreasing diagnosis time. Monitor and manage cluster scaling smoothly The Events Summary Card surfaces Kubernetes warnings and pending pod states, helping you respond to scale-related disruptions before they impact production. Pinpoint root causes of latency and connectivity problems Detailed node saturation metrics, plus control plane and load balancer insights, make it easier to isolate where slowdowns or failures are occurring — whether at the node, cluster, or network layer. Free vs. Upgraded Metrics Overview Here’s a quick comparison of what’s included by default versus what you get with the enhanced experience: Basic tier metrics Additional metrics in upgraded experience Alert summary card Historical Kubernetes events (30 days) Events summary card Warning events by reason Pod status KPI card Namespace CPU and memory % Node status KPI card Container logs by volume Node CPU and memory % Top five controllers by logs volume VMSS OS disk bandwidth consumed % (max) Packets dropped I/O VMSS OS disk IOPS consumed % (max) Load balancer SNAT port usage API server CPU % (max) (preview) API server memory % (max) (preview) ETCD database usage % (max) (preview) See What Customers Are Saying Early adopters have already seen meaningful improvements: "Azure Monitor managed Prometheus visualizations for Container Insights has been a game-changer for our team. Offloading the burden of self-hosting and maintaining our own Prometheus infrastructure has significantly reduced our operational overhead. With the managed add-on, we get the powerful insights and metrics we need without worrying about scalability, upgrades, or reliability. It seamlessly integrates into our existing Azure environment, giving us out-of-the-box visibility into our container workloads. This solution allows our engineers to focus more on building and delivering features, rather than managing monitoring infrastructure." – S500 customer in health care industry Get Started Today We’re committed to helping you optimize and manage your AKS clusters with confidence. Visit the Azure portal and explore the new AKS Monitoring experience today! Learn more: https://aka.ms/azmon-prometheus-visualizations364Views1like0CommentsPublic Preview: Metrics usage insights for Azure Monitor Workspace
As organizations expand their services and applications, reliability and high availability are a top priority to ensure they provide a high level of quality to their customers. As the complexity of these services and applications grows, organizations continue to collect more telemetry to ensure higher observability. However, many are facing a common challenge: increasing costs driven by the ever-growing volume of telemetry data. Over time, as products grow and evolve, not all telemetry remains valuable. In fact, over instrumentation can create unnecessary noise, generating data that contributes to higher costs without delivering actionable insights. In a time where every team is being asked to do more with less, identifying which telemetry streams truly matter has become essential. To address this need we are announcing the Public Preview of ‘metrics usage insights’, a feature currently designed for Azure Managed Prometheus users which will analyze all metrics ingested in Azure Managed Workspace (AMW), surfacing actionable insights to optimize your observability setup. Metrics usage insights is built to empower teams with the visibility and tools the organizations need to manage observability costs effectively. It empowers customers to pinpoint metrics that align with their business objectives, uncover areas of unnecessary spend by identifying unused metrics, and sustain a streamlined, cost-effective monitoring approach. Metrics usage insights sends usage data to a Log Analytics Workspace (LAW) for analysis. This is a free offering, and there is no charge associated for the data sent to the Log Analytics workspace, storage or queries. Customers will be guided to enable the feature as part of the standard out of the box experience during new AMW resource creation. For existing AMWs this can be configured using diagnostic settings. Key Features 1.Understanding Limits and Quotas for Effective Resource Management Monitoring limits and quotas is crucial for system performance and resource optimization. Tracking usage aids in efficient scaling and cost avoidance. Metrics usage insights provides tools to monitor thresholds, resolve throttling, and ensure cost-effective operations without the need for creating support incidents. 2.Workspace Exploration This experience lets customers explore their AMW data and gain insights. It provides a detailed analysis of data points and samples ingested for billing, both at metric and workspace levels. Customers can evaluate individual metrics by examining their quantity, ingestion volume, and financial impact. 3.Identifying and Removing Unused Metrics The metrics usage insights feature helps identify underutilized metrics that are being ingested, but not used through dashboards, monitors, and API calls. Users facing high storage and ingestion costs can use this feature to delete unused metrics to optimize high-cost metrics, and reclaim capacity. Enable metrics usage insights To enable metrics usage insights, you create a diagnostic setting, which instructs the AMW to send data supporting the insights queries and workbooks to a Log Analytics Workspace (LAW). You'll be prompted to enable it automatically when you create a new Azure Monitor workspace. You can enable it later for an existing Azure Monitor workspace. Read More548Views3likes0CommentsAzure Monitor Application Insights Auto-Instrumentation for Java and Node Microservices on AKS
Key Takeaways (TLDR) Monitor Java and Node applications with zero code changes Fast onboarding: just 2 steps Supports distributed tracing, logs, and metrics Correlates application-level telemetry in Application Insights with infrastructure-level telemetry in Container Insights Available today in public preview Introduction Monitoring your applications is now easier than ever with the public preview release of Auto-Instrumentation for Azure Kubernetes Service (AKS). You can now easily monitor your Java and Node deployments without changing your code by leveraging auto-instrumentation that is integrated into the AKS cluster. This feature is ideal for developers or operators who are... Looking to add monitoring in the easiest way possible, without modifying code and avoiding ongoing SDK update maintenance. Starting out on their monitoring journey and looking to benefit from carefully chosen default configurations with the ability to tweak them over time. Working with someone else’s code and looking to instrument at scale. Or considering monitoring for the first time at the time of deployment. Before the introduction of this feature, users needed to manually instrument code, install language-specific SDKs, and manage updates on their own—a process that involved significant effort and numerous opportunities for errors. Now, all you need to do is follow a simple two-step process to instrument your applications and automatically send correlated OpenTelemetry-based application-level logs, metrics, and distributed tracing to your Application Insights resource. With AKS Auto-Instrumentation, you will be able to assess the performance of your application and identify the cause of any incidents more efficiently using the robust application performance monitoring capabilities of Azure Monitor Application Insights. This streamlined approach not only saves time but also ensures that your monitoring setup is both reliable and scalable. Feature Enablement and Onboarding To onboard to this feature, you will need to follow a two-step process: Prepare your cluster by installing the application monitoring webhook. Choose between namespace-wide onboarding or per-deployment onboarding by creating K8’s custom resources. Namespace-wide onboarding is the easiest method. It allows you to instrument all Java or Node deployments in your namespace and direct telemetry to a single Application Insights resource. Per-deployment onboarding allows more control by targeting specific deployments and directing telemetry to different Application Insights resources. Once the custom resource is created, you will need to deploy or redeploy your application, and telemetry will start flowing to Application Insights. For step-by-step instructions and to learn more about onboarding visit our official documentation on MS Learn. The Application Insights experience Once telemetry begins flowing, you can take advantage of Application Insights features such as Application Map, Failures/Performance Views, Availability, and more to help you efficiently diagnose and troubleshoot application issues. Let’s look at an example: I have an auto-instrumented distributed application running in the demoapp namespace of my AKS cluster. It consists of: One Java microservice Two Node.js microservices MongoDB and Redis as its data layer Scenario: End users have been complaining about some latency in the application. As the DRI, I can start my troubleshooting journey by going to the Application Map to get a topological view of my distributed application. I open Application Map and notice MicroserviceA has a red border - 50% of calls are erroring. The Container Insights card shows healthy pods - no failed pods or high CPU/memory usage. I can eliminate infrastructure issues as the cause of the slowness. In the Performance card, I spot that the rescuepet operation has an average duration of 10 seconds. That's pretty long. I drill in to get a distributed trace of the operation and find the root cause: an OutOfMemoryError. In this scenario, the issue has been identified as an out-of-memory error at the application layer. However, when the root cause is not in the code but in the infrastructure I get a full set of resource properties with every distributed trace so I can easily identify the infra resources running each span of my trace. I can click the investigate pods button to transition to Azure Monitor Container Insights and investigate my pods further. This correlation between application-level and infrastructure-level telemetry makes it much easier to determine whether the issue is caused by the application or the infrastructure. Pricing There is no additional cost to use AKS auto-instrumentation to send data to Azure Monitor. You will be only charged as per the current pricing. What’s Next Language Support This integration supports Java and Node workloads by leveraging the Azure Monitor OpenTelemetry distro. We have distros for .NET and Python as well and we are working to integrate these distros into this solution. At that point, this integration will support .NET, Python, Java and Node.js. For customers that want to instrument workloads in other languages such as Go, Ruby, PHP, etc. we plan to leverage open-source instrumentations available in the Open Telemetry community. In this scenario, customers will instrument their code using open source OpenTelemetry instrumentations, and we will provide mechanisms that will make it easy to channel the telemetry to Application Insights. Application Insights will expose an endpoint that accepts OpenTelemetry Language Protocol (OTLP) signals and configure the instrumented workload to channel the telemetry to this endpoint. Operating Systems and K8’s Controllers Right now, you can only instrument kubernetes deployments running on Linux node pools, but we plan to expand support to introduce support for Linux ARM64 node pools as well as support for StatefulSet, Job, Cronjob, and Replicaset controller types. Portal Experiences We are also working on Azure portal experiences to make onboarding easier. When our portal experiences for onboarding are released, users will be able to install the Application Insights extension for AKS using the portal and use a portal user interface to instrument their workloads instead of having to create custom resources. Beyond onboarding, we are working to build Application Insights consumption experiences within the AKS namespace and workloads blade. You will be able to see application-level telemetry right there in the AKS portal without having to navigate away from your cluster to Application Insights. FAQs: What are the advantages of AKS Auto-Instrumentation? No code changes required No access to source code required No configuration changes required Eliminates instrumentation maintenance What languages are supported by AKS Auto-Instrumentation? Currently, AKS Auto-Instrumentation supports Java and Node.js applications. Python and .NET support is coming soon. Moreover, we will be adding support for all OTel supported languages like Go soon via native OTLP ingestion. Does AKS Auto-Instrumentation support custom metrics? For Node.js applications, custom metrics require manual instrumentation with the Azure Monitor OpenTelemetry Distro. Java applications allow custom metrics with auto-instrumentation. Click here for more FAQs. This article was co-authored by Rishab Jolly and Abinet Abate758Views0likes0Comments