azure monitor
1345 TopicsModern VM monitoring, powered by OpenTelemetry
At Build 2026, we're announcing the general availability of OpenTelemetry (OTel) Guest OS metrics for Azure VMs and Arc-enabled Servers. OTel provides a standards-based foundation for VM monitoring with consistent metrics across Windows and Linux, richer Guest OS and per-process visibility, and streamlined integration with open-source and cloud-native observability tools. Alongside the GA, we're introducing an enhanced VM monitoring experience, recommended alerts, and out-of-the-box Grafana dashboards, all powered by OTel Guest OS metrics. We're also sharing upcoming VM troubleshooting capabilities in the Azure Copilot observability agent enriched by OTel Guest OS metrics. What are OpenTelemetry Guest OS metrics OTel Guest OS metrics are collected from inside a VM. Today's coverage includes a curated set of CPU, memory, disk I/O, networking, and per-process metrics including CPU utilization, memory usage, uptime, and thread count. The supported set is point-in-time and will continue to expand as the OTel Host Metrics Receiver evolves upstream. This level of visibility helps customers diagnose operating system and application issues without manually signing into individual VMs. Why they matter 1. Lower cost and faster queries Default OTel Guest OS metrics are available at no additional cost. They are stored in Azure Monitor Workspace using metric-optimized storage and pricing, providing lower cost and faster query performance compared to LA-based metrics. 2. Per-process visibility for deeper troubleshooting Customers can optionally enable per-process metrics for deeper visibility into VM resource consumption. This helps identify noisy processes, memory leaks, runaway jobs, or resource-intensive applications without manually signing into the VM. 3. Consistent metrics across Windows and Linux Use the same metric names, dashboards, and alerts across operating systems without maintaining separate monitoring configurations. 4. Native PromQL support Use PromQL with the scale and managed experience of Azure Monitor Workspace. 5. OpenTelemetry-based standardization Use the same metrics across Azure Monitor, existing OTel pipelines, or other compatible observability backends. Log Analytics (LA)‑based metrics vs OTel‑based metrics Customers running workloads on Azure VMs and Arc-enabled Servers have long relied on Log Analytics (LA)-based metrics for fleet visibility. That experience continues to be generally available and trusted by thousands of customers. We recommend evaluating your requirements to determine which approach best suits your needs. LA-based metrics remain the foundation for customers who need advanced analytics and correlation, while OTel-based metrics open new possibilities for modern VM observability. Learn more. New Capabilities Powered by OpenTelemetry VM monitoring experience powered by OpenTelemetry (GA) We're excited to announce the general availability of the enhanced monitoring experience for Azure VMs and Arc servers. This experience brings comprehensive monitoring capabilities in a single, streamlined view, helping you more efficiently observe, diagnose, and optimize your virtual machines. The new experience offers two levels of insight within one unified interface: Basic view (Host OS-based): Available for all Azure VMs with no configuration required. This view surfaces key host-level metrics including CPU, disk, and network performance for quick health checks. Detailed view (Guest OS-based): Requires simple onboarding. Azure Monitor continues to support the GA detailed view powered by Log Analytics-based metrics. Customers can now choose to power the experience using OTel Guest OS metrics, which enable recommended alerts and provide expanded visibility into Guest OS and process-level resource consumption, including CPU, memory, disk I/O, and networking. Dashboards with Grafana for VMs For deeper analysis and customization, customers can leverage Azure Monitor dashboards with Grafana powered by OTel Guest OS metrics and PromQL at no additional cost. Built-in dashboards provide out-of-the-box visualizations for at-scale monitoring, host-level monitoring, Guest OS monitoring, and per-process monitoring, while still allowing teams to: Customize panels and dashboards Run ad hoc investigations Import dashboards from the Grafana community Share dashboards using Azure RBAC and ARM/Bicep deployment support Together, the enhanced VM monitoring experience and Grafana dashboards provide both streamlined day-to-day monitoring and flexible deep troubleshooting capabilities for modern VM environments. Query metrics in the context of your resources (GA) We’re also announcing the general availability of resource-scope querying for Azure Monitor Workspace (AMW) metrics, including OTel Guest OS metrics. With resource-scope query, you can query metrics directly from the context of a resource, resource group, or subscription, without needing to know which workspace stores the data. This simplifies troubleshooting, aligns with Azure-native workflows, and enforces least-privilege access using Azure RBAC. This capability powers scenarios like querying OTel Guest OS metrics directly from the Virtual Machine resource in Azure Portal, or resources can be scoped as a dedicated data source in Grafana to query with PromQL, making it easier for application and infrastructure teams to monitor and troubleshoot in the context of their workloads. Coming soon: Observability Agent Troubleshooting for VMs (Public Preview) Today, the Observability Agent helps customers investigate issues by correlating applications, infrastructure signals, LA-based metrics, logs, alerts, health information, and recent changes into a guided investigation narrative. Support for OTel Guest OS metrics is coming soon, extending investigations with richer Guest OS and per-process visibility. With OTel Guest OS metrics, the Observability Agent will be able to incorporate finer-grained operating system and process-level insights into its analysis, helping customers more quickly identify resource bottlenecks and understand their impact on application performance. Instead of manually piecing signals together across multiple tools and timelines, customers will receive a guided investigation summary with likely causes and recommended next steps. Combined with the new VM monitoring experience and Grafana dashboards, customers will have both AI-assisted investigations and powerful manual troubleshooting tools built on the same OTel foundation. Onboarding VMs at scale to OpenTelemetry Onboarding Azure VMs and Arc-enabled Servers to OTel Guest OS metrics is now simpler and more cost-efficient than ever. For teams getting started at scale, the easiest path is through the Monitoring Coverage experience in the Azure portal, where you can review recommended resources and onboard VMs through a guided workflow. Customers that prefer infrastructure-as-code can use ARM and Bicep templates to apply the same monitoring configuration programmatically. Azure Advisor recommendations provide another seamless entry point for onboarding, proactively identifying VMs that are not fully monitored and guiding customers to enable OTel -based monitoring with a few clicks. This helps teams continuously improve coverage across their fleet without needing to manually audit resources. Customers can now also reuse an existing Data Collection Rule (DCR) during onboarding, making it easier to standardize monitoring across large VM fleets. After onboarding, teams can centrally evolve their monitoring configuration by updating that DCR to collect additional metrics and logs, with changes applying across all associated VMs. Get Started Explore the new OpenTelemetry-powered experiences today: Enable enhanced monitoring for an Azure virtual machine - Azure Monitor Migrate from logs-based to OpenTelemetry metrics for Azure virtual machines - Azure Monitor Metrics experience for virtual machines in Azure Monitor - Azure Monitor Use Dashboards with Grafana for Azure Virtual Machines - Azure Monitor82Views0likes0CommentsAzure Monitor SLIs now Generally Available
Azure Monitor SLIs are now generally available Service Level Indicators (SLIs) and Service Level Objectives (SLOs) in Azure Monitor are now generally available. Teams can now measure reliability based on customer experience, not just infrastructure signals. SLI: A quantitative measure of how well an application or service is performing from the customer’s point of view. SLO: A defined target for an SLI that represents how good or bad the SLI is over a given time-period. This is also referred to as a baseline in Azure Monitor. Traditional monitoring shows what is happening across your systems, but not always what customers are experiencing. A service can be technically available and still feel unreliable because of latency, partial failures, or dependency issues. SLIs help close that gap by measuring reliability from the customer’s point of view. With GA, Azure Monitor now brings SLI authoring, SLO tracking, error budgets, and burn rate–based alerting into one experience, helping teams focus on whether they are meeting the reliability their customers expect. What Azure Monitor SLI helps you do Azure Monitor SLI lets you measure availability and latency with either request-based or window-based evaluation methods. In Azure Monitor, SLIs are defined at the Service Group level, which provides a logical representation of your application across multiple resources. This gives teams a clearer view of application health, customer impact, and the signals that matter most. SLIs continuously evaluate your service by using existing Azure Monitor metrics and store the resulting evaluations in your Azure Monitor Workspace. Azure Monitor uses these SLI evaluations to power error budgets, burn rate visualization, and alerting. This helps teams spot reliability issues earlier and make better release and incident response decisions. Get started To get started, you’ll need: A Service Group. Application metrics flowing into an Azure Monitor Workspace, for example through Managed Prometheus or OpenTelemetry Collect and analyze OpenTelemetry data with Azure Monitor (Preview) - Azure Monitor | Microsoft Learn Learn more here. Summary Azure Monitor SLI helps teams measure customer experience, track reliability against clear targets, and respond sooner with error budgets and burn rate–based alerting. Learn more in the product documentation and start defining SLIs for your services in Azure Monitor today.67Views0likes0CommentsAzure Monitor Metrics Export Generally Available
Today, we’re excited to announce the general availability of Azure Monitor Metrics Export using data collection rules (DCRs). A scalable, flexible way to continuously export platform metrics with dimensional fidelity, lower latency, and more control over what you send downstream. Azure Monitor Metrics Export is configured through data collection rules and can route platform metrics to Azure Storage accounts, Azure Event Hubs, or Azure Log Analytics workspaces. Compared to diagnostic settings, DCR-based metrics export supports multidimensional metrics, metric-name filtering, and improved scalability for large environments. Here are some of the key benefits of Azure Monitor Metrics Export: Control what you export: You can export all supported metrics for a resource type or filter to specific metric names, helping reduce downstream volume and manage cost. Preserve dimensional fidelity: The DCR-based metric export supports multidimensional metrics, making downstream analysis and correlation more meaningful. Get faster export latency: End-to-end export latency is typically within about 3 minutes, improving time to insight for operational and analytics workflows. With Azure Monitor Metrics Export, organizations can build more scalable observability pipelines, route metrics to the destinations that fit their architecture, and unlock richer analysis for operations, reporting, and integration scenarios. What’s new in GA With general availability, Azure Monitor Metrics Export offers a production-ready path to continuously stream supported platform metrics using data collection rules. Azure Monitor Metrics Export now covers 44 Azure regions, up from 12 regions previously. This expanded footprint helps more customers adopt DCR-based metrics export closer to where their resources run, improving rollout flexibility for global deployments. Customers can export metrics to Azure Storage, Azure Event Hubs, or Azure Log Analytics, preserve metric dimensions, and filter by metric name to better control downstream volume and cost. Learn more about metrics export using data collection rules. We’re excited to make Azure Monitor Metrics Export generally available and look forward to seeing how customers use it to build more reliable, cost-conscious, and extensible monitoring solutions on Azure.70Views0likes0CommentsPUBLIC PREVIEW - Azure Monitor - Collect Azure Resource Platform Logs at Scale with DCRs
PUBLIC PREVIEW - Azure Monitor - Collect Azure Resource Platform Logs at Scale with DCRs. How DCR-based platform logs simplify the telemetry collection for organizations managing 1,000+ resources.385Views2likes0CommentsAzure Monitor Copilot Observability Agent: What’s new at Build
The Observability agent in Azure Copilot is an AI-powered assistant built into Azure Monitor that helps engineers investigate issues and explore their systems using natural language. By grounding its analysis in telemetry data such as metrics, logs, and traces, it supports both open-ended exploration and guided troubleshooting. For more details, see the documentation. Since our initial public preview, the Observability agent in Azure Copilot has continued to evolve with new capabilities and expanded coverage (You can read more about the initial release in our previous blog) At Build 2026, we’re introducing updates that expand the Observability agent’s capabilities and the range of scenarios it can support. These updates provide deeper analysis and more detailed responses for both exploration and investigation. Expanded Investigation Scenarios The Observability agent now supports a broader set of scenarios across applications and infrastructure. These can be accessed directly from relevant product experiences, without requiring a prior alert, allowing teams to explore data conversationally and initiate deeper investigations as signals emerge. Integration with Microsoft Foundry AI Agent The Observability agent integrates with Microsoft Foundry AI Agents, enabling correlation of signals across key generative AI and agent observability scenarios such as latency spikes, error patterns, and tool invocation failures. Teams can interact with the Observability agent either from alerts - including alerts based on Foundry telemetry - or directly within Application Insights, where the Agents details experience serves as the primary entry point. From there, users can use the Observability agent to diagnose errors, analyze trends, and explore their data across one or multiple agents. Application Insights integration The Observability agent enables investigation of failure scenarios directly from Application Insights Failures blade, allowing teams to analyze application-level issues and move from symptom to root cause. Azure Kubernetes Service (AKS) integration The Observability agent enables deep investigation of issues in Azure Kubernetes Service (AKS) clusters. AKS investigations correlate signals from Azure Monitor with Kubernetes logs and events, and (coming soon) Prometheus metrics stored in an Azure Monitor Workspace. Together, these signals enable full‑stack analysis of applications running on AKS. The Observability agent helps teams determine whether an issue originates from the application or from the underlying Kubernetes platform, reducing time to diagnosis and resolution. Activity Logs integration Investigations can be initiated based on Azure Resource Health events surfaced in Activity Logs, enabling analysis of service-impacting signals related to the Azure platform. Deeper Insights across systems Multiple Application Insights - Coming soon! The Observability agent supports investigations that can span multiple Application Insights resources, enabling scenarios that involve multiple services within distributed applications. The agent can guide users to expand the investigation scope when cross-service issues are detected. Integration with Azure Service Health The Observability agent correlates investigation context with Azure Service Health events, helping teams understand potential platform impact as part of their investigation. This helps distinguish application-level issues from broader Azure platform conditions and prioritize active impacts. Issue management Enhancements Viewing issues Issues can now be viewed in multiple places, depending on the required scope: Azure Monitor: showing issues across all Azure Monitor Workspaces (AMWs) under the selected subscriptions Azure Monitor Workspace: showing issues stored within a specific AMW Issue actions & notifications Issue actions trigger notifications when issues are created or updated, enabling integration with workflows such as email, webhooks, and automation. Sharing and follow-up You can now download investigation results as a PDF, including supported data, enabling teams to capture and share investigation context for incident reviews and reporting. Coming Soon Billing for the Observability agent starts on July 1, 2026. The agent uses a consumption-based pricing model, so customers pay only for the AI work the agent performs. Agent consumption is measured in Azure Agent Credit (AAC) units, which reflect how many LLM tokens the agent used. For more details, see the documentation. Stay connected Follow this blog for ongoing updates and deeper dives into new capabilities Join our upcoming webinar for real-world scenarios, best practices, and a look at what’s coming next 👉 Register here We’d love your feedback The Observability Agent continues to evolve based on real-world usage and customer feedback. Share feedback through the Give Feedback option in the product or contact us at: azureobsagent@microsoft.com Want to learn more? Read our previous blog posts - Public Preview Update: Azure Copilot Observability Agent | Microsoft Community Hub The Azure Copilot Observability Agent Chat - Stop Writing Queries, Start Asking Questions. | Microsoft Community Hub Explore our documentation - Azure Copilot observability agent (preview) - Azure Monitor | Microsoft Learn358Views0likes1CommentIs 94% of your syslog just noise? Now you can filter it out before ingestion.
At Microsoft Build 2026, we are announcing the public preview of multi-stage transformations for Azure Monitor Data Collection Rules (DCRs). Multi-stage transformations let you filter, aggregate, parse, and map your logs at the point of collection, before data is ingested into your workspace. Processing happens in a defined sequence of steps called processors, and you can chain them together to build precise data pipelines that reduce ingestion volume, improve data quality, and lower monitoring costs. Processors in orange run on the agent (client-side). The KQL transform in green runs in the ingestion pipeline. Data volume shrinks at each stage. What are multi-stage transformations? A Data Collection Rule defines how Azure Monitor collects, transforms, and routes telemetry data. Until now, DCRs supported a single KQL transformation step on the ingestion side. Multi-stage transformations extend this model by introducing a processor pipeline: an ordered sequence of processing steps that run on the agent (client-side) or at the ingestion endpoint (ingestion-side), or both. Each processor performs one operation: filtering records, parsing structured fields from raw text, renaming or dropping columns, aggregating metrics, or running a KQL expression. Processors execute in order, and the output of one becomes the input to the next. This composable design replaces what previously required complex, monolithic KQL queries or external pre-processing scripts. Client-side processors run on the Azure Monitor Agent before data leaves the source machine. This means filtered and aggregated data never crosses the network, reducing both egress and ingestion costs. Ingestion-side processors run in the Log Analytics ingestion pipeline and support KQL-based transformations for more complex logic. Key applications The most immediate use case is cost reduction. When you can filter records on the agent before they leave the machine, you stop paying for data you never query. Syslog is the classic example: in many environments, informational and debug messages make up the vast majority of volume, and none of it gets looked at unless something breaks. A single filter processor can cut that stream by 90% or more. Aggregation is equally powerful for high-frequency telemetry. Performance counters sampled every 15 seconds produce millions of records per hour across a large fleet, but most dashboards and alert rules only need 5-minute granularity. Rolling up those samples on the agent, before they cross the network, dramatically reduces ingestion without losing the operational signal your team actually relies on. Beyond cost, multi-stage transformations improve the quality of the data that does reach your workspace. Parsing structured fields out of raw text (JSON payloads, XML event data, CEF security logs) at collection time means downstream queries are simpler and faster. And because each processor handles one step in a readable sequence, maintaining the pipeline is far easier than debugging a single monolithic KQL expression that tries to do everything at once. To make this concrete, let’s walk through the two highest-impact patterns we see with preview customers: filtering noisy syslog data and aggregating performance counters. Filter data before ingestion The filter processor evaluates each record against conditions you define and drops anything that does not match. Because filtering runs on the agent, dropped records are never serialized, transmitted, or ingested. This makes it the highest-impact processor for cost reduction. You configure filters using simple field-level conditions: specify a column name, an operator (equals, not equals, greater than, contains, etc.), and a value. Conditions can be combined with AND/OR logic for precise control. Scenario: Keep only warning-and-above syslog messages A typical syslog stream generates thousands of informational and debug messages for every actionable warning or error. With a filter processor, you set a severity threshold, and the agent drops everything below it before transmission. In this example, the filter keeps records where SeverityNumber >= 4 (Warning). The 57,000 debug and informational records per hour are dropped on the machine. Only the 3,250 actionable records are transmitted and ingested, a 94% reduction in syslog volume. Filters also support compound conditions. For example, you can keep auth-facility errors OR any critical message regardless of facility, all in a single processor step. This kind of targeted filtering is especially useful for security teams that need specific event categories without paying for the full syslog firehose. Aggregate logs before ingestion The aggregate processor rolls up high-frequency records into time-windowed summaries on the agent. This is especially valuable for performance counters, heartbeat signals, and any telemetry where per-second granularity is not needed for operational decisions. You configure the processor with a time window (for example, 5 minutes), the aggregation operators to apply (average, sum, min, max, count), and the dimension columns to group by (such as host name and counter name). The agent collects records within each window, computes the aggregates, and emits one summary record per group. Scenario: Roll up performance counters into 5-minute summaries A fleet of 500 VMs, each reporting 10 performance counters every 15 seconds, generates roughly 2 million raw records per hour. Most operational dashboards and alert rules use 5-minute granularity, making the per-sample detail redundant. With the aggregate processor, each agent rolls up its local counter stream into 5-minute windows, grouped by counter name. Each summary record contains the average, maximum, and sample count for that window. Raw data After aggregation (5-min windows) Records per VM per hour 2,400 (10 counters x 4/min x 60 min) 120 (10 counters x 12 windows) Records across 500 VMs per hour 1,200,000 60,000 Volume reduction 95% Operational fidelity Per-sample (15s) Avg, max, and count per 5 min Because the aggregation runs on the agent, the reduced data set is what gets transmitted and ingested. Dashboards and alerts that rely on 5-minute granularity work identically, but ingestion costs drop by 95%. Route the output to a custom table with columns that match the aggregate output (average, max, count, and your dimension columns). Chain processors for complete pipelines Processors are composable. A common pattern chains a header processor (to convert raw data into tabular format), a filter (to drop irrelevant records), a parse step (to extract fields from structured payloads), and a column drop (to remove fields not needed downstream). Scenario: Parse, filter, and slim down Windows Event logs Consider a security team that needs logon success and failure events (Event IDs 4624 and 4625) from the Windows Security log. The raw event stream contains hundreds of event types, each carrying a large XML payload. A four-step pipeline handles this: Header processor converts the raw event stream into tabular rows Parse processor extracts EventID and TargetUser from the XML payload into typed columns Filter processor keeps only logon success (4624) and failure (4625) events, dropping everything else Drop processor removes the bulky RawXml and RenderingInfo columns that are no longer needed The result is a lean, security-focused data set containing only the events and fields the team actually queries. Each step is independent and can be modified without affecting the others. Authoring multi-stage DCRs Multi-stage transformations are available through the Azure portal and through the REST API (version 2025-05-11). The portal provides a visual editor for building processor pipelines, previewing the schema at each stage, and validating the configuration before deployment. The Transform tab in the DCR data source configuration lets you add processors at each stage and preview the resulting schema. For infrastructure-as-code workflows, the full DCR JSON can be authored and deployed via ARM templates, Bicep, or direct REST API calls. To get started: Open Azure Monitor in the Azure portal and navigate to Data Collection Rules Create a new DCR or edit an existing one In the data source configuration, select Edit transformation Author your transformation logic across client and ingestion stages using the set of available processors Preview the schema output at each stage to verify the pipeline produces the expected result Save and associate the DCR with your target resources Preview notes: Multi-stage transformations are available in public preview starting June 3, 2026 Client-side processors require Azure Monitor Agent version 1.35 or later Aggregation output must be routed to custom tables (standard table schemas do not match aggregate output) Data collection, workspace ingestion, and alert rules may incur costs based on the settings you enable. Preview pricing may differ from general availability pricing. See Azure Monitor pricing for current rates To learn more, see: Data Collection Rules overview Looking ahead Multi-stage transformations are part of our continued investment in giving teams control over their data before it reaches the workspace. During the preview period, we plan to expand processor coverage, add support for additional data source types, and incorporate user feedback into the authoring and validation experience. We are also exploring how multi-stage transformations can serve as the foundation for advanced scenarios such as data scrubbing, inline enrichment from external reference data, and AI-assisted pipeline authoring. These capabilities will build on the same processor model, so pipelines you create today will extend naturally as new processors become available. We welcome your feedback as you try multi-stage transformations. Use the feedback options in the Azure portal, or reach out through your Microsoft account team. This feature is currently in preview. Previews are provided "as-is," "with all faults," and "as available," and are excluded from the service level agreements and limited warranty. For more information, see Supplemental Terms of Use for Microsoft Azure Previews]. Statements in this post about future plans and capabilities represent our current intentions and are subject to change. They should not be relied upon when making purchasing decisions.1.2KViews3likes2CommentsAny source. Any destination. Ready for AI-era.
Telemetry is exploding, every new app, edge node, and AI agent is a new firehose, and AI has raised the bar on what that telemetry must be: governed, on open standards, observable at agent scale. Today, most teams answer that by stitching together a stack of disconnected tools, each catering to a set of data sources, another that offers transforms, different ones for routing to each destination, and wrappers on top for some essence of much-needed enterprise governance, all struggling to be held together by glue code and tribal knowledge. This is the gap we're closing at Build 2026, with every announcement lining up with what modern, AI-shaped workloads need most: An AI-native standard, ready for enterprises: OpenTelemetry direct ingest, GA Headroom for bursty AI-agent traffic: Azure Monitor pipeline scaling to billions of events per day One governance plane for AI and Azure platform telemetry (via DCRs) AI-noise controlled at the right point in the journey: Multi-stage transforms Coverage AI can trust: Monitoring Coverage so AI can reason on complete signals instead of blind spots. …..All organized around the journey your data takes: 1 · Discover Most teams think they're monitoring everything, until an incident proves they aren't! Monitoring Coverage turns hope into evidence by answering 3 questions at fleet scale: is monitoring configured, are the right alerts in place, is telemetry actually flowing? Go from “I think we’re covered” to “I know we are”: Is Your Monitoring Actually Working? What's New in Monitoring Coverage | Microsoft Community Hub 2 · Collect Whatever your source, Azure-native or open standard, you shouldn't need a different platform, agent, or governance model to bring it in. At Build, two big shifts close that gap: Govern Azure platform telemetry like the rest of your data: No more per-resource diagnostic settings or separate tooling for platform metrics and logs. They now ride the same policy-based control plane you already use for the rest of Azure Monitor with one model, one audit story, scoped at scale. Platform metrics support - GA Platform logs support - Public preview coming soon! Bring OpenTelemetry straight in - GA: Send OTLP logs, metrics, and traces directly to Azure Monitor and land them in Application Insights, Log Analytics, Azure Monitor Workspace (Prometheus), and Grafana, no shim, no detour! Direct OpenTelemetry ingestion into Azure Monitor is now generally available Have additional OTel collection needs? Tell us us more by filling out this quick survey! 3 · Shape Observability and storage budgets are dying a death by a thousand low-value log lines. The question today is no longer whether to shape your telemetry, it's where. Multi-stage transformations (public preview) now lets you control telemetry where it matters: at the source, in-pipeline, or post-ingest before, all before data lands at its destination. Drop noise early, enrich centrally, and optimize cost without losing signal: Is 94% of your syslog just noise? Now you can filter it out before ingestion. | Microsoft Community Hub 4 · Ingest at scale When telemetry volume spikes, you need a pipeline that doesn't blink. 17 billion events, per day, per replica. That's what Azure Monitor pipeline now sustains, generally available since April ’26, as the living proof of ‘any source, any destination’. This is the high-scale, multi-cloud, edge-resilient engine already trusted in regulated banks, industrial OT networks, and globally distributed SOCs. That's the kind of headroom you want when AI agents start emitting in bursts you didn't plan for: When Telemetry Volume Gets Real: Azure Monitor pipeline’s Performance Story! | Microsoft Community Hub Get Started TODAY! Explore the links above, try the new experiences in Azure Monitor, and tell us in comments below what to build next. The next era of enterprise telemetry is here. We can't wait to see what you'll build on it. — Your Azure Monitor team170Views0likes0CommentsConnect Metrics to Traces with Exemplars in Azure Monitor
Following Microsoft’s recent GA announcement for OpenTelemetry (OTel) support, we are excited to announce support for Exemplars for customers instrumenting metrics with Prometheus or OpenTelemetry and traces using OpenTelemetry, enhancing Azure Monitor’s integrated observability experience for cloud-native applications. Modern cloud-native applications generate enormous volumes of telemetry. Metrics help teams detect that something is wrong, but traces explain why. Exemplars bridge these two worlds by attaching trace references directly to metric data points, making it dramatically easier to pivot from a spike in latency or errors to the exact distributed trace responsible for the issue. With Azure Monitor, customers can now ingest metrics with exemplars and visualize them in Azure Managed Grafana. This enables seamless correlation between metrics and traces, helping engineering teams troubleshoot issues faster and reduce mean time to resolution (MTTR). Why Exemplars Matter Traditional monitoring workflows often require users to manually correlate data across multiple systems. Exemplars simplify this workflow by embedding trace context directly into metric samples. For example, if a latency metric spikes at a specific timestamp, the exemplar associated with that data point can link directly to the distributed trace responsible for the outlier. This provides several benefits: Faster root cause analysis Quicker transition from aggregate metrics to request-level details Simplified debugging workflows for SRE and platform teams Better observability experiences for microservices and distributed applications Unified Observability with Azure Monitor With Azure Monitor and Azure Managed Grafana, you can now: Ingest OTLP or Prometheus metrics with exemplars into Azure Monitor Workspace Store and analyze traces in Azure Monitor Application Insights Visualize exemplar markers directly in Grafana charts Navigate from a metric spike to the exact distributed trace associated with that data point By combining these signals in a single observability platform, organizations can correlate infrastructure health, application behavior, and request traces without context switching between tooling. How It Works Once metrics, exemplars, and traces are ingested into Azure Monitor, Azure Managed Grafana can consume exemplar information from the configured Prometheus data source. When exemplars are enabled in Grafana dashboards, users will see markers associated with individual metric data points. Selecting an exemplar opens the associated trace in Azure Monitor, providing end-to-end diagnostic context. Getting Started Setup data ingestion: Instrument your application to emit OpenTelemetry traces, OpenTelemetry or Prometheus metrics with exemplars, and enable ingestion of the same to Azure Monitor using OpenTelemetry Collector. Follow the instructions in Ingest OTLP Data into Azure Monitor with OTel Collector - Azure Monitor | Microsoft Learn. After this step, you will have the Log Analytics Workspace, Azure Monitor Workspace and Application Insights resources all set up to store the telemetry data. Create an Azure Managed Grafana instance and connect it with the Azure Monitor Workspace by navigating to your Azure Monitor Workspace in the Azure portal and then clicking on “Linked Grafana workspaces”. To learn more, see Manage an Azure Monitor workspace - Azure Monitor | Microsoft Learn Optionally, enable Azure Managed Prometheus on your AKS cluster or use remote-write and configure it to use the same Azure Monitor Workspace to centralize infrastructure and application metrics. Enable Exemplars in Azure Managed Grafana: After setting up the data ingestion, ensure that logs and traces are flowing into Log Analytics Workspace, and metrics are flowing into Azure Monitor Workspace. Step 1: Enable Exemplars on Prometheus Data Source in Azure Managed Grafana Navigate to Connections -> Data Sources in Azure Managed Grafana. Since you have connected Azure Managed Grafana to Azure Monitor Workspace, you will see the data source (Managed_Prometheus_<AMW-Name>) already configured. If the data source is not configured, follow the steps here to add your Azure Monitor Workspace as a data source. Open the data source configuration. Click Add Exemplars to enable exemplar support. Step 2: Configure Trace Linking with Azure Monitor In the exemplar configuration section, toggle Internal Link to On. Select Azure Monitor as the data source. In the Label Name, enter the name of the field in the labels object that should be used to get the trace id, eg. trace_id. Click Save & Test. This configuration enables direct navigation from exemplar markers in Grafana charts to the associated traces stored in Azure Monitor. Azure Managed Grafana also supports trace correlation from other solutions like Jaeger etc. To use your trace solution, use the appropriate links. Step 3: Enable Exemplars in Dashboards Navigate to a Grafana dashboard that uses your configured Prometheus data source. Open the panel options for a metrics chart. Toggle Exemplars to On. Once enabled, exemplar markers will appear on supported metric visualizations. Clicking on it will show exemplar details along with an option to open the corresponding distributed trace in Azure Monitor. To learn more, visit https://aka.ms/azmon-exemplars151Views1like0CommentsAzure Monitor Health Model (Preview): What's New!
Azure Monitor Health Model is a modern observability capability that brings together telemetry, architecture, and business context of your workloads to generate health insights. It continuously aggregates signals across dependencies, producing a single, actionable health state which reduces alert noise and shifts team toward proactive operations with cohesive system view, clearer insights, and faster troubleshooting. It addresses the common operation question 'Is my system/service/app healthy?' and 'Which underlying unit / component is impacting health?' This refresh introduces flexible, workload-centric discovery (use application insights topology, Azure resource graph queries in addition to designing user and system flows) and smarter, faster health signal creation (use recommended signals, import existing alert rules, set dynamic thresholds). Expanded Discovery Scope As customers began modeling increasingly complex applications, we identified an opportunity to make discovery more flexible and intuitive. Teams naturally reason about their systems differently; some at the application level, others through infrastructure fleets or telemetry views. By expanding discovery options, we enable customers to build health models using the constructs they already use, making it easier to evolve health models as applications and architectures change. Azure Monitor health models now support multiple discovery mechanisms: Application Insights–based discovery for application-centric modelling Azure Resource Graph (ARG) discovery for scalable, query-based resource selection Continued support for Service Groups, now including nested Service Groups, as part of a broader set of discovery options This evolution reflects a shift toward loosely coupled modelling, enabling customers to define health based on application architecture rather than infrastructure-centric grouping. Learn more about Discovery Extended Health Signals Our goal has been to help customers achieve meaningful health insights faster with less manual effort. By introducing platform defaults and surfacing recommended signals, we make it easier to align health models with proven Azure best practices from day one. At the same time, we preserve support for existing alerting strategies and investments, ensuring customers can extend rather than replace what they already have. These enhancements balance simplicity, guidance, and flexibility as environments scale. Health Models now supports the following health signal capabilities: Resource Health as a default signal, ensuring every model starts with a reliable platform-provided baseline Recommended signals, automatically surfaced based on Azure service best practices and enhanced through Azure Monitor Baseline Alerts (AMBA) integration Reuse of existing signals, enabled by importing Azure Monitor alert rules as health signals Learn more about Signals Introducing Health Aggregation Rules Modern cloud applications are built for resiliency, redundancy, and tolerance of partial failure. Health Models are designed to reflect this reality by enabling customers to define what “healthy” means for their architecture. Flexible aggregation rules allow teams to model intent rather than individual component states, producing health views that better align with operational priorities and business impact. Health Models now supports advanced aggregation logic, enabling the following types of scenarios: Regional resiliency aggregation using numeric thresholds (e.g., 2 out of 4 regions must remain healthy) Cluster and fleet health aggregation using percentage thresholds (e.g., 60% of VMs in a cluster must be healthy) This enables modelling resiliency patterns, partial failures, and graceful degradation, providing a more accurate view of real business impact. Import Custom Signal Health is most valuable when it reflects both system behavior and application context. By enabling custom health inputs, customers can incorporate signals that are closest to their business logic and application state. Contextual annotations further enrich analysis, making health timelines easier to interpret and correlate with change events. To support this, Health Models now provides for: Custom health report ingestion for external application and system health signals Data annotations to overlay deployments, incidents, and configuration changes on health state Alert Experience To proactively learn about health state change, health models allow creating Alert rules and associated action group trigger automated responses sich as notifying user. It is now possible to view all the alerts on a Health Model and start troubleshooting. Alerts in Health Model Note: To avail these new capabilities, upgrade your health models to the new API version using built-in migration wizard in Azure portal for a simple, guided experience. Note: To avail these new capabilities, upgrade your health models to the new API version using built-in migration wizard in Azure portal for a simple, guided experience.363Views0likes2CommentsNew Capabilities to Observe Agents in Azure Monitor
Over the last six months, we have been listening to you and building new capabilities to help you observe your agents. You’ve been sharing with us that quality issues are tricky and evaluation is critical, that agent reasoning needs to be understood, that humans must be in the loop to review select agent interactions, and that security and privacy are essential. To address these concerns, we’re announcing several new capabilities that make agents a first-class artifact in Azure Monitor, so you can debug them in the context of your broader distributed application alongside non-agentic components. Microsoft Foundry remains the surface for building and evaluating agents within the context of your project, while Azure Monitor provides the full-stack observability platform and underlying data foundation that powers those experiences. Today, we’re announcing new capabilities in Azure Monitor across ingestion, performance, evaluation workflows, agent debugging, and instrumentation updates to help teams get telemetry faster, inspect agent behavior more deeply, and standardize observability across hosting environments and frameworks. What’s new Reducing pipeline latency from more than 60 seconds to 7.5 seconds at P90. This makes telemetry available faster for teams troubleshooting agents at scale. Emitting events up to 1MB and up to 256kB per attribute. Prompts and responses can get large, and this helps avoid data truncation. Introducing a new view that shows a list of all agents being monitored. Whether you use Microsoft Agent Framework, LangChain, Microsoft Copilot Studio, Foundry Hosting, AKS Hosting, or something else, they all show up here. Improving drill-in from Evaluations to underlying prompts/responses. Evaluations in Azure Monitor are powered by Foundry, and we continue to improve visuals. Showing conversation context in end-to-end transaction view. In chat agents, conversations have become critical glue that connects traces and eases debugging. Searching by text and showing prompt previews in end-to-end transaction view. Prompts and responses are essential to understanding agent logic, and now you can search based on keyword text in Search and End-to-end transaction details views. Show evaluation scores in end-to-end transaction details and sort by evaluation score in Search. Evaluation is emerging as a “4 th pillar” of telemetry, and you’ll see it surface more prominently across Azure Monitor Application Insights. Access the entire JSON blob of prompt/response text. This makes it easier to get to your underlying data and copy out of Azure Monitor for custom analysis/evaluation. Adding a “trace tree” to enhance traversing the agent’s reasoning logic. This new addition to end-to-end transaction view makes traversing long-traces much easier. Enabling builders to annotate (i.e., manual evaluations) from transaction details. Get rid of spreadsheets on the side and annotate from within Azure Monitor. Enabling capture of end-user feedback (i.e., thumbs up/down). Brings end-user feedback alongside other telemetry for more powerful troubleshooting. Extending AI-powered troubleshooting to agents. Observability agent offers full-stack, AI-powered troubleshooting and surfaces up findings in an issue. Learn More. Observability of Coding Agents. Get end-to-end visibility into agent and model usage, performance, and cost with Azure Monitor Application Insights, and built-in Grafana dashboards. Learn More. A unified “Microsoft OpenTelemetry Distro” to observe agents hosted anywhere. A unified Microsoft OpenTelemetry Distro for observing agents hosted anywhere gives teams a single starting point across Foundry, Azure Monitor, and A365, reducing fragmentation and simplifying onboarding (GH Repos: Python, .NET, JavaScript). Skills-based enablement. Getting started is easier. Just point your agent to a skill for AI-assisted instrumentation. We also plan to upgrade tools for instrumentation in Azure MCP. What’s next We’re continuing to invest in this area, with upcoming work focused on stronger security controls for prompts and responses, better cost transparency for agents, and clearer ways to measure ROI across your agent fleet. These updates make it possible to observe agents without adopting a separate toolchain. Explore the new capabilities, and if you see gaps, let us know so we can continue shaping the roadmap based on your feedback. Learn More.392Views1like0Comments