azure monitor
205 TopicsPUBLIC PREVIEW - Azure Monitor - Collect Azure Resource Platform Logs at Scale with DCRs
PUBLIC PREVIEW - Azure Monitor - Collect Azure Resource Platform Logs at Scale with DCRs. How DCR-based platform logs simplify the telemetry collection for organizations managing 1,000+ resources.130Views0likes0CommentsAzure Monitor Copilot Observability Agent: What’s new at Build
The Observability agent in Azure Copilot is an AI-powered assistant built into Azure Monitor that helps engineers investigate issues and explore their systems using natural language. By grounding its analysis in telemetry data such as metrics, logs, and traces, it supports both open-ended exploration and guided troubleshooting. For more details, see the documentation. Since our initial public preview, the Observability agent in Azure Copilot has continued to evolve with new capabilities and expanded coverage (You can read more about the initial release in our previous blog) At Build 2026, we’re introducing updates that expand the Observability agent’s capabilities and the range of scenarios it can support. These updates provide deeper analysis and more detailed responses for both exploration and investigation. Expanded Investigation Scenarios The Observability agent now supports a broader set of scenarios across applications and infrastructure. These can be accessed directly from relevant product experiences, without requiring a prior alert, allowing teams to explore data conversationally and initiate deeper investigations as signals emerge. Integration with Microsoft Foundry AI Agent The Observability agent integrates with Microsoft Foundry AI Agents, enabling correlation of signals across key generative AI and agent observability scenarios such as latency spikes, error patterns, and tool invocation failures. Teams can interact with the Observability agent either from alerts - including alerts based on Foundry telemetry - or directly within Application Insights, where the Agents details experience serves as the primary entry point. From there, users can use the Observability agent to diagnose errors, analyze trends, and explore their data across one or multiple agents. Application Insights integration The Observability agent enables investigation of failure scenarios directly from Application Insights Failures blade, allowing teams to analyze application-level issues and move from symptom to root cause. Azure Kubernetes Service (AKS) integration The Observability agent enables deep investigation of issues in Azure Kubernetes Service (AKS) clusters. AKS investigations correlate signals from Azure Monitor with Kubernetes logs and events, and (coming soon) Prometheus metrics stored in an Azure Monitor Workspace. Together, these signals enable full‑stack analysis of applications running on AKS. The Observability agent helps teams determine whether an issue originates from the application or from the underlying Kubernetes platform, reducing time to diagnosis and resolution. Activity Logs integration Investigations can be initiated based on Azure Resource Health events surfaced in Activity Logs, enabling analysis of service-impacting signals related to the Azure platform. Deeper Insights across systems Multiple Application Insights - Coming soon! The Observability agent supports investigations that can span multiple Application Insights resources, enabling scenarios that involve multiple services within distributed applications. The agent can guide users to expand the investigation scope when cross-service issues are detected. Integration with Azure Service Health The Observability agent correlates investigation context with Azure Service Health events, helping teams understand potential platform impact as part of their investigation. This helps distinguish application-level issues from broader Azure platform conditions and prioritize active impacts. Issue management Enhancements Viewing issues Issues can now be viewed in multiple places, depending on the required scope: Azure Monitor: showing issues across all Azure Monitor Workspaces (AMWs) under the selected subscriptions Azure Monitor Workspace: showing issues stored within a specific AMW Issue actions & notifications Issue actions trigger notifications when issues are created or updated, enabling integration with workflows such as email, webhooks, and automation. Sharing and follow-up You can now download investigation results as a PDF, including supported data, enabling teams to capture and share investigation context for incident reviews and reporting. Coming Soon Billing for the Observability agent starts on July 1, 2026. The agent uses a consumption-based pricing model, so customers pay only for the AI work the agent performs. Agent consumption is measured in Azure Agent Credit (AAC) units, which reflect how many LLM tokens the agent used. For more details, see the documentation. Stay connected Follow this blog for ongoing updates and deeper dives into new capabilities Join our upcoming webinar for real-world scenarios, best practices, and a look at what’s coming next 👉 Register here We’d love your feedback The Observability Agent continues to evolve based on real-world usage and customer feedback. Share feedback through the Give Feedback option in the product or contact us at: azureobsagent@microsoft.com Want to learn more? Read our previous blog posts - Public Preview Update: Azure Copilot Observability Agent | Microsoft Community Hub The Azure Copilot Observability Agent Chat - Stop Writing Queries, Start Asking Questions. | Microsoft Community Hub Explore our documentation - Azure Copilot observability agent (preview) - Azure Monitor | Microsoft Learn346Views0likes1CommentIs 94% of your syslog just noise? Now you can filter it out before ingestion.
At Microsoft Build 2026, we are announcing the public preview of multi-stage transformations for Azure Monitor Data Collection Rules (DCRs). Multi-stage transformations let you filter, aggregate, parse, and map your logs at the point of collection, before data is ingested into your workspace. Processing happens in a defined sequence of steps called processors, and you can chain them together to build precise data pipelines that reduce ingestion volume, improve data quality, and lower monitoring costs. Processors in orange run on the agent (client-side). The KQL transform in green runs in the ingestion pipeline. Data volume shrinks at each stage. What are multi-stage transformations? A Data Collection Rule defines how Azure Monitor collects, transforms, and routes telemetry data. Until now, DCRs supported a single KQL transformation step on the ingestion side. Multi-stage transformations extend this model by introducing a processor pipeline: an ordered sequence of processing steps that run on the agent (client-side) or at the ingestion endpoint (ingestion-side), or both. Each processor performs one operation: filtering records, parsing structured fields from raw text, renaming or dropping columns, aggregating metrics, or running a KQL expression. Processors execute in order, and the output of one becomes the input to the next. This composable design replaces what previously required complex, monolithic KQL queries or external pre-processing scripts. Client-side processors run on the Azure Monitor Agent before data leaves the source machine. This means filtered and aggregated data never crosses the network, reducing both egress and ingestion costs. Ingestion-side processors run in the Log Analytics ingestion pipeline and support KQL-based transformations for more complex logic. Key applications The most immediate use case is cost reduction. When you can filter records on the agent before they leave the machine, you stop paying for data you never query. Syslog is the classic example: in many environments, informational and debug messages make up the vast majority of volume, and none of it gets looked at unless something breaks. A single filter processor can cut that stream by 90% or more. Aggregation is equally powerful for high-frequency telemetry. Performance counters sampled every 15 seconds produce millions of records per hour across a large fleet, but most dashboards and alert rules only need 5-minute granularity. Rolling up those samples on the agent, before they cross the network, dramatically reduces ingestion without losing the operational signal your team actually relies on. Beyond cost, multi-stage transformations improve the quality of the data that does reach your workspace. Parsing structured fields out of raw text (JSON payloads, XML event data, CEF security logs) at collection time means downstream queries are simpler and faster. And because each processor handles one step in a readable sequence, maintaining the pipeline is far easier than debugging a single monolithic KQL expression that tries to do everything at once. To make this concrete, let’s walk through the two highest-impact patterns we see with preview customers: filtering noisy syslog data and aggregating performance counters. Filter data before ingestion The filter processor evaluates each record against conditions you define and drops anything that does not match. Because filtering runs on the agent, dropped records are never serialized, transmitted, or ingested. This makes it the highest-impact processor for cost reduction. You configure filters using simple field-level conditions: specify a column name, an operator (equals, not equals, greater than, contains, etc.), and a value. Conditions can be combined with AND/OR logic for precise control. Scenario: Keep only warning-and-above syslog messages A typical syslog stream generates thousands of informational and debug messages for every actionable warning or error. With a filter processor, you set a severity threshold, and the agent drops everything below it before transmission. In this example, the filter keeps records where SeverityNumber >= 4 (Warning). The 57,000 debug and informational records per hour are dropped on the machine. Only the 3,250 actionable records are transmitted and ingested, a 94% reduction in syslog volume. Filters also support compound conditions. For example, you can keep auth-facility errors OR any critical message regardless of facility, all in a single processor step. This kind of targeted filtering is especially useful for security teams that need specific event categories without paying for the full syslog firehose. Aggregate logs before ingestion The aggregate processor rolls up high-frequency records into time-windowed summaries on the agent. This is especially valuable for performance counters, heartbeat signals, and any telemetry where per-second granularity is not needed for operational decisions. You configure the processor with a time window (for example, 5 minutes), the aggregation operators to apply (average, sum, min, max, count), and the dimension columns to group by (such as host name and counter name). The agent collects records within each window, computes the aggregates, and emits one summary record per group. Scenario: Roll up performance counters into 5-minute summaries A fleet of 500 VMs, each reporting 10 performance counters every 15 seconds, generates roughly 2 million raw records per hour. Most operational dashboards and alert rules use 5-minute granularity, making the per-sample detail redundant. With the aggregate processor, each agent rolls up its local counter stream into 5-minute windows, grouped by counter name. Each summary record contains the average, maximum, and sample count for that window. Raw data After aggregation (5-min windows) Records per VM per hour 2,400 (10 counters x 4/min x 60 min) 120 (10 counters x 12 windows) Records across 500 VMs per hour 1,200,000 60,000 Volume reduction 95% Operational fidelity Per-sample (15s) Avg, max, and count per 5 min Because the aggregation runs on the agent, the reduced data set is what gets transmitted and ingested. Dashboards and alerts that rely on 5-minute granularity work identically, but ingestion costs drop by 95%. Route the output to a custom table with columns that match the aggregate output (average, max, count, and your dimension columns). Chain processors for complete pipelines Processors are composable. A common pattern chains a header processor (to convert raw data into tabular format), a filter (to drop irrelevant records), a parse step (to extract fields from structured payloads), and a column drop (to remove fields not needed downstream). Scenario: Parse, filter, and slim down Windows Event logs Consider a security team that needs logon success and failure events (Event IDs 4624 and 4625) from the Windows Security log. The raw event stream contains hundreds of event types, each carrying a large XML payload. A four-step pipeline handles this: Header processor converts the raw event stream into tabular rows Parse processor extracts EventID and TargetUser from the XML payload into typed columns Filter processor keeps only logon success (4624) and failure (4625) events, dropping everything else Drop processor removes the bulky RawXml and RenderingInfo columns that are no longer needed The result is a lean, security-focused data set containing only the events and fields the team actually queries. Each step is independent and can be modified without affecting the others. Authoring multi-stage DCRs Multi-stage transformations are available through the Azure portal and through the REST API (version 2025-05-11). The portal provides a visual editor for building processor pipelines, previewing the schema at each stage, and validating the configuration before deployment. The Transform tab in the DCR data source configuration lets you add processors at each stage and preview the resulting schema. For infrastructure-as-code workflows, the full DCR JSON can be authored and deployed via ARM templates, Bicep, or direct REST API calls. To get started: Open Azure Monitor in the Azure portal and navigate to Data Collection Rules Create a new DCR or edit an existing one In the data source configuration, select Edit transformation Author your transformation logic across client and ingestion stages using the set of available processors Preview the schema output at each stage to verify the pipeline produces the expected result Save and associate the DCR with your target resources Preview notes: Multi-stage transformations are available in public preview starting June 3, 2026 Client-side processors require Azure Monitor Agent version 1.35 or later Aggregation output must be routed to custom tables (standard table schemas do not match aggregate output) Data collection, workspace ingestion, and alert rules may incur costs based on the settings you enable. Preview pricing may differ from general availability pricing. See Azure Monitor pricing for current rates To learn more, see: Data Collection Rules overview Looking ahead Multi-stage transformations are part of our continued investment in giving teams control over their data before it reaches the workspace. During the preview period, we plan to expand processor coverage, add support for additional data source types, and incorporate user feedback into the authoring and validation experience. We are also exploring how multi-stage transformations can serve as the foundation for advanced scenarios such as data scrubbing, inline enrichment from external reference data, and AI-assisted pipeline authoring. These capabilities will build on the same processor model, so pipelines you create today will extend naturally as new processors become available. We welcome your feedback as you try multi-stage transformations. Use the feedback options in the Azure portal, or reach out through your Microsoft account team. This feature is currently in preview. Previews are provided "as-is," "with all faults," and "as available," and are excluded from the service level agreements and limited warranty. For more information, see Supplemental Terms of Use for Microsoft Azure Previews]. Statements in this post about future plans and capabilities represent our current intentions and are subject to change. They should not be relied upon when making purchasing decisions.1.2KViews3likes2CommentsAny source. Any destination. Ready for AI-era.
Telemetry is exploding, every new app, edge node, and AI agent is a new firehose, and AI has raised the bar on what that telemetry must be: governed, on open standards, observable at agent scale. Today, most teams answer that by stitching together a stack of disconnected tools, each catering to a set of data sources, another that offers transforms, different ones for routing to each destination, and wrappers on top for some essence of much-needed enterprise governance, all struggling to be held together by glue code and tribal knowledge. This is the gap we're closing at Build 2026, with every announcement lining up with what modern, AI-shaped workloads need most: An AI-native standard, ready for enterprises: OpenTelemetry direct ingest, GA Headroom for bursty AI-agent traffic: Azure Monitor pipeline scaling to billions of events per day One governance plane for AI and Azure platform telemetry (via DCRs) AI-noise controlled at the right point in the journey: Multi-stage transforms Coverage AI can trust: Monitoring Coverage so AI can reason on complete signals instead of blind spots. …..All organized around the journey your data takes: 1 · Discover Most teams think they're monitoring everything, until an incident proves they aren't! Monitoring Coverage turns hope into evidence by answering 3 questions at fleet scale: is monitoring configured, are the right alerts in place, is telemetry actually flowing? Go from “I think we’re covered” to “I know we are”: Is Your Monitoring Actually Working? What's New in Monitoring Coverage | Microsoft Community Hub 2 · Collect Whatever your source, Azure-native or open standard, you shouldn't need a different platform, agent, or governance model to bring it in. At Build, two big shifts close that gap: Govern Azure platform telemetry like the rest of your data: No more per-resource diagnostic settings or separate tooling for platform metrics and logs. They now ride the same policy-based control plane you already use for the rest of Azure Monitor with one model, one audit story, scoped at scale. Platform metrics support - GA Platform logs support - Public preview coming soon! Bring OpenTelemetry straight in - GA: Send OTLP logs, metrics, and traces directly to Azure Monitor and land them in Application Insights, Log Analytics, Azure Monitor Workspace (Prometheus), and Grafana, no shim, no detour! Direct OpenTelemetry ingestion into Azure Monitor is now generally available Have additional OTel collection needs? Tell us us more by filling out this quick survey! 3 · Shape Observability and storage budgets are dying a death by a thousand low-value log lines. The question today is no longer whether to shape your telemetry, it's where. Multi-stage transformations (public preview) now lets you control telemetry where it matters: at the source, in-pipeline, or post-ingest before, all before data lands at its destination. Drop noise early, enrich centrally, and optimize cost without losing signal: Is 94% of your syslog just noise? Now you can filter it out before ingestion. | Microsoft Community Hub 4 · Ingest at scale When telemetry volume spikes, you need a pipeline that doesn't blink. 17 billion events, per day, per replica. That's what Azure Monitor pipeline now sustains, generally available since April ’26, as the living proof of ‘any source, any destination’. This is the high-scale, multi-cloud, edge-resilient engine already trusted in regulated banks, industrial OT networks, and globally distributed SOCs. That's the kind of headroom you want when AI agents start emitting in bursts you didn't plan for: When Telemetry Volume Gets Real: Azure Monitor pipeline’s Performance Story! | Microsoft Community Hub Get Started TODAY! Explore the links above, try the new experiences in Azure Monitor, and tell us in comments below what to build next. The next era of enterprise telemetry is here. We can't wait to see what you'll build on it. — Your Azure Monitor team168Views0likes0CommentsConnect Metrics to Traces with Exemplars in Azure Monitor
Following Microsoft’s recent GA announcement for OpenTelemetry (OTel) support, we are excited to announce support for Exemplars for customers instrumenting metrics with Prometheus or OpenTelemetry and traces using OpenTelemetry, enhancing Azure Monitor’s integrated observability experience for cloud-native applications. Modern cloud-native applications generate enormous volumes of telemetry. Metrics help teams detect that something is wrong, but traces explain why. Exemplars bridge these two worlds by attaching trace references directly to metric data points, making it dramatically easier to pivot from a spike in latency or errors to the exact distributed trace responsible for the issue. With Azure Monitor, customers can now ingest metrics with exemplars and visualize them in Azure Managed Grafana. This enables seamless correlation between metrics and traces, helping engineering teams troubleshoot issues faster and reduce mean time to resolution (MTTR). Why Exemplars Matter Traditional monitoring workflows often require users to manually correlate data across multiple systems. Exemplars simplify this workflow by embedding trace context directly into metric samples. For example, if a latency metric spikes at a specific timestamp, the exemplar associated with that data point can link directly to the distributed trace responsible for the outlier. This provides several benefits: Faster root cause analysis Quicker transition from aggregate metrics to request-level details Simplified debugging workflows for SRE and platform teams Better observability experiences for microservices and distributed applications Unified Observability with Azure Monitor With Azure Monitor and Azure Managed Grafana, you can now: Ingest OTLP or Prometheus metrics with exemplars into Azure Monitor Workspace Store and analyze traces in Azure Monitor Application Insights Visualize exemplar markers directly in Grafana charts Navigate from a metric spike to the exact distributed trace associated with that data point By combining these signals in a single observability platform, organizations can correlate infrastructure health, application behavior, and request traces without context switching between tooling. How It Works Once metrics, exemplars, and traces are ingested into Azure Monitor, Azure Managed Grafana can consume exemplar information from the configured Prometheus data source. When exemplars are enabled in Grafana dashboards, users will see markers associated with individual metric data points. Selecting an exemplar opens the associated trace in Azure Monitor, providing end-to-end diagnostic context. Getting Started Setup data ingestion: Instrument your application to emit OpenTelemetry traces, OpenTelemetry or Prometheus metrics with exemplars, and enable ingestion of the same to Azure Monitor using OpenTelemetry Collector. Follow the instructions in Ingest OTLP Data into Azure Monitor with OTel Collector - Azure Monitor | Microsoft Learn. After this step, you will have the Log Analytics Workspace, Azure Monitor Workspace and Application Insights resources all set up to store the telemetry data. Create an Azure Managed Grafana instance and connect it with the Azure Monitor Workspace by navigating to your Azure Monitor Workspace in the Azure portal and then clicking on “Linked Grafana workspaces”. To learn more, see Manage an Azure Monitor workspace - Azure Monitor | Microsoft Learn Optionally, enable Azure Managed Prometheus on your AKS cluster or use remote-write and configure it to use the same Azure Monitor Workspace to centralize infrastructure and application metrics. Enable Exemplars in Azure Managed Grafana: After setting up the data ingestion, ensure that logs and traces are flowing into Log Analytics Workspace, and metrics are flowing into Azure Monitor Workspace. Step 1: Enable Exemplars on Prometheus Data Source in Azure Managed Grafana Navigate to Connections -> Data Sources in Azure Managed Grafana. Since you have connected Azure Managed Grafana to Azure Monitor Workspace, you will see the data source (Managed_Prometheus_<AMW-Name>) already configured. If the data source is not configured, follow the steps here to add your Azure Monitor Workspace as a data source. Open the data source configuration. Click Add Exemplars to enable exemplar support. Step 2: Configure Trace Linking with Azure Monitor In the exemplar configuration section, toggle Internal Link to On. Select Azure Monitor as the data source. In the Label Name, enter the name of the field in the labels object that should be used to get the trace id, eg. trace_id. Click Save & Test. This configuration enables direct navigation from exemplar markers in Grafana charts to the associated traces stored in Azure Monitor. Azure Managed Grafana also supports trace correlation from other solutions like Jaeger etc. To use your trace solution, use the appropriate links. Step 3: Enable Exemplars in Dashboards Navigate to a Grafana dashboard that uses your configured Prometheus data source. Open the panel options for a metrics chart. Toggle Exemplars to On. Once enabled, exemplar markers will appear on supported metric visualizations. Clicking on it will show exemplar details along with an option to open the corresponding distributed trace in Azure Monitor. To learn more, visit https://aka.ms/azmon-exemplars138Views1like0CommentsAzure Monitor Health Model (Preview): What's New!
Azure Monitor Health Model is a modern observability capability that brings together telemetry, architecture, and business context of your workloads to generate health insights. It continuously aggregates signals across dependencies, producing a single, actionable health state which reduces alert noise and shifts team toward proactive operations with cohesive system view, clearer insights, and faster troubleshooting. It addresses the common operation question 'Is my system/service/app healthy?' and 'Which underlying unit / component is impacting health?' This refresh introduces flexible, workload-centric discovery (use application insights topology, Azure resource graph queries in addition to designing user and system flows) and smarter, faster health signal creation (use recommended signals, import existing alert rules, set dynamic thresholds). Expanded Discovery Scope As customers began modeling increasingly complex applications, we identified an opportunity to make discovery more flexible and intuitive. Teams naturally reason about their systems differently; some at the application level, others through infrastructure fleets or telemetry views. By expanding discovery options, we enable customers to build health models using the constructs they already use, making it easier to evolve health models as applications and architectures change. Azure Monitor health models now support multiple discovery mechanisms: Application Insights–based discovery for application-centric modelling Azure Resource Graph (ARG) discovery for scalable, query-based resource selection Continued support for Service Groups, now including nested Service Groups, as part of a broader set of discovery options This evolution reflects a shift toward loosely coupled modelling, enabling customers to define health based on application architecture rather than infrastructure-centric grouping. Learn more about Discovery Extended Health Signals Our goal has been to help customers achieve meaningful health insights faster with less manual effort. By introducing platform defaults and surfacing recommended signals, we make it easier to align health models with proven Azure best practices from day one. At the same time, we preserve support for existing alerting strategies and investments, ensuring customers can extend rather than replace what they already have. These enhancements balance simplicity, guidance, and flexibility as environments scale. Health Models now supports the following health signal capabilities: Resource Health as a default signal, ensuring every model starts with a reliable platform-provided baseline Recommended signals, automatically surfaced based on Azure service best practices and enhanced through Azure Monitor Baseline Alerts (AMBA) integration Reuse of existing signals, enabled by importing Azure Monitor alert rules as health signals Learn more about Signals Introducing Health Aggregation Rules Modern cloud applications are built for resiliency, redundancy, and tolerance of partial failure. Health Models are designed to reflect this reality by enabling customers to define what “healthy” means for their architecture. Flexible aggregation rules allow teams to model intent rather than individual component states, producing health views that better align with operational priorities and business impact. Health Models now supports advanced aggregation logic, enabling the following types of scenarios: Regional resiliency aggregation using numeric thresholds (e.g., 2 out of 4 regions must remain healthy) Cluster and fleet health aggregation using percentage thresholds (e.g., 60% of VMs in a cluster must be healthy) This enables modelling resiliency patterns, partial failures, and graceful degradation, providing a more accurate view of real business impact. Import Custom Signal Health is most valuable when it reflects both system behavior and application context. By enabling custom health inputs, customers can incorporate signals that are closest to their business logic and application state. Contextual annotations further enrich analysis, making health timelines easier to interpret and correlate with change events. To support this, Health Models now provides for: Custom health report ingestion for external application and system health signals Data annotations to overlay deployments, incidents, and configuration changes on health state Alert Experience To proactively learn about health state change, health models allow creating Alert rules and associated action group trigger automated responses sich as notifying user. It is now possible to view all the alerts on a Health Model and start troubleshooting. Alerts in Health Model Note: To avail these new capabilities, upgrade your health models to the new API version using built-in migration wizard in Azure portal for a simple, guided experience. Note: To avail these new capabilities, upgrade your health models to the new API version using built-in migration wizard in Azure portal for a simple, guided experience.332Views0likes2CommentsNew Capabilities to Observe Agents in Azure Monitor
Over the last six months, we have been listening to you and building new capabilities to help you observe your agents. You’ve been sharing with us that quality issues are tricky and evaluation is critical, that agent reasoning needs to be understood, that humans must be in the loop to review select agent interactions, and that security and privacy are essential. To address these concerns, we’re announcing several new capabilities that make agents a first-class artifact in Azure Monitor, so you can debug them in the context of your broader distributed application alongside non-agentic components. Microsoft Foundry remains the surface for building and evaluating agents within the context of your project, while Azure Monitor provides the full-stack observability platform and underlying data foundation that powers those experiences. Today, we’re announcing new capabilities in Azure Monitor across ingestion, performance, evaluation workflows, agent debugging, and instrumentation updates to help teams get telemetry faster, inspect agent behavior more deeply, and standardize observability across hosting environments and frameworks. What’s new Reducing pipeline latency from more than 60 seconds to 7.5 seconds at P90. This makes telemetry available faster for teams troubleshooting agents at scale. Emitting events up to 1MB and up to 256kB per attribute. Prompts and responses can get large, and this helps avoid data truncation. Introducing a new view that shows a list of all agents being monitored. Whether you use Microsoft Agent Framework, LangChain, Microsoft Copilot Studio, Foundry Hosting, AKS Hosting, or something else, they all show up here. Improving drill-in from Evaluations to underlying prompts/responses. Evaluations in Azure Monitor are powered by Foundry, and we continue to improve visuals. Showing conversation context in end-to-end transaction view. In chat agents, conversations have become critical glue that connects traces and eases debugging. Searching by text and showing prompt previews in end-to-end transaction view. Prompts and responses are essential to understanding agent logic, and now you can search based on keyword text in Search and End-to-end transaction details views. Show evaluation scores in end-to-end transaction details and sort by evaluation score in Search. Evaluation is emerging as a “4 th pillar” of telemetry, and you’ll see it surface more prominently across Azure Monitor Application Insights. Access the entire JSON blob of prompt/response text. This makes it easier to get to your underlying data and copy out of Azure Monitor for custom analysis/evaluation. Adding a “trace tree” to enhance traversing the agent’s reasoning logic. This new addition to end-to-end transaction view makes traversing long-traces much easier. Enabling builders to annotate (i.e., manual evaluations) from transaction details. Get rid of spreadsheets on the side and annotate from within Azure Monitor. Enabling capture of end-user feedback (i.e., thumbs up/down). Brings end-user feedback alongside other telemetry for more powerful troubleshooting. Extending AI-powered troubleshooting to agents. Observability agent offers full-stack, AI-powered troubleshooting and surfaces up findings in an issue. Learn More. Observability of Coding Agents. Get end-to-end visibility into agent and model usage, performance, and cost with Azure Monitor Application Insights, and built-in Grafana dashboards. Learn More. A unified “Microsoft OpenTelemetry Distro” to observe agents hosted anywhere. A unified Microsoft OpenTelemetry Distro for observing agents hosted anywhere gives teams a single starting point across Foundry, Azure Monitor, and A365, reducing fragmentation and simplifying onboarding (GH Repos: Python, .NET, JavaScript). Skills-based enablement. Getting started is easier. Just point your agent to a skill for AI-assisted instrumentation. We also plan to upgrade tools for instrumentation in Azure MCP. What’s next We’re continuing to invest in this area, with upcoming work focused on stronger security controls for prompts and responses, better cost transparency for agents, and clearer ways to measure ROI across your agent fleet. These updates make it possible to observe agents without adopting a separate toolchain. Explore the new capabilities, and if you see gaps, let us know so we can continue shaping the roadmap based on your feedback. Learn More.356Views1like0CommentsWhat’s new in Observability at Build 2026
At Build 2026, Azure Monitor introduces major advancements in end-to-end observability, extending across AI agents, applications, and infrastructure with OpenTelemetry at its core. New capabilities with Azure Copilot Observability agent, SLI/SLO support, and smarter alerting help teams move faster from detection to root cause while reducing noise and manual effort. Together, these innovations enable developers and SREs to operate modern, AI-driven systems with greater insight, efficiency, and alignment to customer experience.504Views2likes0CommentsWhen Telemetry Volume Gets Real: Azure Monitor pipeline’s Performance Story!
What is Azure Monitor pipeline? Azure Monitor pipeline provides centralized governance and a single point of control that runs close to your data sources, so you can filter, transform, aggregate, and route telemetry before it's sent to Azure Monitor. This approach helps you reduce ingestion volume, improve reliability in disconnected environments, and apply consistent data processing across hybrid and multi-cloud deployments. Built on OpenTelemetry technology, the pipeline supports standard ingestion protocols including Syslog and OTLP, enabling it to receive telemetry from a wide range of clients and environments. Read more about Azure Monitor pipeline here - Azure Monitor pipeline GA: Centralized, Secure Telemetry Ingestion Azure Monitor pipeline Performance A single replica on a stock 8-core node sustains ~200,000 Syslog messages per second end-to-end into Log Analytics — roughly 17 billion events or ~20 TB per day — using only ~2.8 GB of working-set memory. That's ~2.5 TB/day of throughput per vCPU, on commodity hardware, with no special tuning. (Measured on pipeline v1.1.1, May 2026.) Find more detailed performance information in the table below - vCPUs Example node Syslog Basic* Syslog Fully Formed* CEF Fully Formed* 2 Standard_D2as_v6 ~50,000/sec ~35,000/sec ~17,000/sec 4 Standard_D4as_v6 ~100,000/sec ~70,000/sec ~35,000/sec 8 Standard_D8as_v6 ~200,000/sec ~150,000/sec ~65,000/sec 16 Standard_D16as_v6 ~400,000/sec ~300,000/sec ~130,000/sec Syslog Basic* – Azure Monitor pipeline ingesting raw syslog data into Azure Monitor custom table Syslog Fully Formed* – Azure Monitor pipeline ingesting syslog data in Azure Monitor standard syslog table CEF Fully Formed* – Azure Monitor pipeline ingesting CEF data in Azure Monitor standard CEF table Further, adding replicas scales throughput linearly. Linear scaling is what makes the rest of the performance story credible in practice: if one 4-core node handles about 100,000 Syslog logs per second, eight replicas scale that to roughly 800,000 logs per second without changing the architecture. In other words, you do not hit an arbitrary throughput wall as volume grows—you add cores or replicas and get predictable capacity growth. We are continuously improving these numbers, and the latest guidance is documented here -- Azure Monitor pipeline performance and sizing - Azure Monitor | Microsoft Learn Why this Performance Story Matters? Zero-config core usage. The pipeline automatically uses every available CPU core. Move to a bigger node and it just goes faster — no tuning, no config. Backpressure, not data loss. When you exceed capacity, the pipeline applies TCP backpressure to senders instead of dropping messages. Rising send latency is your scale-up signal. Predictable sizing math. Pick your per-vCPU rate, divide your peak logs/sec, add 30% headroom, round up. Done. Efficient memory usage. ~2.8 GB working-set to push 200,000 logs/sec means you're paying for throughput, not overhead. One sizing tip worth knowing: make sure senders open at least as many concurrent TCP connections as there are cores on the pipeline node. The pipeline distributes traffic across cores by source connection, so too few connections leave cores idle. How this Stacks Up? Telemetry pipelines are usually sized per CPU core, making per-core throughput a practical way to reason about capacity and scaling. Against that backdrop, ~2.5 TB/day per vCPU for Syslog Basic — and ~65,000–150,000 logs/sec, on 8 cores for fully formed records — highlights the per-core efficiency of Azure Monitor pipeline for edge log collection. Exact numbers will vary based on event size and processing applied, but the key point is consistency: you get substantial throughput per core, and it scales linearly as you add capacity. Less hardware to move the same volume, efficient memory usage, backpressure instead of loss, and linear growth — that's the performance case for Azure Monitor pipeline. Get started Spin up a pipeline group on your Arc-enabled cluster, point your Syslog/CEF senders at it, and watch the throughput numbers above hold up in your own environment! Read more about getting started here -- What is Azure Monitor pipeline? - Azure Monitor | Microsoft Learn124Views0likes0CommentsIs Your Monitoring Actually Working? What's New in Monitoring Coverage
Monitoring is only useful when the right signals are collected, the right alerts are in place, and the data is actually flowing when teams need it. In large Azure environments, confirming all three across every VM and AKS cluster can still take too much manual work. At Microsoft Ignite, we introduced Monitoring Coverage in Azure Monitor, a centralized preview experience for finding coverage gaps and enabling recommended VM and container monitoring at scale. At Microsoft Build, we are expanding that experience with two new capabilities that make monitoring easier to operationalize: data flow status and at-scale recommended alert enablement for virtual machines and Azure Kubernetes Service (AKS). With these updates, teams can move beyond asking whether monitoring was configured. They can see whether recommended monitoring is enabled, whether important alert coverage is missing, and whether configuration issues may prevent monitoring data from reaching its destination. Monitoring Coverage overview with recommendations and data flow status. What is Monitoring Coverage? Monitoring Coverage in Azure Monitor gives you a single place to review recommended monitoring across supported Azure resources. The Overview page summarizes coverage across your selected scope, shows Azure Advisor observability recommendations, and provides quick actions to enable recommended monitoring settings. Coverage is grouped into basic, partial, and enhanced monitoring so you can quickly understand whether a resource is using only default monitoring or has the Microsoft-recommended configuration enabled. From there, you can drill into the Monitoring Details tab to review individual resources and take action. New: data flow status The most important question after enabling monitoring is simple: is the data flowing? Data flow status helps answer that question directly from Monitoring Coverage. The new data flow status summary shows how many resources need attention, passed initial checks, or are not configured for validation. It also highlights top resources that need attention so operators can start with the most important issues first. When you open data flow status for a resource, Azure Monitor shows validation checks across areas such as: Resource configuration Data collection rule associations Network connectivity Data flows to the configured destination Detected issues are prioritized at the top of the details pane, and each validation check includes a recommended action. After making a fix, you can run validation again to confirm that data flow issues are resolved. Data flow status details with validation checks and recommended actions. Alternatively, you can visualize your data flows and identify problems from there. New: enable recommended alerts at scale Monitoring Coverage now also helps close alerting gaps. From the Overview page, you can see recommendations such as Enable VM Recommended Alerts and Enable AKS Recommended Alerts, then select Apply to configure recommended alert rules from a centralized flow. For virtual machines, you can enable alerts across an entire subscription or choose selected resources. Subscription scope is useful when you want recommended alerts to apply broadly, including to future VMs in the selected subscription. Selected resource scope gives you more granular control when you want to enable alert rules for a specific set of VMs. The enablement flow lets you review recommended alert rules, adjust thresholds, and configure notification options such as email, Azure Resource Manager role notifications, Azure mobile app notifications, or an existing action group. Some VMs may already have alerts configured, and new rules are designed not to duplicate existing alerts. For AKS, Monitoring Coverage can surface recommended alert gaps and start the same guided pattern: review impacted resources, configure recommended alert settings, and use Review + Enable to create the alert rules. A resource-centric view for follow-up The Monitoring Details tab brings coverage and data flow into the same resource list. Two columns are especially useful for triage: Monitoring coverage and Data flow status. Select either value to open resource-level details. Monitoring coverage details show what is configured for the resource, including VM Insights, recommended alerts, data collection rules, data sources, destinations, and agent version when available. Data flow details show validation results and recommended remediation steps. This makes it easier to move from a high-level gap to the specific resource and configuration that needs attention. Getting started Monitoring Coverage is available in preview from the Azure portal. Open Monitor, select Monitoring Coverage (preview), and choose the subscriptions and resources you want to review. From the Overview page, you can: Review coverage across VMs and AKS resources. Apply recommendations to enable VM Insights, container monitoring, and recommended alerts. Use data flow status to find resources whose monitoring data needs attention. Open Monitoring Details for resource-level coverage and validation results. A few preview notes: enablement operations include up to 100 resources at a time, and enabling monitoring or alert rules may create data collection rules, deploy Azure Monitor Agent, configure destinations, or create alert rules. Data collection, workspace ingestion, and alert rules may incur costs based on the settings you enable. To learn more, see Monitoring coverage in Azure Monitor (preview). Looking ahead Monitoring Coverage is part of our continued work to make Azure Monitor easier to operationalize at scale. We want teams to spend less time hunting for monitoring gaps and more time acting on reliable, validated signals. We would love your feedback as you try these new Build updates and we look to expand support beyond this set of resource types. Use the Azure portal feedback options or share feedback through your Microsoft account team.252Views1like0Comments