azure monitor
1329 TopicsAzure Monitor Health Model - API Refresh
API Refresh? Discovery scope extended to include Application Insights Azure Resource Graph query Signals Enhancement for Azure resources Resource health is used as a default signal Recommended signals are available Alert rules can be imported as signals dynamic thresholds for metric signals Aggregation rules Import health reports and data annotations -->> Link to docs158Views0likes0CommentsPublic Preview Update: Azure Copilot Observability Agent
Modern cloud applications generate massive amounts of telemetry - metrics, logs, traces, alerts, and platform signals. Yet whether you're asking questions about your observability data or responding when things go wrong, discovering insights and root causes requires a deep understanding of the application, the observability signals it emits, and the tools, while your business and customers are impacted. The Observability agent is designed to be your monitoring companion across the full observability lifecycle, enabling you to interact via chat to better understand your observability data. Our aspiration is to support the full range of activities - from onboarding and detection through triage and root cause analysis - to significantly reduce human toil and customer downtime. Today, the agent already covers key investigation and exploration scenarios, and we’re rapidly expanding its capabilities across more workflows and entry points. Deep, agentic investigations Deep investigations are designed for situations where something is already wrong and the goal is to understand what happened and what to do next The Observability agent is optimized for real‑world, full‑stack investigations in distributed systems - including environments built on Azure Kubernetes Service (AKS) and Virtual Machines (VMs). To discover the root cause, the agent applies deep reasoning, using an innovative array of Machine Learning (ML) and Large Language Models (LLM) to discover and correlate anomalies across huge volume of signals across application, infrastructure, and Azure platform layers to converge on likely root‑cause candidates across scenarios such as: Application issues, including deployment and performance regressions, request or dependency failures, resource exhaustion, and identity or configuration errors Infrastructure issues, such as compute saturation, disk I/O throttling, misconfigured dependencies, or network connectivity failures in AKS clusters and VMs Platform incidents, including Azure maintenance or outages and managed infrastructure issues like SNAT port exhaustion or upgrade blockers The easiest way to start a deep investigation is directly from an Azure Monitor alert, whether in the Azure portal or from an alert notification. Investigations can also be initiated from other entry points – e.g. the agent chat, Logs, Activity logs with additional entry points being added over time When a deep investigation runs, the agent produces an investigation report that captures the analysis, root cause, suggested next steps along with the key signals, and supporting data. The agent also surfaces a granular insight into its reasoning / chain-of-thought, including data accessed, queries run and more. User does not need to stop there – they can continue interacting with the agent, in the context of investigation to explore deeper or guide agent into additional hypothesis: What changed shortly before the incident started? Are there any issues in VM <vm_id> and are they related? If yes, run a deep investigation including this VM Which dependencies are most correlated with this failure spike? Are there related alerts or configuration changes that explain this behavior? Investigation results can be saved as an Azure Monitor Issue, preserving the full investigation context for collaboration and continuity. Data exploration and analytics The Observability agent supports data exploration and analytics for ad‑hoc understanding and hypothesis building, without starting from an alert or running a full investigation. To get started, simply click on the “Observability Agent” button from the Logs blade (or other supported entry points). From there, you can explore observability data such as logs and metrics using natural language prompts like: Show the top errors over the last hour Is there a correlation between application errors and dependency errors? Chart the trend of application errors and storage related errors What operations in my app are impacted by the ongoing authentication issue? Find latency spikes in my app over the last 3 days and where they are coming from (specific users or regions) If you already had a query / query results in Logs blade – the agent will pick it up automatically, and you can ask it to explain the results, help you evolve the query or even optimize it. Moreover, when exploration surfaces a broader or more complex problem, operators can choose to run a deep investigation directly from the exploration context and persist the results as an Issue. Looking ahead We’re continuing to expand the Observability agent to cover more of the observability lifecycle, moving from reactive investigation toward more proactive and continuous system understanding: Deeper integration across Azure Monitor experiences Expanding beyond alerts into additional entry points and workflows across the platform Autonomous observability When signals indicate emerging or ongoing incidents, the agent can proactively correlate alerts, run investigations, and create Azure Monitor Issues automatically - reducing the need for manual triage Integration with external systems Extending investigation context beyond Azure Monitor, so insights and conclusions can flow into existing engineering workflows Stay connected Follow this blog for ongoing deep dives, updates on current capabilities, and a preview of what’s coming next. Live webinar A walkthrough of real Observability agent scenarios, best practices, and what’s available today - along with a look at what’s coming next, and live Q&A with the product team. 👉 Register here We’d love your feedback The Observability agent continues to evolve based on real‑world usage and operator feedback. Share your thoughts directly through the Give Feedback option in the experience, or reach us at: azureobsagent@microsoft.com565Views2likes0CommentsAzure Monitor Service Level Indicators (SLI)
Announcing Public Preview of Azure Monitor SLIs Today, we are excited to introduce Service Level Indicators (SLI) and Service Level Objectives (SLO) in Azure Monitor a step forward in helping teams measure how customers are experiencing their applications. SLI: A quantitative measure of how well an application or service is performing from the customer’s point of view. SLO: A defined target for an SLI that represents how good or bad the SLI is over a given time-period. This is also referred to as a baseline in Azure Monitor. One of the biggest advantages of SLIs is that they quantify real customer impact. In many environments, multiple alerts may fire across infrastructure and services—but not all of them translate to user-visible issues. Metrics like CPU Percentage can measure what is happening in an environment, but not always indicate whether, or how, a spike in CPU impacted the user experience. SLIs provide a clear lens to evaluate whether those signals actually affect customers, helping teams cut through noise and focus on what truly matters. This also represents a shift from traditional thinking about reliability. An application can be “up” and still feel slow or unreliable to users due to latency, partial failures, or downstream dependencies. On the flip side, not every system issue results in a degraded user experience. SLIs bridge this gap by helping measure actual customer experience, not just uptime. This release brings native SLI authoring, error budgets, as well as a baseline (SLO) and burn rate–based alerting directly into Azure Monitor. Instead of reacting to isolated metrics or alerts, teams can now answer, are we meeting our customer’s promise? Overview: What is Azure Monitor SLI? You can now measure both Availability and Latency SLIs using the Request or windows-based evaluation methods. In Azure Monitor, SLIs are defined at the Service Group level, a logical representation of your application composed of multiple resources. This enables a shift from resource-level monitoring, fragmented alerts to Application-level health, customer-impact measurement and actionable signals. SLIs continuously evaluate your service using existing Azure Monitor metrics and store results in your Azure Monitor Workspace. These SLIs then power downstream experiences such as baseline tracking, error budgets, burn rate visualization, and alerting—all within Azure Monitor. While Error budgets help teams determine how much degradation they can afford within a given time window and guide decisions such as whether to continue feature rollouts or prioritize reliability improvements. Burn rates indicate how quickly the error budget is being consumed, enabling teams to detect excessive degradation early and take corrective action before user experience is significantly impacted. Getting Started To create Application SLIs, you’ll need: A Service Group. You must be emitting metrics about your application to an AMW (via Managed Prometheus or Open Telemetry) Learn more here. Summary Azure Monitor SLI brings service health management directly into the Azure platform. By focusing on user experience, tracking error budgets, and alerting on burn rates, teams can understand their workload health alongside platform signals, move from reactive monitoring to proactive reliability engineering and prioritize issues based on real user impact. We’re excited to see how you use Azure Monitor SLIs to build more reliable applications on Azure.561Views2likes0CommentsIngest at Scale, Securely — Azure Monitor pipeline Is Now Generally Available
Today, we're thrilled to announce the general availability of Azure Monitor pipeline — a telemetry pipeline built for secure, high-scale ingestion across any environment. But the best way to understand what makes it powerful isn't to start with features. It's to start with the problems that kept showing up, over and over, in our conversations with customers. So, let's dig in... Chances are, this sounds a lot like your environment Imagine a large enterprise rolling out Microsoft Sentinel as their SIEM. They have sites across regions, a mix of on‑premises and cloud environments, and security telemetry streaming in from firewalls, network devices, and Linux servers—100,000 to 1 million events per second in some locations. Traditional forwarders buckle under the load, drop events during network blips, and ship everything – signal and noise – straight into Sentinel. The result: skyrocketing ingestion costs, degraded detections, and a brittle forwarding infrastructure that demands constant babysitting. If you're managing environments like these, these questions are probably top of mind: How do I securely ingest telemetry—without opening hundreds of risky endpoints? How do I reduce ingestion costs when telemetry spikes across thousands of sources simultaneously? How do I centrally standardize logs across sites and device types before they ever reach Azure? What happens to telemetry from an entire location when connectivity drops? And how do I do all of this consistently, at massive scale, and centrally across environments instead of configuring each host individually? These aren't edge cases. For many teams, getting data into the system itself is the hardest part of observability —and by the time telemetry reaches Azure Monitor or Sentinel, it's already too late to fix these problems. Customers need control before the data hits the cloud. What is Azure Monitor pipeline (and why it’s different)? Azure Monitor pipeline provides a centralized control point for telemetry ingestion and transformation, designed specifically for secure, high‑throughput, enterprise‑scale scenarios. It's built on open-source technologies from the OpenTelemetry ecosystem and includes the components needed to receive telemetry from local clients, process that telemetry, and forward it to Azure Monitor. It’s not another agent. And NO, you do not need to install it on all the resources… Agents such as Azure Monitor agent are great for collecting telemetry from individual machines and services. Azure Monitor pipeline solves a different problem: “How do I ingest telemetry from across my environment through a centralized pipeline – instead of configuring each host – while maintaining control over reliability, security, and ingestion cost?” With Azure Monitor pipeline control, you can: Ensure logs land directly in Azure‑native schemas – automatic schematization into tables such as Syslog and CommonSecurityLog Prevent data loss during intermittent connectivity across sites – local buffering in persistent storage with automated backfill Reduce ingestion costs before data reaches the cloud – centralized filtering, aggregation, and transformation Ingest telemetry at sustained high volumes in the range of hundreds and thousands of events per second – horizontally scalable pipeline architecture Secure telemetry ingestion without managing certificates on each host individually – centralized TLS/mTLS with automated certificate provisioning and zero‑downtime rotation Maintain visibility into ingestion infrastructure health – pipeline performance and health monitoring Plan deployments confidently at large scale – infrastructure sizing guidance for expected telemetry volume And all of this is fully supported and production‑ready in GA. Learn more. So, let's talk a little bit about these in detail! Tired of broken detections because logs don't match your table schema? - Automatic schematization (a customer favorite!) A consistent theme from preview customers was how painful it is to deal with log formats. Azure Monitor pipeline is the only solution that automatically shapes and schematizes data, so it lands directly in standard Azure tables such as Syslog and CommonSecurityLog. Learn more. That means: No custom parsing pipelines downstream No broken detections due to schema drift Faster time to value for security teams This happens before data reaches the cloud – right where it matters most. What happens to my telemetry when the network goes down? - Local buffering in persistent storage and automated backfill Networks fail. Maintenance happens. Sites go offline. Azure Monitor pipeline is built for this reality. It buffers telemetry locally in your configured persistent storage during network interruptions and automatically backfills data when connectivity is restored. Learn more. The result: No gaps in security visibility No manual replays Confidence that critical telemetry isn’t lost How do I reduce ingestion costs without sacrificing signal quality? - Filter and aggregate at the edge Nobody likes to pay for the data that they do not need... With Azure Monitor pipeline, customers can filter, aggregate, and shape the telemetry at the edge, sending only high‑value data to Azure. Learn more. This helps teams: Reduce ingestion costs Improve detection quality Keep cloud analytics focused on signal, not volume Cost optimization and signal quality are no longer trade‑offs – you get both. How do I keep up when telemetry volumes spike to hundreds of thousands of events per second? - Scaling One of the biggest pain points we hear is scale. Azure Monitor pipeline is designed for sustained high throughput ingestion, scaling horizontally and vertically to handle hundreds of thousands to millions of events per second. Learn more. This isn’t about theoretical limits; it’s about handling the real-world extremes that break traditional forwarders. How do I send telemetry in a secure manner? - Secure ingestion with TLS and mTLS Security teams consistently tell us that plain TCP ingestion just isn’t acceptable – especially in regulated environments. Azure Monitor pipeline addresses this head‑on by providing TLS‑secured ingestion endpoints with mutual authentication, ensuring telemetry is encrypted in transit and accepted only from trusted sources. Learn more. The result: Secure ingestion at the boundary by encrypting data in transit using TLS with automated certificate provisioning and zero downtime rotation. Clients and Azure Monitor pipeline endpoints both validate each other before ingestion by enabling mutual authentication with mTLS, and it’s easy to set it up with our default experience. Do you have your own PKI and certificate management systems? - Feel free to bring your own certificates to enable secure ingestion. If the pipeline is this critical — how do I know it's healthy? One thing we heard loud and clear during preview: “If this pipeline is critical, I need to see how it’s doing.” Azure Monitor pipeline now exposes health and performance signals, so it’s no longer a black box. Learn more. Customers can answer questions like: Is my pipeline receiving, processing, and sending telemetry? What’s the CPU and memory usage of each pipeline instance? Why is a pipeline unhealthy—or down? Observability for observability felt like the right bar to meet. How do I plan infrastructure without over- or under-provisioning? Planning pipeline infrastructure shouldn't be a guessing game – and we heard this loud and clear during preview. GA includes clear sizing guidance to help you plan the right infrastructure based on your expected telemetry volume and workload characteristics. Not rigid formulas, but practical starting points that give you a confident baseline so you can design intentionally, deploy faster, and avoid costly over- or under-provisioning. Learn more. Alright, these are a bunch of exciting features. How much do I need to pay for them? Azure Monitor pipeline is included at no additional cost for ingesting telemetry into Azure Monitor and Microsoft Sentinel. With general availability, Azure Monitor pipeline is production-ready so you can run the most demanding ingestion scenarios with confidence. If you’re already using it in preview, welcome to GA. If you’re just getting started, there’s never been a better time to dive in. As always, your feedback is what drives this forward. Drop a comment below, reach out directly, or share what you're building. We'd love to hear from you.1.1KViews2likes0CommentsTroubleshoot with OpenTelemetry in Azure Monitor - Public Preview
OpenTelemetry is fast becoming the industry standard for modern telemetry collection and ingestion pipelines. With Azure Monitor’s new OpenTelemetry Protocol (OTLP) support, you can ship logs, metrics, and traces from wherever you run workloads to analyze and act on your observability data in one place. What’s in the preview Direct OTLP ingestion into Azure Monitor for logs, metrics, and traces. Automated onboarding for AKS workloads. Application Insights on OTLP for distributed tracing, performance and troubleshooting experiences. Pre-built Grafana dashboards to visualize signals quickly. Prometheus for metric storage and query. OpenTelemetry semantic conventions for logs and traces, so your data lands in a familiar standard-based schema. How to send OTLP to Azure Monitor: pick your path AKS: Auto-instrument Java and Node.js workloads using the Azure Monitor OpenTelemetry distro, or auto-configure any OpenTelemetry SDK-instrumented workload to export OTLP to Azure Monitor. Get started Limited preview: Auto-instrumentation for .NET and Python is also available. Get started VMs/VM Scale Sets (and Azure Arc-enabled compute): Use the Azure Monitor Agent (AMA) to receive OTLP from your apps and export it to Azure Monitor. Get started Any environment: Use the OpenTelemetry Collector to receive OTLP signals and export directly to Azure Monitor cloud ingestion endpoints. Get started Under the hood: where your telemetry lands Metrics: Stored in an Azure Monitor Workspace, a Prometheus metrics store. Logs + traces: Stored in a Log Analytics workspace using an OpenTelemetry semantic conventions–based schema. Troubleshooting: Application Insights lights up distributed tracing and end-to-end performance investigations, backed by Azure Monitor. Why it matters Standardize once: Instrument with OpenTelemetry and keep your telemetry portable. Reduce overhead: Fewer bespoke exporters and pipelines to maintain. Debug faster: Correlate metrics, logs, and traces to get from alert to root cause with less guesswork. Observe with confidence: Use dashboards and tracing views that are ready on day one. Next step: Try the OTLP preview in your environment, then validate end-to-end signal flow with Application Insights and Grafana dashboards. Learn More456Views3likes0CommentsAzure VMs host (platform) metrics (not guest metrics) to the log analytics workspace ?
Hi Team, Can some one help me how to send Azure VMs host (platform) metrics (not guest metrics) to the log analytics workspace ? Earlier some years ago I used to do it, by clicking on “Diagnostic Settings”, but now if I go to “Diagnostic Settings” tab its asking me to enable guest level monitoring (guest level metrics I don’t want) and pointing to a Storage Account. I don’t see the option to send the these metrics to Log analytics workspace. I have around 500 azure VMs whose host (platform) metrics (not guest metrics) I want to send it to the log analytics workspace.66Views0likes1CommentIntroducing Azure Managed Grafana MCP: The Managed Telemetry Gateway for AI Agents
AI agents are rapidly becoming a core part of how teams build, operate, and improve cloud systems, from coding assistants to autonomous remediation workflows. To deliver on that promise in the enterprise, agents need a secure, governed way to access real production telemetry. Azure Managed Grafana MCP lets AI agents securely query the same production telemetry you already connect to Azure Managed Grafana, like Azure Monitor metrics and logs, Application Insights, and Kusto, using your existing Azure RBAC and managed identities. How do you securely connect AI agents to real production telemetry, without standing up yet another piece of infrastructure? Today, enabling an agent to query systems like Azure Monitor, Application Insights, or Kusto often requires deploying and operating a self‑hosted MCP server, wiring up identity and networking, and maintaining additional runtime infrastructure. That friction slows adoption and expands the security surface area. Azure Managed Grafana MCP removes that entire layer. With this release, every Azure Managed Grafana instance now includes a fully managed, remote MCP server that is ready by default. What is Azure Managed Grafana MCP? Azure Managed Grafana MCP is a built‑in, managed MCP endpoint that allows AI agents to securely query enterprise telemetry and operational data through Azure Managed Grafana. Instead of deploying your own MCP server, customers can simply: Point their agent to the Azure Managed Grafana MCP endpoint Grant the agent a managed identity Start querying production data immediately No containers. No extra infrastructure. No duplicated auth systems. Azure Managed Grafana MCP is very easy to configure with your existing AMG instance Because most Azure Managed Grafana customers already connect data sources like Azure Monitor metrics, logs, Kusto, and Application Insights to Azure Managed Grafana, the MCP server can expose that telemetry to AI agents instantly, using the same RBAC and access controls teams already trust. Why we built this As we’ve talked with customers experimenting with Foundry and coding agents, a consistent theme has emerged: agents are only as useful as the data they can reason over. Requiring teams to stand up and operate a separate MCP layer introduces real cost: Additional infrastructure to deploy and maintain Custom identity and token handling Expanded attack surface Slower experimentation and adoption This Azure Managed Grafana MCP takes a different approach. Rather than asking customers to build new infrastructure for agents, we leverage infrastructure they already run and trust: Azure Managed Grafana. This shifts Grafana from being just a visualization layer to something more strategic: A secure telemetry access plane An analytical engine for agent reasoning A bridge between operational data and autonomous action Core value propositions Zero infrastructure overhead Azure Managed Grafana MCP is fully managed and enabled by default: No self‑hosted MCP servers No additional networking configuration Agents connect directly to Azure Managed Grafana and start querying data. Secure by design Security is not bolted on, it’s inherited: Uses existing Azure RBAC Supports managed identities Respects current Azure Managed Grafana access controls There’s no need to duplicate authentication or authorization logic, and the security posture remains consistent with existing observability access patterns. Immediate enterprise scenarios By exposing production telemetry through MCP, teams can unlock high‑value agent workflows immediately: Root cause analysis using Application Insights Automated operational summaries Real‑time diagnostics Cross‑resource telemetry correlation Structured data access via Kusto Chatting with an agent using Azure Managed Grafana MCP These are scenarios customers already run manually today and this MCP server makes them accessible to agents. Closing the loop: from insight to action One of the most powerful aspects of Azure Managed Grafana MCP is what happens when agents have access to both code context and live telemetry. For example: An agent queries Application Insights for production errors Identifies recurring exception patterns Locates the source code emitting those errors Generates a fix and submits a pull request This closes the loop between observability and remediation, something that’s been largely manual until now. Designing for agents, not just dashboards Humans and agents consume data very differently. Humans: Navigate dashboards sequentially Are limited by cognitive bandwidth Correlate issues manually Agents: Process large datasets in parallel Perform iterative drill‑downs without fatigue Detect statistically significant patterns quickly Azure Managed Grafana MCP is designed with this in mind. Instead of only exposing raw data, it enables agent‑optimized tools, like aggregated failure views across dozens of Application Insights instances, so agents can reason efficiently at scale. To make it easier for our customers, it is now available as a native tool within Microsoft Foundry, so you can easily connect it to your Foundry Agents. Azure Managed Grafana MCP as a native Foundry tool Looking ahead Azure Managed Grafana MCP is the foundation for a broader vision: Observability‑driven autonomous agents Secure enterprise telemetry reasoning AI systems that detect, diagnose, and act Over time, this transforms Azure Managed Grafana from dashboard software into a strategic AI integration layer for Azure. This isn’t just a visualization feature. It’s an infrastructure shift. Check out the doc for more information: Configure an Azure Managed Grafana remote MCP server | Microsoft Learn1.1KViews1like0Comments