azure copilot

4 Topics

Find and fix app issues - Azure Copilot Observability Agent
The Azure Copilot Observability Agent autonomously investigates incidents, correlates signals across logs, metrics, alerts, application health, and ML anomalies, then surfaces root cause with charts and recommended next steps. Extend coverage to your AI agents in Microsoft Foundry, track Gen AI errors and token consumption with trace-level detail, and write plain-language instructions to tune autonomous behavior to match your team’s workflow. Matt McSpirit, Microsoft Azure expert, shares how to take full control of incident response at scale. Any alert, full stack. The Azure Copilot Observability Agent correlates OpenTelemetry & native Azure signals end-to-end. Trace any issue to deep failure analysis across every tier. Check it out. Logs. Metrics. Alerts. ML anomalies. The Azure Copilot Observability Agent correlates them all, surfaces root cause, and recommends mitigation steps automatically. See it here. Tune the Azure Copilot Observability Agent to match your team’s workflow. Write natural-language instructions to set alert groupings, escalation rules, and issue priorities. See how. QUICK LINKS: 00:00 — Azure Copilot Observability Agent 00:43 — How to use it as you work 01:33 — Unified Full-Stack Telemetry 02:39 — Root Cause Investigation 04:12 — Investigate further 04:55 — Re-run the investigation 05:36 — Autonomous Alert Correlation & Triage 07:13 — Natural Language Agent Customization 07:34 — Wrap up Link References Get started at https://aka.ms/ObservabilityAgent Unfamiliar with Microsoft Mechanics? As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast Keep getting this insider knowledge, join us on social: Follow us on Twitter: https://twitter.com/MSFTMechanics Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics Video Transcript: -When you run compute services at scale, issues are bound to happen. The real question is how quickly you can understand and fix them. This is where AI can cut through the noise, bringing you the right signals so that you can start your day with critical issues already investigated. Azure Copilot Observability Agent, powered by Azure Monitor, can run autonomously with knowledge of your environment, application topology, its dependencies and typical patterns. It monitors alerts streaming into Azure Monitor, triaging and correlating related alerts into issues requiring your attention so that you experience less alert fatigue and noise. The agent investigates critical issues to determine the root cause, and then recommends mitigation steps. Let me walk you through how you can use the Observability Agent to your advantage as you work. -Using Azure Monitor, I can observe my app, across tiers. The App map shows me the topological view of my app. It’s a multi-tier app with a modern micro-service backend, hosted on Kubernetes, with a SQL database, Redis cache and more. The app also has an agent built in Microsoft Foundry, and with the new Foundry capabilities, I can also observe how it is performing from operational metrics, including Agent Runs, Gen AI Errors, Tool Calls, underlying Models, and Token Consumption. And scrolling up, we can even view traces with Gen AI errors and can see a list of traces. I’ll look at this one with 93 items. And if I click in, I can see a trace view of all the agent’s activities with direct integration into our existing backend APIs through an MCP server. This is where Azure Monitor and unified logging is foundational to connecting the dots for a more complete view of the health of your service. It unifies telemetry in real time across the stack using OpenTelemetry as well as native integrations with Azure Services and normalizes those signals into a consistent schema. -And because everything shares the same data foundation, it’s easy for either you or the Observability Agent to move instantly from insights, such as failed requests to more details across different failure categories, which can be drilled into and queried further without having to stitch anything together. So, this way even though there are several teams involved across app and infrastructure managing my app, everyone shares a common operational view to keep it operating smoothly. -In fact, let’s go back into Azure Monitor to see how my app is doing, specifically my agent. This time I’ll start in our agent alerts view. It looks like my agent is experiencing some errors I’ll choose this alert, to view additional details and this is where you can invoke the Observability Agent using the investigate button. Importantly, once you have Azure Monitor running in your environment, there is no setup needed. -So, now I have the agent open. Let’s start the investigation. Now, when invoked for the first time, the Observability Agent will learn about your app, its topology, patterns, baselines, and more. In this case, the agent already has deep knowledge of my app so it launches directly into an investigation. It’s able to correlate and reason across signals spanning the full application stack, logs, metrics, alerts, application health, and ML-detected anomalies, to pinpoint the root cause of the incident. It plans, executes, backtracks, runs multiple queries and validates or disproves hypotheses, just like a human expert would. After completing the investigation, Observability Agent produces a detailed report with its findings. Now in this case, it’s discovered failures when the agent hit the MCP product catalog where the backend telemetry shows the underlying failure was a SQL execution time out. We can see a consistent time out pattern with 117 failed SQL dependency calls. It’s ruled out any issues with the Redis cache. And has determined that there was NO SQL outage. -Instead, SQL metrics show a small saturation window at the incident start which dropped after the initial spike which would have led to connection failure. And it also charts its findings visually with search failures, latency spikes, failures by dependency and CPU and DTU spikes. And back on the left we see it’s recommending what can be done next, including how to improve querying and code, as well as monitoring. -Now at this point, you can either start a new investigation or build off the Observability Agent’s findings. I want to investigate further. There are a few things that cause spikes like this so I’m going to clarify if our SQL server saw a spike in traffic leading to timeouts. The Observability Agent responds quickly and notes stable to moderate SQL volume with many of the details we saw earlier. This time though, it’s found the root cause. It appears that an expensive query consumed significant SQL CPU resources. I am also concerned about token usage during the incident so I’ll ask the Observability Agent to check for anomalies. And after it’s done its analysis, the agent rules out my hypothesis there too. -So, it’s easy to collaborate with the Observability Agent, in the context of the incident. You can even ask the agent to re-run the entire investigation with additional instructions if you need to. Next, I want to create an issue that captures the progress of the investigation I’ll keep all the details and agree to share the agent chat around this issue so that it can be viewed by others on my team, which lets you hand off the issue to a colleague without losing context. The issue will retain the history of the incident, so no one has to start from scratch. In fact, this becomes a shared case file for the investigation. -So, I just walked you through a manual flow for how you can use the new Observability Agent as you work. Next, let me show you how you can get the Observability Agent to work alongside you autonomously on your behalf with a new capability in public preview. Either from the marketplace or Azure Monitor, you can create a new Observability Agent resource. -First, you start by giving it name, selecting the region, and your Azure Monitor workspace. Next, you’ll point it at your Application Insights resource. We’ll use our “ct-agent” from before. And then you can configure whether you want the agent instance to run autonomously on auto-created issues to provide root cause analysis and next steps. From there, I just review and confirm. The Observability Agent will then learn about your application to establish deep expertise and knowledge of your environment and preserve it in the agent instance. And if an alert arrives, the agent creates an issue automatically and correlates any additional related alerts into that same issue, before launching an investigation. -Now once an investigation is underway, the Observability Agent notifies you, for example, via email, with background on the detected issue. And from there, you can review the overview and look at the associated alerts. As you can see, in our case, the Observability Agent was able to use deep, agentic reasoning to correlate separate alerts both for the agent in this application and for the backend APIs invoked independently and then bring them together into a single, unified view. This significantly reduces alert noise. -Now this correlation is also important for issue triage. Because individually, these alerts might have lower severity, but when correlated, they increase the overall severity of the issue. And clicking into the Observability Agent tab we can see that the investigation is complete. This is a comprehensive report like we saw earlier with its analysis, corresponding chart visualizations, and recommendations on what to do next. Now as you set up agents, you can also customize them with your own instructions to prioritize what’s important. So, if we go back to where you create a new agent instance, here’s where you would add specific instructions, written in natural language. For example, you can tell an agent to group certain alerts, and prioritize the ones that should always trigger an issue so that its autonomous behavior matches how your team operates. -So that’s the power of Azure Monitor with the Azure Copilot Observability Agent which together turn alerts into fully investigated issues with clear, actionable insights. By correlating signals across your entire stack, it helps you move from detection, to root cause, to action-faster, and without losing context. The result: less time triaging, faster resolution, and more resilient systems at scale. To learn more, visit aka.ms/ObservabilityAgent. Keep watching Microsoft Mechanics for the latest tech updates and thanks for watching.
Zachary-Cavanell
Jun 28, 2026 Place Microsoft Mechanics Blog
45Views
0likes
0Comments
LangChain.js + Azure: A Generative AI App Journey
In an era where the landscape of Generative Artificial Intelligence (GenAI) is rapidly evolving, developers find themselves at the crossroads of innovation and practical implementation. The Azure Developers JavaScript Day 2024 provided an illuminating session, presented by Yohan, a Senior Cloud Developer Advocate at Microsoft, focusing on the development of GenAI applications using LangChain.js and Azure. This article delves into the highlights of this session, offering insights into the processes, tools, and methodologies that pave the way for developers to harness the power of GenAI.
Glaucia_Lemos
Aug 23, 2024 Place Educator Developer Blog
2.9KViews
0likes
0Comments
Intelligent FinOps in Azure
Leverage FinOps in Azure to optimize your cloud spend and drive accountability across your organization.
Zachary-Cavanell
Apr 26, 2024 Place Microsoft Mechanics Blog
3.3KViews
2likes
0Comments
Microsoft copilot for Azure is not providing appropriate answer.
I have been testing Azure copilot and i saw some wrong answers and it is not integrated with Service Health appropriately. When i asked if there was any Planned maintenance for any of the service, it checked if all the service are working fine or not. When i asked if there was any Security advisory for any of the service, it checked if all the service are working fine or not. Does anyone else facing the same issue or is the prompt wrong?
shrey_mittal
Jan 20, 2024 Place Azure
541Views
1like
1Comment