As organizations accelerate their adoption of AI agents, the need for robust observability has never been greater. At Microsoft Ignite, we introduced Observability in Foundry Control Plane - a comprehensive and enhanced suite of tools that empowers teams to evaluate, monitor, and optimize the quality, performance, cost and safety of multi-agent AI systems across their entire lifecycle. Watch “Monitor, optimize and scale with AI Observability Microsoft Foundry” session, where we demonstrate how to get end-to-end observability for your AI agents.
During the above session, CarMax—a leading automotive retailer—explained how they use Observability features in Microsoft Foundry to monitor, troubleshoot, and improve the performance of their AI agent, Skye.
“CarMax uses Microsoft Foundry not just for evaluations, but as a foundation for agentic observability. Every interaction from our agent Skye is captured, analyzed, and scored through a mix of out-of-the-box and custom evaluators, giving us deep visibility into how agents perform across safety, compliance, and quality metrics. This observability enables proactive monitoring, faster iteration, and confidence that our multi-agent architecture operates responsibly and effectively. We are excited to explore the ability to get fleet-wide observability in one view with Foundry Control Plane”
Abhi Bhatt, Data & AI Engineering, CarMax
Why Observability Matters for AI Agents
AI agents are inherently non-deterministic, their outputs can vary, and even small changes in input or configuration can lead to unexpected results. This unpredictability, combined with the high stakes of deploying AI in customer-facing and regulated environments, makes continuous evaluation and monitoring essential. Observability is the foundation for building trust and reliability in AI systems.
Get Complete Visibility into All your Agents Throughout their Lifecycle
Observability in Foundry Control Plane addresses these challenges with a comprehensive suite of observability tools, including:
- Comprehensive evaluation: Build robust agents with out-of-the-box and custom evaluators, synthetic datasets, continuous evaluations and built-in cluster analysis of evaluation results pinpointing problematic areas. Evaluation can be run in the playground, on test datasets in CI/CD pipelines, continuously on production traffic or on a schedule to detect drift.
- Unified monitoring dashboards: Track agent cost, performance, and safety with customizable dashboards and actionable insights.
- End-to-end tracing: Debug issues with OpenTelemetry-based tracing, following every agent run from input to tool call.
- Fleet-wide oversight: Observe agents built on Foundry and third-party platforms, plus agents built with popular agent frameworks (Microsoft Semantic Kernal, LangChain, LangGraph, OpenAI, etc.) in a single view.
- AI Red Teaming Agent: Continuously test and harden generative AI systems against real-world risks, automating red-teaming runs for ongoing safety.
The above observability tools are integrated throughout the agent lifecycle - from prototyping in playground, monitoring and optimizing in production and managing a fleet of agents. Plus, all evaluations, traces, and red-teaming results are published to Azure Monitor, where agent signals are correlated with the KPIs of dependent AI infrastructure and other app signals to deliver an end-to-end operational picture.
Step 1: Accelerate Reliable Agent Development with Integrated & Continuous Evaluations
Rapidly developing robust AI agents requires more than just code—it demands continuous evaluation and feedback. Observability in Foundry Control Plane integrates evaluations directly into the agent playground, offering:
- A comprehensive set of out-of-the-box quality, risk, and safety evaluators
- The ability to create custom evaluators tailored to your use cases
- Human evaluation for supplemental measurements
- Built-in synthetic datasets and recommended evaluators (new at Ignite) to streamline evaluation creation
- Cluster analysis for evaluations, providing visual insights that help developers quickly identify patterns and mistakes, and take recommended actions to improve their agents
These capabilities empower teams to build production-grade AI with confidence and speed, reducing time-to-value and minimizing risk.
Accelerate reliable agent development with built-in evaluations and tracing
Step 2: Monitor and Optimize Agents with Actionable Insights in Production
Once agents are in production, continuous monitoring and optimization are essential. Foundry Control Plane provides:
- Out-of-the-box, customizable monitoring dashboards to track agent cost, performance, evaluation results, and red-teaming scans
- End-to-end tracing which is OpenTelemetry (OTel) compliant, enables teams to trace every agent run and evaluation result—from agent calls and LLM inference to individual tool calls
- Actionable insights for debugging and optimization, including a curated model upgrade experience to improve cost and performance
- AI Gateway for granular control of model, agent, and tool usage, ensuring performance and cost management
With these tools, teams can proactively detect issues, optimize agent behavior, and ensure their AI systems deliver value reliably and efficiently.
Agent monitoring dashboard showing cost, performance and safety metrics in one view
Step #3: Get Unified Observability Across Your Fleet of Agents—On Any Platform
Modern organizations often deploy agents across multiple platforms and frameworks. Foundry Control Plane delivers unified oversight by:
- Enabling teams to register, observe and evaluate all agents—regardless of where they were built—within one unified observability solution.
- Supporting agents built on Foundry and third-party platforms (including Microsoft Agent Framework, LangChain, and LangGraph) via OTel-compliant traces.
- Providing a single view of cost, performance, red-teaming scan results, and alerts across your entire agent fleet in the Operate experience
This fleet-wide visibility ensures consistent governance, compliance, and operational excellence, even in complex, heterogeneous environments.
Observe all your agents in one place - built in Foundry or third-party agents
Conclusion
With Observability in Foundry Control Plane, organizations can confidently evaluate, monitor, and optimize every agent—across platforms and at scale. Whether you’re building your first agent or managing a global fleet, these capabilities provide the visibility and control needed to deliver production-grade AI responsibly. Refer to the Observability in Foundry Control Plane webpage and documentation for more information.