Microsoft is enhancing multi-agent observability by introducing new semantic conventions to OpenTelemetry, developed collaboratively with Outshift, Cisco’s incubation engine. These additions—built upon OpenTelemetry and W3C Trace Context—establish standardized practices for tracing and telemetry within multi-agent systems, facilitating consistent logging of key metrics for quality, performance, safety, and cost. This systematic approach enables more comprehensive visibility into multi-agent workflows, including tool invocations and collaboration.
These advancements have been integrated into Azure AI Foundry, Microsoft Agent Framework, Semantic Kernel, and Azure AI packages for LangChain, LangGraph, and the OpenAI Agents SDK, enabling customers to get unified observability for agentic systems built using any of these frameworks with Azure AI Foundry. The additional semantic conventions and integration across different frameworks equip developers to monitor, troubleshoot, and optimize their AI agents in a unified solution with increased efficiency and valuable insights.
“Outshift, Cisco's Incubation Engine, worked with Microsoft to add new semantic conventions in OpenTelemetry. These conventions standardize multi-agent observability and evaluation, giving teams comprehensive insights into their AI systems.” Giovanna Carofiglio, Distinguished Engineer, Cisco
Multi-agent observability challenges
Multi-agent systems involve multiple interacting agents with diverse roles and architectures. Such systems are inherently dynamic—they adapt in real time by decomposing complex tasks into smaller, manageable subtasks and distributing them across specialized agents. Each agent may invoke multiple tools, often in parallel or sequence, to fulfill its part of the task, resulting in emergent workflows that are non-linear and highly context dependent. Given the dynamic nature of multi-agent systems and the management across agents, observability becomes critical for debugging, performance tuning, security, and compliance for such systems.
Multi-agent observability presents unique challenges that traditional GenAI telemetry standards fail to address. Traditional observability conventions are optimized for single-agent reasoning path visibility and lack the semantic depth to capture collaborative workflows across multiple agents. In multi-agent systems, tasks are often decomposed and distributed dynamically, requiring visibility into agent roles, task hierarchies, tool usage, and decision-making processes. Without standardized task spans and a unified namespace, it's difficult to trace cross-agent coordination, evaluate task outcomes, or analyze retry patterns. These gaps hinder white-box observability, making it hard to assess performance, safety, and quality across complex agentic workflows.
Extending OpenTelemetry with multi-agent observability
Microsoft has proposed new spans and attributes to OpenTelemetry semantic convention for GenAI agent and framework spans. They will enrich insights and capture the complexity of agent, tool, tasks and plans interactions in multi-agent systems. Below is a list of all the new additions proposed to OpenTelemetry.
New Span/Trace/Attributes |
Name |
Purpose |
New span |
execute_task |
Captures task planning and event propagation, providing insights into how tasks are decomposed and distributed. |
New child spans under “invoke_agent” |
agent_to_agent_interaction |
Traces communication between agents |
agent.state.management |
Effective context, short or long term memory management | |
agent_planning |
Logs the agent’s internal planning steps | |
|
agent orchestration |
Capture agent-to-agent orchestration |
New attributes in invoke_agent span |
tool_definitions |
Describes the tool’s purpose or configuration |
llm_spans |
Records model call spans | |
New attributes in “execute_tool” span |
tool.call.arguments |
Logs the arguments passed during tool invocation |
tool.call.results |
Records the results returned by the tool | |
New event |
Evaluation - attributes (name, error.type, label) |
Enables structured evaluation of agent performance and decision-making |
More details can be found in the following pull-requests merged into OpenTelemetry
- Add tool definition plus tool-related attributes in invoke-agent, inference, and execute-tool spans
- Capture evaluation results for GenAI applications
Azure AI Foundry delivers unified observability for Microsoft Agent Framework, LangChain, LangGraph, OpenAI Agents SDK
Agents built with Azure AI Foundry (SDK or portal) get out-of-the box observability in Foundry. With the new addition, agents built on different frameworks including Microsoft Agent Framework, Semantic Kernel, LangChain, LangGraph and OpenAI Agents SDK can use Foundry for monitoring, analyzing, debugging and evaluation with full observability. Agents built on Microsoft Agent Framework and Semantic Kernel get out-of-box tracing and evaluations support in Foundry Observability. Agents built with LangChain, LangGraph and OpenAI Agents SDK can use the corresponding packages and detailed instructions listed in the documentation to get tracing and evaluations support in Foundry Observability.
Customer benefits
With standardized multi-agent observability and support across multiple agent frameworks, customers get the following benefits:
- Unified Observability Platform for Multi-agent systems: Foundry Observability is the unified multi-agent observability platform for agents built with Foundry SDK or other agent frameworks like Microsoft Agent Framework, LangGraph, Lang Chain, OpenAI SDK.
- End-to-End Visibility into multi-agent systems: Customers can see not just what the system did, but how and why—from user request, through agent collaboration, tool usage, to final output.
- Faster Debugging & Root Cause Analysis: When something goes wrong (e.g., wrong answer, safety violation, performance bottleneck), customers can trace the exact path, see which agent/tool/task failed, and why.
- Quality & Safety Assurance: Built-in metrics and evaluation events (e.g. task success and validation scores) help customers monitor and improve the quality and safety of their AI workflows.
- Cost & Performance Optimization: Detailed metrics on token usage, API calls, resource consumption, and latency help customers optimize efficiency and cost.
Get Started today with end-to-end agent observability with Foundry
Azure AI Foundry Observability is a unified solution for evaluating, monitoring, tracing, and governing the quality, performance, and safety of your AI systems end-to-end— all built into your AI development loop and backed by the power of Azure Monitor for full stack observability. From model selection to real-time debugging, Foundry Observability capabilities empower teams to ship production-grade AI with confidence and speed. It’s observability, reimagined for the enterprise AI era.
With the above OpenTelemetry enhancements Azure AI Foundry now provides detailed multi-agent observability for agents built with different frameworks including Azure AI Foundry, Microsoft Agent Framework, LangChain, LangGraph and OpenAI Agents SDK. Learn more about Azure AI Foundry Observability and get end-to-end agent observability today!