Instrument your AI agents with OpenTelemetry GenAI semantic conventions and light up the new Agents (Preview) experience in Application Insights for full visibility into agent runs, token usage, and tool calls.
Part 2 of 3: In Blog 1, we deployed a multi-agent travel planner on Azure App Service using the Microsoft Agent Framework (MAF) 1.0 GA. This post dives deep into how we instrumented those agents with OpenTelemetry and lit up the brand-new Agents (Preview) view in Application Insights.
π Prerequisite: This post assumes you've followed the guidance in Blog 1 to deploy the multi-agent travel planner to Azure App Service. If you haven't deployed the app yet, start there first β you'll need a running App Service with the agents, Service Bus, Cosmos DB, and Azure OpenAI provisioned before the monitoring steps in this post will work.
Deploying Agents Is Only Half the Battle
In Blog 1, we walked through deploying a multi-agent travel planning application on Azure App Service. Six specialized agents β a Coordinator, Currency Converter, Weather Advisor, Local Knowledge Expert, Itinerary Planner, and Budget Optimizer β work together to generate comprehensive travel plans. The architecture uses an ASP.NET Core API backed by a WebJob for async processing, Azure Service Bus for messaging, and Azure OpenAI for the brains.
But here's the thing: deploying agents to production is only half the battle. Once they're running, you need answers to questions like:
- Which agent is consuming the most tokens?
- How long does the Itinerary Planner take compared to the Weather Advisor?
- Is the Coordinator making too many LLM calls per workflow?
- When something goes wrong, which agent in the pipeline failed?
Traditional APM gives you HTTP latencies and exception rates. That's table stakes. For AI agents, you need to see inside the agent β the model calls, the tool invocations, the token spend. And that's exactly what Application Insights' new Agents (Preview) view delivers, powered by OpenTelemetry and the GenAI semantic conventions.
Let's break down how it all works.
The Agents (Preview) View in Application Insights
Azure Application Insights now includes a dedicated Agents (Preview) blade that provides unified monitoring purpose-built for AI agents. It's not just a generic dashboard β it understands agent concepts natively. Whether your agents are built with Microsoft Agent Framework, Azure AI Foundry, Copilot Studio, or a third-party framework, this view lights up as long as your telemetry follows the GenAI semantic conventions.
Here's what you get out of the box:
- Agent dropdown filter β A dropdown populated by
gen_ai.agent.namevalues from your telemetry. In our travel planner, this shows all six agents: "Travel Planning Coordinator", "Currency Conversion Specialist", "Weather & Packing Advisor", "Local Expert & Cultural Guide", "Itinerary Planning Expert", and "Budget Optimization Specialist". You can filter the entire dashboard to one agent or view them all. - Token usage metrics β Visualizations of input and output token consumption, broken down by agent. Instantly see which agents are the most expensive to run.
- Operational metrics β Latency distributions, error rates, and throughput for each agent. Spot performance regressions before users notice.
- End-to-end transaction details β Click into any trace to see the full workflow: which agents were invoked, what tools they called, how long each step took. The "simple view" renders agent steps in a story-like format that's remarkably easy to follow.
- Grafana integration β One-click export to Azure Managed Grafana for custom dashboards and alerting.
The key insight: this view isn't magic. It works because the telemetry is structured using well-defined semantic conventions. Let's look at those next.
π Docs: Application Insights Agents (Preview) view documentation
GenAI Semantic Conventions β The Foundation
The entire Agents view is powered by the OpenTelemetry GenAI semantic conventions. These are a standardized set of span attributes that describe AI agent behavior in a way that any observability backend can understand. Think of them as the "contract" between your instrumented code and Application Insights.
Let's walk through the key attributes and why each one matters:
gen_ai.agent.name
This is the human-readable name of the agent. In our travel planner, each agent sets this via the name parameter when constructing the MAF ChatClientAgent β for example, "Weather & Packing Advisor" or "Budget Optimization Specialist". This is what populates the agent dropdown in the Agents view. Without this attribute, Application Insights would have no way to distinguish one agent from another in your telemetry. It's the single most important attribute for agent-level monitoring.
gen_ai.agent.description
A brief description of what the agent does. Our Weather Advisor, for example, is described as "Provides weather forecasts, packing recommendations, and activity suggestions based on destination weather conditions." This metadata helps operators and on-call engineers quickly understand an agent's role without diving into source code. It shows up in trace details and helps contextualize what you're looking at when debugging.
gen_ai.agent.id
A unique identifier for the agent instance. In MAF, this is typically an auto-generated GUID. While gen_ai.agent.name is the human-friendly label, gen_ai.agent.id is the machine-stable identifier. If you rename an agent, the ID stays the same, which is important for tracking agent behavior across code deployments.
gen_ai.operation.name
The type of operation being performed. Values include "chat" for standard LLM calls and "execute_tool" for tool/function invocations. In our travel planner, when the Weather Advisor calls the GetWeatherForecast function via NWS, or when the Currency Converter calls ConvertCurrency via the Frankfurter API, those tool calls get their own spans with gen_ai.operation.name = "execute_tool". This lets you measure LLM think-time separately from tool execution time β a critical distinction for performance optimization.
gen_ai.request.model / gen_ai.response.model
The model used for the request and the model that actually served the response (these can differ when providers do model routing). In our case, both are "gpt-4o" since that's what we deploy via Azure OpenAI. These attributes let you track model usage across agents, spot unexpected model assignments, and correlate performance changes with model updates.
gen_ai.usage.input_tokens / gen_ai.usage.output_tokens
Token consumption per LLM call. This is what powers the token usage visualizations in the Agents view. The Coordinator agent, which aggregates results from all five specialist agents, tends to have higher output token counts because it's synthesizing a full travel plan. The Currency Converter, which makes focused API calls, uses fewer tokens overall. These attributes let you answer the question "which agent is costing me the most?" β and more importantly, let you set alerts when token usage spikes unexpectedly.
gen_ai.system
The AI system or provider. In our case, this is "openai" (set by the Azure OpenAI client instrumentation). If you're using multiple AI providers β say, Azure OpenAI for planning and a local model for classification β this attribute lets you filter and compare.
Together, these attributes create a rich, structured view of agent behavior that goes far beyond generic tracing. They're the reason Application Insights can render agent-specific dashboards with token breakdowns, latency distributions, and end-to-end workflow views. Without these conventions, all you'd see is opaque HTTP calls to an OpenAI endpoint.
π‘ Key takeaway: The GenAI semantic conventions are what transform generic distributed traces into agent-aware observability. They're the bridge between your code and the Agents view. Any framework that emits these attributes β MAF, Semantic Kernel, LangChain β can light up this dashboard.
Two Layers of OpenTelemetry Instrumentation
Our travel planner sample instruments at two distinct levels, each capturing different aspects of agent behavior. Let's look at both.
Layer 1: IChatClient-Level Instrumentation
The first layer instruments at the IChatClient level using Microsoft.Extensions.AI. This is where we wrap the Azure OpenAI chat client with OpenTelemetry:
var client = new AzureOpenAIClient(azureOpenAIEndpoint, new DefaultAzureCredential());
// Wrap with OpenTelemetry to emit GenAI semantic convention spans
return client.GetChatClient(modelDeploymentName).AsIChatClient()
.AsBuilder()
.UseOpenTelemetry()
.Build();
This single .UseOpenTelemetry() call intercepts every LLM call and emits spans with:
gen_ai.systemβ the AI provider (e.g.,"openai")gen_ai.request.model/gen_ai.response.modelβ which model was usedgen_ai.usage.input_tokens/gen_ai.usage.output_tokensβ token consumption per callgen_ai.operation.nameβ the operation type ("chat")
Think of this as the "LLM layer" β it captures what the model is doing regardless of which agent called it. It's model-centric telemetry.
Layer 2: Agent-Level Instrumentation
The second layer instruments at the agent level using MAF 1.0 GA's built-in OpenTelemetry support. This happens in the BaseAgent class that all our agents inherit from:
Agent = new ChatClientAgent(
chatClient,
instructions: Instructions,
name: AgentName,
description: Description,
tools: chatOptions.Tools?.ToList())
.AsBuilder()
.UseOpenTelemetry(sourceName: AgentName)
.Build();
The .UseOpenTelemetry(sourceName: AgentName) call on the MAF agent builder emits a different set of spans:
gen_ai.agent.nameβ the human-readable agent name (e.g.,"Weather & Packing Advisor")gen_ai.agent.descriptionβ what the agent doesgen_ai.agent.idβ the unique agent identifier- Agent invocation traces β spans that represent the full lifecycle of an agent call
This is the "agent layer" β it captures which agent is doing the work and provides the identity information that powers the Agents view dropdown and per-agent filtering.
Why Both Layers?
When both layers are active, you get the richest possible telemetry. The agent-level spans nest around the LLM-level spans, creating a trace hierarchy that looks like:
Agent: "Weather & Packing Advisor" (gen_ai.agent.name)
βββ chat (gen_ai.operation.name)
βββ model: gpt-4o, input_tokens: 450, output_tokens: 120
βββ execute_tool: GetWeatherForecast
βββ chat (follow-up with tool results)
βββ model: gpt-4o, input_tokens: 680, output_tokens: 350
There is a tradeoff: with both layers active, you may see some span duplication since both the IChatClient wrapper and the MAF agent wrapper emit spans for the same underlying LLM call. If you find the telemetry too noisy, you can disable one layer:
- Agent layer only (remove
.UseOpenTelemetry()from theIChatClient) β You get agent identity but lose per-call token breakdowns. - IChatClient layer only (remove
.UseOpenTelemetry()from the agent builder) β You get detailed LLM metrics but lose agent identity in the Agents view.
For the fullest experience with the Agents (Preview) view, we recommend keeping both layers active. The official sample uses both, and the Agents view is designed to handle the overlapping spans gracefully.
π Docs: MAF Observability Guide
Exporting Telemetry to Application Insights
Emitting OpenTelemetry spans is only useful if they land somewhere you can query them. The good news is that Azure App Service and Application Insights have deep native integration β App Service can auto-instrument your app, forward platform logs, and surface health metrics out of the box. For a full overview of monitoring capabilities, see Monitor Azure App Service.
For our AI agent scenario, we go beyond the built-in platform telemetry. We need the GenAI semantic convention spans that we configured in the previous sections to flow into App Insights so the Agents (Preview) view can render them. Our travel planner has two host processes β the ASP.NET Core API and a WebJob β and each requires a slightly different exporter setup.
ASP.NET Core API β Azure Monitor OpenTelemetry Distro
For the API, it's a single line. The Azure Monitor OpenTelemetry Distro handles everything:
// Configure OpenTelemetry with Azure Monitor for traces, metrics, and logs.
// The APPLICATIONINSIGHTS_CONNECTION_STRING env var is auto-discovered.
builder.Services.AddOpenTelemetry().UseAzureMonitor();
That's it. The distro automatically:
- Discovers the
APPLICATIONINSIGHTS_CONNECTION_STRINGenvironment variable - Configures trace, metric, and log exporters to Application Insights
- Sets up appropriate sampling and batching
- Registers standard ASP.NET Core HTTP instrumentation
This is the recommended approach for any ASP.NET Core application. One NuGet package (Azure.Monitor.OpenTelemetry.AspNetCore), one line of code, zero configuration files.
WebJob β Manual Exporter Setup
The WebJob is a non-ASP.NET Core host (it uses Host.CreateApplicationBuilder), so the distro's convenience method isn't available. Instead, we configure the exporters explicitly:
// Configure OpenTelemetry with Azure Monitor for the WebJob (non-ASP.NET Core host).
// The APPLICATIONINSIGHTS_CONNECTION_STRING env var is auto-discovered.
builder.Services.AddOpenTelemetry()
.ConfigureResource(r => r.AddService("TravelPlanner.WebJob"))
.WithTracing(t => t
.AddSource("*")
.AddAzureMonitorTraceExporter())
.WithMetrics(m => m
.AddMeter("*")
.AddAzureMonitorMetricExporter());
builder.Logging.AddOpenTelemetry(o => o.AddAzureMonitorLogExporter());
A few things to note:
.AddSource("*")β Subscribes to all trace sources, including the ones emitted by MAF's.UseOpenTelemetry(sourceName: AgentName). In production, you might narrow this to specific source names for performance..AddMeter("*")β Similarly captures all metrics, including the GenAI metrics emitted by the instrumentation layers..ConfigureResource(r => r.AddService("TravelPlanner.WebJob"))β Tags all telemetry with the service name so you can distinguish API vs. WebJob telemetry in Application Insights.- The connection string is still auto-discovered from the
APPLICATIONINSIGHTS_CONNECTION_STRINGenvironment variable β no need to pass it explicitly.
The key difference between these two approaches is ceremony, not capability. Both send the same GenAI spans to Application Insights; the Agents view works identically regardless of which exporter setup you use.
π Docs: Azure Monitor OpenTelemetry Distro
Infrastructure as Code β Provisioning the Monitoring Stack
The monitoring infrastructure is provisioned via Bicep modules alongside the rest of the application's Azure resources. Here's how it fits together.
Log Analytics Workspace
infra/core/monitor/loganalytics.bicep creates the Log Analytics workspace that backs Application Insights:
resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2023-09-01' = {
name: name
location: location
tags: tags
properties: {
sku: {
name: 'PerGB2018'
}
retentionInDays: 30
}
}
Application Insights
infra/core/monitor/appinsights.bicep creates a workspace-based Application Insights resource connected to Log Analytics:
resource appInsights 'Microsoft.Insights/components@2020-02-02' = {
name: name
location: location
tags: tags
kind: 'web'
properties: {
Application_Type: 'web'
WorkspaceResourceId: logAnalyticsWorkspaceId
}
}
output connectionString string = appInsights.properties.ConnectionString
Wiring It All Together
In infra/main.bicep, the Application Insights connection string is passed as an app setting to the App Service:
appSettings: {
APPLICATIONINSIGHTS_CONNECTION_STRING: appInsights.outputs.connectionString
// ... other app settings
}
This is the critical glue: when the app starts, the OpenTelemetry distro (or manual exporters) auto-discover this environment variable and start sending telemetry to your Application Insights resource. No connection strings in code, no configuration files β it's all infrastructure-driven.
The same connection string is available to both the API and the WebJob since they run on the same App Service. All agent telemetry from both host processes flows into a single Application Insights resource, giving you a unified view across the entire application.
See It in Action
Once the application is deployed and processing travel plan requests, here's how to explore the agent telemetry in Application Insights.
Step 1: Open the Agents (Preview) View
In the Azure portal, navigate to your Application Insights resource. In the left nav, look for Agents (Preview) under the Investigations section. This opens the unified agent monitoring dashboard.
Step 2: Filter by Agent
The agent dropdown at the top of the page is populated by the gen_ai.agent.name values in your telemetry. You'll see all six agents listed:
- Travel Planning Coordinator
- Currency Conversion Specialist
- Weather & Packing Advisor
- Local Expert & Cultural Guide
- Itinerary Planning Expert
- Budget Optimization Specialist
Select a specific agent to filter the entire dashboard β token usage, latency, error rate β down to that one agent.
Step 3: Review Token Usage
The token usage tile shows total input and output token consumption over your selected time range. Compare agents to find your biggest spenders. In our testing, the Coordinator agent consistently uses the most output tokens because it aggregates and synthesizes results from all five specialists.
Step 4: Drill into Traces
Click "View Traces with Agent Runs" to see all agent executions. Each row represents a workflow run. You can filter by time range, status (success/failure), and specific agent.
Step 5: End-to-End Transaction Details
Click any trace to open the end-to-end transaction details. The "simple view" renders the agent workflow as a story β showing each step, which agent handled it, how long it took, and what tools were called. For a full travel plan, you'll see the Coordinator dispatch work to each specialist, tool calls to the NWS weather API and Frankfurter currency API, and the final aggregation step.
Grafana Dashboards
The Agents (Preview) view in Application Insights is great for ad-hoc investigation. For ongoing monitoring and alerting, Azure Managed Grafana provides prebuilt dashboards specifically designed for agent workloads.
From the Agents view, click "Explore in Grafana" to jump directly into these dashboards:
- Agent Framework Dashboard β Per-agent metrics including token usage trends, latency percentiles, error rates, and throughput over time. Pin this to your operations wall.
- Agent Framework Workflow Dashboard β Workflow-level metrics showing how multi-agent orchestrations perform end-to-end. See how long complete travel plans take, identify bottleneck agents, and track success rates.
These dashboards query the same underlying data in Log Analytics, so there's zero additional instrumentation needed. If your telemetry lights up the Agents view, it lights up Grafana too.
Key Packages Summary
Here are the NuGet packages that make this work, pulled from the actual project files:
| Package | Version | Purpose |
|---|---|---|
Azure.Monitor.OpenTelemetry.AspNetCore | 1.3.0 | Azure Monitor OTEL Distro for ASP.NET Core (API). One-line setup for traces, metrics, and logs. |
Azure.Monitor.OpenTelemetry.Exporter | 1.3.0 | Azure Monitor OTEL exporter for non-ASP.NET Core hosts (WebJob). Trace, metric, and log exporters. |
Microsoft.Agents.AI | 1.0.0 | MAF 1.0 GA β ChatClientAgent, .UseOpenTelemetry() for agent-level instrumentation. |
Microsoft.Extensions.AI | 10.4.1 | IChatClient abstraction with .UseOpenTelemetry() for LLM-level instrumentation. |
OpenTelemetry.Extensions.Hosting | 1.11.2 | OTEL dependency injection integration for Host.CreateApplicationBuilder (WebJob). |
Microsoft.Extensions.AI.OpenAI | 10.4.1 | OpenAI/Azure OpenAI adapter for IChatClient. Bridges the Azure OpenAI SDK to the M.E.AI abstraction. |
Wrapping Up
Let's zoom out. In this three-part series, so far we've gone from zero to a fully observable, production-grade multi-agent AI application on Azure App Service:
- Blog 1 covered deploying the multi-agent travel planner with MAF 1.0 GA β the agents, the architecture, the infrastructure.
- Blog 2 (this post) showed how to instrument those agents with OpenTelemetry, explained the GenAI semantic conventions that make agent-aware monitoring possible, and walked through the new Agents (Preview) view in Application Insights.
- Blog 3 will show you how to secure those agents for production with the Microsoft Agent Governance Toolkit.
The pattern is straightforward:
- Add
.UseOpenTelemetry()at theIChatClientlevel for LLM metrics. - Add
.UseOpenTelemetry(sourceName: AgentName)at the MAF agent level for agent identity. - Export to Application Insights via the Azure Monitor distro (one line) or manual exporters.
- Wire the connection string through Bicep and environment variables.
- Open the Agents (Preview) view and start monitoring.
With MAF 1.0 GA's built-in OpenTelemetry support and Application Insights' new Agents view, you get production-grade observability for AI agents with minimal code. The GenAI semantic conventions ensure your telemetry is structured, portable, and understood by any compliant backend. And because it's all standard OpenTelemetry, you're not locked into any single vendor β swap the exporter and your telemetry goes to Jaeger, Grafana, Datadog, or wherever you need it.
Now go see what your agents are up to and check out Blog 3.
Resources
- Sample repository: seligj95/app-service-multi-agent-maf-otel
- App Insights Agents (Preview) view: Documentation
- GenAI Semantic Conventions: OpenTelemetry GenAI Registry
- MAF Observability Guide: Microsoft Agent Framework Observability
- Azure Monitor OpenTelemetry Distro: Enable OpenTelemetry for .NET
- Grafana Agent Framework Dashboard: aka.ms/amg/dash/af-agent
- Grafana Workflow Dashboard: aka.ms/amg/dash/af-workflow
- Blog 1: Deploy Multi-Agent AI Apps on Azure App Service with MAF 1.0 GA
- Blog 3: Govern AI Agents on App Service with the Microsoft Agent Governance Toolkit