Observability for Multi-Agent Systems with Microsoft Agent Framework and Azure AI Foundry

Microsoft

Nov 11, 2025

Agentic applications are revolutionizing enterprise automation, but their dynamic toolchains and latent reasoning make them notoriously hard to operate. In this post, you'll learn how to instrument a Microsoft Agent Framework–based service with OpenTelemetry, ship traces to Azure AI Foundry observability, and adopt a practical workflow to debug, evaluate, and improve multi-agent behavior in production.

We'll show how to wire spans around reasoning steps and tool calls (OpenAPI / MCP), enabling deep visibility into your agentic workflows.

Who Should Read This?

Developers building agents with Microsoft Agent Framework (MAF) in .NET or Python
Architects/SREs seeking enterprise-grade visibility, governance, and reliability for deployments on Azure AI Foundry

Why Observability Is Non-Negotiable for Agents

Traditional logs fall short for agentic systems:

Reasoning and routing (which tool? which doc?) are opaque without explicit spans/events
Failures often occur between components (e.g., retrieval mismatch, tool schema drift)
Without traces across agents ⇄ tools ⇄ data stores, you can't reproduce or evaluate behavior

Microsoft has introduced multi-agent observability patterns and OpenTelemetry (OTel) conventions that unify traces across Agent Framework, Foundry, and popular stacks—so you can see one coherent timeline for each task.

Reference Architecture

Key Capabilities

Agent orchestration & deployment via Microsoft Agent Framework
Model access using Foundry’s OpenAI-compatible endpoint
OpenTelemetry for traces/spans + attributes (agent, tool, retrieval, latency, tokens)

Step-by-Step Implementation

Assumption: This article uses Azure Monitor (via Application Insights) as the OpenTelemetry exporter, but you can configure other supported exporters in the same way.

Prerequisites

.NET 8 SDK or later
Azure OpenAI service (endpoint, API key, deployed model)
Application Insights and Grafana

Create an Agent with OpenTelemetry (ASP.NET Core or Console App)

Install required packages:

dotnet add package Azure.AI.OpenAI 
dotnet add package Azure.Monitor.OpenTelemetry.Exporter 
dotnet add package Microsoft.Agents.AI.OpenAI 
dotnet add package Microsoft.Extensions.Logging 
dotnet add package OpenTelemetry 
dotnet add package OpenTelemetry.Trace 
dotnet add package OpenTelemetry.Metrics 
dotnet add package OpenTelemetry.Extensions.Hosting 
dotnet add package OpenTelemetry.Instrumentation.Http

Setup environment variables:

AZURE_OPENAI_ENDPOINT: https://<your_service_name>.openai.azure.com/ 
AZURE_OPENAI_API_KEY: <your_azure_openai_apikey> 
APPLICATIONINSIGHTS_CONNECTION_STRING: <your_application_insights_connectionstring_for_azuremonitor_exporter>

Configure tracing once at startup:

var applicationInsightsConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING");

// Create a resource describing the service
var resource = ResourceBuilder.CreateDefault()
    .AddService(serviceName: ServiceName)
    .AddAttributes(new Dictionary<string, object>
    {
        ["deployment.environment"] = "development",
        ["service.instance.id"] = Environment.MachineName
    })
    .Build();

// Setup OpenTelemetry TracerProvider
var traceProvider = Sdk.CreateTracerProviderBuilder()
    .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName))
    .AddSource(SourceName)
    .AddSource("Microsoft.Agents.AI")
    .AddHttpClientInstrumentation()
    .AddAzureMonitorTraceExporter(options =>
    {
        options.ConnectionString = applicationInsightsConnectionString;
    })
    .Build();

// Setup OpenTelemetry MeterProvider
var meterProvider = Sdk.CreateMeterProviderBuilder()
    .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName))
    .AddMeter(SourceName)
    .AddAzureMonitorMetricExporter(options =>
    {
        options.ConnectionString = applicationInsightsConnectionString;
    })
    .Build();

// Configure DI and OpenTelemetry
var serviceCollection = new ServiceCollection();

// Setup Logging with OpenTelemetry and Application Insights
serviceCollection.AddLogging(loggingBuilder =>
    {
        loggingBuilder.SetMinimumLevel(LogLevel.Debug);
        loggingBuilder.AddOpenTelemetry(options =>
        {
            options.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName));
            options.IncludeScopes = true;
            options.IncludeFormattedMessage = true;
            options.AddAzureMonitorLogExporter(exporterOptions =>
            {
                exporterOptions.ConnectionString = applicationInsightsConnectionString;
            });
        });
        loggingBuilder.AddApplicationInsights(
            configureTelemetryConfiguration: (config) =>
            {
                config.ConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING");
            },
            configureApplicationInsightsLoggerOptions: options =>
            {
                options.TrackExceptionsAsExceptionTelemetry = true;
                options.IncludeScopes = true;
            });
    });

Configure custom metrics and activity source for tracing:

using var activitySource = new ActivitySource(SourceName);
using var meter = new Meter(SourceName);

// Create custom metrics
var interactionCounter = meter.CreateCounter<long>("chat_interactions_total", description: "Total number of chat interactions");
var responseTimeHistogram = meter.CreateHistogram<double>("chat_response_time_ms", description: "Chat response time in milliseconds");

2. Wire-up the AI Agent:

// Create OpenAI client
var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT");
var apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY");
var deploymentName = "gpt-4o-mini";

using var client = new AzureOpenAIClient(new Uri(endpoint), new AzureKeyCredential(apiKey))
    .GetChatClient(deploymentName)
    .AsIChatClient()
    .AsBuilder()
    .UseOpenTelemetry(sourceName: SourceName, configure: (cfg) => cfg.EnableSensitiveData = true)
    .Build();

logger.LogInformation("Creating Agent with OpenTelemetry instrumentation");

// Create AI Agent
var agent = new ChatClientAgent(
    client,
    name: "AgentObservabilityDemo",
    instructions: "You are a helpful assistant that provides concise and informative responses.")
    .AsBuilder()
    .UseOpenTelemetry(SourceName, configure: (cfg) => cfg.EnableSensitiveData = true)
    .Build();

var thread = agent.GetNewThread();

logger.LogInformation("Agent created successfully with ID: {AgentId}", agent.Id);

3. Instrument Agent logic with semantic attributes and call OpenAI-compatible API:

// Create a parent span for the entire agent session
using var sessionActivity = activitySource.StartActivity("Agent Session");
Console.WriteLine($"Trace ID: {sessionActivity?.TraceId} ");

var sessionId = Guid.NewGuid().ToString("N");
sessionActivity?
    .SetTag("agent.name", "AgentObservabilityDemo")
    .SetTag("session.id", sessionId)
    .SetTag("session.start_time", DateTimeOffset.UtcNow.ToString("O"));

logger.LogInformation("Starting agent session with ID: {SessionId}", sessionId);
using (logger.BeginScope(new Dictionary<string, object> { ["SessionId"] = sessionId, ["AgentName"] = "AgentObservabilityDemo" }))
{
    var interactionCount = 0;
    while (true)
    {
        Console.Write("You (or 'exit' to quit): ");
        var input = Console.ReadLine();

        if (string.IsNullOrWhiteSpace(input) || input.Equals("exit", StringComparison.OrdinalIgnoreCase))
        {
            logger.LogInformation("User requested to exit the session");
            break;
        }

        interactionCount++;
        logger.LogInformation("Processing interaction #{InteractionCount}", interactionCount);

        // Create a child span for each individual interaction
        using var activity = activitySource.StartActivity("Agent Interaction");
        activity?
            .SetTag("user.input", input)
            .SetTag("agent.name", "AgentObservabilityDemo")
            .SetTag("interaction.number", interactionCount);

        var stopwatch = Stopwatch.StartNew();

        try
        {
            logger.LogInformation("Starting agent execution for interaction #{InteractionCount}", interactionCount);

            var response = await agent.RunAsync(input);
            Console.WriteLine($"Agent: {response}");
            Console.WriteLine();

            stopwatch.Stop();
            var responseTimeMs = stopwatch.Elapsed.TotalMilliseconds;

            // Record metrics
            interactionCounter.Add(1, new KeyValuePair<string, object?>("status", "success"));
            responseTimeHistogram.Record(responseTimeMs, new KeyValuePair<string, object?>("status", "success"));

            activity?.SetTag("interaction.status", "success");
            logger.LogInformation("Agent interaction #{InteractionNumber} completed successfully in {ResponseTime:F2} seconds", interactionCount, responseTimeMs);
        }
        catch (Exception ex)
        {
            Console.WriteLine($"Error: {ex.Message}");
            Console.WriteLine();

            stopwatch.Stop();
            var responseTimeMs = stopwatch.Elapsed.TotalSeconds;

            // Record error metrics
            interactionCounter.Add(1, new KeyValuePair<string, object?>("status", "error"));
            responseTimeHistogram.Record(responseTimeMs,
                new KeyValuePair<string, object?>("status", "error"));

            activity?
                .SetTag("response.success", false)
                .SetTag("error.message", ex.Message)
                .SetStatus(ActivityStatusCode.Error, ex.Message);
            logger.LogError(ex, "Agent interaction #{InteractionNumber} failed after {ResponseTime:F2} seconds: {ErrorMessage}",
                interactionCount, responseTimeMs, ex.Message);
        }
    }

    // Add session summary to the parent span
    sessionActivity?
        .SetTag("session.total_interactions", interactionCount)
        .SetTag("session.end_time", DateTimeOffset.UtcNow.ToString("O"));

    logger.LogInformation("Agent session completed. Total interactions: {TotalInteractions}", interactionCount);