observability

34 Topics

eBPF-Powered Observability Beyond Azure: A Multi-Cloud Perspective with Retina
Kubernetes simplifies container orchestration but introduces observability challenges due to dynamic pod lifecycles and complex inter-service communication. eBPF technology addresses these issues by providing deep system insights and efficient monitoring. The open-source Retina project leverages eBPF for comprehensive, cloud-agnostic network observability across AKS, GKE, and EKS, enhancing troubleshooting and optimization through real-world demo scenarios.
Simone_Rodigari
Apr 17, 2025 Place Linux and Open Source Blog
1.1KViews
10likes
0Comments
Enhance Monitoring with Azure Postgres Grafana Dashboard
Enhancing Monitoring: Introducing the New Grafana Dashboard Integration.
varun-dhawan
Nov 30, 2023 Place Microsoft Blog for PostgreSQL
9.3KViews
6likes
5Comments
Discover the Future of Data Engineering with Microsoft Fabric for Technical Students & Entrepreneurs
Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place. This makes it an ideal platform for technical students and entrepreneurial developers looking to streamline their data engineering and analytics workflows.
Lee_Stott
Dec 02, 2023 Place Educator Developer Blog
6.2KViews
4likes
1Comment
Enhanced monitoring metrics for Azure Postgres Flexible Server
Enhanced metrics to provide real-time visibility into the performance of your Azure PostgreSQL database.
varun-dhawan
Jan 09, 2023 Place Microsoft Blog for PostgreSQL
8.2KViews
4likes
0Comments
Foundry Agent Service at Ignite 2025: Simple to Build. Powerful to Deploy. Trusted to Operate.
The upgraded Foundry Agent Service delivers a unified, simplified platform with managed hosting, built-in memory, tool catalogs, and seamless integration with Microsoft Agent Framework. Developers can now deploy agents faster and more securely, leveraging one-click publishing to Microsoft 365 and advanced governance features for streamlined enterprise AI operations.
Yina Arenas
Nov 18, 2025 Place Microsoft Foundry Blog
8.3KViews
3likes
1Comment
Observability for Multi-Agent Systems with Microsoft Agent Framework and Azure AI Foundry
Agentic applications are revolutionizing enterprise automation, but their dynamic toolchains and latent reasoning make them notoriously hard to operate. In this post, you'll learn how to instrument a Microsoft Agent Framework–based service with OpenTelemetry, ship traces to Azure AI Foundry observability, and adopt a practical workflow to debug, evaluate, and improve multi-agent behavior in production. We'll show how to wire spans around reasoning steps and tool calls (OpenAPI / MCP), enabling deep visibility into your agentic workflows. Who Should Read This? Developers building agents with Microsoft Agent Framework (MAF) in .NET or Python Architects/SREs seeking enterprise-grade visibility, governance, and reliability for deployments on Azure AI Foundry Why Observability Is Non-Negotiable for Agents Traditional logs fall short for agentic systems: Reasoning and routing (which tool? which doc?) are opaque without explicit spans/events Failures often occur between components (e.g., retrieval mismatch, tool schema drift) Without traces across agents ⇄ tools ⇄ data stores, you can't reproduce or evaluate behavior Microsoft has introduced multi-agent observability patterns and OpenTelemetry (OTel) conventions that unify traces across Agent Framework, Foundry, and popular stacks—so you can see one coherent timeline for each task. Reference Architecture Key Capabilities Agent orchestration & deployment via Microsoft Agent Framework Model access using Foundry’s OpenAI-compatible endpoint OpenTelemetry for traces/spans + attributes (agent, tool, retrieval, latency, tokens) Step-by-Step Implementation Assumption: This article uses Azure Monitor (via Application Insights) as the OpenTelemetry exporter, but you can configure other supported exporters in the same way. Prerequisites .NET 8 SDK or later Azure OpenAI service (endpoint, API key, deployed model) Application Insights and Grafana Create an Agent with OpenTelemetry (ASP.NET Core or Console App) Install required packages: dotnet add package Azure.AI.OpenAI dotnet add package Azure.Monitor.OpenTelemetry.Exporter dotnet add package Microsoft.Agents.AI.OpenAI dotnet add package Microsoft.Extensions.Logging dotnet add package OpenTelemetry dotnet add package OpenTelemetry.Trace dotnet add package OpenTelemetry.Metrics dotnet add package OpenTelemetry.Extensions.Hosting dotnet add package OpenTelemetry.Instrumentation.Http Setup environment variables: AZURE_OPENAI_ENDPOINT: https://<your_service_name>.openai.azure.com/ AZURE_OPENAI_API_KEY: <your_azure_openai_apikey> APPLICATIONINSIGHTS_CONNECTION_STRING: <your_application_insights_connectionstring_for_azuremonitor_exporter> Configure tracing once at startup: var applicationInsightsConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING"); // Create a resource describing the service var resource = ResourceBuilder.CreateDefault() .AddService(serviceName: ServiceName) .AddAttributes(new Dictionary<string, object> { ["deployment.environment"] = "development", ["service.instance.id"] = Environment.MachineName }) .Build(); // Setup OpenTelemetry TracerProvider var traceProvider = Sdk.CreateTracerProviderBuilder() .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName)) .AddSource(SourceName) .AddSource("Microsoft.Agents.AI") .AddHttpClientInstrumentation() .AddAzureMonitorTraceExporter(options => { options.ConnectionString = applicationInsightsConnectionString; }) .Build(); // Setup OpenTelemetry MeterProvider var meterProvider = Sdk.CreateMeterProviderBuilder() .SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName)) .AddMeter(SourceName) .AddAzureMonitorMetricExporter(options => { options.ConnectionString = applicationInsightsConnectionString; }) .Build(); // Configure DI and OpenTelemetry var serviceCollection = new ServiceCollection(); // Setup Logging with OpenTelemetry and Application Insights serviceCollection.AddLogging(loggingBuilder => { loggingBuilder.SetMinimumLevel(LogLevel.Debug); loggingBuilder.AddOpenTelemetry(options => { options.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService(ServiceName)); options.IncludeScopes = true; options.IncludeFormattedMessage = true; options.AddAzureMonitorLogExporter(exporterOptions => { exporterOptions.ConnectionString = applicationInsightsConnectionString; }); }); loggingBuilder.AddApplicationInsights( configureTelemetryConfiguration: (config) => { config.ConnectionString = Environment.GetEnvironmentVariable("APPLICATIONINSIGHTS_CONNECTION_STRING"); }, configureApplicationInsightsLoggerOptions: options => { options.TrackExceptionsAsExceptionTelemetry = true; options.IncludeScopes = true; }); }); Configure custom metrics and activity source for tracing: using var activitySource = new ActivitySource(SourceName); using var meter = new Meter(SourceName); // Create custom metrics var interactionCounter = meter.CreateCounter<long>("chat_interactions_total", description: "Total number of chat interactions"); var responseTimeHistogram = meter.CreateHistogram<double>("chat_response_time_ms", description: "Chat response time in milliseconds"); 2. Wire-up the AI Agent: // Create OpenAI client var endpoint = Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT"); var apiKey = Environment.GetEnvironmentVariable("AZURE_OPENAI_API_KEY"); var deploymentName = "gpt-4o-mini"; using var client = new AzureOpenAIClient(new Uri(endpoint), new AzureKeyCredential(apiKey)) .GetChatClient(deploymentName) .AsIChatClient() .AsBuilder() .UseOpenTelemetry(sourceName: SourceName, configure: (cfg) => cfg.EnableSensitiveData = true) .Build(); logger.LogInformation("Creating Agent with OpenTelemetry instrumentation"); // Create AI Agent var agent = new ChatClientAgent( client, name: "AgentObservabilityDemo", instructions: "You are a helpful assistant that provides concise and informative responses.") .AsBuilder() .UseOpenTelemetry(SourceName, configure: (cfg) => cfg.EnableSensitiveData = true) .Build(); var thread = agent.GetNewThread(); logger.LogInformation("Agent created successfully with ID: {AgentId}", agent.Id); 3. Instrument Agent logic with semantic attributes and call OpenAI-compatible API: // Create a parent span for the entire agent session using var sessionActivity = activitySource.StartActivity("Agent Session"); Console.WriteLine($"Trace ID: {sessionActivity?.TraceId} "); var sessionId = Guid.NewGuid().ToString("N"); sessionActivity? .SetTag("agent.name", "AgentObservabilityDemo") .SetTag("session.id", sessionId) .SetTag("session.start_time", DateTimeOffset.UtcNow.ToString("O")); logger.LogInformation("Starting agent session with ID: {SessionId}", sessionId); using (logger.BeginScope(new Dictionary<string, object> { ["SessionId"] = sessionId, ["AgentName"] = "AgentObservabilityDemo" })) { var interactionCount = 0; while (true) { Console.Write("You (or 'exit' to quit): "); var input = Console.ReadLine(); if (string.IsNullOrWhiteSpace(input) || input.Equals("exit", StringComparison.OrdinalIgnoreCase)) { logger.LogInformation("User requested to exit the session"); break; } interactionCount++; logger.LogInformation("Processing interaction #{InteractionCount}", interactionCount); // Create a child span for each individual interaction using var activity = activitySource.StartActivity("Agent Interaction"); activity? .SetTag("user.input", input) .SetTag("agent.name", "AgentObservabilityDemo") .SetTag("interaction.number", interactionCount); var stopwatch = Stopwatch.StartNew(); try { logger.LogInformation("Starting agent execution for interaction #{InteractionCount}", interactionCount); var response = await agent.RunAsync(input); Console.WriteLine($"Agent: {response}"); Console.WriteLine(); stopwatch.Stop(); var responseTimeMs = stopwatch.Elapsed.TotalMilliseconds; // Record metrics interactionCounter.Add(1, new KeyValuePair<string, object?>("status", "success")); responseTimeHistogram.Record(responseTimeMs, new KeyValuePair<string, object?>("status", "success")); activity?.SetTag("interaction.status", "success"); logger.LogInformation("Agent interaction #{InteractionNumber} completed successfully in {ResponseTime:F2} seconds", interactionCount, responseTimeMs); } catch (Exception ex) { Console.WriteLine($"Error: {ex.Message}"); Console.WriteLine(); stopwatch.Stop(); var responseTimeMs = stopwatch.Elapsed.TotalSeconds; // Record error metrics interactionCounter.Add(1, new KeyValuePair<string, object?>("status", "error")); responseTimeHistogram.Record(responseTimeMs, new KeyValuePair<string, object?>("status", "error")); activity? .SetTag("response.success", false) .SetTag("error.message", ex.Message) .SetStatus(ActivityStatusCode.Error, ex.Message); logger.LogError(ex, "Agent interaction #{InteractionNumber} failed after {ResponseTime:F2} seconds: {ErrorMessage}", interactionCount, responseTimeMs, ex.Message); } } // Add session summary to the parent span sessionActivity? .SetTag("session.total_interactions", interactionCount) .SetTag("session.end_time", DateTimeOffset.UtcNow.ToString("O")); logger.LogInformation("Agent session completed. Total interactions: {TotalInteractions}", interactionCount); Azure Monitor dashboard Once you run the agent and generate some traffic, your dashboard in Azure Monitor will be populated as shown below: You can drill down to specific service / activity source / spans by applying relevant filters: Key Features Demonstrated OpenTelemetry instrumentation with Microsoft Agent framework Custom metrics for user interactions End-to-end Telemetry correlation Real time telemetry visualization along with metrics and logging interactions Further reading Introducing Microsoft Agent Framework Azure AI Foundry docs OpenTelemetry Aspire Demo with Azure OpenAI
Shah_Viral
Nov 11, 2025 Place Microsoft Foundry Blog
1.4KViews
3likes
0Comments
Scale Monitoring with Azure PostgreSQL Multi-Resource Alerts
Leveraging Azure PostgreSQL Multi-Resource Alerts for Scalable Monitoring - A Comprehensive Guide
varun-dhawan
Jul 06, 2023 Place Microsoft Blog for PostgreSQL
5.8KViews
3likes
0Comments
Gain full observability into Windows containers on Azure Kubernetes Service using Datadog
This blog showcase how to deploy Datadog for observability with Windows containers on Azure Kubernetes Service.
ViniciusApolinario
Jun 22, 2023 Place Containers
7.4KViews
3likes
0Comments
Query Performance Insight - Start monitoring in minutes
Boost Database Performance with Query Performance Insight in Azure Database for PostgreSQL Flexible Server
varun-dhawan
Apr 05, 2023 Place Microsoft Blog for PostgreSQL
9.4KViews
3likes
0Comments
Announcing new security and observability features in Azure Database for PostgreSQL
Today we’re excited to announce new enterprise grade security features available in Azure Database for PostgreSQL – Flexible Server to give developers more control and peace of mind when developing secure applications in Azure
Sunil_Agarwal
Mar 07, 2023 Place Microsoft Blog for PostgreSQL
4.6KViews
3likes
0Comments