modern apps
108 TopicsMonitor AI Agents on App Service with OpenTelemetry and the New Application Insights Agents View
Part 2 of 2: In Blog 1, we deployed a multi-agent travel planner on Azure App Service using the Microsoft Agent Framework (MAF) 1.0 GA. This post dives deep into how we instrumented those agents with OpenTelemetry and lit up the brand-new Agents (Preview) view in Application Insights. π Prerequisite: This post assumes you've followed the guidance in Blog 1 to deploy the multi-agent travel planner to Azure App Service. If you haven't deployed the app yet, start there first β you'll need a running App Service with the agents, Service Bus, Cosmos DB, and Azure OpenAI provisioned before the monitoring steps in this post will work. Deploying Agents Is Only Half the Battle In Blog 1, we walked through deploying a multi-agent travel planning application on Azure App Service. Six specialized agents β a Coordinator, Currency Converter, Weather Advisor, Local Knowledge Expert, Itinerary Planner, and Budget Optimizer β work together to generate comprehensive travel plans. The architecture uses an ASP.NET Core API backed by a WebJob for async processing, Azure Service Bus for messaging, and Azure OpenAI for the brains. But here's the thing: deploying agents to production is only half the battle. Once they're running, you need answers to questions like: Which agent is consuming the most tokens? How long does the Itinerary Planner take compared to the Weather Advisor? Is the Coordinator making too many LLM calls per workflow? When something goes wrong, which agent in the pipeline failed? Traditional APM gives you HTTP latencies and exception rates. That's table stakes. For AI agents, you need to see inside the agent β the model calls, the tool invocations, the token spend. And that's exactly what Application Insights' new Agents (Preview) view delivers, powered by OpenTelemetry and the GenAI semantic conventions. Let's break down how it all works. The Agents (Preview) View in Application Insights Azure Application Insights now includes a dedicated Agents (Preview) blade that provides unified monitoring purpose-built for AI agents. It's not just a generic dashboard β it understands agent concepts natively. Whether your agents are built with Microsoft Agent Framework, Azure AI Foundry, Copilot Studio, or a third-party framework, this view lights up as long as your telemetry follows the GenAI semantic conventions. Here's what you get out of the box: Agent dropdown filter β A dropdown populated by gen_ai.agent.name values from your telemetry. In our travel planner, this shows all six agents: "Travel Planning Coordinator", "Currency Conversion Specialist", "Weather & Packing Advisor", "Local Expert & Cultural Guide", "Itinerary Planning Expert", and "Budget Optimization Specialist". You can filter the entire dashboard to one agent or view them all. Token usage metrics β Visualizations of input and output token consumption, broken down by agent. Instantly see which agents are the most expensive to run. Operational metrics β Latency distributions, error rates, and throughput for each agent. Spot performance regressions before users notice. End-to-end transaction details β Click into any trace to see the full workflow: which agents were invoked, what tools they called, how long each step took. The "simple view" renders agent steps in a story-like format that's remarkably easy to follow. Grafana integration β One-click export to Azure Managed Grafana for custom dashboards and alerting. The key insight: this view isn't magic. It works because the telemetry is structured using well-defined semantic conventions. Let's look at those next. π Docs: Application Insights Agents (Preview) view documentation GenAI Semantic Conventions β The Foundation The entire Agents view is powered by the OpenTelemetry GenAI semantic conventions. These are a standardized set of span attributes that describe AI agent behavior in a way that any observability backend can understand. Think of them as the "contract" between your instrumented code and Application Insights. Let's walk through the key attributes and why each one matters: gen_ai.agent.name This is the human-readable name of the agent. In our travel planner, each agent sets this via the name parameter when constructing the MAF ChatClientAgent β for example, "Weather & Packing Advisor" or "Budget Optimization Specialist" . This is what populates the agent dropdown in the Agents view. Without this attribute, Application Insights would have no way to distinguish one agent from another in your telemetry. It's the single most important attribute for agent-level monitoring. gen_ai.agent.description A brief description of what the agent does. Our Weather Advisor, for example, is described as "Provides weather forecasts, packing recommendations, and activity suggestions based on destination weather conditions." This metadata helps operators and on-call engineers quickly understand an agent's role without diving into source code. It shows up in trace details and helps contextualize what you're looking at when debugging. gen_ai.agent.id A unique identifier for the agent instance. In MAF, this is typically an auto-generated GUID. While gen_ai.agent.name is the human-friendly label, gen_ai.agent.id is the machine-stable identifier. If you rename an agent, the ID stays the same, which is important for tracking agent behavior across code deployments. gen_ai.operation.name The type of operation being performed. Values include "chat" for standard LLM calls and "execute_tool" for tool/function invocations. In our travel planner, when the Weather Advisor calls the GetWeatherForecast function via NWS, or when the Currency Converter calls ConvertCurrency via the Frankfurter API, those tool calls get their own spans with gen_ai.operation.name = "execute_tool" . This lets you measure LLM think-time separately from tool execution time β a critical distinction for performance optimization. gen_ai.request.model / gen_ai.response.model The model used for the request and the model that actually served the response (these can differ when providers do model routing). In our case, both are "gpt-4o" since that's what we deploy via Azure OpenAI. These attributes let you track model usage across agents, spot unexpected model assignments, and correlate performance changes with model updates. gen_ai.usage.input_tokens / gen_ai.usage.output_tokens Token consumption per LLM call. This is what powers the token usage visualizations in the Agents view. The Coordinator agent, which aggregates results from all five specialist agents, tends to have higher output token counts because it's synthesizing a full travel plan. The Currency Converter, which makes focused API calls, uses fewer tokens overall. These attributes let you answer the question "which agent is costing me the most?" β and more importantly, let you set alerts when token usage spikes unexpectedly. gen_ai.system The AI system or provider. In our case, this is "openai" (set by the Azure OpenAI client instrumentation). If you're using multiple AI providers β say, Azure OpenAI for planning and a local model for classification β this attribute lets you filter and compare. Together, these attributes create a rich, structured view of agent behavior that goes far beyond generic tracing. They're the reason Application Insights can render agent-specific dashboards with token breakdowns, latency distributions, and end-to-end workflow views. Without these conventions, all you'd see is opaque HTTP calls to an OpenAI endpoint. π‘ Key takeaway: The GenAI semantic conventions are what transform generic distributed traces into agent-aware observability. They're the bridge between your code and the Agents view. Any framework that emits these attributes β MAF, Semantic Kernel, LangChain β can light up this dashboard. Two Layers of OpenTelemetry Instrumentation Our travel planner sample instruments at two distinct levels, each capturing different aspects of agent behavior. Let's look at both. Layer 1: IChatClient-Level Instrumentation The first layer instruments at the IChatClient level using Microsoft.Extensions.AI . This is where we wrap the Azure OpenAI chat client with OpenTelemetry: var client = new AzureOpenAIClient(azureOpenAIEndpoint, new DefaultAzureCredential()); // Wrap with OpenTelemetry to emit GenAI semantic convention spans return client.GetChatClient(modelDeploymentName).AsIChatClient() .AsBuilder() .UseOpenTelemetry() .Build(); This single .UseOpenTelemetry() call intercepts every LLM call and emits spans with: gen_ai.system β the AI provider (e.g., "openai" ) gen_ai.request.model / gen_ai.response.model β which model was used gen_ai.usage.input_tokens / gen_ai.usage.output_tokens β token consumption per call gen_ai.operation.name β the operation type ( "chat" ) Think of this as the "LLM layer" β it captures what the model is doing regardless of which agent called it. It's model-centric telemetry. Layer 2: Agent-Level Instrumentation The second layer instruments at the agent level using MAF 1.0 GA's built-in OpenTelemetry support. This happens in the BaseAgent class that all our agents inherit from: Agent = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description, tools: chatOptions.Tools?.ToList()) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName) .Build(); The .UseOpenTelemetry(sourceName: AgentName) call on the MAF agent builder emits a different set of spans: gen_ai.agent.name β the human-readable agent name (e.g., "Weather & Packing Advisor" ) gen_ai.agent.description β what the agent does gen_ai.agent.id β the unique agent identifier Agent invocation traces β spans that represent the full lifecycle of an agent call This is the "agent layer" β it captures which agent is doing the work and provides the identity information that powers the Agents view dropdown and per-agent filtering. Why Both Layers? When both layers are active, you get the richest possible telemetry. The agent-level spans nest around the LLM-level spans, creating a trace hierarchy that looks like: Agent: "Weather & Packing Advisor" (gen_ai.agent.name) βββ chat (gen_ai.operation.name) βββ model: gpt-4o, input_tokens: 450, output_tokens: 120 βββ execute_tool: GetWeatherForecast βββ chat (follow-up with tool results) βββ model: gpt-4o, input_tokens: 680, output_tokens: 350 There is a tradeoff: with both layers active, you may see some span duplication since both the IChatClient wrapper and the MAF agent wrapper emit spans for the same underlying LLM call. If you find the telemetry too noisy, you can disable one layer: Agent layer only (remove .UseOpenTelemetry() from the IChatClient ) β You get agent identity but lose per-call token breakdowns. IChatClient layer only (remove .UseOpenTelemetry() from the agent builder) β You get detailed LLM metrics but lose agent identity in the Agents view. For the fullest experience with the Agents (Preview) view, we recommend keeping both layers active. The official sample uses both, and the Agents view is designed to handle the overlapping spans gracefully. π Docs: MAF Observability Guide Exporting Telemetry to Application Insights Emitting OpenTelemetry spans is only useful if they land somewhere you can query them. The good news is that Azure App Service and Application Insights have deep native integration β App Service can auto-instrument your app, forward platform logs, and surface health metrics out of the box. For a full overview of monitoring capabilities, see Monitor Azure App Service. For our AI agent scenario, we go beyond the built-in platform telemetry. We need the GenAI semantic convention spans that we configured in the previous sections to flow into App Insights so the Agents (Preview) view can render them. Our travel planner has two host processes β the ASP.NET Core API and a WebJob β and each requires a slightly different exporter setup. ASP.NET Core API β Azure Monitor OpenTelemetry Distro For the API, it's a single line. The Azure Monitor OpenTelemetry Distro handles everything: // Configure OpenTelemetry with Azure Monitor for traces, metrics, and logs. // The APPLICATIONINSIGHTS_CONNECTION_STRING env var is auto-discovered. builder.Services.AddOpenTelemetry().UseAzureMonitor(); That's it. The distro automatically: Discovers the APPLICATIONINSIGHTS_CONNECTION_STRING environment variable Configures trace, metric, and log exporters to Application Insights Sets up appropriate sampling and batching Registers standard ASP.NET Core HTTP instrumentation This is the recommended approach for any ASP.NET Core application. One NuGet package ( Azure.Monitor.OpenTelemetry.AspNetCore ), one line of code, zero configuration files. WebJob β Manual Exporter Setup The WebJob is a non-ASP.NET Core host (it uses Host.CreateApplicationBuilder ), so the distro's convenience method isn't available. Instead, we configure the exporters explicitly: // Configure OpenTelemetry with Azure Monitor for the WebJob (non-ASP.NET Core host). // The APPLICATIONINSIGHTS_CONNECTION_STRING env var is auto-discovered. builder.Services.AddOpenTelemetry() .ConfigureResource(r => r.AddService("TravelPlanner.WebJob")) .WithTracing(t => t .AddSource("*") .AddAzureMonitorTraceExporter()) .WithMetrics(m => m .AddMeter("*") .AddAzureMonitorMetricExporter()); builder.Logging.AddOpenTelemetry(o => o.AddAzureMonitorLogExporter()); A few things to note: .AddSource("*") β Subscribes to all trace sources, including the ones emitted by MAF's .UseOpenTelemetry(sourceName: AgentName) . In production, you might narrow this to specific source names for performance. .AddMeter("*") β Similarly captures all metrics, including the GenAI metrics emitted by the instrumentation layers. .ConfigureResource(r => r.AddService("TravelPlanner.WebJob")) β Tags all telemetry with the service name so you can distinguish API vs. WebJob telemetry in Application Insights. The connection string is still auto-discovered from the APPLICATIONINSIGHTS_CONNECTION_STRING environment variable β no need to pass it explicitly. The key difference between these two approaches is ceremony, not capability. Both send the same GenAI spans to Application Insights; the Agents view works identically regardless of which exporter setup you use. π Docs: Azure Monitor OpenTelemetry Distro Infrastructure as Code β Provisioning the Monitoring Stack The monitoring infrastructure is provisioned via Bicep modules alongside the rest of the application's Azure resources. Here's how it fits together. Log Analytics Workspace infra/core/monitor/loganalytics.bicep creates the Log Analytics workspace that backs Application Insights: resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2023-09-01' = { name: name location: location tags: tags properties: { sku: { name: 'PerGB2018' } retentionInDays: 30 } } Application Insights infra/core/monitor/appinsights.bicep creates a workspace-based Application Insights resource connected to Log Analytics: resource appInsights 'Microsoft.Insights/components@2020-02-02' = { name: name location: location tags: tags kind: 'web' properties: { Application_Type: 'web' WorkspaceResourceId: logAnalyticsWorkspaceId } } output connectionString string = appInsights.properties.ConnectionString Wiring It All Together In infra/main.bicep , the Application Insights connection string is passed as an app setting to the App Service: appSettings: { APPLICATIONINSIGHTS_CONNECTION_STRING: appInsights.outputs.connectionString // ... other app settings } This is the critical glue: when the app starts, the OpenTelemetry distro (or manual exporters) auto-discover this environment variable and start sending telemetry to your Application Insights resource. No connection strings in code, no configuration files β it's all infrastructure-driven. The same connection string is available to both the API and the WebJob since they run on the same App Service. All agent telemetry from both host processes flows into a single Application Insights resource, giving you a unified view across the entire application. See It in Action Once the application is deployed and processing travel plan requests, here's how to explore the agent telemetry in Application Insights. Step 1: Open the Agents (Preview) View In the Azure portal, navigate to your Application Insights resource. In the left nav, look for Agents (Preview) under the Investigations section. This opens the unified agent monitoring dashboard. Step 2: Filter by Agent The agent dropdown at the top of the page is populated by the gen_ai.agent.name values in your telemetry. You'll see all six agents listed: Travel Planning Coordinator Currency Conversion Specialist Weather & Packing Advisor Local Expert & Cultural Guide Itinerary Planning Expert Budget Optimization Specialist Select a specific agent to filter the entire dashboard β token usage, latency, error rate β down to that one agent. Step 3: Review Token Usage The token usage tile shows total input and output token consumption over your selected time range. Compare agents to find your biggest spenders. In our testing, the Coordinator agent consistently uses the most output tokens because it aggregates and synthesizes results from all five specialists. Step 4: Drill into Traces Click "View Traces with Agent Runs" to see all agent executions. Each row represents a workflow run. You can filter by time range, status (success/failure), and specific agent. Step 5: End-to-End Transaction Details Click any trace to open the end-to-end transaction details. The "simple view" renders the agent workflow as a story β showing each step, which agent handled it, how long it took, and what tools were called. For a full travel plan, you'll see the Coordinator dispatch work to each specialist, tool calls to the NWS weather API and Frankfurter currency API, and the final aggregation step. Grafana Dashboards The Agents (Preview) view in Application Insights is great for ad-hoc investigation. For ongoing monitoring and alerting, Azure Managed Grafana provides prebuilt dashboards specifically designed for agent workloads. From the Agents view, click "Explore in Grafana" to jump directly into these dashboards: Agent Framework Dashboard β Per-agent metrics including token usage trends, latency percentiles, error rates, and throughput over time. Pin this to your operations wall. Agent Framework Workflow Dashboard β Workflow-level metrics showing how multi-agent orchestrations perform end-to-end. See how long complete travel plans take, identify bottleneck agents, and track success rates. These dashboards query the same underlying data in Log Analytics, so there's zero additional instrumentation needed. If your telemetry lights up the Agents view, it lights up Grafana too. Key Packages Summary Here are the NuGet packages that make this work, pulled from the actual project files: Package Version Purpose Azure.Monitor.OpenTelemetry.AspNetCore 1.3.0 Azure Monitor OTEL Distro for ASP.NET Core (API). One-line setup for traces, metrics, and logs. Azure.Monitor.OpenTelemetry.Exporter 1.3.0 Azure Monitor OTEL exporter for non-ASP.NET Core hosts (WebJob). Trace, metric, and log exporters. Microsoft.Agents.AI 1.0.0 MAF 1.0 GA β ChatClientAgent , .UseOpenTelemetry() for agent-level instrumentation. Microsoft.Extensions.AI 10.4.1 IChatClient abstraction with .UseOpenTelemetry() for LLM-level instrumentation. OpenTelemetry.Extensions.Hosting 1.11.2 OTEL dependency injection integration for Host.CreateApplicationBuilder (WebJob). Microsoft.Extensions.AI.OpenAI 10.4.1 OpenAI/Azure OpenAI adapter for IChatClient . Bridges the Azure OpenAI SDK to the M.E.AI abstraction. Wrapping Up Let's zoom out. In this two-part series, we've gone from zero to a fully observable, production-grade multi-agent AI application on Azure App Service: Blog 1 covered deploying the multi-agent travel planner with MAF 1.0 GA β the agents, the architecture, the infrastructure. Blog 2 (this post) showed how to instrument those agents with OpenTelemetry, explained the GenAI semantic conventions that make agent-aware monitoring possible, and walked through the new Agents (Preview) view in Application Insights. The pattern is straightforward: Add .UseOpenTelemetry() at the IChatClient level for LLM metrics. Add .UseOpenTelemetry(sourceName: AgentName) at the MAF agent level for agent identity. Export to Application Insights via the Azure Monitor distro (one line) or manual exporters. Wire the connection string through Bicep and environment variables. Open the Agents (Preview) view and start monitoring. With MAF 1.0 GA's built-in OpenTelemetry support and Application Insights' new Agents view, you get production-grade observability for AI agents with minimal code. The GenAI semantic conventions ensure your telemetry is structured, portable, and understood by any compliant backend. And because it's all standard OpenTelemetry, you're not locked into any single vendor β swap the exporter and your telemetry goes to Jaeger, Grafana, Datadog, or wherever you need it. Now go see what your agents are up to. Resources Sample repository: seligj95/app-service-multi-agent-maf-otel App Insights Agents (Preview) view: Documentation GenAI Semantic Conventions: OpenTelemetry GenAI Registry MAF Observability Guide: Microsoft Agent Framework Observability Azure Monitor OpenTelemetry Distro: Enable OpenTelemetry for .NET Grafana Agent Framework Dashboard: aka.ms/amg/dash/af-agent Grafana Workflow Dashboard: aka.ms/amg/dash/af-workflow Blog 1: Deploy Multi-Agent AI Apps on Azure App Service with MAF 1.0 GA99Views0likes0CommentsBuild Multi-Agent AI Apps on Azure App Service with Microsoft Agent Framework 1.0
A couple of months ago, we published a three-part series showing how to build multi-agent AI systems on Azure App Service using preview packages from the Microsoft Agent Framework (MAF) (formerly AutoGen / Semantic Kernel Agents). The series walked through async processing, the request-reply pattern, and client-side multi-agent orchestration β all running on App Service. Since then, Microsoft Agent Framework has reached 1.0 GA β unifying AutoGen and Semantic Kernel into a single, production-ready agent platform. This post is a fresh start with the GA bits. We'll rebuild our travel-planner sample on the stable API surface, call out the breaking changes from preview, and get you up and running fast. All of the code is in the companion repo: seligj95/app-service-multi-agent-maf-otel. What Changed in MAF 1.0 GA The 1.0 release is more than a version bump. Here's what moved: Unified platform. AutoGen and Semantic Kernel agent capabilities have converged into Microsoft.Agents.AI . One package, one API surface. Stable APIs with long-term support. The 1.0 contract is now locked for servicing. No more preview churn. Breaking change β Instructions on options removed. In preview, you set instructions through ChatClientAgentOptions.Instructions . In GA, pass them directly to the ChatClientAgent constructor. Breaking change β RunAsync parameter rename. The thread parameter is now session (type AgentSession ). If you were using named arguments, this is a compile error. Microsoft.Extensions.AI upgraded. The framework moved from the 9.x preview of Microsoft.Extensions.AI to the stable 10.4.1 release. OpenTelemetry integration built in. The builder pipeline now includes UseOpenTelemetry() out of the box β more on that in Blog 2. Our project references reflect the GA stack: <PackageReference Include="Microsoft.Agents.AI" Version="1.0.0" /> <PackageReference Include="Microsoft.Extensions.AI" Version="10.4.1" /> <PackageReference Include="Azure.AI.OpenAI" Version="2.1.0" /> Why Azure App Service for AI Agents? If you're building with Microsoft Agent Framework, you need somewhere to run your agents. You could reach for Kubernetes, containers, or serverless β but for most agent workloads, Azure App Service is the sweet spot. Here's why: No infrastructure management β App Service is fully managed. No clusters to configure, no container orchestration to learn. Deploy your .NET or Python agent code and it just runs. Always On β Agent workflows can take minutes. App Service's Always On feature (on Premium tiers) ensures your background workers never go cold, so agents are ready to process requests instantly. WebJobs for background processing β Long-running agent workflows don't belong in HTTP request handlers. App Service's built-in WebJob support gives you a dedicated background worker that shares the same deployment, configuration, and managed identity β no separate compute resource needed. Managed Identity everywhere β Zero secrets in your code. App Service's system-assigned managed identity authenticates to Azure OpenAI, Service Bus, Cosmos DB, and Application Insights automatically. No connection strings, no API keys, no rotation headaches. Built-in observability β Native integration with Application Insights and OpenTelemetry means you can see exactly what your agents are doing in production (more on this in Part 2). Enterprise-ready β VNet integration, deployment slots for safe rollouts, custom domains, auto-scaling rules, and built-in authentication. All the things you'll need when your agent POC becomes a production service. Cost-effective β A single P0v4 instance (~$75/month) hosts both your API and WebJob worker. Compare that to running separate container apps or a Kubernetes cluster for the same workload. The bottom line: App Service lets you focus on building your agents, not managing infrastructure. And since MAF supports both .NET and Python β both first-class citizens on App Service β you're covered regardless of your language preference. Architecture Overview The sample is a travel planner that coordinates six specialized agents to build a personalized trip itinerary. Users fill out a form (destination, dates, budget, interests), and the system returns a comprehensive travel plan complete with weather forecasts, currency advice, a day-by-day itinerary, and a budget breakdown. The Six Agents Currency Converter β calls the Frankfurter API for real-time exchange rates Weather Advisor β calls the National Weather Service API for forecasts and packing tips Local Knowledge Expert β cultural insights, customs, and hidden gems Itinerary Planner β day-by-day scheduling with timing and costs Budget Optimizer β allocates spend across categories and suggests savings Coordinator β assembles everything into a polished final plan Four-Phase Workflow Phase Agents Execution 1 β Parallel Gathering Currency, Weather, Local Knowledge Task.WhenAll 2 β Itinerary Itinerary Planner Sequential (uses Phase 1 context) 3 β Budget Budget Optimizer Sequential (uses Phase 2 output) 4 β Assembly Coordinator Final synthesis Infrastructure Azure App Service (P0v4) β hosts the API and a continuous WebJob for background processing Azure Service Bus β decouples the API from heavy AI work (async request-reply) Azure Cosmos DB β stores task state, results, and per-agent chat histories (24-hour TTL) Azure OpenAI (GPT-4o) β powers all agent LLM calls Application Insights + Log Analytics β monitoring and diagnostics ChatClientAgent Deep Dive At the core of every agent is ChatClientAgent from Microsoft.Agents.AI . It wraps an IChatClient (from Microsoft.Extensions.AI ) with instructions, a name, a description, and optionally a set of tools. This is client-side orchestration β you control the chat history, lifecycle, and execution order. No server-side Foundry agent resources are created. Here's the BaseAgent pattern used by all six agents in the sample: // BaseAgent.cs β constructor for agents with tools Agent = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description, tools: chatOptions.Tools?.ToList()) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName) .Build(); Notice the builder pipeline: .AsBuilder().UseOpenTelemetry(...).Build() . This opts every agent into the framework's built-in OpenTelemetry instrumentation with a single line. We'll explore what that telemetry looks like in Blog 2. Invoking an agent is equally straightforward: // BaseAgent.cs β InvokeAsync public async Task<ChatMessage> InvokeAsync( IList<ChatMessage> chatHistory, CancellationToken cancellationToken = default) { var response = await Agent.RunAsync( chatHistory, session: null, options: null, cancellationToken); return response.Messages.LastOrDefault() ?? new ChatMessage(ChatRole.Assistant, "No response generated."); } Key things to note: session: null β this is the renamed parameter (was thread in preview). We pass null because we manage chat history ourselves. The agent receives the full chatHistory list, so context accumulates across turns. Simple agents (Local Knowledge, Itinerary Planner, Budget Optimizer, Coordinator) use the tool-less constructor; agents that call external APIs (Currency, Weather) use the constructor that accepts ChatOptions with tools. Tool Integration Two of our agents β Weather Advisor and Currency Converter β call real external APIs through the MAF tool-calling pipeline. Tools are registered using AIFunctionFactory.Create() from Microsoft.Extensions.AI . Here's how the WeatherAdvisorAgent wires up its tool: // WeatherAdvisorAgent.cs private static ChatOptions CreateChatOptions( IWeatherService weatherService, ILogger logger) { var chatOptions = new ChatOptions { Tools = new List<AITool> { AIFunctionFactory.Create( GetWeatherForecastFunction(weatherService, logger)) } }; return chatOptions; } GetWeatherForecastFunction returns a Func<double, double, int, Task<string>> that the model can call with latitude, longitude, and number of days. Under the hood, it hits the National Weather Service API and returns a formatted forecast string. The Currency Converter follows the same pattern with the Frankfurter API. This is one of the nicest parts of the GA API: you write a plain C# method, wrap it with AIFunctionFactory.Create() , and the framework handles the JSON schema generation, function-call parsing, and response routing automatically. Multi-Phase Workflow Orchestration The TravelPlanningWorkflow class coordinates all six agents. The key insight is that the orchestration is just C# code β no YAML, no graph DSL, no special runtime. You decide when agents run, what context they receive, and how results flow between phases. // Phase 1: Parallel Information Gathering var gatheringTasks = new[] { GatherCurrencyInfoAsync(request, state, progress, cancellationToken), GatherWeatherInfoAsync(request, state, progress, cancellationToken), GatherLocalKnowledgeAsync(request, state, progress, cancellationToken) }; await Task.WhenAll(gatheringTasks); After Phase 1 completes, results are stored in a WorkflowState object β a simple dictionary-backed container that holds per-agent chat histories and contextual data: // WorkflowState.cs public Dictionary<string, object> Context { get; set; } = new(); public Dictionary<string, List<ChatMessage>> AgentChatHistories { get; set; } = new(); Phases 2β4 run sequentially, each pulling context from the previous phase. For example, the Itinerary Planner receives weather and local knowledge gathered in Phase 1: var localKnowledge = state.GetFromContext<string>("LocalKnowledge") ?? ""; var weatherAdvice = state.GetFromContext<string>("WeatherAdvice") ?? ""; var itineraryChatHistory = state.GetChatHistory("ItineraryPlanner"); itineraryChatHistory.Add(new ChatMessage(ChatRole.User, $"Create a detailed {days}-day itinerary for {request.Destination}..." + $"\n\nWEATHER INFORMATION:\n{weatherAdvice}" + $"\n\nLOCAL KNOWLEDGE & TIPS:\n{localKnowledge}")); var itineraryResponse = await _itineraryAgent.InvokeAsync( itineraryChatHistory, cancellationToken); This pattern β parallel fan-out followed by sequential context enrichment β is simple, testable, and easy to extend. Need a seventh agent? Add it to the appropriate phase and wire it into WorkflowState . Async Request-Reply Pattern A multi-agent workflow with six LLM calls (some with tool invocations) can easily run 30β60 seconds. That's well beyond typical HTTP timeout expectations and not a great user experience for a synchronous request. We use the Async Request-Reply pattern to handle this: The API receives the travel plan request and immediately queues a message to Service Bus. It stores an initial task record in Cosmos DB with status queued and returns a taskId to the client. A continuous WebJob (running as a separate process on the same App Service plan) picks up the message, executes the full multi-agent workflow, and writes the result back to Cosmos DB. The client polls the API for status updates until the task reaches completed . This pattern keeps the API responsive, makes the heavy work retriable (Service Bus handles retries and dead-lettering), and lets the WebJob run independently β you can restart it without affecting the API. We covered this pattern in detail in the previous series, so we won't repeat the plumbing here. Deploy with azd The repo is wired up with the Azure Developer CLI for one-command provisioning and deployment: git clone https://github.com/seligj95/app-service-multi-agent-maf-otel.git cd app-service-multi-agent-maf-otel azd auth login azd up azd up provisions the following resources via Bicep: Azure App Service (P0v4 Windows) with a continuous WebJob Azure Service Bus namespace and queue Azure Cosmos DB account, database, and containers Azure AI Services (Azure OpenAI with GPT-4o deployment) Application Insights and Log Analytics workspace Managed Identity with all necessary role assignments After deployment completes, azd outputs the App Service URL. Open it in your browser, fill in the travel form, and watch six agents collaborate on your trip plan in real time. What's Next We now have a production-ready multi-agent app running on App Service with the GA Microsoft Agent Framework. But how do you actually observe what these agents are doing? When six agents are making LLM calls, invoking tools, and passing context between phases β you need visibility into every step. In the next post, we'll dive deep into how we instrumented these agents with OpenTelemetry and the new Agents (Preview) view in Application Insights β giving you full visibility into agent runs, token usage, tool calls, and model performance. You already saw the .UseOpenTelemetry() call in the builder pipeline; Blog 2 shows what that telemetry looks like end to end and how to light up the new Agents experience in the Azure portal. Stay tuned! Resources Sample repo β app-service-multi-agent-maf-otel Microsoft Agent Framework 1.0 GA Announcement Microsoft Agent Framework Documentation Previous Series β Part 3: Client-Side Multi-Agent Orchestration on App Service Microsoft.Extensions.AI Documentation Azure App Service Documentation173Views0likes0CommentsBuild and Host MCP Apps on Azure App Service
MCP Apps are here, and they're a game-changer for building AI tools with interactive UIs. If you've been following the Model Context Protocol (MCP) ecosystem, you've probably heard about the MCP Apps spec β the first official MCP extension that lets your tools return rich, interactive UIs that render directly inside AI chat clients like Claude Desktop, ChatGPT, VS Code Copilot, Goose, and Postman. And here's the best part: you can host them on Azure App Service. In this post, I'll walk you through building a weather widget MCP App and deploying it to App Service. You'll have a production-ready MCP server serving interactive UIs in under 10 minutes. What Are MCP Apps? MCP Apps extend the Model Context Protocol by combining tools (the functions your AI client can call) with UI resources (the interactive interfaces that display the results). The pattern is simple: A tool declares a _meta.ui.resourceUri in its metadata When the tool is invoked, the MCP host fetches that UI resource The UI renders in a sandboxed iframe inside the chat client The key insight? MCP Apps are just web apps β HTML, JavaScript, and CSS served through MCP. And that's exactly what App Service does best. The MCP Apps spec supports cross-client rendering, so the same UI works in Claude Desktop, VS Code Copilot, ChatGPT, and other MCP-enabled clients. Your weather widget, map viewer, or data dashboard becomes a universal component in the AI ecosystem. Why App Service for MCP Apps? Azure App Service is a natural fit for hosting MCP Apps. Here's why: Always On β No cold starts. Your UI resources are served instantly, every time. Easy Auth β Secure your MCP endpoint with Entra ID authentication out of the box, no code required. Custom domains + TLS β Professional MCP server endpoints with your own domain and managed certificates. Deployment slots β Canary and staged rollouts for MCP App updates without downtime. Sidecars β Run backend services (Redis, message queues, monitoring agents) alongside your MCP server. App Insights β Built-in telemetry to see which tools and UIs are being invoked, response times, and error rates. Now, these are all capabilities you can add to a production MCP App, but the sample we're building today keeps things simple. We're focusing on the core pattern: serving MCP tools with interactive UIs from App Service. The production features are there when you need them. When to Use Functions vs App Service for MCP Apps Before we dive into the code, let's talk about Azure Functions. The Functions team has done great work with their MCP Apps quickstart, and if serverless is your preferred model, that's a fantastic option. Functions and App Service both host MCP Apps beautifully β they just serve different needs. Azure Functions Azure App Service Best for New, purpose-built MCP Apps that benefit from serverless scaling MCP Apps that need always-on hosting, persistent state, or are part of larger web apps Scaling Scale to zero, pay per invocation Dedicated plans, always running Cold start Possible (mitigated by premium plan) None (Always On) Deployment azd up with Functions template azd up with App Service template MCP Apps quickstart Available This blog post! Additional capabilities Event-driven triggers, durable functions Easy Auth, custom domains, deployment slots, sidecars Think of it this way: if you're building a new MCP App from scratch and want serverless economics, go with Functions. If you're adding MCP capabilities to an existing web app, need zero cold starts, or want production features like Easy Auth and deployment slots, App Service is your friend. Build the Weather Widget MCP App Let's build a simple MCP App that fetches weather data from the Open-Meteo API and displays it in an interactive widget. The sample uses ASP.NET Core for the MCP server and Vite for the frontend UI. Here's the structure: app-service-mcp-app-sample/ βββ src/ β βββ Program.cs # MCP server setup β βββ WeatherTool.cs # Weather tool with UI metadata β βββ WeatherUIResource.cs # MCP resource serving the UI β βββ WeatherService.cs # Open-Meteo API integration β βββ app/ # Vite frontend (weather widget) β βββ src/ β βββ weather-app.ts # MCP Apps SDK integration βββ .vscode/ β βββ mcp.json # VS Code MCP server config βββ azure.yaml # Azure Developer CLI config βββ infra/ # Bicep infrastructure Program.cs β MCP Server Setup The MCP server is an ASP.NET Core app that registers tools and UI resources: using ModelContextProtocol; var builder = WebApplication.CreateBuilder(args); // Register WeatherService builder.Services.AddSingleton<WeatherService>(sp => new WeatherService(WeatherService.CreateDefaultClient())); // Add MCP Server with HTTP transport, tools, and resources builder.Services.AddMcpServer() .WithHttpTransport(t => t.Stateless = true) .WithTools<WeatherTool>() .WithResources<WeatherUIResource>(); var app = builder.Build(); // Map MCP endpoints (no auth required for this sample) app.MapMcp("/mcp").AllowAnonymous(); app.Run(); AddMcpServer() configures the MCP protocol handler. WithHttpTransport() enables Streamable HTTP with stateless mode (no session management needed). WithTools<WeatherTool>() registers our weather tool, and WithResources<WeatherUIResource>() registers the UI resource that the MCP host will fetch and render. MapMcp("/mcp") maps the MCP endpoint at /mcp . WeatherTool.cs β Tool with UI Metadata The WeatherTool class defines the tool and uses the [McpMeta] attribute to declare a ui metadata block containing the resourceUri . This tells the MCP host where to fetch the interactive UI: using System.ComponentModel; using ModelContextProtocol.Server; [McpServerToolType] public class WeatherTool { private readonly WeatherService _weatherService; public WeatherTool(WeatherService weatherService) { _weatherService = weatherService; } [McpServerTool] [Description("Get current weather for a location via Open-Meteo. Returns weather data that displays in an interactive widget.")] [McpMeta("ui", JsonValue = """{"resourceUri": "ui://weather/index.html"}""")] public async Task<object> GetWeather( [Description("City name to check weather for (e.g., Seattle, New York, Miami)")] string location) { var result = await _weatherService.GetCurrentWeatherAsync(location); return result; } } The key line is the [McpMeta("ui", ...)] attribute. This adds _meta.ui.resourceUri to the tool definition, pointing to the ui://weather/index.html resource. When the AI client calls this tool, the host fetches that resource and renders it in a sandboxed iframe alongside the tool result. WeatherUIResource.cs β UI Resource The UI resource class serves the bundled HTML as an MCP resource with the ui:// scheme and text/html;profile=mcp-app MIME type required by the MCP Apps spec: using ModelContextProtocol.Protocol; using ModelContextProtocol.Server; [McpServerResourceType] public class WeatherUIResource { [McpServerResource( UriTemplate = "ui://weather/index.html", Name = "weather_ui", MimeType = "text/html;profile=mcp-app")] public static ResourceContents GetWeatherUI() { var filePath = Path.Combine( AppContext.BaseDirectory, "app", "dist", "index.html"); var html = File.ReadAllText(filePath); return new TextResourceContents { Uri = "ui://weather/index.html", MimeType = "text/html;profile=mcp-app", Text = html }; } } The [McpServerResource] attribute registers this method as the handler for the ui://weather/index.html resource. When the host fetches it, the bundled single-file HTML (built by Vite) is returned with the correct MIME type. WeatherService.cs β Open-Meteo API Integration The WeatherService class handles geocoding and weather data from the Open-Meteo API. Nothing MCP-specific here β it's just a standard HTTP client that geocodes a city name and fetches current weather observations. The UI Resource (Vite Frontend) The app/ directory contains a TypeScript app built with Vite that renders the weather widget. It uses the @modelcontextprotocol/ext-apps SDK to communicate with the host: import { App } from "@modelcontextprotocol/ext-apps"; const app = new App({ name: "Weather Widget", version: "1.0.0" }); // Handle tool results from the server app.ontoolresult = (params) => { const data = parseToolResultContent(params.content); if (data) render(data); }; // Adapt to host theme (light/dark) app.onhostcontextchanged = (ctx) => { if (ctx.theme) applyTheme(ctx.theme); }; await app.connect(); The SDK's App class handles the postMessage communication with the host. When the tool returns weather data, ontoolresult fires and the widget renders the temperature, conditions, humidity, and wind. The app also adapts to the host's theme so it looks native in both light and dark mode. The frontend is bundled into a single index.html file using Vite and the vite-plugin-singlefile plugin, which inlines all JavaScript and CSS. This makes it easy to serve as a single MCP resource. Run Locally To run the sample locally, you'll need the .NET 9 SDK and Node.js 18+ installed. Clone the repo and run: # Clone the repo git clone https://github.com/seligj95/app-service-mcp-app-sample.git cd app-service-mcp-app-sample # Build the frontend cd src/app npm install npm run build # Run the MCP server cd .. dotnet run The server starts on http://localhost:5000 . Now connect from VS Code Copilot: Open your workspace in VS Code The sample includes a .vscode/mcp.json that configures the local MCP server: { "servers": { "local-mcp-appservice": { "type": "http", "url": "http://localhost:5000/mcp" } } } Open the GitHub Copilot Chat panel Ask: "What's the weather in Seattle?" Copilot will invoke the GetWeather tool, and the interactive weather widget will render inline in the chat: Weather widget MCP App rendering inline in VS Code Copilot Chat Deploy to Azure Deploying to Azure is even easier. The sample includes an azure.yaml file and Bicep templates for App Service, so you can deploy with a single command: cd app-service-mcp-app-sample azd auth login azd up azd up will: Provision an App Service plan and web app in your subscription Build the .NET app and Vite frontend Deploy the app to App Service Output the public MCP endpoint URL After deployment, azd will output a URL like https://app-abc123.azurewebsites.net . Update your .vscode/mcp.json to point to the remote server: { "servers": { "remote-weather-app": { "type": "http", "url": "https://app-abc123.azurewebsites.net/mcp" } } } From that point forward, your MCP App is live. Any AI client that supports MCP Apps can invoke your weather tool and render the interactive widget β no local server required. What's Next? You've now built and deployed an MCP App to Azure App Service. Here's what you can explore next: Read the MCP Apps spec to understand the full capabilities of the extension, including input forms, persistent state, and multi-step workflows. Check out the ext-apps examples on GitHub β there are samples for map viewers, PDF renderers, system monitors, and more. Try the Azure Functions MCP Apps quickstart if you want to build a serverless MCP App. Learn about hosting remote MCP servers in App Service for more patterns and best practices. Clone the sample repo and customize it for your own use cases. And remember: App Service gives you a full production hosting platform for your MCP Apps. You can add Easy Auth to secure your endpoints with Entra ID, wire up App Insights for telemetry, configure custom domains and TLS certificates, and set up deployment slots for blue/green rollouts. These features make App Service a great choice when you're ready to take your MCP App to production. If you build something cool with MCP Apps and App Service, let me know β I'd love to see what you create!173Views0likes0CommentsTake Control of Every Message: Partial Failure Handling for Service Bus Triggers in Azure Functions
The Problem: All-or-Nothing Batch Processing in Azure Service Bus Azure Service Bus is one of the most widely used messaging services for building event-driven applications on Azure. When you use Azure Functions with a Service Bus trigger in batch mode, your function receives multiple messages at once for efficient, high-throughput processing. But what happens when one message in the batch fails? Your function receives a batch of 50 Service Bus messages. 49 process perfectly. 1 fails. What happens? In the default model, the entire batch fails. All 50 messages go back on the queue and get reprocessed, including the 49 that already succeeded. This leads to: Duplicate processing β messages that were already handled successfully get processed again Wasted compute β you pay for re-executing work that already completed Infinite retry loops β if that one "poison" message keeps failing, it blocks the entire batch indefinitely Idempotency burden β your downstream systems must handle duplicates gracefully, adding complexity to every consumer This is the classic all-or-nothing batch failure problem. Azure Functions solves it with per-message settlement. The Solution: Per-Message Settlement for Azure Service Bus Azure Functions gives you direct control over how each individual message is settled in real time, as you process it. Instead of treating the batch as all-or-nothing, you settle each message independently based on its processing outcome. With Service Bus message settlement actions in Azure Functions, you can: Action What It Does Complete Remove the message from the queue (successfully processed) Abandon Release the lock so the message returns to the queue for retry, optionally modifying application properties Dead-letter Move the message to the dead-letter queue (poison message handling) Defer Keep the message in the queue but make it only retrievable by sequence number This means in a batch of 50 messages, you can: Complete 47 that processed successfully Abandon 2 that hit a transient error (with updated retry metadata) Dead-letter 1 that is malformed and will never succeed All in a single function invocation. No reprocessing of successful messages. No building failure response objects. No all-or-nothing. Why This Matters 1. Eliminates Duplicate Processing When you complete messages individually, successfully processed messages are immediately removed from the queue. There's no chance of them being redelivered, even if other messages in the same batch fail. 2. Enables Granular Error Handling Different failures deserve different treatments. A malformed message should be dead-lettered immediately. A message that failed due to a transient database timeout should be abandoned for retry. A message that requires manual intervention should be deferred. Per-message settlement gives you this granularity. 3. Implements Exponential Backoff Without External Infrastructure By combining abandon with modified application properties, you can track retry counts per message and implement exponential backoff patterns directly in your function code, no additional queues or Durable Functions required. 4. Reduces Cost You stop paying for redundant re-execution of already-successful work. In high-throughput systems processing millions of messages, this can be a material cost reduction. 5. Simplifies Idempotency Requirements When successful messages are never redelivered, your downstream systems don't need to guard against duplicates as aggressively. This reduces architectural complexity and potential for bugs. Before: One Message = One Function Invocation Before batch support, there was no cardinality option, Azure Functions processed each Service Bus message as a separate function invocation. If your queue had 50 messages, the runtime spun up 50 individual executions. Single-Message Processing (The Old Way) import { app, InvocationContext } from '@azure/functions'; async function processOrder( message: unknown, // β One message at a time, no batch context: InvocationContext ): Promise<void> { try { const order = message as Order; await processOrder(order); } catch (error) { context.error('Failed to process message:', error); // Message auto-complete by default. throw error; } } app.serviceBusQueue('processOrder', { connection: 'ServiceBusConnection', queueName: 'orders-queue', handler: processOrder, }); What this cost you: 50 messages on the queue Old (single-message) New (batch + settlement) Function invocations 50 separate invocations 1 invocation Connection overhead 50 separate DB/API connections 1 connection, reused across batch Compute cost 50Γ invocation overhead 1Γ invocation overhead Settlement control Binary: throw or don't 4 actions per message Every message paid the full price of a function invocation, startup, connection setup, teardown. At scale (millions of messages/day), this was a significant cost and latency penalty. And when a message failed, your only option was to throw (retry the whole message) or swallow the error (lose it silently). Code Examples Let's see how this looks across all three major Azure Functions language stacks. Node.js (TypeScript with @ azure/functions-extensions-servicebus) import '@azure/functions-extensions-servicebus'; import { app, InvocationContext } from '@azure/functions'; import { ServiceBusMessageContext, messageBodyAsJson } from '@azure/functions-extensions-servicebus'; interface Order { id: string; product: string; amount: number; } export async function processOrderBatch( sbContext: ServiceBusMessageContext, context: InvocationContext ): Promise<void> { const { messages, actions } = sbContext; for (const message of messages) { try { const order = messageBodyAsJson<Order>(message); await processOrder(order); await actions.complete(message); // β Done } catch (error) { context.error(`Failed ${message.messageId}:`, error); await actions.deadletter(message); // β οΈ Poison } } } app.serviceBusQueue('processOrderBatch', { connection: 'ServiceBusConnection', queueName: 'orders-queue', sdkBinding: true, autoCompleteMessages: false, cardinality: 'many', handler: processOrderBatch, }); Key points: Enable sdkBinding: true and autoCompleteMessages: false to gain manual settlement control ServiceBusMessageContext provides both the messages array and actions object Settlement actions: complete(), abandon(), deadletter(), defer() Application properties can be passed to abandon() for retry tracking Built-in helpers like messageBodyAsJson<T>() handle Buffer-to-object parsing Full sample: serviceBusSampleWithComplete Python (V2 Programming Model) import json import logging from typing import List import azure.functions as func import azurefunctions.extensions.bindings.servicebus as servicebus app = func.FunctionApp(http_auth_level=func.AuthLevel.FUNCTION) @app.service_bus_queue_trigger(arg_name="messages", queue_name="orders-queue", connection="SERVICEBUS_CONNECTION", auto_complete_messages=False, cardinality="many") def process_order_batch(messages: List[servicebus.ServiceBusReceivedMessage], message_actions: servicebus.ServiceBusMessageActions): for message in messages: try: order = json.loads(message.body) process_order(order) message_actions.complete(message) # β Done except Exception as e: logging.error(f"Failed {message.message_id}: {e}") message_actions.dead_letter(message) # β οΈ Poison def process_order(order): logging.info(f"Processing order: {order['id']}") Key points: Uses azurefunctions.extensions.bindings.servicebus for SDK-type bindings with ServiceBusReceivedMessage Supports both queue and topic triggers with cardinality="many" for batch processing Each message exposes SDK properties like body, enqueued_time_utc, lock_token, message_id, and sequence_number Full sample: servicebus_samples_settlement .NET (C# Isolated Worker) using Azure.Messaging.ServiceBus; using Microsoft.Azure.Functions.Worker; public class ServiceBusBatchProcessor(ILogger<ServiceBusBatchProcessor> logger) { [Function(nameof(ProcessOrderBatch))] public async Task ProcessOrderBatch( [ServiceBusTrigger("orders-queue", Connection = "ServiceBusConnection")] ServiceBusReceivedMessage[] messages, ServiceBusMessageActions messageActions) { foreach (var message in messages) { try { var order = message.Body.ToObjectFromJson<Order>(); await ProcessOrder(order); await messageActions.CompleteMessageAsync(message); // β Done } catch (Exception ex) { logger.LogError(ex, "Failed {MessageId}", message.MessageId); await messageActions.DeadLetterMessageAsync(message); // β οΈ Poison } } } private Task ProcessOrder(Order order) => Task.CompletedTask; } public record Order(string Id, string Product, decimal Amount); Key points: Inject ServiceBusMessageActions directly alongside the message array Each message is individually settled with CompleteMessageAsync, DeadLetterMessageAsync, or AbandonMessageAsync Application properties can be modified on abandon to track retry metadata Full sample: ServiceBusReceivedMessageFunctions.cs364Views3likes0CommentsHTTP Triggers in Azure SRE Agent: From Jira Ticket to Automated Investigation
Introduction Many teams run their observability, incident management, ticketing, and deployment on platforms outside of AzureβJira, Opsgenie, Grafana, Zendesk, GitLab, Jenkins, Harness, or homegrown internal tools. These are the systems where alerts fire, tickets get filed, deployments happen, and operational decisions are made every day. HTTP Triggers make it easy to connect any of them to Azure SRE Agentβturning events from any platform into automated agent actions with a simple HTTP POST. No manual copy-paste, no context-switching, no delay between detection and response. In this blog, we'll demonstrate by connecting Jira to SRE Agentβso that every new incident ticket automatically triggers an investigation, and the agent posts its findings back to the Jira ticket when it's done. The Scenario: Jira Incident β Automated Investigation Your team manages production applications backed by Azure PostgreSQL Flexible Server. You use Jira for incident tracking. Today, when a P1 or P2 incident is filed, your on-call engineer has to manually triageβreading through the ticket, checking dashboards, querying logs, correlating recent deploymentsβbefore they can even begin working on a fix. Some teams have Jira automations that route or label tickets, but the actual investigation still starts with a human. HTTP Triggers let you bring SRE Agent directly into that existing workflow. Instead of adding another tool for engineers to check, the agent meets them where they already work. Jira ticket created β SRE Agent automatically investigates β Agent writes findings back to Jira The on-call engineer opens the Jira ticket and the investigation is already thereβroot cause analysis, evidence from logs and metrics, and recommended next stepsβposted as a comment by the agent. Here's how to set this up. Architecture Overview Here's the end-to-end flow we'll build: Jira β A new issue is created in your project Logic App β The Jira connector detects the new issue, and the Logic App calls the SRE Agent HTTP Trigger, using Managed Identity for authentication HTTP Trigger β The agent prompt is rendered with the Jira ticket details (key, summary, priority, etc.) via payload placeholders Agent Investigation β The agent uses Jira MCP tools to read the ticket and search related issues, queries Azure logs, metrics, and recent deployments, then posts its findings back to the Jira ticket as a comment How HTTP Triggers Work Every HTTP Trigger you create in Azure SRE Agent exposes a unique webhook URL: https://<your-agent>.<instance>.azuresre.ai/api/v1/httptriggers/trigger/<trigger-id> When an external system sends a POST request to this URL with a JSON payload, the SRE Agent: Validates the trigger exists and is enabled Renders your agent prompt by injecting payload values into {payload.X} placeholders Creates a new investigation thread (or reuses an existing one) Executes the agent with the rendered promptβautonomously or in review mode Records the execution in the trigger's history for auditing Payload Placeholders The real power of HTTP Triggers is in payload placeholders. When you configure a trigger, you write an agent prompt with {payload.X} tokens that get replaced at runtime with values from the incoming JSON. For example, a prompt like: Investigate Jira incident {payload.key}: {payload.summary} (Priority: {payload.priority}) Gets rendered with actual incident data before the agent sees it, giving it immediate context to begin investigating. If your prompt doesn't use any placeholders, the raw JSON payload is automatically appended to the prompt, so the agent always has access to the full context regardless. Thread Modes HTTP Triggers support two thread modes: New Thread (recommended for incidents): Every trigger invocation creates a fresh investigation thread, giving each incident its own isolated workspace Same Thread: All invocations share a single thread, building up a continuous conversationβuseful for accumulating alerts from a single source Authenticating External Platforms The HTTP Trigger endpoint is secured with Azure AD authentication, ensuring only authorized callers can create agent investigation threads. Every request requires a valid bearer token scoped to the SRE Agent's data plane. External platforms like Jira send standard HTTP webhooks and don't natively acquire Azure AD tokens. To bridge this, you can use any Azure service that supports Managed Identity as an intermediaryβthis approach means zero secrets to store or rotate in the external platform. Common options include: Approach Best For Azure Logic Apps Native connectors for many platforms, no code required, visual workflow designer Azure Functions Simple relay with ~15 lines of code, clean URL for any webhook source API Management (APIM) Enterprise environments needing rate limiting, IP filtering, or API key management All three support Managed Identity and can transparently acquire the Azure AD token before forwarding requests to the SRE Agent HTTP Trigger. In this walkthrough, we'll use Azure Logic Apps with the built-in Jira connector. Step-by-Step: Connecting Jira to SRE Agent Prerequisites An Azure SRE Agent resource deployed in your subscription A Jira Cloud project with API token access An Azure subscription for the Logic App Step 1: Set Up the Jira MCP Connector First, let's give the SRE Agent the ability to interact with Jira directly. In your agent's MCP Tool settings, add the Jira connector: Setting Value Package mcp-atlassian (npm, version 2.0.0) Transport STDIO Configure these environment variables: Variable Value ATLASSIAN_BASE_URL https://your-site.atlassian.net ATLASSIAN_EMAIL Your Jira account email ATLASSIAN_API_TOKEN Your Jira API token Once the connector is added, select the specific MCP tools you want the agent to use. The connector provides 18 Jira tools out of 80 available. For our incident investigation workflow, the key tools include: jira-mcp_read_jira_issue β Read details from a Jira issue by issue key jira-mcp_search_jira_issues β Search for Jira issues using JQL (Jira Query Language) jira-mcp_add_jira_comment β Add a comment to a Jira issue (post investigation findings back) jira-mcp_list_jira_projects β List available Jira projects jira-mcp_create_jira_issue β Create a new Jira issue This gives the SRE Agent bidirectional access to Jiraβit can read ticket details, fetch comments, query related issues, and post investigation findings back as comments on the original ticket. This closes the loop so your on-call engineers see the agent's analysis directly in Jira without switching tools. Step 2: Create the HTTP Trigger Navigate to Builder β HTTP Triggers in the SRE Agent UI and click Create. Setting Value Name jira-incident-handler Agent Mode Autonomous Thread Mode New Thread (one investigation per incident) Sub-Agent (optional) Select a specialized incident response agent Agent Prompt: A new Jira incident has been filed that requires investigation: Jira Ticket: {payload.key} Summary: {payload.summary} Priority: {payload.priority} Reporter: {payload.reporter} Description: {payload.description} Jira URL: {payload.ticketUrl} Investigate this incident by: Identifying the affected Azure resources mentioned in the description Querying recent metrics and logs for anomalies Checking for recent deployments or configuration changes Providing a structured analysis with Root Cause, Evidence, and Recommended Actions Once your investigation is complete, use the Jira MCP tools to post a summary of your findings as a comment on the original ticket ({payload.key}). After saving, enable the trigger and open the trigger detail view. Copy the Trigger URLβyou'll need it for the Logic App. Step 3: Create the Azure Logic App In the Azure Portal, create a new Logic App: Setting Value Type Consumption (Multi-tenant, Stateful) Name jira-sre-agent-bridge Region Same region as your SRE Agent (e.g., East US 2) Resource Group Same resource group as your SRE Agent (recommended for simplicity) Step 4: Enable Managed Identity In the Logic App β Identity β System assigned: Set Status to On Click Save Step 5: Assign the SRE Agent Admin Role Navigate to your SRE Agent resource β Access control (IAM) β Add role assignment: Setting Value Role SRE Agent Admin Assign to Managed Identity β select your Logic App This grants the Logic App's Managed Identity the data-plane permissions needed to invoke HTTP Triggers. Important: The Contributor role alone is not sufficient. Contributor covers the Azure control plane, but SRE Agent uses a separate data plane with its own RBAC. The SRE Agent Admin role provides the required data-plane permissions. Step 6: Create the Jira Connection Open the Logic App designer. When adding the Jira trigger, it will prompt you to create a connection: Setting Value Connection name jira-connection Jira instance https://your-site.atlassian.net Email Your Jira email API Token Your Jira API token Step 7: Configure the Logic App Workflow Switch to the Logic App Code view and paste this workflow definition: { "definition": { "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#", "contentVersion": "1.0.0.0", "triggers": { "When_a_new_issue_is_created_(V2)": { "recurrence": { "interval": 3, "frequency": "Minute" }, "splitOn": "@triggerBody()", "type": "ApiConnection", "inputs": { "host": { "connection": { "name": "@parameters('$connections')['jira']['connectionId']" } }, "method": "get", "path": "/v2/new_issue_trigger/search", "queries": { "X-Request-Jirainstance": "https://YOUR-SITE.atlassian.net", "projectKey": "YOUR_PROJECT_ID" } } } }, "actions": { "Call_SRE_Agent_HTTP_Trigger": { "runAfter": {}, "type": "Http", "inputs": { "uri": "https://YOUR-AGENT.azuresre.ai/api/v1/httptriggers/trigger/YOUR-TRIGGER-ID", "method": "POST", "headers": { "Content-Type": "application/json" }, "body": { "key": "@{triggerBody()?['key']}", "summary": "@{triggerBody()?['fields']?['summary']}", "priority": "@{triggerBody()?['fields']?['priority']?['name']}", "reporter": "@{triggerBody()?['fields']?['reporter']?['displayName']}", "description": "@{triggerBody()?['fields']?['description']}", "ticketUrl": "@{concat('https://YOUR-SITE.atlassian.net/browse/', triggerBody()?['key'])}" }, "authentication": { "type": "ManagedServiceIdentity", "audience": "https://azuresre.dev" } } } }, "outputs": {}, "parameters": { "$connections": { "type": "Object", "defaultValue": {} } } }, "parameters": { "$connections": { "type": "Object", "value": { "jira": { "id": "/subscriptions/YOUR-SUB/providers/Microsoft.Web/locations/YOUR-REGION/managedApis/jira", "connectionId": "/subscriptions/YOUR-SUB/resourceGroups/YOUR-RG/providers/Microsoft.Web/connections/jira", "connectionName": "jira" } } } } } Replace the YOUR-* placeholders with your actual values. To find your Jira project ID, navigate to https://your-site.atlassian.net/rest/api/3/project/YOUR-PROJECT-KEY in your browser and find the "id" field in the JSON response. The critical piece is the authentication block: "authentication": { "type": "ManagedServiceIdentity", "audience": "https://azuresre.dev" } This tells the Logic App to automatically acquire an Azure AD token for the SRE Agent data plane and attach it as a Bearer token. No secrets, no expiration management, no manual token refresh. After pasting the JSON and clicking Save, switch back to the Designer view. The Logic App automatically generates the visual workflow from the code β you'll see the Jira trigger ("When a new issue is created (V2)") connected to the HTTP action ("Call SRE Agent HTTP Trigger") as a two-step flow, with all the field mappings and authentication settings already configured What Happens Inside the Agent When the HTTP Trigger fires, the SRE Agent receives a fully contextualized prompt with all the Jira incident data injected: A new Jira incident has been filed that requires investigation: Jira Ticket: KAN-16 Summary: Elevated API Response Times β PostgreSQL Table Lock Causing Request Blocking on Listings Service Priority: High Reporter: Vineela Suri Description: Severity: P2 β High. Affected Service: Production API (octopets-prod-postgres). Impact: End users experience slow or unresponsive listing pages. Jira URL: https://your-site.atlassian.net/browse/KAN-16 Investigate this incident by: Identifying the affected Azure resources mentioned in the description Querying recent metrics and logs for anomalies ... The agent then uses its configured tools to investigateβAzure CLI to query metrics, Kusto to analyze logs, and the Jira MCP connector to read the ticket for additional context. Once the investigation is complete, the agent posts its findings as a comment directly on the Jira ticket, closing the loop without any manual copy-paste. Each execution is recorded in the trigger's history with timestamp, thread ID, success status, duration, and an AI-generated summaryβgiving you full observability into your automated investigation pipeline. Extending to Other Platforms The pattern we built here works for any external platform that isn't natively supported by SRE Agent. The core architecture stays the same: External Platform β Auth Bridge (Managed Identity) β SRE Agent HTTP Trigger You only need to swap the inbound side of the bridge. For example: External Platform Auth Bridge Configuration Jira Logic App with Jira V2 connector (polling) OpsGenie Logic App with OpsGenie connector, or Azure Function relay receiving OpsGenie webhooks Datadog Azure Function relay or APIM policy receiving Datadog webhook notifications Grafana Azure Function relay or APIM policy receiving Grafana alert webhooks Splunk APIM with webhook endpoint and Managed Identity forwarding Custom / Internal tools Logic App HTTP trigger, Azure Function relay, or APIM β any service that supports Managed Identity The SRE Agent HTTP Trigger and the Managed Identity authentication remain the same regardless of the source platform. You configure the trigger once, set up the auth bridge, and connect as many external sources as needed. Each trigger can have its own tailored prompt, sub-agent, and thread mode optimized for the type of incoming event. Key Takeaways HTTP Triggers extend Azure SRE Agent's reach to any external platform: Connect What You Use: If your incident platform isn't natively supported, HTTP Triggers provide the integration pointβno code changes to SRE Agent required Secure by Design: Azure AD authentication with Managed Identity keeps the data plane protected while making integration straightforward through standard Azure services Bidirectional with MCP: Combine HTTP Triggers (inbound) with MCP connectors (outbound) for full round-trip integrationβreceive incidents automatically and post findings back to the source platform Full Observability: Every trigger execution is recorded with timestamps, thread IDs, duration, and AI-generated summaries Flexible Context Injection: Payload placeholders let you craft precise investigation prompts from incident data, while raw payload passthrough ensures the agent always has full context Getting Started HTTP Triggers are available now in the Azure SRE Agent platform: Create a Trigger: Navigate to Builder β HTTP Triggers β Create. Define your agent prompt with {payload.X} placeholders Set Up an Auth Bridge: Use Logic Apps, Azure Functions, or APIM with Managed Identity to handle Azure AD authentication Connect Your Platform: Point your external platform at the bridge and create a test event Within minutes, you'll have an automated pipeline that turns every incident ticket into an AI-driven investigation. Learn More HTTP Triggers Documentation Agent Hooks Blog Post β Governance controls for automated investigations YAML Schema Reference SRE Agent Getting Started Guide Ready to extend your SRE Agent to platforms it doesn't support natively? Set up your first HTTP Trigger today at sre.azure.com.412Views0likes0CommentsMigrating Ant Builds to Maven with GitHub Copilot app modernization
Many legacy Java applications still rely on Apache Ant for building, packaging, and dependency management. While Ant remains flexible, it lacks the structured lifecycle, dependency resolution, and ecosystem support that modern build tools like Maven provide. Migrating from Ant to Maven improves maintainability, build reproducibility, IDE compatibility, and enables modern Java workflows such as dependency upgrades, framework updates, and containerization. GitHub Copilot app modernization accelerates this transition by analyzing an Antβbased project, generating a migration plan, and applying transformations to produce a Mavenβbased build aligned with modern Java tooling. What GitHub Copilot app modernization Supports GitHub Copilot app modernization can help teams: Detect Ant build scripts (build.xml) and related custom task files Recommend Maven project structure and lifecycle alignment Generate an initial pom.xml with matched project metadata Map Ant targets to Maven phases where possible Identify external dependencies and translate them into Maven coordinates Migrate resource directories and compiled output locations Surface code or configuration changes required for a Mavenβdriven build Validate the new Maven configuration through iterative builds This modernizes the build foundation before performing other upgrades such as JDK, Spring, Jakarta, or containerβreadiness transformations. Project Analysis When you open an Antβbased project in Visual Studio Code or IntelliJ IDEA, GitHub Copilot app modernization performs an analysis: Detects build.xml and auxiliary Ant scripts Identifies classpaths defined across Ant targets Evaluates manually referenced JARs in lib directories Inspects source layout and output directories Determines project metadata such as groupId, artifactId, and version Determines whether frameworks or libraries require updates before Maven migration This analysis forms the basis of the migration plan. Migration Plan Generation GitHub Copilot app modernization produces a migration plan that outlines: The recommended Maven project layout (src/main/java, src/test/java, resources directories) A generated pom.xml with discovered dependencies Mapped Ant targets to Maven lifecycle phases (compile, test, package) Plugin configurations needed to replicate custom Ant functionality Suggested removal of lib directory JARs in favor of dependency management Notes on unsupported or manualβreview areas (custom Ant tasks, scriptβheavy targets, specialized packaging logic) You can review and adjust the plan before proceeding. Automated Transformations Once confirmed, GitHub Copilot app modernization applies targeted updates: Generates the projectβs pom.xml Migrates dependency JAR references to Maven dependency entries Moves source and resource files into Mavenβcompatible structure Updates ignore files, build output directories, and paths Introduces common Maven plugins for compiler, surefire, assembly, or shading Suggests replacements for custom Ant tasks if builtβin Maven plugins exist This automated work removes most of the manual lifting normally required for Ant β Maven transitions. Build & Fix Iteration After applying the transformations, the tool attempts to build the new Maven project: Runs the build Captures missing dependencies, incorrect scopes, or misaligned plugin versions Suggests targeted fixes Applies adjustments and rebuilds Iterates until the project compiles or no further automated fixes are possible This helps stabilize the migration quickly. Security & Behavior Validation GitHub Copilot app modernization also performs additional validation: Flags CVEs introduced or resolved through dependency discovery Alerts you to behavioral differences between Antβdriven and Mavenβdriven builds Highlights test failures, packaging differences, or altered classpaths that may need review These findings allow developers to refine the migration safely. Expected Output After the migration, you can expect: A newly generated and fully structured Maven project A populated pom.xml with dependencies, plugins, and metadata Updated project layout aligned with Maven standards Removed or deprecated Ant build files where appropriate Aligned dependency versions ready for further modernization A summary file detailing: Build changes Dependency mappings Code or config adjustments Remaining manual review items Developer Responsibilities While GitHub Copilot app modernization automates the mechanical migration from Ant to Maven, developers remain responsible for: Reviewing tests and build artifacts for behavioral differences Validating packaging steps for WAR/EAR/JAR outputs Replacing complex custom Ant scripts with proper Maven plugins Verifying deployment and CI workflows dependent on Ant build logic Confirming integration points that rely on Antβspecific tasks or ordering Once validated, the Mavenβbased structure becomes a strong foundation for further modernization such as JDK upgrades, Spring migration, Jakarta adoption, and containerization. Learn More For project setup and the complete modernization workflow, refer to the Microsoft Learn guide for upgrading Java projects with GitHub Copilot app modernization. Quickstart: Upgrade a Java Project with GitHub Copilot App Modernization | Microsoft Learn136Views1like0CommentsA Practical Path Forward for Heroku Customers with Azure
On February 6, 2026, Heroku announced it is moving to a sustaining engineering model focused on stability, security, reliability, and ongoing support. Many customers are now reassessing how their application platforms will support todayβs workloads and future innovation. Microsoft is committed to helping customers migrate and modernize applications from platforms like Heroku to Azure.213Views0likes0CommentsIndustry-Wide Certificate Changes Impacting Azure App Service Certificates
Executive Summary In early 2026, industry-wide changes mandated by browser applications and the CA/B Forum will affect both how TLS certificates are issued as well as their validity period. The CA/B Forum is a vendor body that establishes standards for securing websites and online communications through SSL/TLS certificates. Azure App Service is aligning with these standards for both App Service Managed Certificates (ASMC, free, DigiCert-issued) and App Service Certificates (ASC, paid, GoDaddy-issued). Most customers will experience no disruption. Action is required only if you pin certificates or use them for client authentication (mTLS). Update: February 17, 2026 Weβve published new Microsoft Learn documentation, Industry-wide certificate changes impacting Azure App Service , which provides more detailed guidance on these compliance-driven changes. The documentation also includes additional information not previously covered in this blog, such as updates to domain validation reuse, along with an expanding FAQ section. The Microsoft Learn documentation now represents the most complete and up-to-date overview of these changes. Going forward, any new details or clarifications will be published there, and we recommend bookmarking the documentation for the latest guidance. Who Should Read This? App Service administrators Security and compliance teams Anyone responsible for certificate management or application security Quick Reference: Whatβs Changing & What To Do Topic ASMC (Managed, free) ASC (GoDaddy, paid) Required Action New Cert Chain New chain (no action unless pinned) New chain (no action unless pinned) Remove certificate pinning Client Auth EKU Not supported (no action unless cert is used for mTLS) Not supported (no action unless cert is used for mTLS) Transition from mTLS Validity No change (already compliant) Two overlapping certs issued for the full year None (automated) If you do not pin certificates or use them for mTLS, no action is required. Timeline of Key Dates Date Change Action Required Mid-Jan 2026 and after ASMC migrates to new chain ASMC stops supporting client auth EKU Remove certificate pinning if used Transition to alternative authentication if the certificate is used for mTLS Mar 2026 and after ASC validity shortened ASC migrates to new chain ASC stops supporting client auth EKU Remove certificate pinning if used Transition to alternative authentication if the certificate is used for mTLS Actions Checklist For All Users Review your use of App Service certificates. If you do not pin these certificates and do not use them for mTLS, no action is required. If You Pin Certificates (ASMC or ASC) Remove all certificate or chain pinning before their respective key change dates to avoid service disruption. See Best Practices: Certificate Pinning. If You Use Certificates for Client Authentication (mTLS) Switch to an alternative authentication method before their respective key change dates to avoid service disruption, as client authentication EKU will no longer be supported for these certificates. See Sunsetting the client authentication EKU from DigiCert public TLS certificates. See Set Up TLS Mutual Authentication - Azure App Service Details & Rationale Why Are These Changes Happening? These updates are required by major browser programs (e.g., Chrome) and apply to all public CAs. They are designed to enhance security and compliance across the industry. Azure App Service is automating updates to minimize customer impact. Whatβs Changing? New Certificate Chain Certificates will be issued from a new chain to maintain browser trust. Impact: Remove any certificate pinning to avoid disruption. Removal of Client Authentication EKU Newly issued certificates will not support client authentication EKU. This change aligns with Google Chromeβs root program requirements to enhance security. Impact: If you use these certificates for mTLS, transition to an alternate authentication method. Shortening of Certificate Validity Certificate validity is now limited to a maximum of 200 days. Impact: ASMC is already compliant; ASC will automatically issue two overlapping certificates to cover one year. No billing impact. Frequently Asked Questions (FAQs) Will I lose coverage due to shorter validity? No. For App Service Certificate, App Service will issue two certificates to span the full year you purchased. Is this unique to DigiCert and GoDaddy? No. This is an industry-wide change. Do these changes impact certificates from other CAs? Yes. These changes are an industry-wide change. We recommend you reach out to your certificatesβ CA for more information. Do I need to act today? If you do not pin or use these certs for mTLS, no action is required. Glossary ASMC: App Service Managed Certificate (free, DigiCert-issued) ASC: App Service Certificate (paid, GoDaddy-issued) EKU: Extended Key Usage mTLS: Mutual TLS (client certificate authentication) CA/B Forum: Certification Authority/Browser Forum Additional Resources Changes to the Managed TLS Feature Set Up TLS Mutual Authentication Azure App Service Best Practices β Certificate pinning DigiCert Root and Intermediate CA Certificate Updates 2023 Sunsetting the client authentication EKU from DigiCert public TLS certificates Feedback & Support If you have questions or need help, please visit our official support channels or the Microsoft Q&A, where our team and the community can assist you.5.3KViews1like0CommentsFrom Local MCP Server to Hosted Web Agent: App Service Observability, Part 2
In Part 1, we introduced the App Service Observability MCP Server β a proof-of-concept that lets GitHub Copilot (and other AI assistants) query your App Service logs, analyze errors, and help debug issues through natural language. That version runs locally alongside your IDE, and it's great for individual developers who want to investigate their apps without leaving VS Code. A local MCP server is powerful, but it's personal. Your teammate has to clone the repo, configure their IDE, and run it themselves. What if your on-call engineer could just open a browser and start asking questions? What if your whole team had a shared observability assistant β no setup required? In this post, we'll show how we took the same set of MCP tools and wrapped them in a hosted web application β deployed to Azure App Service with a chat UI and a built-in Azure OpenAI agent. We'll cover what changed, what stayed the same, and why this pattern opens the door to far more than just a web app. Quick Recap: The Local MCP Server If you haven't read Part 1, here's the short version: We built an MCP (Model Context Protocol) server that exposes ~15 observability tools for App Service β things like querying Log Analytics, fetching Kudu container logs, analyzing HTTP errors, correlating deployments with failures, and checking logging configurations. You point your AI assistant (GitHub Copilot, Claude, etc.) at the server, and it calls those tools on your behalf to answer questions about your apps. That version: Runs locally on your machine via node Uses stdio transport (your IDE spawns the process) Relies on your Azure credentials ( az login ) β the AI operates with your exact permissions Requires no additional Azure resources It works. It's fast. And for a developer investigating their own apps, it's the simplest path. This is still a perfectly valid way to use the project β nothing about the hosted version replaces it. The Problem: Sharing Is Hard The local MCP server has a limitation: it's tied to one developer's machine and IDE. In practice, this means: On-call engineers need to clone the repo and configure their environment before they can use it Team leads can't point someone at a URL and say "go investigate" Non-IDE users (PMs, support engineers) are left out entirely Consistent configuration (which subscription, which resource group) has to be managed per-person We wanted to keep the same tools and the same observability capabilities, but make them accessible to anyone with a browser. The Solution: Host It on App Service The answer turned out to be straightforward: deploy the MCP server itself to Azure App Service, give it a web frontend, and bring its own AI agent along for the ride. Here's what the hosted version adds on top of the local MCP server: Local MCP Server Hosted Web Agent How it works Runs locally, your IDE's AI calls the tools Deployed to Azure App Service with its own AI agent Interface VS Code, Claude Desktop, or any MCP client Browser-based chat UI Agent Your existing AI assistant (Copilot, Claude, etc.) Built-in Azure OpenAI (GPT-5-mini) Azure resources needed None beyond az login App Service, Azure OpenAI, VNet Best for Individual developers in their IDE Teams who want a shared, centralized tool Authentication Your local az login credentials Managed identity + Easy Auth (Entra ID) Deploy npm install && npm run build azd up The key insight: the MCP tools are identical. Both versions use the exact same set of observability tools β the only difference is who's calling them (your IDE's AI vs. the built-in Azure OpenAI agent) and where the server runs (your laptop vs. App Service). What We Built Architecture βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Web Browser β β React Chat UI β resource selectors, tool steps, markdown responses β ββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββ β HTTP (REST API) βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Azure App Service (Node.js 20) β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Express Server β β β β βββ /api/chat β Agent loop (OpenAI β tool calls β respond) β β β β βββ /api/set-context β Set target app for investigation β β β β βββ /api/resource-groups, /api/apps β Resource discovery β β β β βββ /mcp β MCP protocol endpoint (Streamable HTTP) β β β β βββ / β Static SPA (React chat UI) β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β VNet Integration (snet-app) β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β βΌ βΌ βΌ ββββββββββββββββ βββββββββββββββββββββ ββββββββββββββββββββββ β Azure OpenAI β β Log Analytics / β β ARM API / Kudu β β (GPT-5-mini) β β KQL Queries β β (app metadata, β β Private EP β βββββββββββββββββββββ β container logs) β ββββββββββββββββ ββββββββββββββββββββββ The Express server does double duty: it serves the React chat UI as static files and exposes the MCP endpoint for remote IDE connections. The agent loop is simple β when a user sends a message, the server calls Azure OpenAI, which may request tool calls, the server executes those tools, and the loop continues until the AI has a final answer. Demo The following screenshots show how this app can be used. The first screenshot shows what happens when you ask about a functioning app. You can see the agent made 5 tool calls and was able to give a thorough summary of the current app's status, recent deployments, as well as provide some recommendations for how to improve observability of the app itself. I expanded the tools section so you could see exactly what the agent was doing behind the scenes and get a sense of how it was thinking. At this point, you can proceed to ask more questions about your app if there were other pieces of information you wanted to pull from your logs. I then injected a fault into this app by initiating a deployment pointing to a config file that didn't actually exist. The goal here was to prove that the agent could correlate an application issue to a specific deployment event, something that currently involves manual effort and deep investigation into logs and source code. Having an agent that can do this for you in a matter of seconds saves so much time and effort that could be directed to more important activities and ensures that you find the issue the first time. A few minutes after initiating the bad deployment, I saw that my app was no longer responding. Rather than going to the logs and investigating myself, I asked the agent "I'm getting an application error now, what happened?" I obviously know what happened and what the source of the error was, but let's see if the agent can pick that up. The agent was able to see that something was wrong and then point me in the direction to address the issue. It ran a number of tool calls following our investigation steps called out in the skills file and was successfully able to identify the source of the error. And lastly, I wanted to confirm the error was associated with the recent deployment, something that our agent should be able to do because we built in the tools it needs to be able to corrleate these kinds of events with errors. I asked it directly and here was the response, exactly what I expected to see. Infrastructure (one command) Everything is defined in Bicep and deployed with the Azure Developer CLI: azd up This provisions: App Service Plan (P0v3) with App Service (Node.js 20 LTS, VNet-integrated) Azure OpenAI (GPT-5-mini, Global Standard) with a private endpoint and private DNS zone VNet (10.0.0.0/16) with dedicated subnets for the app and private endpoints Managed Identity with RBAC roles: Reader, Website Contributor, Log Analytics Reader, Cognitive Services OpenAI User No API keys anywhere. The App Service authenticates to Azure OpenAI over a private network using its managed identity. The Chat UI The web interface is designed to get out of the way and let you focus on investigating: Resource group and app dropdowns β Browse your subscription, pick the app you want to investigate Tool step visibility β A collapsible panel shows exactly which tools the agent called, what arguments it used, and how long each took Session management β Start fresh conversations, with confirmation dialogs when switching context mid-investigation Markdown responses β The agent's answers are rendered with full formatting, code blocks, and tables When you first open the app, it auto-discovers your subscription and populates the resource group dropdown. Select an app, hit "Tell me about this app," and the agent starts investigating. Security Since this app has subscription-wide read access to your App Services and Log Analytics workspaces, you should definitely enable authentication. After deploying, configure Easy Auth in the Azure Portal: Go to your App Service β Authentication Click Add identity provider β select Microsoft Entra ID Set Unauthenticated requests to "HTTP 401 Unauthorized" This ensures only authorized members of your organization can access the tool. The connection to Azure OpenAI is secured via a private endpoint β traffic never traverses the public internet. The app authenticates using its managed identity with the Cognitive Services OpenAI User role. What Stayed the Same This is the part worth emphasizing: the core tools didn't change at all. Whether you're using the local MCP server or the hosted web agent, you get the same 15 tools. The Agent Skill (SKILL.md) from Part 1 also carries over. The hosted agent has the same domain expertise for App Service debugging baked into its system prompt β the same debugging workflows, common error patterns, KQL templates, and SKU reference that make the local version effective. The Bigger Picture: It's Not Just a Web App Here's what makes this interesting beyond our specific implementation: the pattern is the point. We took a set of domain-specific tools (App Service observability), wrapped them in a standard protocol (MCP), and showed two ways to use them: Local MCP server β Your IDE's AI calls the tools Hosted web agent β A deployed app with its own AI calls the same tools But those are just two examples. The same tools could power: A Microsoft Teams bot β Your on-call channel gets an observability assistant that anyone can InvalidUsernameβ A Slack integration β Same idea, different platform A CLI agent β A terminal-based chat for engineers who live in the command line An automated monitor β An agent that periodically checks your apps and files alerts An Azure Portal extension β Observability chat embedded directly in the portal experience A mobile app β Check on your apps from your phone during an incident The MCP tools are the foundation. The agent and interface are just the delivery mechanism. Build whatever surface makes sense for your team. This is one of the core ideas behind MCP: write the tools once, use them everywhere. The protocol standardizes how AI assistants discover and call tools, so you're not locked into any single client or agent. Try It Yourself Both versions are open-source: Local MCP server (Part 1): github.com/seligj95/app-service-observability-agent Hosted web agent (Part 2): github.com/seligj95/app-service-observability-agent-hosted To deploy the hosted version: git clone https://github.com/seligj95/app-service-observability-agent-hosted.git cd app-service-observability-agent-hosted azd up To run the local version, see the Getting Started section in Part 1. What's Next? This is still a proof-of-concept, and we're continuing to explore how AI-powered observability can become a first-class part of the App Service platform. Some things we're thinking about: More tools β Resource health, autoscale history, certificate expiration, network diagnostics Multi-app investigations β Correlate issues across multiple apps in a resource group Proactive monitoring β Agents that watch your apps and alert you before users notice Deeper integration β What if every App Service came with a built-in observability endpoint? We'd love your feedback. Try it out, open an issue, or submit a PR if you have ideas for additional tools or debugging patterns. And if you build something interesting on top of these MCP tools β a Teams bot, a CLI agent, anything β we'd love to hear about it.370Views0likes0Comments