azure paas
183 TopicsA simpler way to deploy your code to Azure App Service for Linux
We’ve added a new deployment experience for Azure App Service for Linux that makes it easier to get your code running on your web app. To get started, go to the Kudu/SCM site for your app: <sitename>.scm.azurewebsites.net From there, open the new Deployments experience. You can now deploy your app by simply dragging and dropping a zip file containing your code. Once your file is uploaded, App Service shows you the contents of the zip so you can quickly verify what you’re about to deploy. If your application is already built and ready to run, you also have the option to skip server-side build. Otherwise, App Service can handle the build step for you. When you’re ready, select Deploy. From there, the deployment starts right away, and you can follow each phase of the process as it happens. The experience shows clear progress through upload, build, and deployment, along with deployment logs to help you understand what’s happening behind the scenes. After the deployment succeeds, you can also view runtime logs, which makes it easier to confirm that your app has started successfully. This experience is ideal if you’re getting started with Azure App Service and want the quickest path from code to a running app. For production workloads and teams with established release processes, you’ll typically continue using an automated CI/CD pipeline (for example, GitHub Actions or Azure DevOps) for repeatable deployments. We’re continuing to improve the developer experience on App Service for Linux. Give it a try and let us know what you think.102Views1like0CommentsMonitor AI Agents on App Service with OpenTelemetry and the New Application Insights Agents View
Part 2 of 2: In Blog 1, we deployed a multi-agent travel planner on Azure App Service using the Microsoft Agent Framework (MAF) 1.0 GA. This post dives deep into how we instrumented those agents with OpenTelemetry and lit up the brand-new Agents (Preview) view in Application Insights. 📋 Prerequisite: This post assumes you've followed the guidance in Blog 1 to deploy the multi-agent travel planner to Azure App Service. If you haven't deployed the app yet, start there first — you'll need a running App Service with the agents, Service Bus, Cosmos DB, and Azure OpenAI provisioned before the monitoring steps in this post will work. Deploying Agents Is Only Half the Battle In Blog 1, we walked through deploying a multi-agent travel planning application on Azure App Service. Six specialized agents — a Coordinator, Currency Converter, Weather Advisor, Local Knowledge Expert, Itinerary Planner, and Budget Optimizer — work together to generate comprehensive travel plans. The architecture uses an ASP.NET Core API backed by a WebJob for async processing, Azure Service Bus for messaging, and Azure OpenAI for the brains. But here's the thing: deploying agents to production is only half the battle. Once they're running, you need answers to questions like: Which agent is consuming the most tokens? How long does the Itinerary Planner take compared to the Weather Advisor? Is the Coordinator making too many LLM calls per workflow? When something goes wrong, which agent in the pipeline failed? Traditional APM gives you HTTP latencies and exception rates. That's table stakes. For AI agents, you need to see inside the agent — the model calls, the tool invocations, the token spend. And that's exactly what Application Insights' new Agents (Preview) view delivers, powered by OpenTelemetry and the GenAI semantic conventions. Let's break down how it all works. The Agents (Preview) View in Application Insights Azure Application Insights now includes a dedicated Agents (Preview) blade that provides unified monitoring purpose-built for AI agents. It's not just a generic dashboard — it understands agent concepts natively. Whether your agents are built with Microsoft Agent Framework, Azure AI Foundry, Copilot Studio, or a third-party framework, this view lights up as long as your telemetry follows the GenAI semantic conventions. Here's what you get out of the box: Agent dropdown filter — A dropdown populated by gen_ai.agent.name values from your telemetry. In our travel planner, this shows all six agents: "Travel Planning Coordinator", "Currency Conversion Specialist", "Weather & Packing Advisor", "Local Expert & Cultural Guide", "Itinerary Planning Expert", and "Budget Optimization Specialist". You can filter the entire dashboard to one agent or view them all. Token usage metrics — Visualizations of input and output token consumption, broken down by agent. Instantly see which agents are the most expensive to run. Operational metrics — Latency distributions, error rates, and throughput for each agent. Spot performance regressions before users notice. End-to-end transaction details — Click into any trace to see the full workflow: which agents were invoked, what tools they called, how long each step took. The "simple view" renders agent steps in a story-like format that's remarkably easy to follow. Grafana integration — One-click export to Azure Managed Grafana for custom dashboards and alerting. The key insight: this view isn't magic. It works because the telemetry is structured using well-defined semantic conventions. Let's look at those next. 📖 Docs: Application Insights Agents (Preview) view documentation GenAI Semantic Conventions — The Foundation The entire Agents view is powered by the OpenTelemetry GenAI semantic conventions. These are a standardized set of span attributes that describe AI agent behavior in a way that any observability backend can understand. Think of them as the "contract" between your instrumented code and Application Insights. Let's walk through the key attributes and why each one matters: gen_ai.agent.name This is the human-readable name of the agent. In our travel planner, each agent sets this via the name parameter when constructing the MAF ChatClientAgent — for example, "Weather & Packing Advisor" or "Budget Optimization Specialist" . This is what populates the agent dropdown in the Agents view. Without this attribute, Application Insights would have no way to distinguish one agent from another in your telemetry. It's the single most important attribute for agent-level monitoring. gen_ai.agent.description A brief description of what the agent does. Our Weather Advisor, for example, is described as "Provides weather forecasts, packing recommendations, and activity suggestions based on destination weather conditions." This metadata helps operators and on-call engineers quickly understand an agent's role without diving into source code. It shows up in trace details and helps contextualize what you're looking at when debugging. gen_ai.agent.id A unique identifier for the agent instance. In MAF, this is typically an auto-generated GUID. While gen_ai.agent.name is the human-friendly label, gen_ai.agent.id is the machine-stable identifier. If you rename an agent, the ID stays the same, which is important for tracking agent behavior across code deployments. gen_ai.operation.name The type of operation being performed. Values include "chat" for standard LLM calls and "execute_tool" for tool/function invocations. In our travel planner, when the Weather Advisor calls the GetWeatherForecast function via NWS, or when the Currency Converter calls ConvertCurrency via the Frankfurter API, those tool calls get their own spans with gen_ai.operation.name = "execute_tool" . This lets you measure LLM think-time separately from tool execution time — a critical distinction for performance optimization. gen_ai.request.model / gen_ai.response.model The model used for the request and the model that actually served the response (these can differ when providers do model routing). In our case, both are "gpt-4o" since that's what we deploy via Azure OpenAI. These attributes let you track model usage across agents, spot unexpected model assignments, and correlate performance changes with model updates. gen_ai.usage.input_tokens / gen_ai.usage.output_tokens Token consumption per LLM call. This is what powers the token usage visualizations in the Agents view. The Coordinator agent, which aggregates results from all five specialist agents, tends to have higher output token counts because it's synthesizing a full travel plan. The Currency Converter, which makes focused API calls, uses fewer tokens overall. These attributes let you answer the question "which agent is costing me the most?" — and more importantly, let you set alerts when token usage spikes unexpectedly. gen_ai.system The AI system or provider. In our case, this is "openai" (set by the Azure OpenAI client instrumentation). If you're using multiple AI providers — say, Azure OpenAI for planning and a local model for classification — this attribute lets you filter and compare. Together, these attributes create a rich, structured view of agent behavior that goes far beyond generic tracing. They're the reason Application Insights can render agent-specific dashboards with token breakdowns, latency distributions, and end-to-end workflow views. Without these conventions, all you'd see is opaque HTTP calls to an OpenAI endpoint. 💡 Key takeaway: The GenAI semantic conventions are what transform generic distributed traces into agent-aware observability. They're the bridge between your code and the Agents view. Any framework that emits these attributes — MAF, Semantic Kernel, LangChain — can light up this dashboard. Two Layers of OpenTelemetry Instrumentation Our travel planner sample instruments at two distinct levels, each capturing different aspects of agent behavior. Let's look at both. Layer 1: IChatClient-Level Instrumentation The first layer instruments at the IChatClient level using Microsoft.Extensions.AI . This is where we wrap the Azure OpenAI chat client with OpenTelemetry: var client = new AzureOpenAIClient(azureOpenAIEndpoint, new DefaultAzureCredential()); // Wrap with OpenTelemetry to emit GenAI semantic convention spans return client.GetChatClient(modelDeploymentName).AsIChatClient() .AsBuilder() .UseOpenTelemetry() .Build(); This single .UseOpenTelemetry() call intercepts every LLM call and emits spans with: gen_ai.system — the AI provider (e.g., "openai" ) gen_ai.request.model / gen_ai.response.model — which model was used gen_ai.usage.input_tokens / gen_ai.usage.output_tokens — token consumption per call gen_ai.operation.name — the operation type ( "chat" ) Think of this as the "LLM layer" — it captures what the model is doing regardless of which agent called it. It's model-centric telemetry. Layer 2: Agent-Level Instrumentation The second layer instruments at the agent level using MAF 1.0 GA's built-in OpenTelemetry support. This happens in the BaseAgent class that all our agents inherit from: Agent = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description, tools: chatOptions.Tools?.ToList()) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName) .Build(); The .UseOpenTelemetry(sourceName: AgentName) call on the MAF agent builder emits a different set of spans: gen_ai.agent.name — the human-readable agent name (e.g., "Weather & Packing Advisor" ) gen_ai.agent.description — what the agent does gen_ai.agent.id — the unique agent identifier Agent invocation traces — spans that represent the full lifecycle of an agent call This is the "agent layer" — it captures which agent is doing the work and provides the identity information that powers the Agents view dropdown and per-agent filtering. Why Both Layers? When both layers are active, you get the richest possible telemetry. The agent-level spans nest around the LLM-level spans, creating a trace hierarchy that looks like: Agent: "Weather & Packing Advisor" (gen_ai.agent.name) └── chat (gen_ai.operation.name) ├── model: gpt-4o, input_tokens: 450, output_tokens: 120 └── execute_tool: GetWeatherForecast └── chat (follow-up with tool results) └── model: gpt-4o, input_tokens: 680, output_tokens: 350 There is a tradeoff: with both layers active, you may see some span duplication since both the IChatClient wrapper and the MAF agent wrapper emit spans for the same underlying LLM call. If you find the telemetry too noisy, you can disable one layer: Agent layer only (remove .UseOpenTelemetry() from the IChatClient ) — You get agent identity but lose per-call token breakdowns. IChatClient layer only (remove .UseOpenTelemetry() from the agent builder) — You get detailed LLM metrics but lose agent identity in the Agents view. For the fullest experience with the Agents (Preview) view, we recommend keeping both layers active. The official sample uses both, and the Agents view is designed to handle the overlapping spans gracefully. 📖 Docs: MAF Observability Guide Exporting Telemetry to Application Insights Emitting OpenTelemetry spans is only useful if they land somewhere you can query them. The good news is that Azure App Service and Application Insights have deep native integration — App Service can auto-instrument your app, forward platform logs, and surface health metrics out of the box. For a full overview of monitoring capabilities, see Monitor Azure App Service. For our AI agent scenario, we go beyond the built-in platform telemetry. We need the GenAI semantic convention spans that we configured in the previous sections to flow into App Insights so the Agents (Preview) view can render them. Our travel planner has two host processes — the ASP.NET Core API and a WebJob — and each requires a slightly different exporter setup. ASP.NET Core API — Azure Monitor OpenTelemetry Distro For the API, it's a single line. The Azure Monitor OpenTelemetry Distro handles everything: // Configure OpenTelemetry with Azure Monitor for traces, metrics, and logs. // The APPLICATIONINSIGHTS_CONNECTION_STRING env var is auto-discovered. builder.Services.AddOpenTelemetry().UseAzureMonitor(); That's it. The distro automatically: Discovers the APPLICATIONINSIGHTS_CONNECTION_STRING environment variable Configures trace, metric, and log exporters to Application Insights Sets up appropriate sampling and batching Registers standard ASP.NET Core HTTP instrumentation This is the recommended approach for any ASP.NET Core application. One NuGet package ( Azure.Monitor.OpenTelemetry.AspNetCore ), one line of code, zero configuration files. WebJob — Manual Exporter Setup The WebJob is a non-ASP.NET Core host (it uses Host.CreateApplicationBuilder ), so the distro's convenience method isn't available. Instead, we configure the exporters explicitly: // Configure OpenTelemetry with Azure Monitor for the WebJob (non-ASP.NET Core host). // The APPLICATIONINSIGHTS_CONNECTION_STRING env var is auto-discovered. builder.Services.AddOpenTelemetry() .ConfigureResource(r => r.AddService("TravelPlanner.WebJob")) .WithTracing(t => t .AddSource("*") .AddAzureMonitorTraceExporter()) .WithMetrics(m => m .AddMeter("*") .AddAzureMonitorMetricExporter()); builder.Logging.AddOpenTelemetry(o => o.AddAzureMonitorLogExporter()); A few things to note: .AddSource("*") — Subscribes to all trace sources, including the ones emitted by MAF's .UseOpenTelemetry(sourceName: AgentName) . In production, you might narrow this to specific source names for performance. .AddMeter("*") — Similarly captures all metrics, including the GenAI metrics emitted by the instrumentation layers. .ConfigureResource(r => r.AddService("TravelPlanner.WebJob")) — Tags all telemetry with the service name so you can distinguish API vs. WebJob telemetry in Application Insights. The connection string is still auto-discovered from the APPLICATIONINSIGHTS_CONNECTION_STRING environment variable — no need to pass it explicitly. The key difference between these two approaches is ceremony, not capability. Both send the same GenAI spans to Application Insights; the Agents view works identically regardless of which exporter setup you use. 📖 Docs: Azure Monitor OpenTelemetry Distro Infrastructure as Code — Provisioning the Monitoring Stack The monitoring infrastructure is provisioned via Bicep modules alongside the rest of the application's Azure resources. Here's how it fits together. Log Analytics Workspace infra/core/monitor/loganalytics.bicep creates the Log Analytics workspace that backs Application Insights: resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2023-09-01' = { name: name location: location tags: tags properties: { sku: { name: 'PerGB2018' } retentionInDays: 30 } } Application Insights infra/core/monitor/appinsights.bicep creates a workspace-based Application Insights resource connected to Log Analytics: resource appInsights 'Microsoft.Insights/components@2020-02-02' = { name: name location: location tags: tags kind: 'web' properties: { Application_Type: 'web' WorkspaceResourceId: logAnalyticsWorkspaceId } } output connectionString string = appInsights.properties.ConnectionString Wiring It All Together In infra/main.bicep , the Application Insights connection string is passed as an app setting to the App Service: appSettings: { APPLICATIONINSIGHTS_CONNECTION_STRING: appInsights.outputs.connectionString // ... other app settings } This is the critical glue: when the app starts, the OpenTelemetry distro (or manual exporters) auto-discover this environment variable and start sending telemetry to your Application Insights resource. No connection strings in code, no configuration files — it's all infrastructure-driven. The same connection string is available to both the API and the WebJob since they run on the same App Service. All agent telemetry from both host processes flows into a single Application Insights resource, giving you a unified view across the entire application. See It in Action Once the application is deployed and processing travel plan requests, here's how to explore the agent telemetry in Application Insights. Step 1: Open the Agents (Preview) View In the Azure portal, navigate to your Application Insights resource. In the left nav, look for Agents (Preview) under the Investigations section. This opens the unified agent monitoring dashboard. Step 2: Filter by Agent The agent dropdown at the top of the page is populated by the gen_ai.agent.name values in your telemetry. You'll see all six agents listed: Travel Planning Coordinator Currency Conversion Specialist Weather & Packing Advisor Local Expert & Cultural Guide Itinerary Planning Expert Budget Optimization Specialist Select a specific agent to filter the entire dashboard — token usage, latency, error rate — down to that one agent. Step 3: Review Token Usage The token usage tile shows total input and output token consumption over your selected time range. Compare agents to find your biggest spenders. In our testing, the Coordinator agent consistently uses the most output tokens because it aggregates and synthesizes results from all five specialists. Step 4: Drill into Traces Click "View Traces with Agent Runs" to see all agent executions. Each row represents a workflow run. You can filter by time range, status (success/failure), and specific agent. Step 5: End-to-End Transaction Details Click any trace to open the end-to-end transaction details. The "simple view" renders the agent workflow as a story — showing each step, which agent handled it, how long it took, and what tools were called. For a full travel plan, you'll see the Coordinator dispatch work to each specialist, tool calls to the NWS weather API and Frankfurter currency API, and the final aggregation step. Grafana Dashboards The Agents (Preview) view in Application Insights is great for ad-hoc investigation. For ongoing monitoring and alerting, Azure Managed Grafana provides prebuilt dashboards specifically designed for agent workloads. From the Agents view, click "Explore in Grafana" to jump directly into these dashboards: Agent Framework Dashboard — Per-agent metrics including token usage trends, latency percentiles, error rates, and throughput over time. Pin this to your operations wall. Agent Framework Workflow Dashboard — Workflow-level metrics showing how multi-agent orchestrations perform end-to-end. See how long complete travel plans take, identify bottleneck agents, and track success rates. These dashboards query the same underlying data in Log Analytics, so there's zero additional instrumentation needed. If your telemetry lights up the Agents view, it lights up Grafana too. Key Packages Summary Here are the NuGet packages that make this work, pulled from the actual project files: Package Version Purpose Azure.Monitor.OpenTelemetry.AspNetCore 1.3.0 Azure Monitor OTEL Distro for ASP.NET Core (API). One-line setup for traces, metrics, and logs. Azure.Monitor.OpenTelemetry.Exporter 1.3.0 Azure Monitor OTEL exporter for non-ASP.NET Core hosts (WebJob). Trace, metric, and log exporters. Microsoft.Agents.AI 1.0.0 MAF 1.0 GA — ChatClientAgent , .UseOpenTelemetry() for agent-level instrumentation. Microsoft.Extensions.AI 10.4.1 IChatClient abstraction with .UseOpenTelemetry() for LLM-level instrumentation. OpenTelemetry.Extensions.Hosting 1.11.2 OTEL dependency injection integration for Host.CreateApplicationBuilder (WebJob). Microsoft.Extensions.AI.OpenAI 10.4.1 OpenAI/Azure OpenAI adapter for IChatClient . Bridges the Azure OpenAI SDK to the M.E.AI abstraction. Wrapping Up Let's zoom out. In this two-part series, we've gone from zero to a fully observable, production-grade multi-agent AI application on Azure App Service: Blog 1 covered deploying the multi-agent travel planner with MAF 1.0 GA — the agents, the architecture, the infrastructure. Blog 2 (this post) showed how to instrument those agents with OpenTelemetry, explained the GenAI semantic conventions that make agent-aware monitoring possible, and walked through the new Agents (Preview) view in Application Insights. The pattern is straightforward: Add .UseOpenTelemetry() at the IChatClient level for LLM metrics. Add .UseOpenTelemetry(sourceName: AgentName) at the MAF agent level for agent identity. Export to Application Insights via the Azure Monitor distro (one line) or manual exporters. Wire the connection string through Bicep and environment variables. Open the Agents (Preview) view and start monitoring. With MAF 1.0 GA's built-in OpenTelemetry support and Application Insights' new Agents view, you get production-grade observability for AI agents with minimal code. The GenAI semantic conventions ensure your telemetry is structured, portable, and understood by any compliant backend. And because it's all standard OpenTelemetry, you're not locked into any single vendor — swap the exporter and your telemetry goes to Jaeger, Grafana, Datadog, or wherever you need it. Now go see what your agents are up to. Resources Sample repository: seligj95/app-service-multi-agent-maf-otel App Insights Agents (Preview) view: Documentation GenAI Semantic Conventions: OpenTelemetry GenAI Registry MAF Observability Guide: Microsoft Agent Framework Observability Azure Monitor OpenTelemetry Distro: Enable OpenTelemetry for .NET Grafana Agent Framework Dashboard: aka.ms/amg/dash/af-agent Grafana Workflow Dashboard: aka.ms/amg/dash/af-workflow Blog 1: Deploy Multi-Agent AI Apps on Azure App Service with MAF 1.0 GA173Views0likes0CommentsDeploying to Azure Web App from Azure DevOps Using UAMI
TOC UAMI Configuration App Configuration Azure DevOps Configuration Logs UAMI Configuration Create a User Assigned Managed Identity with no additional configuration. This identity will be mentioned in later steps, especially at Object ID. App Configuration On an existing Azure Web App, enable Diagnostic Settings and configure it to retain certain types of logs, such as Access Audit Logs. These logs will be discussed in the final section of this article. Next, navigate to Access Control (IAM) and assign the previously created User Assigned Managed Identity the Website Contributor role. Azure DevOps Configuration Go to Azure DevOps → Project Settings → Service Connections, and create a new ARM (Azure Resource Manager) connection. While creating the connection: Select the corresponding User Assigned Managed Identity Grant it appropriate permissions at the Resource Group level During this process, you will be prompted to sign in again using your own account. This authentication will later be reflected in the deployment logs discussed below. Assuming the following deployment template is used in the pipeline, you will notice that additional steps appear in the deployment process compared to traditional service principal–based authentication. Logs A few minutes after deployment, related log records will appear. In the AppServiceAuditLogs table, you can observe that the deployment initiator is shown as the Object ID from UAMI, and the Source is listed as Azure (DevOps). This indicates that the User Assigned Managed Identity is authorized under my user context, while the deployment action itself is initiated by Azure DevOps.116Views0likes0CommentsAzure Functions Ignite 2025 Update
Azure Functions is redefining event-driven applications and high-scale APIs in 2025, accelerating innovation for developers building the next generation of intelligent, resilient, and scalable workloads. This year, our focus has been on empowering AI and agentic scenarios: remote MCP server hosting, bulletproofing agents with Durable Functions, and first-class support for critical technologies like OpenTelemetry, .NET 10 and Aspire. With major advances in serverless Flex Consumption, enhanced performance, security, and deployment fundamentals across Elastic Premium and Flex, Azure Functions is the platform of choice for building modern, enterprise-grade solutions. Remote MCP Model Context Protocol (MCP) has taken the world by storm, offering an agent a mechanism to discover and work deeply with the capabilities and context of tools. When you want to expose MCP/tools to your enterprise or the world securely, we recommend you think deeply about building remote MCP servers that are designed to run securely at scale. Azure Functions is uniquely optimized to run your MCP servers at scale, offering serverless and highly scalable features of Flex Consumption plan, plus two flexible programming model options discussed below. All come together using the hardened Functions service plus new authentication modes for Entra and OAuth using Built-in authentication. Remote MCP Triggers and Bindings Extension GA Back in April, we shared a new extension that allows you to author MCP servers using functions with the MCP tool trigger. That MCP extension is now generally available, with support for C#(.NET), Java, JavaScript (Node.js), Python, and Typescript (Node.js). The MCP tool trigger allows you to focus on what matters most: the logic of the tool you want to expose to agents. Functions will take care of all the protocol and server logistics, with the ability to scale out to support as many sessions as you want to throw at it. [Function(nameof(GetSnippet))] public object GetSnippet( [McpToolTrigger(GetSnippetToolName, GetSnippetToolDescription)] ToolInvocationContext context, [BlobInput(BlobPath)] string snippetContent ) { return snippetContent; } New: Self-hosted MCP Server (Preview) If you’ve built servers with official MCP SDKs and want to run them as remote cloud‑scale servers without re‑writing any code, this public preview is for you. You can now self‑host your MCP server on Azure Functions—keep your existing Python, TypeScript, .NET, or Java code and get rapid 0 to N scaling, built-in server authentication and authorization, consumption-based billing, and more from the underlying Azure Functions service. This feature complements the Azure Functions MCP extension for building MCP servers using the Functions programming model (triggers & bindings). Pick the path that fits your scenario—build with the extension or standard MCP SDKs. Either way you benefit from the same scalable, secure, and serverless platform. Use the official MCP SDKs: # MCP.tool() async def get_alerts(state: str) -> str: """Get weather alerts for a US state. Args: state: Two-letter US state code (e.g. CA, NY) """ url = f"{NWS_API_BASE}/alerts/active/area/{state}" data = await make_nws_request(url) if not data or "features" not in data: return "Unable to fetch alerts or no alerts found." if not data["features"]: return "No active alerts for this state." alerts = [format_alert(feature) for feature in data["features"]] return "\n---\n".join(alerts) Use Azure Functions Flex Consumption Plan's serverless compute using Custom Handlers in host.json: { "version": "2.0", "configurationProfile": "mcp-custom-handler", "customHandler": { "description": { "defaultExecutablePath": "python", "arguments": ["weather.py"] }, "http": { "DefaultAuthorizationLevel": "anonymous" }, "port": "8000" } } Learn more about MCPTrigger and self-hosted MCP servers at https://aka.ms/remote-mcp Built-in MCP server authorization (Preview) The built-in authentication and authorization feature can now be used for MCP server authorization, using a new preview option. You can quickly define identity-based access control for your MCP servers with Microsoft Entra ID or other OpenID Connect providers. Learn more at https://aka.ms/functions-mcp-server-authorization. Better together with Foundry agents Microsoft Foundry is the starting point for building intelligent agents, and Azure Functions is the natural next step for extending those agents with remote MCP tools. Running your tools on Functions gives you clean separation of concerns, reuse across multiple agents, and strong security isolation. And with built-in authorization, Functions enables enterprise-ready authentication patterns, from calling downstream services with the agent’s identity to operating on behalf of end users with their delegated permissions. Build your first remote MCP server and connect it to your Foundry agent at https://aka.ms/foundry-functions-mcp-tutorial. Agents Microsoft Agent Framework 2.0 (Public Preview Refresh) We’re excited about the preview refresh 2.0 release of Microsoft Agent Framework that builds on battle hardened work from Semantic Kernel and AutoGen. Agent Framework is an outstanding solution for building multi-agent orchestrations that are both simple and powerful. Azure Functions is a strong fit to host Agent Framework with the service’s extreme scale, serverless billing, and enterprise grade features like VNET networking and built-in auth. Durable Task Extension for Microsoft Agent Framework (Preview) The durable task extension for Microsoft Agent Framework transforms how you build production-ready, resilient and scalable AI agents by bringing the proven durable execution (survives crashes and restarts) and distributed execution (runs across multiple instances) capabilities of Azure Durable Functions directly into the Microsoft Agent Framework. Combined with Azure Functions for hosting and event-driven execution, you can now deploy stateful, resilient AI agents that automatically handle session management, failure recovery, and scaling, freeing you to focus entirely on your agent logic. Key features of the durable task extension include: Serverless Hosting: Deploy agents on Azure Functions with auto-scaling from thousands of instances to zero, while retaining full control in a serverless architecture. Automatic Session Management: Agents maintain persistent sessions with full conversation context that survives process crashes, restarts, and distributed execution across instances Deterministic Multi-Agent Orchestrations: Coordinate specialized durable agents with predictable, repeatable, code-driven execution patterns Human-in-the-Loop with Serverless Cost Savings: Pause for human input without consuming compute resources or incurring costs Built-in Observability with Durable Task Scheduler: Deep visibility into agent operations and orchestrations through the Durable Task Scheduler UI dashboard Create a durable agent: endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4o-mini") # Create an AI agent following the standard Microsoft Agent Framework pattern agent = AzureOpenAIChatClient( endpoint=endpoint, deployment_name=deployment_name, credential=AzureCliCredential() ).create_agent( instructions="""You are a professional content writer who creates engaging, well-structured documents for any given topic. When given a topic, you will: 1. Research the topic using the web search tool 2. Generate an outline for the document 3. Write a compelling document with proper formatting 4. Include relevant examples and citations""", name="DocumentPublisher", tools=[ AIFunctionFactory.Create(search_web), AIFunctionFactory.Create(generate_outline) ] ) # Configure the function app to host the agent with durable session management app = AgentFunctionApp(agents=[agent]) app.run() Durable Task Scheduler dashboard for agent and agent workflow observability and debugging For more information on the durable task extension for Agent Framework, see the announcement: https://aka.ms/durable-extension-for-af-blog. Flex Consumption Updates As you know, Flex Consumption means serverless without compromise. It combines elastic scale and pay‑for‑what‑you‑use pricing with the controls you expect: per‑instance concurrency, longer executions, VNet/private networking, and Always Ready instances to minimize cold starts. Since launching GA at Ignite 2024 last year, Flex Consumption has had tremendous growth with over 1.5 billion function executions per day and nearly 40 thousand apps. Here’s what’s new for Ignite 2025: 512 MB instance size (GA). Right‑size lighter workloads, scale farther within default quota. Availability Zones (GA). Distribute instances across zones. Rolling updates (Public Preview). Unlock zero-downtime deployments of code or config by setting a single configuration. See below for more information. Even more improvements including: new diagnostic settingsto route logs/metrics, use Key Vault App Config references, new regions, and Custom Handler support. To get started, review Flex Consumption samples, or dive into the documentation to see how Flex can support your workloads. Migrating to Azure Functions Flex Consumption Migrating to Flex Consumption is simple with our step-by-step guides and agentic tools. Move your Azure Functions apps or AWS Lambda workloads, update your code and configuration, and take advantage of new automation tools. With Linux Consumption retiring, now is the time to switch. For more information, see: Migrate Consumption plan apps to the Flex Consumption plan Migrate AWS Lambda workloads to Azure Functions Durable Functions Durable Functions introduces powerful new features to help you build resilient, production-ready workflows: Distributed Tracing: lets you track requests across components and systems, giving you deep visibility into orchestration and activities with support for App Insights and OpenTelemetry. Extended Sessions support in .NET isolated: improves performance by caching orchestrations in memory, ideal for fast sequential activities and large fan-out/fan-in patterns. Orchestration versioning (public preview): enables zero-downtime deployments and backward compatibility, so you can safely roll out changes without disrupting in-flight workflows Durable Task Scheduler Updates Durable Task Scheduler Dedicated SKU (GA): Now generally available, the Dedicated SKU offers advanced orchestration for complex workflows and intelligent apps. It provides predictable pricing for steady workloads, automatic checkpointing, state protection, and advanced monitoring for resilient, reliable execution. Durable Task Scheduler Consumption SKU (Public Preview): The new Consumption SKU brings serverless, pay-as-you-go orchestration to dynamic and variable workloads. It delivers the same orchestration capabilities with flexible billing, making it easy to scale intelligent applications as needed. For more information see: https://aka.ms/dts-ga-blog OpenTelemetry support in GA Azure Functions OpenTelemetry is now generally available, bringing unified, production-ready observability to serverless applications. Developers can now export logs, traces, and metrics using open standards—enabling consistent monitoring and troubleshooting across every workload. Key capabilities include: Unified observability: Standardize logs, traces, and metrics across all your serverless workloads for consistent monitoring and troubleshooting. Vendor-neutral telemetry: Integrate seamlessly with Azure Monitor or any OpenTelemetry-compliant backend, ensuring flexibility and choice. Broad language support: Works with .NET (isolated), Java, JavaScript, Python, PowerShell, and TypeScript. Start using OpenTelemetry in Azure Functions today to unlock standards-based observability for your apps. For step-by-step guidance on enabling OpenTelemetry and configuring exporters for your preferred backend, see the documentation. Deployment with Rolling Updates (Preview) Achieving zero-downtime deployments has never been easier. The Flex Consumption plan now offers rolling updates as a site update strategy. Set a single property, and all future code deployments and configuration changes will be released with zero-downtime. Instead of restarting all instances at once, the platform now drains existing instances in batches while scaling out the latest version to match real-time demand. This ensures uninterrupted in-flight executions and resilient throughput across your HTTP, non-HTTP, and Durable workloads – even during intensive scale-out scenarios. Rolling updates are now in public preview. Learn more at https://aka.ms/functions/rolling-updates. Secure Identity and Networking Everywhere By Design Security and trust are paramount. Azure Functions incorporates proven best practices by design, with full support for managed identity—eliminating secrets and simplifying secure authentication and authorization. Flex Consumption and other plans offer enterprise-grade networking features like VNETs, private endpoints, and NAT gateways for deep protection. The Azure Portal streamlines secure function creation, and updated scenarios and samples showcase these identity and networking capabilities in action. Built-in authentication (discussed above) enables inbound client traffic to use identity as well. Check out our updated Functions Scenarios page with quickstarts or our secure samples gallery to see these identity and networking best practices in action. .NET 10 Azure Functions now supports .NET 10, bringing in a great suite of new features and performance benefits for your code. .NET 10 is supported on the isolated worker model, and it’s available for all plan types except Linux Consumption. As a reminder, support ends for the legacy in-process model on November 10, 2026, and the in-process model is not being updated with .NET 10. To stay supported and take advantage of the latest features, migrate to the isolated worker model. Aspire Aspire is an opinionated stack that simplifies development of distributed applications in the cloud. The Azure Functions integration for Aspire enables you to develop, debug, and orchestrate an Azure Functions .NET project as part of an Aspire solution. Aspire publish directly deploys to your functions to Azure Functions on Azure Container Apps. Aspire 13 includes an updated preview version of the Functions integration that acts as a release candidate with go-live support. The package will be moved to GA quality with Aspire 13.1. Java 25, Node.js 24 Azure Functions now supports Java 25 and Node.js 24 in preview. You can now develop functions using these versions locally and deploy them to Azure Functions plans. Learn how to upgrade your apps to these versions here In Summary Ready to build what’s next? Update your Azure Functions Core Tools today and explore the latest samples and quickstarts to unlock new capabilities for your scenarios. The guided quickstarts run and deploy in under 5 minutes, and incorporate best practices—from architecture to security to deployment. We’ve made it easier than ever to scaffold, deploy, and scale real-world solutions with confidence. The future of intelligent, scalable, and secure applications starts now—jump in and see what you can create!3.4KViews0likes1CommentAnnouncing a flexible, predictable billing model for Azure SRE Agent
Billing for Azure SRE Agent will start on September 1, 2025. Announced at Microsoft Build 2025, Azure SRE Agent is a pre-built AI agent for root cause analysis, uptime improvement, and operational cost reduction. Learn more about the billing model and example scenarios.4.2KViews1like1CommentContinued Investment in Azure App Service
This blog was originally published to the App Service team blog Recent Investments Premium v4 (Pv4) Azure App Service Premium v4 delivers higher performance and scalability on newer Azure infrastructure while preserving the fully managed PaaS experience developers rely on. Premium v4 offers expanded CPU and memory options, improved price-performance, and continued support for App Service capabilities such as deployment slots, integrated monitoring, and availability zone resiliency. These improvements help teams modernize and scale demanding workloads without taking on additional operational complexity. App Service Managed Instance App Service Managed Instance extends the App Service model to support Windows web applications that require deeper environment control. It enables plan-level isolation, optional private networking, and operating system customization while retaining managed scaling, patching, identity, and diagnostics. Managed Instance is designed to reduce migration friction for existing applications, allowing teams to move to a modern PaaS environment without code changes. Faster Runtime and Language Support Azure App Service continues to invest in keeping pace with modern application stacks. Regular updates across .NET, Node.js, Python, Java, and PHP help developers adopt new language versions and runtime improvements without managing underlying infrastructure. Reliability and Availability Improvements Ongoing investments in platform reliability and resiliency strengthen production confidence. Expanded Availability Zone support and related infrastructure improvements help applications achieve higher availability with more flexible configuration options as workloads scale. Deployment Workflow Enhancements Deployment workflows across Azure App Service continue to evolve, with ongoing improvements to GitHub Actions, Azure DevOps, and platform tooling. These enhancements reduce friction from build to production while preserving the managed App Service experience. A Platform That Grows With You These recent investments reflect a consistent direction for Azure App Service: active development focused on performance, reliability, and developer productivity. Improvements to runtimes, infrastructure, availability, and deployment workflows are designed to work together, so applications benefit from platform progress without needing to re-architect or change operating models. The recent General Availability of Aspire on Azure App Service is another example of this direction. Developers building distributed .NET applications can now use the Aspire AppHost model to define, orchestrate, and deploy their services directly to App Service — bringing a code-first development experience to a fully managed platform. We are also seeing many customers build and run AI-powered applications on Azure App Service, integrating models, agents, and intelligent features directly into their web apps and APIs. App Service continues to evolve to support these scenarios, providing a managed, scalable foundation that works seamlessly with Azure's broader AI services and tooling. Whether you are modernizing with Premium v4, migrating existing workloads using App Service Managed Instance, or running production applications at scale - including AI-enabled workloads - Azure App Service provides a predictable and transparent foundation that evolves alongside your applications. Azure App Service continues to focus on long-term value through sustained investment in a managed platform developers can rely on as requirements grow, change, and increasingly incorporate AI. Get Started Ready to build on Azure App Service? Here are some resources to help you get started: Create your first web app — Deploy a web app in minutes using the Azure portal, CLI, or VS Code. App Service documentation — Explore guides, tutorials, and reference for the full platform. Aspire on Azure App Service — Now generally available. Deploy distributed .NET applications to App Service using the Aspire AppHost model. Pricing and plans — Compare tiers including Premium v4 and find the right fit for your workload. App Service on Azure Architecture Center — Reference architectures and best practices for production deployments.298Views1like0CommentsThe Durable Task Scheduler Consumption SKU is now Generally Available
Today, we're excited to announce that the Durable Task Scheduler Consumption SKU has reached General Availability. Developers can now run durable workflows and agents on Azure with pay-per-use pricing, no storage to manage, no capacity to plan, and no idle costs. Just create a scheduler, connect your app, and start orchestrating. Whether you're coordinating AI agent workflows, processing event-driven pipelines, or running background jobs, the Consumption SKU is ready to go. GET STARTED WITH THE DURABLE TASK SCHEDULER CONSUMPTION SKU Since launching the Consumption SKU in public preview last November, we've seen incredible adoption and have incorporated feedback from developers around the world to ensure the GA release is truly production ready. “The Durable Task Scheduler has become a foundational piece of what we call ‘workflows’. It gives us the reliability guarantees we need for processing financial documents and sensitive workflows, while keeping the programming model straightforward. The combination of durable execution, external event correlation, deterministic idempotency, and the local emulator experience has made it a natural fit for our event-driven architecture. We have been delighted with the consumption SKUs cost model for our lower environments.”– Emily Lewis, CarMax What is the Durable Task Scheduler? If you're new to the Durable Task Scheduler, we recommend checking out our previous blog posts for a detailed background: Announcing Limited Early Access of the Durable Task Scheduler Announcing Workflow in Azure Container Apps with the Durable Task Scheduler Announcing Dedicated SKU GA & Consumption SKU Public Preview In brief, the Durable Task Scheduler is a fully managed orchestration backend for durable execution on Azure, meaning your workflows and agent sessions can reliably resume and run to completion, even through process failures, restarts, and scaling events. Whether you’re running workflows or orchestrating durable agents, it handles task scheduling, state persistence, fault tolerance, and built-in monitoring, freeing developers from the operational overhead of managing their own execution engines and storage backends. The Durable Task Scheduler works across Azure compute environments: Azure Functions: Using the Durable Functions extension across all Function App SKUs, including Flex Consumption. Azure Container Apps: Using the Durable Functions or Durable Task SDKs with built-in workflow support and auto-scaling. Any compute: Azure Kubernetes Service, Azure App Service, or any environment where you can run the Durable Task SDKs (.NET, Python, Java, JavaScript). Why choose the Consumption SKU? With the Consumption SKU you’re charged only for actions dispatched, with no minimum commitments or idle costs. There’s no capacity to size or throughput to reserve. Create a scheduler, connect your app, and you’re running. The Consumption SKU is a natural fit for workloads with unpredictable or bursty usage patterns: AI agent orchestration: Multi-step agent workflows that call LLMs, retrieve data, and take actions. Users trigger these on demand, so volume is spiky and pay-per-use avoids idle costs between bursts. Event-driven pipelines: Processing events from queues, webhooks, or streams with reliable orchestration and automatic checkpointing, where volumes spike and dip unpredictably. API-triggered workflows: User signups, form submissions, payment flows, and other request-driven processing where volume varies throughout the day. Distributed transactions: Retries and compensation logic across microservices with durable sagas that survive failures and restarts. What's included in the Consumption SKU at GA The Consumption SKU has been hardened based on feedback and real-world usage during the public preview. Here's what's included at GA: Performance Up to 500 actions per second: Sufficient throughput for a wide range of workloads, with the option to move to the Dedicated SKU for higher-scale scenarios. Up to 30 days of data retention: View and manage orchestration history, debug failures, and audit execution data for up to 30 days. Built-in monitoring dashboard Filter orchestrations by status, drill into execution history, view visual Gantt and sequence charts, and manage orchestrations (pause, resume, terminate, or raise events), all from the dashboard, secured with Role-Based Access Control (RBAC). Identity-based security The Consumption SKU uses Entra ID for authentication and RBAC for authorization. No SAS tokens or access keys to manage, just assign the appropriate role and connect. Get started with the Durable Task Scheduler today The Consumption SKU is available now Generally Available. Provision a scheduler in the Azure portal, connect your app, and start orchestrating. You only pay for what you use. Documentation Getting started Samples Pricing Consumption SKU docs We'd love to hear your feedback. Reach out to us by filing an issue on our GitHub repository317Views0likes0CommentsBuilding the agentic future together at JDConf 2026
JDConf 2026 is just weeks away, and I’m excited to welcome Java developers, architects, and engineering leaders from around the world for two days of learning and connection. Now in its sixth year, JDConf has become a place where the Java community compares notes on their real-world production experience: patterns, tooling, and hard-earned lessons you can take back to your team, while we keep moving the Java systems that run businesses and services forward in the AI era. This year’s program lines up with a shift many of us are seeing first-hand: delivery is getting more intelligent, more automated, and more tightly coupled to the systems and data we already own. Agentic approaches are moving from demos to backlog items, and that raises practical questions: what’s the right architecture, where do you draw trust boundaries, how do you keep secrets safe, and how do you ship without trading reliability for novelty? JDConf is for and by the people who build and manage the mission-critical apps powering organizations worldwide. Across three regional livestreams, you’ll hear from open source and enterprise practitioners who are making the same tradeoffs you are—velocity vs. safety, modernization vs. continuity, experimentation vs. operational excellence. Expect sessions that go beyond “what” and get into “how”: design choices, integration patterns, migration steps, and the guardrails that make AI features safe to run in production. You’ll find several practical themes for shipping Java in the AI era: connecting agents to enterprise systems with clear governance; frameworks and runtimes adapting to AI-native workloads; and how testing and delivery pipelines evolve as automation gets more capable. To make this more concrete, a sampling of sessions would include topics like Secrets of Agentic Memory Management (patterns for short- and long-term memory and safe retrieval), Modernizing a Java App with GitHub Copilot (end-to-end upgrade and migration with AI-powered technologies), and Docker Sandboxes for AI Agents (guardrails for running agent workflows without risking your filesystem or secrets). The goal is to help you adopt what’s new while hardening your long lived codebases. JDConf is built for community learning—free to attend, accessible worldwide, and designed for an interactive live experience in three time zones. You’ll not only get 23 practitioner-led sessions with production-ready guidance but also free on-demand access after the event to re-watch with your whole team. Pro tip: join live and get more value by discussing practical implications and ideas with your peers in the chat. This is where the “how” details and tradeoffs become clearer. JDConf 2026 Keynote Building the Agentic Future Together Rod Johnson, Embabel | Bruno Borges, Microsoft | Ayan Gupta, Microsoft The JDConf 2026 keynote features Rod Johnson, creator of the Spring Framework and founder of Embabel, joined by Bruno Borges and Ayan Gupta to explore where the Java ecosystem is headed in the agentic era. Expect a practitioner-level discussion on how frameworks like Spring continue to evolve, how MCP is changing the way agents interact with enterprise systems, and what Java developers should be paying attention to right now. Register. Attend. Earn. Register for JDConf 2026 to earn Microsoft Rewards points, which you can use for gift cards, sweepstakes entries, and more. Earn 1,000 points simply by signing up. When you register for any regional JDConf 2026 event with your Microsoft account, you'll automatically receive these points. Get 5,000 additional points for attending live (limited to the first 300 attendees per stream). On the day of your regional event, check in through the Reactor page or your email confirmation link to qualify. Disclaimer: Points are added to your Microsoft account within 60 days after the event. Must register with a Microsoft account email. Up to 10,000 developers eligible. Points will be applied upon registration and attendance and will not be counted multiple times for registering or attending at different events. Terms | Privacy JDConf 2026 Regional Live Streams Americas – April 8, 8:30 AM – 12:30 PM PDT (UTC -7) Bruno Borges hosts the Americas stream, discussing practical agentic Java topics like memory management, multi-agent system design, LLM integration, modernization with AI, and dependency security. Experts from Redis, IBM, Hammerspace, HeroDevs, AI Collective, Tekskills, and Microsoft share their insights. Register for Americas → Asia-Pacific – April 9, 10:00 AM – 2:00 PM SGT (UTC +8) Brian Benz and Ayan Gupta co-host the APAC stream, highlighting Java frameworks and practices for agentic delivery. Topics include Spring AI, multi-agent orchestration, spec-driven development, scalable DevOps, and legacy modernization, with speakers from Broadcom, Alibaba, CERN, MHP (A Porsche Company), and Microsoft. Register for Asia-Pacific → Europe, Middle East and Africa – April 9, 9:00 AM – 12:30 PM GMT (UTC +0) The EMEA stream, hosted by Sandra Ahlgrimm, will address the implementation of agentic Java in production environments. Topics include self-improving systems utilizing Spring AI, Docker sandboxes for agent workflow management, Retrieval-Augmented Generation (RAG) pipelines, modernization initiatives from a national tax authority, and AI-driven CI/CD enhancements. Presentations will feature experts from Broadcom, Docker, Elastic, Azul Systems, IBM, Team Rockstars IT, and Microsoft. Register for EMEA → Make It Interactive: Join Live Come prepared with an actual challenge you’re facing, whether you’re modernizing a legacy application, connecting agents to internal APIs, or refining CI/CD processes. Test your strategies by participating in live chats and Q&As with presenters and fellow professionals. If you’re attending with your team, schedule a debrief after the live stream to discuss how to quickly use key takeaways and insights in your pilots and projects. Learning Resources Java and AI for Beginners Video Series: Practical, episode-based walkthroughs on MCP, GenAI integration, and building AI-powered apps from scratch. Modernize Java Apps Guide: Step-by-step guide using GitHub Copilot agent mode for legacy Java project upgrades, automated fixes, and cloud-ready migrations. AI Agents for Java Webinar: Embedding AI Agent capabilities into Java applications using Microsoft Foundry, from project setup to production deployment. Java Practitioner’s Guide: Learning plan for deploying, managing, and optimizing Java applications on Azure using modern cloud-native approaches. Register Now JDConf 2026 is a free global event for Java teams. Join live to ask questions, connect, and gain practical patterns. All 23 sessions will be available on-demand. Register now to earn Microsoft Rewards points for attending. Register at JDConf.com174Views0likes0CommentsThe Agent that investigates itself
Azure SRE Agent handles tens of thousands of incident investigations each week for internal Microsoft services and external teams running it for their own systems. Last month, one of those incidents was about the agent itself. Our KV cache hit rate alert started firing. Cached token percentage was dropping across the fleet. We didn't open dashboards. We simply asked the agent. It spawned parallel subagents, searched logs, read through its own source code, and produced the analysis. First finding: Claude Haiku at 0% cache hits. The agent checked the input distribution and found that the average call was ~180 tokens, well below Anthropic’s 4,096-token minimum for Haiku prompt caching. Structurally, these requests could never be cached. They were false positives. The real regression was in Claude Opus: cache hit rate fell from ~70% to ~48% over a week. The agent correlated the drop against the deployment history and traced it to a single PR that restructured prompt ordering, breaking the common prefix that caching relies on. It submitted two fixes: one to exclude all uncacheable requests from the alert, and the other to restore prefix stability in the prompt pipeline. That investigation is how we develop now. We rarely start with dashboards or manual log queries. We start by asking the agent. Three months earlier, it could not have done any of this. The breakthrough was not building better playbooks. It was harness engineering: enabling the agent to discover context as the investigation unfolded. This post is about the architecture decisions that made it possible. Where we started In our last post, Context Engineering for Reliable AI Agents: Lessons from Building Azure SRE Agent, we described how moving to a single generalist agent unlocked more complex investigations. The resolution rates were climbing, and for many internal teams, the agent could now autonomously investigate and mitigate roughly 50% of incidents. We were moving in the right direction. But the scores weren't uniform, and when we dug into why, the pattern was uncomfortable. The high-performing scenarios shared a trait: they'd been built with heavy human scaffolding. They relied on custom response plans for specific incident types, hand-built subagents for known failure modes, and pre-written log queries exposed as opaque tools. We weren’t measuring the agent’s reasoning – we were measuring how much engineering had gone into the scenario beforehand. On anything new, the agent had nowhere to start. We found these gaps through manual review. Every week, engineers read through lower-scored investigation threads and pushed fixes: tighten a prompt, fix a tool schema, add a guardrail. Each fix was real. But we could only review fifty threads a week. The agent was handling ten thousand. We were debugging at human speed. The gap between those two numbers was where our blind spots lived. We needed an agent powerful enough to take this toil off us. An agent which could investigate itself. Dogfooding wasn't a philosophy - it was the only way to scale. The Inversion: Three bets The problem we faced was structural - and the KV cache investigation shows it clearly. The cache rate drop was visible in telemetry, but the cause was not. The agent had to correlate telemetry with deployment history, inspect the relevant code, and reason over the diff that broke prefix stability. We kept hitting the same gap in different forms: logs pointing in multiple directions, failure modes in uninstrumented paths, regressions that only made sense at the commit level. Telemetry showed symptoms, but not what actually changed. We'd been building the agent to reason over telemetry. We needed it to reason over the system itself. The instinct when agents fail is to restrict them: pre-write the queries, pre-fetch the context, pre-curate the tools. It feels like control. In practice, it creates a ceiling. The agent can only handle what engineers anticipated in advance. The answer is an agent that can discover what it needs as the investigation unfolds. In the KV cache incident, each step, from metric anomaly to deployment history to a specific diff, followed from what the previous step revealed. It was not a pre-scripted path. Navigating towards the right context with progressive discovery is key to creating deep agents which can handle novel scenarios. Three architectural decisions made this possible – and each one compounded on the last. Bet 1: The Filesystem as the Agent's World Our first bet was to give the agent a filesystem as its workspace instead of a custom API layer. Everything it reasons over – source code, runbooks, query schemas, past investigation notes – is exposed as files. It interacts with that world using read_file, grep, find, and shell. No SearchCodebase API. No RetrieveMemory endpoint. This is an old Unix idea: reduce heterogeneous resources to a single interface. Coding agents already work this way. It turns out the same pattern works for an SRE agent. Frontier models are trained on developer workflows: navigating repositories, grepping logs, patching files, running commands. The filesystem is not an abstraction layered on top of that prior. It matches it. When we materialized the agent’s world as a repo-like workspace, our human "Intent Met" score - whether the agent's investigation addressed the actual root cause as judged by the on-call engineer - rose from 45% to 75% on novel incidents. But interface design is only half the story. The other half is what you put inside it. Code Repositories: the highest-leverage context Teams had prewritten log queries because they did not trust the agent to generate correct ones. That distrust was justified. Models hallucinate table names, guess column schemas, and write queries against the wrong cluster. But the answer was not tighter restriction. It was better grounding. The repo is the schema. Everything else is derived from it. When the agent reads the code that produces the logs, query construction stops being guesswork. It knows the exact exceptions thrown, and the conditions under which each path executes. Stack traces start making sense, and logs become legible. But beyond query grounding, code access unlocked three new capabilities that telemetry alone could not provide: Ground truth over documentation. Docs drift and dashboards show symptoms. The code is what the service actually does. In practice, most investigations only made sense when logs were read alongside implementation. Point-in-time investigation. The agent checks out the exact commit at incident time, not current HEAD, so it can correlate the failure against the actual diffs. That's what cracked the KV cache investigation: a PR broke prefix stability, and the diff was the only place this was visible. Without commit history, you can't distinguish a code regression from external factors. Reasoning even where telemetry is absent. Some code paths are not well instrumented. The agent can still trace logic through source and explain behavior even when logs do not exist. This is especially valuable in novel failure modes – the ones most likely to be missed precisely because no one thought to instrument them. Memory as a filesystem, not a vector store Our first memory system used RAG over past session learnings. It had a circular dependency: a limited agent learned from limited sessions and produced limited knowledge. Garbage in, garbage out. But the deeper problem was retrieval. In SRE Context, embedding similarity is a weak proxy for relevance. “KV cache regression” and “prompt prefix instability” may be distant in embedding space yet still describe the same causal chain. We tried re-ranking, query expansion, and hybrid search. None fixed the core mismatch between semantic similarity and diagnostic relevance. We replaced RAG with structured Markdown files that the agent reads and writes through its standard tool interface. The model names each file semantically: overview.md for a service summary, team.md for ownership and escalation paths, logs.md for cluster access and query patterns, debugging.md for failure modes and prior learnings. Each carry just enough context to orient the agent, with links to deeper files when needed. The key design choice was to let the model navigate memory, not retrieve it through query matching. The agent starts from a structured entry point and follows the evidence toward what matters. RAG assumes you know the right query before you know what you need. File traversal lets relevance emerge as context accumulates. This removed chunking, overlap tuning, and re-ranking entirely. It also proved more accurate, because frontier models are better at following context than embeddings are at guessing relevance. As a side benefit, memory state can be snapshotted periodically. One problem remains unsolved: staleness. When two sessions write conflicting patterns to debugging.md, the model must reconcile them. When a service changes behavior, old entries can become misleading. We rely on timestamps and explicit deprecation notes, but we do not have a systemic solution yet. This is an active area of work, and anyone building memory at scale will run into it. The sandbox as epistemic boundary The filesystem also defines what the agent can see. If something is not in the sandbox, the agent cannot reason about it. We treat that as a feature, not a limitation. Security boundaries and epistemic boundaries are enforced by the same mechanism. Inside that boundary, the agent has full execution: arbitrary bash, python, jq, and package installs through pip or apt. That scope unlocks capabilities we never would have built as custom tools. It opens PRs with gh cli, like the prompt-ordering fix from KV cache incident. It pushes Grafana dashboards, like a cache-hit-rate dashboard we now track by model. It installs domain-specific CLI tools mid-investigation when needed. No bespoke integration required, just a shell. The recurring lesson was simple: a generally capable agent in the right execution environment outperforms a specialized agent with bespoke tooling. Custom tools accumulate maintenance costs. Shell commands compose for free. Bet 2: Context Layering Code access tells the agent what a service does. It does not tell the agent what it can access, which resources its tools are scoped to, or where an investigation should begin. This gap surfaced immediately. Users would ask "which team do you handle incidents for?" and the agent had no answer. Tools alone are not enough. An integration also needs ambient context so the model knows what exists, how it is configured, and when to use it. We fixed this with context hooks: structured context injected at prompt construction time to orient the agent before it takes action. Connectors - what can I access? A manifest of wired systems such as Log Analytics, Outlook, and Grafana, along with their configuration. Repositories - what does this system do? Serialized repo trees, plus files like AGENTS.md, Copilot.md, and CLAUDE.md with team-specific instructions. Knowledge map - what have I learned before? A two-tier memory index with a top-level file linking to deeper scenario-specific files, so the model can drill down only when needed. Azure resource topology - where do things live? A serialized map of relationships across subscriptions, resource groups, and regions, so investigations start in the right scope. Together, these context hooks turn a cold start into an informed one. That matters because a bad early choice does not just waste tokens. It sends the investigation down the wrong trajectory. A capable agent still needs to know what exists, what matters, and where to start. Bet 3: Frugal Context Management Layered context creates a new problem: budget. Serialized repo trees, resource topology, connector manifests, and a memory index fill context fast. Once the agent starts reading source files and logs, complex incidents hit context limits. We needed our context usage to be deliberately frugal. Tool result compression via the filesystem Large tool outputs are expensive because they consume context before the agent has extracted any value from them. In many cases, only a small slice or a derived summary of that output is actually useful. Our framework exposes these results as files to the agent. The agent can then use tools like grep, jq, or python to process them outside the model interface, so that only the final result enters context. The filesystem isn't just a capability abstraction - it's also a budget management primitive. Context Pruning and Auto Compact Long investigations accumulate dead weight. As hypotheses narrow, earlier context becomes noise. We handle this with two compaction strategies. Context Pruning runs mid-session. When context usage crosses a threshold, we trim or drop stale tool calls and outputs - keeping the window focused on what still matters. Auto-Compact kicks in when a session approaches its context limit. The framework summarizes findings and working hypotheses, then resumes from that summary. From the user's perspective, there's no visible limit. Long investigations just work. Parallel subagents The KV cache investigation required reasoning along two independent hypotheses: whether the alert definition was sound, and whether cache behavior had actually regressed. The agent spawned parallel subagents for each task, each operating in its own context window. Once both finished, it merged their conclusions. This pattern generalizes to any task with independent components. It speeds up the search, keeps intermediate work from consuming the main context window, and prevents one hypothesis from biasing another. The Feedback loop These architectural bets have enabled us to close the original scaling gap. Instead of debugging the agent at human speed, we could finally start using it to fix itself. As an example, we were hitting various LLM errors: timeouts, 429s (too many requests), failures in the middle of response streaming, 400s from code bugs that produced malformed payloads. These paper cuts would cause investigations to stall midway and some conversations broke entirely. So, we set up a daily monitoring task for these failures. The agent searches for the last 24 hours of errors, clusters the top hitters, traces each to its root cause in the codebase, and submits a PR. We review it manually before merging. Over two weeks, the errors were reduced by more than 80%. Over the last month, we have successfully used our agent across a wide range of scenarios: Analyzed our user churn rate and built dashboards we now review weekly. Correlated which builds needed the most hotfixes, surfacing flaky areas of the codebase. Ran security analysis and found vulnerabilities in the read path. Helped fill out parts of its own Responsible AI review, with strict human review. Handles customer-reported issues and LiveSite alerts end to end. Whenever it gets stuck, we talk to it and teach it, ask it to update its memory, and it doesn't fail that class of problem again. The title of this post is literal. The agent investigating itself is not a metaphor. It is a real workflow, driven by scheduled tasks, incident triggers, and direct conversations with users. What We Learned We spent months building scaffolding to compensate for what the agent could not do. The breakthrough was removing it. Every prewritten query was a place we told the model not to think. Every curated tool was a decision made on its behalf. Every pre-fetched context was a guess about what would matter before we understood the problem. The inversion was simple but hard to accept: stop pre-computing the answer space. Give the model a structured starting point, a filesystem it knows how to navigate, context hooks that tell it what it can access, and budget management that keeps it sharp through long investigations. The agent that investigates itself is both the proof and the product of this approach. It finds its own bugs, traces them to root causes in its own code, and submits its own fixes. Not because we designed it to. Because we designed it to reason over systems, and it happens to be one. We are still learning. Staleness is unsolved, budget tuning remains largely empirical, and we regularly discover assumptions baked into context that quietly constrain the agent. But we have crossed a new threshold: from an agent that follows your playbook to one that writes the next one. Thanks to visagarwal for co-authoring this post.13KViews6likes0CommentsIndustry-Wide Certificate Changes Impacting Azure App Service Certificates
Executive Summary In early 2026, industry-wide changes mandated by browser applications and the CA/B Forum will affect both how TLS certificates are issued as well as their validity period. The CA/B Forum is a vendor body that establishes standards for securing websites and online communications through SSL/TLS certificates. Azure App Service is aligning with these standards for both App Service Managed Certificates (ASMC, free, DigiCert-issued) and App Service Certificates (ASC, paid, GoDaddy-issued). Most customers will experience no disruption. Action is required only if you pin certificates or use them for client authentication (mTLS). Update: February 17, 2026 We’ve published new Microsoft Learn documentation, Industry-wide certificate changes impacting Azure App Service , which provides more detailed guidance on these compliance-driven changes. The documentation also includes additional information not previously covered in this blog, such as updates to domain validation reuse, along with an expanding FAQ section. The Microsoft Learn documentation now represents the most complete and up-to-date overview of these changes. Going forward, any new details or clarifications will be published there, and we recommend bookmarking the documentation for the latest guidance. Who Should Read This? App Service administrators Security and compliance teams Anyone responsible for certificate management or application security Quick Reference: What’s Changing & What To Do Topic ASMC (Managed, free) ASC (GoDaddy, paid) Required Action New Cert Chain New chain (no action unless pinned) New chain (no action unless pinned) Remove certificate pinning Client Auth EKU Not supported (no action unless cert is used for mTLS) Not supported (no action unless cert is used for mTLS) Transition from mTLS Validity No change (already compliant) Two overlapping certs issued for the full year None (automated) If you do not pin certificates or use them for mTLS, no action is required. Timeline of Key Dates Date Change Action Required Mid-Jan 2026 and after ASMC migrates to new chain ASMC stops supporting client auth EKU Remove certificate pinning if used Transition to alternative authentication if the certificate is used for mTLS Mar 2026 and after ASC validity shortened ASC migrates to new chain ASC stops supporting client auth EKU Remove certificate pinning if used Transition to alternative authentication if the certificate is used for mTLS Actions Checklist For All Users Review your use of App Service certificates. If you do not pin these certificates and do not use them for mTLS, no action is required. If You Pin Certificates (ASMC or ASC) Remove all certificate or chain pinning before their respective key change dates to avoid service disruption. See Best Practices: Certificate Pinning. If You Use Certificates for Client Authentication (mTLS) Switch to an alternative authentication method before their respective key change dates to avoid service disruption, as client authentication EKU will no longer be supported for these certificates. See Sunsetting the client authentication EKU from DigiCert public TLS certificates. See Set Up TLS Mutual Authentication - Azure App Service Details & Rationale Why Are These Changes Happening? These updates are required by major browser programs (e.g., Chrome) and apply to all public CAs. They are designed to enhance security and compliance across the industry. Azure App Service is automating updates to minimize customer impact. What’s Changing? New Certificate Chain Certificates will be issued from a new chain to maintain browser trust. Impact: Remove any certificate pinning to avoid disruption. Removal of Client Authentication EKU Newly issued certificates will not support client authentication EKU. This change aligns with Google Chrome’s root program requirements to enhance security. Impact: If you use these certificates for mTLS, transition to an alternate authentication method. Shortening of Certificate Validity Certificate validity is now limited to a maximum of 200 days. Impact: ASMC is already compliant; ASC will automatically issue two overlapping certificates to cover one year. No billing impact. Frequently Asked Questions (FAQs) Will I lose coverage due to shorter validity? No. For App Service Certificate, App Service will issue two certificates to span the full year you purchased. Is this unique to DigiCert and GoDaddy? No. This is an industry-wide change. Do these changes impact certificates from other CAs? Yes. These changes are an industry-wide change. We recommend you reach out to your certificates’ CA for more information. Do I need to act today? If you do not pin or use these certs for mTLS, no action is required. Glossary ASMC: App Service Managed Certificate (free, DigiCert-issued) ASC: App Service Certificate (paid, GoDaddy-issued) EKU: Extended Key Usage mTLS: Mutual TLS (client certificate authentication) CA/B Forum: Certification Authority/Browser Forum Additional Resources Changes to the Managed TLS Feature Set Up TLS Mutual Authentication Azure App Service Best Practices – Certificate pinning DigiCert Root and Intermediate CA Certificate Updates 2023 Sunsetting the client authentication EKU from DigiCert public TLS certificates Feedback & Support If you have questions or need help, please visit our official support channels or the Microsoft Q&A, where our team and the community can assist you.5.3KViews1like0Comments