web apps
445 TopicsA Better Way to View Logs in Kudu for Azure App Service on Linux
Logs are often the fastest way to understand what is happening inside your application. Whether you are investigating startup behavior, runtime errors, failed requests, dependency issues, or unexpected application behavior, having the right log view can make troubleshooting much easier. To make this easier, we have added a new Log stream page in Kudu for Azure App Service on Linux, available under the Logs dropdown. This experience gives you a single place to stream, browse, search, and filter logs so you can understand what is happening in your app faster. Opening the Logs page You can open Kudu from the Azure portal: Go to your App Service. Select Advanced Tools. Click Go. You can also open Kudu directly by going to: https://<app-name>.scm.azurewebsites.net From there, open the Logs page. View live logs across your app and platform The Logs page lets you view logs as they are being written, with filters for timeframe, instance, container, log type, and level. This helps when you want to focus on a specific instance, look only at errors, or separate application logs from platform events. For example, you can use platform logs to understand container lifecycle events, restarts, startup behavior, warmup probe activity, and other platform-side events related to your app. Quickly find the log entries that matter You can use keyword search to narrow down the log stream or historical logs. This is useful when you are looking for a specific error message, request path, exception, dependency failure, timeout, or any application-specific keyword. Instead of scanning through hundreds of entries, you can search for the terms that are relevant to the issue you are investigating. Investigate issues within a specific timeframe The Log stream page also supports viewing logs for a selected time range. This is useful when you know when an issue occurred and want to inspect both application and platform activity around that time. For example, you can filter to a specific timeframe, switch to Application logs, and check what your app was doing when the issue happened. This can help you troubleshoot scenarios such as failed requests, application exceptions, slow startup, container restarts, dependency issues, or configuration problems. Summary The new Log stream page in Kudu makes it easier to work with logs for Azure App Service on Linux. With live streaming, keyword search, historical views, and filters for application and platform logs, you can quickly narrow down the information you need and troubleshoot issues more efficiently. We are continuing to improve the App Service Linux experience to make diagnostics simpler and more useful for day-to-day development and operations.173Views1like1CommentAzure Container Apps Express for Shipping Container Apps Fast
ACA Express Apps are a strong fit for teams that need to ship quickly and can't afford long platform setup cycles. This includes startups, internal platform teams, and product groups deploying APIs, web apps, or agent endpoints that scale with uneven demand. If the priority is fast path-to-production, predictable wake-up behavior, and minimal infrastructure overhead, this model is likely the right choice. To put real numbers behind that, I built a live demo that races Express against a Consumption environment on the same app. The measurements below come from that demo, not from a spec sheet. MicroVMs make cold starts practical Cold start delays usually come from rebuilding runtime state whenever an app wakes up. ACA Express Apps reduce that overhead with MicroVM-based startup paths built for fast boot and isolation. The result is faster instance readiness without trading off security. The gap shows up clearly when both apps have scaled all the way to zero. Waking from a genuine cold start, Express comes back in about 1.5 seconds. The same app in a Consumption environment takes about 20 seconds to answer the first request. Both were measured live in the browser, from request to first response. Disk and memory state restore is the speed multiplier State restoration skips the app's internal boot sequence entirely. Instead of replaying the same initialization work on every start, ACA Express Apps can restore disk and memory state so the app starts closer to ready. That reduces time-to-first-request and smooths scale events, especially for framework-heavy workloads. It's also what lets scale-to-zero stay practical: the app costs nothing while idle, but the wake-up penalty stays in the low single-digit seconds instead of the tens of seconds you'd otherwise pay. Environmentless changes the deployment experience Skipping the environment setup completely changes the deployment workflow. Teams can ship the container app without first managing environment sprawl, while still getting the runtime foundations they need. For fast-moving teams, that means less setup overhead and a shorter path to production. You can see how little there is to fill in. Creating an Express app is a single short form. There is no environment to stand up first. And once it's created, the manage view gives you the live URL, status, and the basics you need to operate it. The numbers, side by side Everything below was measured on the same container image, in the West Central US region. What's measured Express Consumption Cold start from zero (request to first response) ~1.5 s ~20 s Environment provisioning ~14 s ~120 s First-time deploy (environment + app, zero to live URL) ~52 s ~166 s App deploy only (environment already exists) ~30 s ~30 s Express is much faster on the two steps that build infrastructure from scratch: cold start and environment provisioning. Once an environment already exists, the two are about the same. Express isn't a different app runtime, it's the same platform with the first-time setup cost stripped down. Get started Express is in public preview. You can have a container on a live URL in the time it takes to read this post. ๐ Azure Container Apps Express overview โ concepts, capabilities, and the current feature support matrix. ๐ Create your first Express app โ the CLI commands and portal steps to get an app running. ๐ ๏ธ New Container Apps portal โ create and manage Express apps in the streamlined UI. ๐งช Test Express apps locally โ validate your container before you deploy. โ Express FAQ โ preview status, limits, regions, and how Express relates to standard Container Apps. ๐ Deploy an Express app ยท Read the docs ยท Browse the FAQ When speed matters, ACA Express is the best tool for deploying containers. It skips the platform setup delays without sacrificing reliability under load.93Views1like0CommentsApp Service Easy MCP: Add AI Agent Capabilities to Your Existing Apps with Zero Code Changes
The age of AI agents is here. Tools like GitHub Copilot, Claude, and other AI assistants are no longer just answering questionsโthey're taking actions, calling APIs, and automating complex workflows. But how do you make your existing applications and APIs accessible to these intelligent agents? At Microsoft Ignite, I teamed up to present session BRK116: Apps, agents, and MCP is the AI innovation recipe, where I demonstrated how you can add agentic capabilities to your existing applications with little to no code changes. Today, I'm excited to share a concrete example of that vision: Easy MCPโa way to expose any REST API to AI agents with absolutely zero code changes to your existing apps. The Challenge: Bridging REST APIs and AI Agents Most organizations have invested years building REST APIs that power their applications. These APIs represent critical business logic, data access patterns, and integrations. But AI agents speak a different languageโthey use protocols like Model Context Protocol (MCP) to discover and invoke tools. The traditional approach would require you to: Learn the MCP SDK Write new MCP server code Manually map each API endpoint to an MCP tool Deploy and maintain additional infrastructure What if you could skip all of that? Introducing Easy MCP (a proof of concept not associated with the App Service platform) Easy MCP is an OpenAPI-to-MCP translation layer that automatically generates MCP tools from your existing REST APIs. If your API has an OpenAPI (Swagger) specificationโwhich most modern APIs doโyou can make it accessible to AI agents in minutes. This means that if you have existing apps with OpenAPI specifications already running on App Service, or really any hosting platform, this tool makes enabling MCP seamless. How It Works Point the gateway at your API's base URL Detect your OpenAPI specification automatically Connect and the gateway generates MCP tools for every endpoint Use the MCP endpoint URL with any MCP-compatible AI client That's it. No code changes. No SDK integration. No manual tool definitions. See It in Action Let's say you have a Todo API running on Azure App Service at `https://my-todo-app.azurewebsites.net`. In just a few clicks: Open the Easy MCP web UI Enter your API URL Click "Detect" to find your OpenAPI spec Click "Connect" Now configure your AI client (like VS Code with GitHub Copilot) to use the gateway's MCP endpoint: { "servers": { "my-api": { "type": "http", "url": "https://my-gateway.azurewebsites.net/mcp" } } } Instantly, your AI assistant can: "What's on my todo list?" "Add 'Review PR #123' to my todos with high priority" "Mark all tasks as complete" All powered by your existing REST API, with zero modifications. The Bigger Picture: Modernization Without Rewrites This approach aligns perfectly with a broader modernization strategy we're enabling on Azure App Service. App Service Managed Instance: Move and Modernize Legacy Apps For organizations with legacy applicationsโwhether they're running on older Windows frameworks, custom configurations, or traditional hosting environmentsโAzure App Service Managed Instance provides a path to the cloud with minimal friction. You can migrate these applications to a fully managed platform without rewriting code. Easy MCP: Add AI Capabilities Post-Migration Once your legacy applications are running on App Service, Easy MCP becomes the next step in your modernization journey. That 10-year-old internal API? It can now be accessed by AI agents. That legacy inventory system? AI assistants can query and update it. No code changes needed. The modernization path: Migrate legacy apps to App Service with Managed Instance (no code changes) Expose APIs to AI agents with Easy MCP Gateway (no code changes) Empower your organization with AI-assisted workflows Deploy It Yourself Easy MCP is open source and ready to deploy. If you already have an existing API to use with this tool, go for it. If you need an app to test with, check out this sample. Make sure you complete the "Add OpenAPI functionality to your web app" step. You don't need to go beyond that. GitHub Repository: seligj95/app-service-easy-mcp Deploy to Azure in minutes with Azure Developer CLI: azd auth login azd init azd up Or run it locally for testing: npm install npm run dev # Open http://localhost:3000 What's Next: Native App Service Integration Here's where it gets really exciting. We're exploring ways to build this capability directly into the Azure App Service platform so you won't have to deploy a second app or additional resources to get this capability. Azure API Management recently released a feature with functionality to expose a REST API, including an API on App Service, as an MCP server, which I highly recommend that you check out if you're familiar with Azure API Management. But in this case, imagine a future where adding AI agent capabilities to your App Service apps is as simple as flipping a switch in the Azure Portalโno gateway or API Management deployment required, no additional infrastructure or services to manage, and built-in security, monitoring, scaling, etc.โall of the features you're already using and are familiar with on App Service. Stay tuned for updates as we continue to make Azure App Service the best platform for AI-powered applications. And please share your feedback on Easy MCPโwe want to hear how you're using it and what features you'd like to see next as we consider this feature for native integration.1.2KViews1like2CommentsGovern AI Agents on App Service with the Microsoft Agent Governance Toolkit
Part 3 of 3 โ Multi-Agent AI on Azure App Service In Blog 1, we built a multi-agent travel planner with Microsoft Agent Framework 1.0 on App Service. In Blog 2, we added observability with OpenTelemetry and the new Application Insights Agents view. Now in Part 3, we secure those agents for production with the Microsoft Agent Governance Toolkit. This post assumes you've followed the guidance in Blog 1 to deploy the multi-agent travel planner to Azure App Service. If you haven't deployed the app yet, start there first. The governance gap Our travel planner works. It's observable. But here's the question I'm hearing from customers: "How do I make sure my agents don't do something they shouldn't?" It's a fair question. Our six agents โ Coordinator, Currency Converter, Weather Advisor, Local Knowledge, Itinerary Planner, and Budget Optimizer โ can call external APIs, process user data, and make autonomous decisions. In a demo, that's impressive. In production, that's a risk surface. Consider what can go wrong with ungoverned agents: Unauthorized API calls โ An agent calls an external API it was never intended to use, leaking data or incurring costs Sensitive data exposure โ An agent passes PII to a third-party service without consent controls Runaway token spend โ A recursive agent loop burns through your OpenAI budget in minutes Tool misuse โ A prompt injection tricks an agent into executing a tool it shouldn't Cascading failures โ One agent's error propagates through the entire multi-agent workflow These aren't theoretical. In December 2025, OWASP published the Top 10 for Agentic Applications โ the first formal taxonomy of risks specific to autonomous AI agents, including goal hijacking, tool misuse, identity abuse, memory poisoning, and rogue agents. Regulators are paying attention too: the EU AI Act's high-risk AI obligations take effect in August 2026, and the Colorado AI Act becomes enforceable in June 2026. The bottom line: if you're running agents in production, you need governance. Not eventually โ now. What the Agent Governance Toolkit does The Agent Governance Toolkit is an open-source project (MIT license) from Microsoft that brings runtime security governance to autonomous AI agents. It's the first toolkit to address all 10 OWASP agentic AI risks with deterministic, sub-millisecond policy enforcement. The toolkit is organized into 7 packages: Package What it does Think of it as... Agent OS Stateless policy engine, intercepts every action before execution (<0.1ms p99) The kernel for AI agents Agent Mesh Cryptographic identity (DIDs), inter-agent trust protocol, dynamic trust scoring mTLS for agents Agent Runtime Execution rings (like CPU privilege levels), saga orchestration, kill switch Process isolation for agents Agent SRE SLOs, error budgets, circuit breakers, chaos engineering SRE practices for agents Agent Compliance Automated governance verification, regulatory mapping (EU AI Act, HIPAA, SOC2) Compliance-as-code Agent Marketplace Plugin lifecycle management, Ed25519 signing, supply-chain security Package manager security Agent Lightning RL training governance with policy-enforced runners Safe training guardrails The toolkit is available in Python, TypeScript, Rust, Go, and .NET. It's framework-agnostic โ it works with MAF, LangChain, CrewAI, Google ADK, and more. For our ASP.NET Core travel planner, we'll use the .NET SDK via NuGet ( Microsoft.AgentGovernance ). For this blog, we're focusing on three packages: Agent OS โ the policy engine that intercepts and evaluates every agent action Agent Compliance โ regulatory mapping and audit trail generation Agent SRE โ SLOs and circuit breakers for agent reliability How easy it was to add governance Here's the part that surprised me. I expected adding governance to a production agent system to be a multi-hour effort โ new infrastructure, complex configuration, extensive refactoring. Instead, it took about 30 minutes. Here's exactly what we changed: Step 1: Add NuGet packages Three packages added to TravelPlanner.Shared.csproj : <itemgroup> <!-- Existing packages --> <packagereference include="Azure.Monitor.OpenTelemetry.AspNetCore" version="1.3.0"> <packagereference include="Microsoft.Agents.AI" version="1.0.0"> <!-- NEW: Agent Governance Toolkit (single package, all features included) --> <packagereference include="Microsoft.AgentGovernance" version="3.0.2"> </packagereference></packagereference></packagereference></itemgroup> Step 2: Create the policy file One new file: governance-policies.yaml in the project root. This is where all your governance rules live: apiVersion: governance.toolkit/v1 name: travel-planner-governance description: Policy enforcement for the multi-agent travel planner on App Service scope: global defaultAction: deny rules: - name: allow-currency-conversion condition: "tool == 'ConvertCurrency'" action: allow priority: 10 description: Allow Currency Converter agent to call Frankfurter exchange rate API - name: allow-weather-forecast condition: "tool == 'GetWeatherForecast'" action: allow priority: 10 description: Allow Weather Advisor agent to call NWS forecast API - name: allow-weather-alerts condition: "tool == 'GetWeatherAlerts'" action: allow priority: 10 description: Allow Weather Advisor agent to check NWS weather alerts Step 3: One line in BaseAgent.cs This is the moment. Here's our BaseAgent.cs before: Agent = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName) .Build(); And after: var kernel = serviceProvider.GetService<GovernanceKernel>(); if (kernel is not null) builder.UseGovernance(kernel, AgentName); Agent = builder.Build(); One line of intent, two lines of null-safety. The .UseGovernance(kernel, AgentName) call intercepts every tool/function invocation in the agent's pipeline, evaluating it against the loaded policies before execution. If the GovernanceKernel isn't registered (governance disabled), agents work exactly as before โ no crash, no code change needed. Here's the full updated constructor using IServiceProvider to optionally resolve governance: using AgentGovernance; using Microsoft.Extensions.DependencyInjection; public abstract class BaseAgent : IAgent { protected readonly ILogger Logger; protected readonly AgentOptions Options; protected readonly AIAgent Agent; // Constructor for simple agents without tools protected BaseAgent( ILogger logger, IOptions<AgentOptions> options, IChatClient chatClient, IServiceProvider serviceProvider) { Logger = logger; Options = options.Value; var builder = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName); var kernel = serviceProvider.GetService<GovernanceKernel>(); if (kernel is not null) builder.UseGovernance(kernel, AgentName); Agent = builder.Build(); } // Constructor for agents with tools protected BaseAgent( ILogger logger, IOptions<AgentOptions> options, IChatClient chatClient, ChatOptions chatOptions, IServiceProvider serviceProvider) { Logger = logger; Options = options.Value; var builder = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description, tools: chatOptions.Tools?.ToList()) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName); var kernel = serviceProvider.GetService<GovernanceKernel>(); if (kernel is not null) builder.UseGovernance(kernel, AgentName); Agent = builder.Build(); } // ... rest unchanged } Step 4: DI registrations in Program.cs A few lines to wire up governance in the dependency injection container: using AgentGovernance; // ... existing builder setup ... // Configure OpenTelemetry with Azure Monitor (existing โ from Blog 2) builder.Services.AddOpenTelemetry().UseAzureMonitor(); // NEW: Configure Agent Governance Toolkit // Load policy from YAML, register as singleton. Agents resolve via IServiceProvider. var policyPath = Path.Combine(builder.Environment.ContentRootPath, "governance-policies.yaml"); if (File.Exists(policyPath)) { try { var yaml = File.ReadAllText(policyPath); var kernel = new GovernanceKernel(new GovernanceOptions { EnableAudit = true, EnableMetrics = true }); kernel.LoadPolicyFromYaml(yaml); builder.Services.AddSingleton(kernel); Console.WriteLine($"[Governance] Loaded policies from {policyPath}"); } catch (Exception ex) { Console.WriteLine($"[Governance] Failed to load: {ex.Message}. Running without governance."); } } That's it. Your agents are now governed. Let me repeat that because it's the core message of this blog: we added production governance to a six-agent system by adding one NuGet package, creating one YAML policy file, adding a few lines to our base agent class, and registering the governance kernel in DI. No new infrastructure. No complex rewiring. No multi-sprint project. If you followed Blog 1 and Blog 2, you can do this in 30 minutes. Policy flexibility deep-dive The YAML policy language is intentionally simple to start with, but it supports real complexity when you need it. Let's walk through what each policy in our file does. API allowlists and blocklists Our travel planner calls two external APIs: Frankfurter (currency exchange) and the National Weather Service. The defaultAction: deny combined with explicit allow rules ensures agents can only call these approved tools. If an agent attempts to call any other function โ whether through a prompt injection or a bug โ the call is blocked before it executes: defaultAction: deny rules: - name: allow-currency-conversion condition: "tool == 'ConvertCurrency'" action: allow priority: 10 - name: allow-weather-forecast condition: "tool == 'GetWeatherForecast'" action: allow priority: 10 When a blocked call happens, you'll see output like this in your logs: [Governance] Tool call 'DeleteDatabase' blocked for agent 'LocalKnowledgeAgent': No matching rules; default action is deny. Condition language The condition field supports equality checks, pattern matching, and boolean logic. You can match on tool name, agent ID, or any key in the evaluation context: # Match a specific tool condition: "tool == 'ConvertCurrency'" # Match multiple tools with OR condition: "tool == 'GetWeatherForecast' or tool == 'GetWeatherAlerts'" # Match by agent condition: "agent == 'CurrencyConverterAgent' and tool == 'ConvertCurrency'" Priority and conflict resolution When multiple rules match, the toolkit evaluates by priority (higher number = higher priority). A deny rule at priority 100 will override an allow rule at priority 10. This lets you layer broad allows with specific denies: rules: - name: allow-all-weather-tools condition: "tool == 'GetWeatherForecast' or tool == 'GetWeatherAlerts'" action: allow priority: 10 - name: block-during-maintenance condition: "tool == 'GetWeatherForecast'" action: deny priority: 100 description: Temporarily block NWS calls during API maintenance Advanced: OPA Rego and Cedar The YAML policy language handles most scenarios, but for teams with advanced needs, the toolkit also supports OPA Rego and Cedar policy languages. You can mix them โ use YAML for simple rules and Rego for complex conditional logic: # policies/advanced.rego โ Example: time-based access control package travel_planner.governance default allow_tool_call = false allow_tool_call { input.agent == "CurrencyConverterAgent" input.tool == "get_exchange_rate" time.weekday(time.now_ns()) != "Sunday" # Markets closed } Start simple with YAML. Add complexity only when you need it. Why App Service for governed agent workloads You might be wondering: why does hosting platform matter for governance? It matters a lot. The governance toolkit handles the application-level policies, but a production agent system also needs platform-level security, networking, identity, and deployment controls. App Service gives you these out of the box. Managed Identity Governance policies enforce what agents can access. Managed Identity handles how they authenticate โ without secrets to manage, rotate, or leak. Our travel planner already uses DefaultAzureCredential for Azure OpenAI, Cosmos DB, and Service Bus. Governance layers on top of this identity foundation. VNet Integration + Private Endpoints The governance toolkit enforces API allowlists at the application level. App Service's VNet integration and private endpoints enforce network boundaries at the infrastructure level. This is defense in depth: even if a governance policy is misconfigured, the network layer prevents unauthorized egress. Your agents can only reach the networks you've explicitly allowed. Easy Auth App Service's built-in authentication (Easy Auth) protects your agent APIs without custom code. Before a request even reaches your governance engine, App Service has already validated the caller's identity. No custom auth middleware. No JWT parsing. Just toggle it on. Deployment Slots This is underrated for governance. With deployment slots, you can test new governance policies in a staging slot before swapping to production. Deploy updated governance-policies.yaml to staging, run your test suite, verify the policies work as expected, and then swap. Zero-downtime policy updates with full rollback capability. App Insights integration Governance audit events flow into the same Application Insights instance we configured in Blog 2. This means your governance decisions appear alongside your OTel traces in the Agents view. One pane of glass for agent behavior and governance enforcement. Always-on + WebJobs Our travel planner uses WebJobs for long-running agent workflows. With App Service's Always-on feature, those workflows stay warm, and governance is continuous โ no cold-start gaps where agents run unmonitored. azd deployment One command deploys the full governed stack โ application code, governance policies, infrastructure, and monitoring: azd up App Service gives you the enterprise production features governance needs โ identity, networking, observability, safe deployment โ out of the box. The governance toolkit handles agent-level policy enforcement; App Service handles platform-level security. Together, they're a complete governed agent platform. Governance audit events in App Insights In Blog 2, we set up OpenTelemetry and the Application Insights Agents view to monitor agent behavior. With the governance toolkit, those same traces now include governance audit events โ every policy decision is recorded as a span attribute on the agent's trace. When you open a trace in the Agents view, you'll see governance events inline: Policy: api-allowlist โ ALLOWED โ CurrencyConverterAgent called Frankfurter API, permitted Policy: token-budget โ ALLOWED โ Request used 3,200 tokens, within per-request limit of 8,000 Policy: rate-limit โ THROTTLED โ WeatherAdvisorAgent exceeded 60 calls/min, request delayed For deeper analysis, use KQL to query governance events directly. Here's a query that finds all policy violations in the last 24 hours: // Find all governance policy violations in the last 24 hours traces | where timestamp > ago(24h) | where customDimensions["governance.decision"] != "ALLOWED" | extend agentName = tostring(customDimensions["agent.name"]), policyName = tostring(customDimensions["governance.policy"]), decision = tostring(customDimensions["governance.decision"]), violationReason = tostring(customDimensions["governance.reason"]), targetUrl = tostring(customDimensions["tool.target_url"]) | project timestamp, agentName, policyName, decision, violationReason, targetUrl | order by timestamp desc And here's one for tracking token budget consumption across agents: // Token budget consumption by agent over the last hour customMetrics | where timestamp > ago(1h) | where name == "governance.tokens.consumed" | extend agentName = tostring(customDimensions["agent.name"]) | summarize totalTokens = sum(value), avgTokensPerRequest = avg(value), maxTokensPerRequest = max(value) by agentName, bin(timestamp, 5m) | order by totalTokens desc This is the power of integrating governance with your existing observability stack. You don't need a separate governance dashboard โ everything lives in the same App Insights workspace you already know. SRE for agents The Agent SRE package brings Site Reliability Engineering practices to agent systems. This was the part that got me most excited, because it addresses a question I hear constantly: "How do I know my agents are actually reliable?" Service Level Objectives (SLOs) We defined SLOs in our policy file: slos: - name: weather-agent-latency agent: "WeatherAdvisorAgent" metric: latency-p99 target: 5000ms window: 5m This says: "The Weather Advisor Agent must respond within 5 seconds at the 99th percentile, measured over a 5-minute rolling window." When the SLO is breached, the toolkit emits an alert event and can trigger automated responses. Circuit breakers Circuit breakers prevent cascading failures. If an agent fails 5 times in a row, the circuit opens, and subsequent requests get a fast failure response instead of waiting for another timeout: circuit-breakers: - agent: "*" failure-threshold: 5 recovery-timeout: 30s half-open-max-calls: 2 After 30 seconds, the circuit enters a half-open state, allowing 2 test calls through. If those succeed, the circuit closes and normal operation resumes. If they fail, the circuit opens again. This pattern is battle-tested in microservices โ now it protects your agents too. Error budgets Error budgets tie SLOs to business decisions. If your Coordinator Agent's success rate target is 99.5% over a 15-minute window, that means you have an error budget of 0.5%. When the budget is consumed, the toolkit can automatically reduce agent autonomy โ for example, requiring human approval for high-risk actions until the error budget recovers. SRE practices turn agent reliability from a hope into a measurable, enforceable contract. Architecture Here's how everything fits together after adding governance: โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ Azure App Service โ โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ โ Frontend โโโโโถโ ASP.NET Core API โ โ โ โ (Static) โ โ โ โ โ โโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ โ โ โ Coordinator Agent โ โ โ โ โ โ โโโโโโโโโ โโโโโโโโโโโโโโ โ โ โ โ โ โ โ OTel โโโถโ Governance โ โ โ โ โ โ โ โโโโโโโโโ โ Engine โ โ โ โ โ โ โ โ โโโโโโโโโโ โ โ โ โ โ โ โ โ โPoliciesโ โ โ โ โ โ โ โ โ โโโโโโโโโโ โ โ โ โ โ โ โ โโโโโโโฌโโโโโโโ โ โ โ โ โ โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโ โ โ โ โ โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโ โ โ โ โ โ Specialist Agents โ โ โ โ โ โ โ (Currency, Weather, etc.) โ โ โ โ โ โ Each with OTel + Governance โ โ โ โ โ โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโ โ โ โ โโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ โ โ โ โ โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโผโโโโโโโโโโ โ โ โ Managed โ โ VNet โ โ App Insights โ โ โ โ Identity โ โIntegrationโ โ (Traces + โ โ โ โ (no keys) โ โ(network โ โ Governance Audit) โ โ โ โ โ โ boundary) โ โ โ โ โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ Only allowed APIs โผ โโโโโโโโโโโโโโโโโโโโโโโโ โ External APIs โ โ โ Frankfurter API โ โ โ NWS Weather API โ โ โ Everything else โ โโโโโโโโโโโโโโโโโโโโโโโโ The key insight: governance is a transparent layer in the agent pipeline. It sits between the agent's decision and the action's execution. The agent code doesn't know or care about governance โ it just builds the agent with .UseGovernance() and the policy engine handles the rest. Bring it to your own agents We've shown governance with Microsoft Agent Framework on .NET, but the toolkit is framework-agnostic. Here's how to add it to other popular frameworks: LangChain (Python) from agent_governance import PolicyEngine, GovernanceCallbackHandler policy_engine = PolicyEngine.from_yaml("governance-policies.yaml") # Add governance as a LangChain callback handler agent = create_react_agent( llm=llm, tools=tools, callbacks=[GovernanceCallbackHandler(policy_engine)] ) CrewAI (Python) from agent_governance import PolicyEngine from agent_governance.integrations.crewai import GovernanceTaskDecorator policy_engine = PolicyEngine.from_yaml("governance-policies.yaml") # Add governance as a CrewAI task decorator @GovernanceTaskDecorator(policy_engine) def research_task(agent, context): return agent.execute(context) Google ADK (Python) from agent_governance import PolicyEngine from agent_governance.integrations.google_adk import GovernancePlugin policy_engine = PolicyEngine.from_yaml("governance-policies.yaml") # Add governance as a Google ADK plugin agent = Agent( model="gemini-2.0-flash", tools=[...], plugins=[GovernancePlugin(policy_engine)] ) TypeScript / Node.js import { PolicyEngine } from '@microsoft/agentmesh-sdk'; const policyEngine = PolicyEngine.fromYaml('governance-policies.yaml'); // Use as middleware in your agent pipeline agent.use(policyEngine.middleware()); Every integration hooks into the framework's native extension points โ callbacks, decorators, plugins, middleware โ so adding governance doesn't require rewriting your agent code. Install the package, point it at your policy file, and you're governed. What's next This wraps up our three-part series on building production-ready multi-agent AI applications on Azure App Service: Blog 1: Build โ Deploy a multi-agent travel planner with Microsoft Agent Framework 1.0 Blog 2: Monitor โ Add observability with OpenTelemetry and the Application Insights Agents view Blog 3: Govern โ Secure agents for production with the Agent Governance Toolkit (you are here) The progression is intentional: first make it work, then make it visible, then make it safe. And the consistent theme across all three parts is that App Service makes each step easier โ managed hosting for Blog 1, integrated monitoring for Blog 2, and platform-level security features for Blog 3. Next steps for your agents Explore the Agent Governance Toolkit โ star the repo, browse the 20 tutorials, try the demo Customize policies for your compliance needs โ start with our YAML template and adapt it to your domain. Healthcare teams: enable HIPAA mappings. Finance teams: add SOC2 controls. Explore Agent Mesh for multi-agent trust โ if you have agents communicating across services or trust boundaries, Agent Mesh's cryptographic identity and trust scoring add another layer of defense Deploy the sample โ clone our travel planner repo, run azd up , and see governed agents in action AI agents are becoming autonomous decision-makers in high-stakes domains. The question isn't whether we need governance โ it's whether we build it proactively, before incidents force our hand. With the Agent Governance Toolkit and Azure App Service, you can add production governance to your agents today. In about 30 minutes.955Views0likes0CommentsBuild Multi-Agent AI Apps on Azure App Service with Microsoft Agent Framework 1.0
Part 1 of 3 โ Multi-Agent AI on Azure App Service This is part 1 of a 3 part series on deploying and working with multi-agent AI on Azure App Service. Follow allong to learn how to deploy, manage, observe, and secure your agents on Azure App Service. A couple of months ago, we published a three-part series showing how to build multi-agent AI systems on Azure App Service using preview packages from the Microsoft Agent Framework (MAF) (formerly AutoGen / Semantic Kernel Agents). The series walked through async processing, the request-reply pattern, and client-side multi-agent orchestration โ all running on App Service. Since then, Microsoft Agent Framework has reached 1.0 GA โ unifying AutoGen and Semantic Kernel into a single, production-ready agent platform. This post is a fresh start with the GA bits. We'll rebuild our travel-planner sample on the stable API surface, call out the breaking changes from preview, and get you up and running fast. All of the code is in the companion repo: seligj95/app-service-multi-agent-maf-otel. What Changed in MAF 1.0 GA The 1.0 release is more than a version bump. Here's what moved: Unified platform. AutoGen and Semantic Kernel agent capabilities have converged into Microsoft.Agents.AI . One package, one API surface. Stable APIs with long-term support. The 1.0 contract is now locked for servicing. No more preview churn. Breaking change โ Instructions on options removed. In preview, you set instructions through ChatClientAgentOptions.Instructions . In GA, pass them directly to the ChatClientAgent constructor. Breaking change โ RunAsync parameter rename. The thread parameter is now session (type AgentSession ). If you were using named arguments, this is a compile error. Microsoft.Extensions.AI upgraded. The framework moved from the 9.x preview of Microsoft.Extensions.AI to the stable 10.4.1 release. OpenTelemetry integration built in. The builder pipeline now includes UseOpenTelemetry() out of the box โ more on that in Blog 2. Our project references reflect the GA stack: <PackageReference Include="Microsoft.Agents.AI" Version="1.0.0" /> <PackageReference Include="Microsoft.Extensions.AI" Version="10.4.1" /> <PackageReference Include="Azure.AI.OpenAI" Version="2.1.0" /> Why Azure App Service for AI Agents? If you're building with Microsoft Agent Framework, you need somewhere to run your agents. You could reach for Kubernetes, containers, or serverless โ but for most agent workloads, Azure App Service is the sweet spot. Here's why: No infrastructure management โ App Service is fully managed. No clusters to configure, no container orchestration to learn. Deploy your .NET or Python agent code and it just runs. Always On โ Agent workflows can take minutes. App Service's Always On feature (on Premium tiers) ensures your background workers never go cold, so agents are ready to process requests instantly. WebJobs for background processing โ Long-running agent workflows don't belong in HTTP request handlers. App Service's built-in WebJob support gives you a dedicated background worker that shares the same deployment, configuration, and managed identity โ no separate compute resource needed. Managed Identity everywhere โ Zero secrets in your code. App Service's system-assigned managed identity authenticates to Azure OpenAI, Service Bus, Cosmos DB, and Application Insights automatically. No connection strings, no API keys, no rotation headaches. Built-in observability โ Native integration with Application Insights and OpenTelemetry means you can see exactly what your agents are doing in production (more on this in Part 2). Enterprise-ready โ VNet integration, deployment slots for safe rollouts, custom domains, auto-scaling rules, and built-in authentication. All the things you'll need when your agent POC becomes a production service. Cost-effective โ A single P0v4 instance (~$75/month) hosts both your API and WebJob worker. Compare that to running separate container apps or a Kubernetes cluster for the same workload. The bottom line: App Service lets you focus on building your agents, not managing infrastructure. And since MAF supports both .NET and Python โ both first-class citizens on App Service โ you're covered regardless of your language preference. Architecture Overview The sample is a travel planner that coordinates six specialized agents to build a personalized trip itinerary. Users fill out a form (destination, dates, budget, interests), and the system returns a comprehensive travel plan complete with weather forecasts, currency advice, a day-by-day itinerary, and a budget breakdown. The Six Agents Currency Converter โ calls the Frankfurter API for real-time exchange rates Weather Advisor โ calls the National Weather Service API for forecasts and packing tips Local Knowledge Expert โ cultural insights, customs, and hidden gems Itinerary Planner โ day-by-day scheduling with timing and costs Budget Optimizer โ allocates spend across categories and suggests savings Coordinator โ assembles everything into a polished final plan Four-Phase Workflow Phase Agents Execution 1 โ Parallel Gathering Currency, Weather, Local Knowledge Task.WhenAll 2 โ Itinerary Itinerary Planner Sequential (uses Phase 1 context) 3 โ Budget Budget Optimizer Sequential (uses Phase 2 output) 4 โ Assembly Coordinator Final synthesis Infrastructure Azure App Service (P0v4) โ hosts the API and a continuous WebJob for background processing Azure Service Bus โ decouples the API from heavy AI work (async request-reply) Azure Cosmos DB โ stores task state, results, and per-agent chat histories (24-hour TTL) Azure OpenAI (GPT-4o) โ powers all agent LLM calls Application Insights + Log Analytics โ monitoring and diagnostics ChatClientAgent Deep Dive At the core of every agent is ChatClientAgent from Microsoft.Agents.AI . It wraps an IChatClient (from Microsoft.Extensions.AI ) with instructions, a name, a description, and optionally a set of tools. This is client-side orchestration โ you control the chat history, lifecycle, and execution order. No server-side Foundry agent resources are created. Here's the BaseAgent pattern used by all six agents in the sample: // BaseAgent.cs โ constructor for agents with tools Agent = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description, tools: chatOptions.Tools?.ToList()) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName) .Build(); Notice the builder pipeline: .AsBuilder().UseOpenTelemetry(...).Build() . This opts every agent into the framework's built-in OpenTelemetry instrumentation with a single line. We'll explore what that telemetry looks like in Blog 2. Invoking an agent is equally straightforward: // BaseAgent.cs โ InvokeAsync public async Task<ChatMessage> InvokeAsync( IList<ChatMessage> chatHistory, CancellationToken cancellationToken = default) { var response = await Agent.RunAsync( chatHistory, session: null, options: null, cancellationToken); return response.Messages.LastOrDefault() ?? new ChatMessage(ChatRole.Assistant, "No response generated."); } Key things to note: session: null โ this is the renamed parameter (was thread in preview). We pass null because we manage chat history ourselves. The agent receives the full chatHistory list, so context accumulates across turns. Simple agents (Local Knowledge, Itinerary Planner, Budget Optimizer, Coordinator) use the tool-less constructor; agents that call external APIs (Currency, Weather) use the constructor that accepts ChatOptions with tools. Tool Integration Two of our agents โ Weather Advisor and Currency Converter โ call real external APIs through the MAF tool-calling pipeline. Tools are registered using AIFunctionFactory.Create() from Microsoft.Extensions.AI . Here's how the WeatherAdvisorAgent wires up its tool: // WeatherAdvisorAgent.cs private static ChatOptions CreateChatOptions( IWeatherService weatherService, ILogger logger) { var chatOptions = new ChatOptions { Tools = new List<AITool> { AIFunctionFactory.Create( GetWeatherForecastFunction(weatherService, logger)) } }; return chatOptions; } GetWeatherForecastFunction returns a Func<double, double, int, Task<string>> that the model can call with latitude, longitude, and number of days. Under the hood, it hits the National Weather Service API and returns a formatted forecast string. The Currency Converter follows the same pattern with the Frankfurter API. This is one of the nicest parts of the GA API: you write a plain C# method, wrap it with AIFunctionFactory.Create() , and the framework handles the JSON schema generation, function-call parsing, and response routing automatically. Multi-Phase Workflow Orchestration The TravelPlanningWorkflow class coordinates all six agents. The key insight is that the orchestration is just C# code โ no YAML, no graph DSL, no special runtime. You decide when agents run, what context they receive, and how results flow between phases. // Phase 1: Parallel Information Gathering var gatheringTasks = new[] { GatherCurrencyInfoAsync(request, state, progress, cancellationToken), GatherWeatherInfoAsync(request, state, progress, cancellationToken), GatherLocalKnowledgeAsync(request, state, progress, cancellationToken) }; await Task.WhenAll(gatheringTasks); After Phase 1 completes, results are stored in a WorkflowState object โ a simple dictionary-backed container that holds per-agent chat histories and contextual data: // WorkflowState.cs public Dictionary<string, object> Context { get; set; } = new(); public Dictionary<string, List<ChatMessage>> AgentChatHistories { get; set; } = new(); Phases 2โ4 run sequentially, each pulling context from the previous phase. For example, the Itinerary Planner receives weather and local knowledge gathered in Phase 1: var localKnowledge = state.GetFromContext<string>("LocalKnowledge") ?? ""; var weatherAdvice = state.GetFromContext<string>("WeatherAdvice") ?? ""; var itineraryChatHistory = state.GetChatHistory("ItineraryPlanner"); itineraryChatHistory.Add(new ChatMessage(ChatRole.User, $"Create a detailed {days}-day itinerary for {request.Destination}..." + $"\n\nWEATHER INFORMATION:\n{weatherAdvice}" + $"\n\nLOCAL KNOWLEDGE & TIPS:\n{localKnowledge}")); var itineraryResponse = await _itineraryAgent.InvokeAsync( itineraryChatHistory, cancellationToken); This pattern โ parallel fan-out followed by sequential context enrichment โ is simple, testable, and easy to extend. Need a seventh agent? Add it to the appropriate phase and wire it into WorkflowState . Async Request-Reply Pattern A multi-agent workflow with six LLM calls (some with tool invocations) can easily run 30โ60 seconds. That's well beyond typical HTTP timeout expectations and not a great user experience for a synchronous request. We use the Async Request-Reply pattern to handle this: The API receives the travel plan request and immediately queues a message to Service Bus. It stores an initial task record in Cosmos DB with status queued and returns a taskId to the client. A continuous WebJob (running as a separate process on the same App Service plan) picks up the message, executes the full multi-agent workflow, and writes the result back to Cosmos DB. The client polls the API for status updates until the task reaches completed . This pattern keeps the API responsive, makes the heavy work retriable (Service Bus handles retries and dead-lettering), and lets the WebJob run independently โ you can restart it without affecting the API. We covered this pattern in detail in the previous series, so we won't repeat the plumbing here. Deploy with azd The repo is wired up with the Azure Developer CLI for one-command provisioning and deployment: git clone https://github.com/seligj95/app-service-multi-agent-maf-otel.git cd app-service-multi-agent-maf-otel azd auth login azd up azd up provisions the following resources via Bicep: Azure App Service (P0v4 Windows) with a continuous WebJob Azure Service Bus namespace and queue Azure Cosmos DB account, database, and containers Azure AI Services (Azure OpenAI with GPT-4o deployment) Application Insights and Log Analytics workspace Managed Identity with all necessary role assignments After deployment completes, azd outputs the App Service URL. Open it in your browser, fill in the travel form, and watch six agents collaborate on your trip plan in real time. What's Next We now have a production-ready multi-agent app running on App Service with the GA Microsoft Agent Framework. But how do you actually observe what these agents are doing? When six agents are making LLM calls, invoking tools, and passing context between phases โ you need visibility into every step. In the next post, we'll dive deep into how we instrumented these agents with OpenTelemetry and the new Agents (Preview) view in Application Insights โ giving you full visibility into agent runs, token usage, tool calls, and model performance. You already saw the .UseOpenTelemetry() call in the builder pipeline; Blog 2 shows what that telemetry looks like end to end and how to light up the new Agents experience in the Azure portal. Stay tuned! Resources Sample repo โ app-service-multi-agent-maf-otel Microsoft Agent Framework 1.0 GA Announcement Microsoft Agent Framework Documentation Previous Series โ Part 3: Client-Side Multi-Agent Orchestration on App Service Microsoft.Extensions.AI Documentation Azure App Service Documentation Blog 2: Monitor AI Agents on App Service with OpenTelemetry and the New Application Insights Agents View Blog 3: Govern AI Agents on App Service with the Microsoft Agent Governance Toolkit1.5KViews0likes0CommentsOnly 8.5% of MCP Servers Use OAuth โ Here's How to Host One Securely on App Service
The Model Context Protocol exploded onto the scene because it's easy. Stand up a server, expose a few tools, point Claude or VS Code at it, and your agent can suddenly read files, hit APIs, and run code. That same ease is the problem: most MCP servers ship with no authentication at all, and they're getting pushed straight to the internet. The numbers are bleak-into-an-incident-report bad. Astrix Research's State of MCP Server Security 2025 found that only 8.5% of MCP servers use OAuth โ the rest lean on static API keys or nothing. And the CVEs have already started: CVE-2025-6514 โ a CVSS 9.6 OS command-injection flaw in mcp-remote . If a client connects to a malicious or hijacked MCP server, the server can inject shell commands through the OAuth authorization_endpoint during discovery and achieve remote code execution on the client. Roughly half a million downloads were exposed. CVE-2025-49596 โ RCE in the MCP Inspector dev tool, which shipped with no authentication on its local web UI. A crafted request from a webpage you happened to visit could execute code on your machine. The throughline: MCP doesn't enforce security at the protocol level. The spec is explicit that authorization is optional and implementation-dependent. That's a reasonable design choice for a transport, but it means you own the perimeter. Skip it, and you've published an unauthenticated RPC endpoint that can read secrets and run tools. So let's not skip it. This post walks through a hardened MCP server on Azure App Service that closes every gap above โ and most of it is platform configuration, not code you have to write and get right yourself. Sample: seligj95/app-service-secure-mcp. One azd up (plus an Entra app registration the hook creates for you) gives you an MCP server behind Easy Auth, talking to Key Vault over a managed identity, with no public network access, fronted by API Management, and an Application Insights alert watching for abuse. The threat model for a hosted MCP server Before the architecture, be honest about the attack surface. When an MCP server is internet-reachable, the bad days look like this: Unauthenticated tool invocation. Anyone who finds the endpoint calls your tools. If one of them reads a database or a secret, that's the whole game. Credential exfiltration. A tool that returns a secret value โ even "helpfully," for debugging โ hands credentials to whatever is driving the session. Prompt injection via tool responses. A compromised or malicious tool return can carry instructions that hijack the calling agent. Path traversal / injection. A tool that concatenates user input into a file path or shell command is the same class of bug we've fought for 25 years, now with an LLM cheerfully supplying the payload. Lateral movement. A server running with a broad identity or a network line of sight to everything becomes a pivot point. The architecture below maps a defense to each one. None of it is exotic โ it's the App Service security stack, pointed at MCP. The architecture Five layers, each one a checkbox or a few lines of Bicep. Let's take them in order. 1. Easy Auth โ spec-compliant OAuth you don't have to write The single most important fix is also the easiest: turn on App Service built-in authentication (Easy Auth) and point it at Entra ID. Now App Service validates the token and rejects unauthenticated requests at the platform, before a single line of your Python runs. App Service Authentication also has a built-in MCP server authorization mode (Preview) that makes your server comply with the MCP authorization spec: it serves Protected Resource Metadata (PRM) so a compliant MCP client can discover the authorization server and complete the OAuth handshake itself โ instead of just getting a bare 401. In the sample that's an authsettingsV2 resource: resource authSettings 'Microsoft.Web/sites/config@2024-04-01' = { parent: web name: 'authsettingsV2' properties: { globalValidation: { requireAuthentication: true unauthenticatedClientAction: 'Return401' // reject, don't redirect } identityProviders: { azureActiveDirectory: { enabled: true registration: { clientId: authClientId openIdIssuer: '${environment().authentication.loginEndpoint}${authTenantId}/v2.0' } validation: { allowedAudiences: [ 'api://${authClientId}' ] } } } } } The piece that makes it MCP-compliant โ not just "returns 401" โ is enabling PRM. That's one app setting that publishes the metadata document MCP clients look for: { name: 'WEBSITE_AUTH_PRM_DEFAULT_WITH_SCOPES' value: 'api://${authClientId}/user_impersonation' } unauthenticatedClientAction: 'Return401' gives a clean 401 instead of a login redirect, and PRM turns that 401 into a discoverable OAuth challenge โ the client follows the metadata, signs the user in, and retries with a valid token. Recall that 8.5% figure: this is the spec-compliant OAuth the other 91.5% are missing, and you got it from configuration, not code. One gotcha worth calling out: when App Service creates the Entra registration for you, the default policy only accepts tokens the app itself obtained. For a real MCP client to connect, add its client id to the allowed-applications policy and preauthorize it on the app registration. (Entra has no Dynamic Client Registration, so the client ships a known client id; for VS Code / GitHub Copilot, preauthorization avoids a consent prompt the client won't surface.) The bonus is that the validated identity is handed to your code. App Service injects the caller's claims into every forwarded request as the X-MS-CLIENT-PRINCIPAL headers โ and crucially, it strips any client-supplied copy first, so they can't be forged. The whoami tool just reads them: def _client_principal(request: Request) -> Dict[str, Any]: raw = request.headers.get("x-ms-client-principal") # base64-encoded JSON of the caller's claims, injected by Easy Auth ... return {"authenticated": bool(raw), "name": name, ...} Your tools now know who's calling without you owning any of the token machinery. 2. Managed identity โ stop storing the keys to the kingdom The static-API-key habit is how secrets leak. Replace it with a system-assigned managed identity: App Service gets an Entra identity that Azure manages, and your code authenticates to Key Vault, Storage, or Azure OpenAI with no stored credential. This matters for a subtle reason the MCP authorization guidance calls out explicitly: the token a client presents represents access to your server, not to Key Vault. Never forward it downstream โ use the managed identity (or an on-behalf-of token) for that hop. Pass-through is a vulnerability; delegation is the fix, and the managed identity is how you delegate without holding a secret. resource web 'Microsoft.Web/sites@2024-04-01' = { identity: { type: 'SystemAssigned' } ... } In Python, DefaultAzureCredential resolves to that identity automatically โ the same code runs locally against your az login and in Azure against the MI: from azure.identity import DefaultAzureCredential from azure.keyvault.secrets import SecretClient credential = DefaultAzureCredential() client = SecretClient(vault_url=KEY_VAULT_URI, credential=credential) And least privilege is one role assignment. The sample grants the identity exactly Key Vault Secrets User โ read secret values, nothing else: var keyVaultSecretsUserRoleId = '4633458b-17de-408a-b874-0445c86b69e6' resource appSecretsUser 'Microsoft.Authorization/roleAssignments@2022-04-01' = { name: guid(keyVault.id, appPrincipalId, keyVaultSecretsUserRoleId) scope: keyVault properties: { roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', keyVaultSecretsUserRoleId) principalId: appPrincipalId principalType: 'ServicePrincipal' } } There is now no secret used to read secrets. That's the chain of custody you want. 3. Key Vault references โ and a tool that won't leak them Two pieces here. First, Key Vault references keep secrets out of your configuration. You point an app setting at a vault URI, and App Service resolves the value at runtime via the managed identity: { name: 'SECURE_CONFIG_VALUE' value: '@Microsoft.KeyVault(SecretUri=${secureConfigSecretUri})' } The plaintext never appears in your repo, your Bicep, or the portal's app settings blade โ it shows up as a resolved reference. Second, and this is the part developers get wrong: a tool that reads a secret should never return the secret. The sample's read_secret_metadata proves the managed-identity path works end to end, then deliberately withholds the value: async def tool_read_secret_metadata(secret_name: str = "demo-secret"): secret = client.get_secret(secret_name) return { "available": True, "version": secret.properties.version, "value_length": len(secret.value), # length, never the value "note": "Value intentionally withheld โ metadata only.", } If your MCP server has a get_secret tool that returns the secret, you've built a credential-exfiltration API with a friendly name. Return metadata; act on the value server-side. The same discipline applies to input. The safe_lookup tool matches against a fixed allow-list and refuses anything that smells like traversal or injection โ it never touches a filesystem or a shell: suspicious = any(t in key for t in ("..", "/", "\\", ";", "|", "$(", "`")) if key in DOCS: return {"topic": key, "doc": DOCS[key], "found": True} return {"found": False, "rejected_as_suspicious": suspicious, ...} safe_lookup("../../etc/passwd") comes back rejected_as_suspicious: true . That is the entire fix for a whole class of CVEs. 4. Private endpoints + APIM โ take the server off the internet Authentication is necessary but not sufficient. The strongest version of "don't expose your MCP server" is to not expose it โ give the App Service and Key Vault private endpoints, disable public network access, and let API Management be the only public door. resource web 'Microsoft.Web/sites@2024-04-01' = { properties: { virtualNetworkSubnetId: appSubnetId // outbound: reach KV's private endpoint publicNetworkAccess: 'Disabled' // inbound: no public access at all ... } } Now the App Service hostname returns nothing from the internet. The only ingress is the APIM gateway, which runs the security policy before traffic ever reaches the VNet โ validate the Entra JWT, rate-limit per caller, and (the documented extension point) run a content-safety check: <inbound> <base /> <validate-jwt header-name="Authorization" failed-validation-httpcode="401"> <openid-config url="https://login.microsoftonline.com/{tenant}/v2.0/.well-known/openid-configuration" /> <required-claims> <claim name="aud"><value>api://{clientId}</value></claim> </required-claims> </validate-jwt> <rate-limit-by-key calls="60" renewal-period="60" counter-key="@(context.Request.IpAddress)" /> <set-backend-service backend-id="mcp-backend" /> </inbound> This is defense in depth: APIM validates the token, and Easy Auth validates it again at the app. An attacker has to get past a public gateway with JWT enforcement and rate limiting just to reach a private endpoint that also demands a valid token. Compare that to the median MCP server, which is a raw port on the internet. The honest trade-off: this is a security reference architecture, not a 60-second demo. APIM takes ~30โ45 minutes to provision, and because the app is private, you test through the gateway, not the App Service hostname. That friction is the point โ it's the same friction an attacker hits. 5. Monitoring โ see the abuse before it's an incident The last layer is visibility. The Azure Monitor OpenTelemetry distro auto-instruments FastAPI, and the audit_event tool emits a structured custom event per call. A scheduled-query alert watches the rate of those events and fires when tool invocations spike โ the signature of an agent looping over a sensitive tool, or someone probing the surface: criteria: { allOf: [{ query: 'customEvents | where name == "mcp_tool_audit" | summarize calls = count() by bin(timestamp, 5m)' metricMeasureColumn: 'calls' operator: 'GreaterThan' threshold: 100 }] } Tune the threshold to your baseline. The point is that "is someone hammering my credential-reading tool?" becomes an alert, not a forensic exercise after the fact. Deploy it azd auth login azd up A preprovision hook creates the Entra ID app registration and stashes its client id in the azd environment, so Easy Auth and the APIM policy wire themselves up. Then Bicep provisions the VNet, private endpoints, Key Vault, App Service, APIM, and the monitoring stack. (Grab a coffee for the APIM step.) To verify, get a token and call through the gateway: TOKEN=$(az account get-access-token \ --resource "api://$(azd env get-value AZURE_AUTH_CLIENT_ID)" \ --query accessToken -o tsv) curl -s -X POST "$(azd env get-value APIM_MCP_URL)" \ -H "Authorization: Bearer $TOKEN" -H 'content-type: application/json' \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"whoami","arguments":{}}}' The response shows the authenticated principal. Drop the token and you get a 401 from APIM โ exactly what you want an unauthenticated caller to see. To let a real MCP client (VS Code, Claude) sign users in itself rather than pasting a bearer token, point it at the same URL: PRM is already published, so the client discovers the auth server and runs the OAuth flow. Just make sure its app id is allowed โ azd env set AZURE_MCP_CLIENT_APP_ID <client-id> before azd up adds it to the allowed-applications policy โ and preauthorize it on the server's app registration so clients that don't surface a consent prompt can connect. Once it's deployed and you've verified it, take the App Service off the public internet with a one-line flip โ azd env set LOCK_DOWN_WEB_APP true && azd provision . (The first deploy keeps public access on just long enough to push the code, because a fully-private app can only be deployed from inside its VNet. The sample's README walks through both phases.) Why this matters MCP is going to be the USB-C of agent tooling, and right now most of the connectors are unauthenticated and exposed. The CVEs aren't hypothetical โ they have numbers and CVSS scores. But the fix isn't a research project. On App Service, the perimeter is mostly configuration: flip on Easy Auth, use a managed identity, reference Key Vault, go private, front it with APIM, and alert on the logs. That's the difference between "I shipped an MCP server" and "I shipped an MCP server I'd put in production." If you're hosting MCP โ especially anywhere a compliance auditor will eventually look โ start from the secure shape, not the demo shape. Try it Sample repo: github.com/seligj95/app-service-secure-mcp Astrix โ State of MCP Server Security 2025: astrix.security MCP authorization spec: modelcontextprotocol.io App Service authentication: learn.microsoft.com436Views0likes0CommentsMCP Just Went Stateless โ What the 2026 Spec Changes About Scaling on App Service
A couple of months ago I wrote about scaling MCP servers behind App Service's built-in load balancer. The trick back then was to lean on stateless HTTP transport so any instance could serve any request โ and to make sure you turned off ARR affinity so the load balancer was actually free to spread traffic around. That post still works. But the MCP spec just caught up to it in a big way. The 2026-07-28 release candidate is the largest revision of the Model Context Protocol since it launched, and the headline change is exactly the thing we were working around: MCP is now stateless at the protocol layer. The handshake is gone, the session header is gone, and the sticky-routing-and-shared-session-store dance that horizontal deployments used to need is no longer part of the protocol at all. If you're hosting an MCP server on App Service, this is good news โ and it means a few of the steps from my last post are now things the protocol does for you. Here's what changed, and what (if anything) you need to do about it. Here's the before and after, straight from the spec. In 2025-11-25 , the client POST s an initialize call to /mcp first and gets a session ID back: {"jsonrpc":"2.0","id":1,"method":"initialize", "params":{"protocolVersion":"2025-11-25","capabilities":{}, "clientInfo":{"name":"my-app","version":"1.0"}}} Heads up on timing: 2026-07-28 is a release candidate as I write this; the final spec ships July 28, 2026. It contains breaking changes, so treat this as "get ready" guidance rather than "rip everything out today." Quick recap: how we scaled MCP before In the original post, the recipe looked like this: Run the MCP server in stateless HTTP mode (the 2025-11-25 transport). Scale App Service out to N instances (the sample used three). Set clientAffinityEnabled: false so there's no ARR affinity cookie pinning a client to one instance. If you genuinely needed cross-request state, externalize it โ typically into Azure Cache for Redis โ so every instance saw the same data. Watch traffic spread across instances in Application Insights via cloud_RoleInstance . The catch: even in "stateless HTTP" mode, the 2025-11-25 protocol still started every connection with an initialize handshake and handed back an Mcp-Session-Id that the client had to send on every follow-up request. That session ID pinned a client to whichever instance issued it โ so to scale cleanly you either kept affinity on (and gave up even load balancing) or did real work to share session state across instances. That's the part the 2026 spec deletes. What the 2026 spec actually changes The handshake and the session are gone Two proposals do the heavy lifting: SEP-2575 removes the initialize / initialized handshake. The protocol version, client info, and client capabilities that used to be exchanged once at connect time now ride along in _meta on every request. A new server/discover method lets a client ask for server capabilities when it actually wants them. SEP-2567 removes the Mcp-Session-Id header and the protocol-level session that came with it. With both gone, any MCP request can land on any instance. The sticky routing and shared session stores that horizontal deployments needed before just aren't required at the protocol layer anymore. Here's the before and after, straight from the spec. In 2025-11-25 , the client POST s an initialize call to /mcp first and gets a session ID back: {"jsonrpc":"2.0","id":1,"method":"initialize", "params":{"protocolVersion":"2025-11-25","capabilities":{}, "clientInfo":{"name":"my-app","version":"1.0"}}} โฆthen every later call has to carry the Mcp-Session-Id header the server handed back, which pins it to that instance: {"jsonrpc":"2.0","id":2,"method":"tools/call", "params":{"name":"search","arguments":{"q":"otters"}}} In 2026-07-28 , the same tool call is one self-contained request that any instance can answer. The routing info rides in headers โ MCP-Protocol-Version , Mcp-Method , and Mcp-Name โ and the body carries everything else: {"jsonrpc":"2.0","id":1,"method":"tools/call", "params":{"name":"search","arguments":{"q":"otters"}, "_meta":{"io.modelcontextprotocol/clientInfo":{"name":"my-app","version":"1.0"}}}} No handshake, no session ID, nothing to pin. Traffic you can route and cache at the edge A few smaller changes make this traffic much friendlier to the infrastructure App Service already gives you: Routable headers (SEP-2243): Streamable HTTP now requires Mcp-Method and Mcp-Name headers, so load balancers, gateways, and rate-limiters can route or throttle on the operation without cracking open the request body. (Servers reject requests where the headers and body disagree.) Cacheable lists (SEP-2549): tools/list and resource-read results now carry ttlMs and cacheScope , modeled on HTTP Cache-Control . Clients know exactly how long a tool list is fresh and whether it's safe to share across users โ no more holding an SSE stream open just to learn the list changed. Traceable calls (SEP-414): W3C Trace Context ( traceparent , tracestate , baggage ) propagation in _meta is now documented with fixed key names. A trace that starts in the host app can follow a tool call through the client SDK, your MCP server, and whatever it calls downstream โ and show up as one span tree in any OpenTelemetry backend, including Application Insights. That last one pairs really nicely with the App Insights setup from the original sample, which already tags spans with cloud_RoleInstance . Why this is easier on App Service now App Service's built-in load balancer has always wanted to round-robin your requests. The thing stopping it from doing that cleanly with MCP was the protocol's own session affinity. Now that the protocol is stateless: No affinity tuning to reason about. You still want clientAffinityEnabled: false , but there's no longer a protocol session fighting it. Any instance serves any request, for real. Scale from 3 to 10 instances and the load balancer just spreads the work โ no shared session store required for protocol state. Less Redis glue. In the old model, Redis was often there to share protocol session state. That reason is gone (see the next section for what Redis is still great for). "Stateless protocol" doesn't mean "stateless app" This is the part I want to be really clear about, because it's easy to over-read the headline. Removing the protocol session does not mean your application can't have state. It means the protocol stops carrying state for you. If your server needs to remember something across calls, you do what HTTP APIs have always done: mint an explicit handle and let the model pass it back as an argument. The spec calls this the explicit-handle pattern. A tool returns a basket_id (or browser_id , or whatever), and later calls include that ID as a normal parameter: // 1) create returns a handle {"name": "create_basket", "arguments": {}} // -> { "basket_id": "b_12345" } // 2) later calls pass it back as an ordinary argument {"name": "add_item", "arguments": {"basket_id": "b_12345", "sku": "ABC"}} The nice side effect: the model can see the handle, compose it across tools, and hand it off between steps โ in ways that session state hidden in transport metadata never really allowed. So where does Redis fit now? Exactly where it always belonged โ your application's data, not the protocol's plumbing: Backing store for those explicit handles (what's actually in basket b_12345 ). Caching expensive lookups or model responses across instances. App-level conversation memory or rate-limit counters. Stateless protocol, stateful application. You externalize state because your app needs it shared, not because the transport forces you to. Migrating an existing MCP server on App Service If you deployed the original sample (or something like it), here's the punch list to get to the 2026 model. The good news: the App Service / infra side barely changes โ most of the work is in the protocol layer your SDK handles for you. App Service config โ mostly already done: Keep clientAffinityEnabled: false . (Still the right call.) Keep scaling out to N instances. Nothing here changes. Keep Application Insights + OpenTelemetry โ and lean into the new Trace Context key names for cleaner end-to-end traces. Protocol layer โ the real work: Update to an SDK build that speaks 2026-07-28 . The handshake and session handling go away; your server reads protocol version and client info from _meta per request instead of from an initialize exchange. Emit ttlMs / cacheScope on tools/list and resource reads so clients (and your gateway) can cache them. Make sure your server honors / validates the Mcp-Method and Mcp-Name headers. If you were storing anything keyed off Mcp-Session-Id , move it to the explicit-handle pattern (handle in, handle out, state in Redis/Cosmos/etc.). Audit for the breaking bits: tasks/list is removed, Roots/Sampling/Logging are deprecated, and the "resource not found" error code moves from -32002 to the standard -32602 . I built a standalone companion sample for exactly this โ the 2026-07-28 version of the original, with the handshake gone, everything read from _meta , server/discover implemented, and the explicit-handle pattern shown in a real tool. Link below. Try it yourself I built a companion sample for this post: a FastAPI MCP server that speaks 2026-07-28 natively โ no handshake, no session โ running on three App Service instances behind the built-in load balancer, with a staging slot, App Insights, a spec-compliant client, and a k6 load test: ๐ seligj95/app-service-mcp-stateless-scale-2026-python azd auth login azd up That provisions a Premium v3 plan with capacity: 3 , the web app with clientAffinityEnabled: false , a staging slot, and Log Analytics + Application Insights. No initialize , no Mcp-Session-Id anywhere โ discovery is a single server/discover call, and every request carries its own protocol version and client info in _meta . The part I like best is the tally tool. It keeps a running total across calls using an explicit, signed handle instead of a session โ so you can watch the total stay correct even as the load balancer routes each call to a different instance: +10 -> total=10 served_by=2103650c... +5 -> total=15 served_by=08fc7022... (different instance, total still right) +100 -> total=115 served_by=08fc7022... That's the stateless handle pattern from earlier, made concrete: state travels with the request, not the connection. Then watch the load spread in Application Insights: requests | where timestamp > ago(15m) | where name contains "/mcp" | summarize count() by cloud_RoleInstance Want the 2025-11-25 version for comparison? That's the original Part 1 sample: seligj95/app-service-mcp-stateless-scale-python. Diff the two main.py files and you can see the handshake and session handling simply disappear. The takeaway When I wrote the first post, "make MCP stateless so App Service can load-balance it" was a pattern you had to apply. With the 2026 spec, it's just how MCP works. The protocol deleted the exact friction we were routing around โ which means hosting a horizontally scaled MCP server on App Service is now closer to "deploy a normal web app and scale it out" than ever. If you're already running MCP on App Service: you did the hard part early. The spec just made it official. Got an MCP server running on App Service? I'd love to hear how the migration goes โ drop a comment.779Views0likes0CommentsAnnouncing the Public Preview of the New App Service Quota Self-Service Experience
Update 10/30/2025: The App Service Quota Self-Service experience is back online after a short period where we were incorporating your feedback and making needed updates. As this is public preview, availability and features are subject to change as we receive and incorporate feedback. Whatโs New? The updated experience introduces a dedicated App Service Quota blade in the Azure portal, offering a streamlined and intuitive interface to: View current usage and limits across the various SKUs Set custom quotas tailored to your App Service plan needs This new experience empowers developers and IT admins to proactively manage resources, avoid service disruptions, and optimize performance. Quick Reference - Start here! Leverage the new self-service experience to increase your quota automatically. If your deployment requires quota for ten or more subscriptions, then file a support ticket with problem type Quota following the instructions at the bottom of this post. If any subscription included in your request requires zone redundancy (note that most Isolated v2 deployments require ZR), then file a support ticket with problem type Quota following the instructions at the bottom of this post. Self-service Quota Requests For non-zone-redundant needs, quota alone is sufficient to enable App Service deployment or scale-out. Follow the provided steps to place your request. 1. Navigate to the Quotas resource provider in the Azure portal 2. Select App Service (Pubic Preview) Navigating the primary interface: Each App Service VM size is represented as a separate SKU. If the intention is to be able to scale up or down within a specific offering (e.g., Premium v3), then equivalent number of VMs need to be requested for each applicable size of that offering (e.g., request 5 instances for both P1v3 and P3v3). As with other quotas, you can filter by region, subscription, provider, or usage. Note that your portal will now show "App Service (Public Preview)" for the Provider name. You can also group the results by usage, quota (App Service VM type), or location (region). Current usage is represented as App Service VMs. This allows you to quickly identify which SKUs are nearing their quota limits. Adjustments can be made inline: no need to visit another page. This is covered in detail in the next section. Total Regional VMs: There is a SKU in each region called Total Regional VMs. This SKU summarizes your usage and available quota across all individual SKUs in that region. There are three key points about using Total Regional VMs. You should never request Total Regional VMs quota directly - it will automatically increase in response to your request for individual SKU quota. If you are unable to deploy a given SKU, then you must request more quota for that SKU to unblock deployment. For your deployment to succeed, you must have sufficient quota in the individual SKU as well as Total Regional VMs. If either usage is at its respective limit, then you will be unable to deploy and must request more of that individual SKU's quota to proceed. In some regions, Total Regional VMs appears as "0 of 0" usage and limit and no individual SKU quotas are shown. This is an indication that you should not interact with the portal to resolve any quota-related issues in this region. Instead, you should try the deployment and observe any error messages that arise. If any error messages indicate more quota is needed, then this must be requested by filing a support ticket with problem type Quota following the instructions at the bottom of this post so that App Service can identify and fix any potential quota issues. In most cases, this will not be necessary, and your deployment will work without requesting quota wherever "0 of 0" is shown for Total Regional VMs and no individual SKU quotas are visible. See the example below: 3. Request quota adjustments Clicking the pen icon opens a flyout window to capture the quota request: The quota type (App Service SKU) is already populated, along with current usage. Note that your request is not incremental: you must specify the new limit that you wish to see reflected in the portal. For example, to request two additional instances of P1v2 VMs, you would file the request like this: Click submit to send the request for automatic processing. How quota approvals work: Immediately upon submitting a quota request, you will see a processing dialog like the one shown: If the quota request can be automatically fulfilled, then no support request is needed. You should receive this confirmation within a few minutes of submission: If the request cannot be automatically fulfilled, then you will be given the option to file a support request with the same information. In the example below, the requested new limit exceeds what can be automatically granted for the region: 4. If applicable, create support ticket If automatic quota fulfillment fails, and it recommend you โCreate a support requestโ, then follow the steps given in the at the end of this post. Known issues The self-service quota request experience for App Service is in public preview. Here are some caveats worth mentioning while the team finalizes the release for general availability: Closing the quota request flyout window will stop meaningful notifications for that request. You can still view the outcome of your quota requests by checking actual quota, but if you want to rely on notifications for alerts, then we recommend leaving the quota request window open for the few minutes that it is processing. Some SKUs are not yet represented in the quota dashboard. These will be added later in the public preview. The Activity Log does not currently provide a meaningful summary of previous quota requests and their outcomes. This will also be addressed during the public preview. As noted in the walkthrough, the new experience does not enable zone-redundant deployments. Quota is an inherently regional construct, and zone-redundant enablement requires a separate step that can only be taken in response to a support ticket being filed. Quota API documentation is being drafted to enable bulk non-zone redundant quota requests without requiring you to file a support ticket. Create a support request While we are continuously improving the system to automatically process quota requests, there are certain scenarios you might need to create a support request: Automatic fulfillment request failed on quota blade. Deployment requires zone-redundancy You want to make bulk request for ten or more subscriptions When creating a support request, select your Issue type as โService and subscription limits (quotas)โ and Quota type as โFunction or Web App (Windows and Linux)โ. Select Next. You can then fill in your quota requirements by clicking on โEnter detailsโ. There are 4 mandatory fields you must provide in your request: Region โ Quota limits are based on region. If you are facing quota limits in one region, you can always try deployment in a geographically paired region. For example, West US 2 and West Central US are paired regions. East Asia (Hong Kong) and Southeast Asia (Singapore) are also paired regions. See Azure cross-region replication pairings for all geographies for more information. Deployment type โ This is another important criterion when submitting quota request. If you are not sure which deployment type your App Service Plan is using, you can check it here on the portal: App >> App Service Plan >> Zone redundant >> Enabled\Disabled. App Service plan โ Each SKU in your subscription and the region selected above, will have its own limit. Choose the SKU based on your deployment requirement. You will be able to see the current usage and the limit on that SKU. New limit โ You must submit the new limit that you want for the SKU selected above. Do not add the increment value. The new limit must be higher than the existing limit. If you choose to create a support ticket, then you will interact with the capacity management team for that region. This is a 24x7 service, so requests may be created at any time. Once you have filed the support request, you can track its status via the Help + support dashboard. Note for Logic Apps You can now self-serve your Logic App quota requirements using the same App Services quotas blade. You must choose one of the Logic App SKUs (WS1, WS2, WS3) when making the request, and it will be processed in the same way App Services requests are processed. We want your feedback! If you notice any aspect of the experience that does not work as expected, or you have feedback on how to make it better, please use the comments below to share your thoughts!14KViews3likes36CommentsLearn JavaScript with this series of videos for beginners
Learning a new framework or development environment is made even more difficult when you don't know the programming language. To help you with that, we've created this series of videos to focus on the core concepts of JavaScript.
7.2KViews3likes1CommentDebug App Startup Faster on Azure App Service for Linux with Startup Logs
When an app fails to start on Azure App Service for Linux, one of the first things you need is visibility into what happened during startup. This can include container initialization, runtime setup, startup command execution, application output, and warmup probe results. To make this easier, we have added new Azure CLI commands that let you list and view App Service startup logs directly from the command line. List available startup logs You can list startup logs for an app using: az webapp log startup list \ --name <app-name> \ --resource-group <resource-group> The output shows whether the startup attempt succeeded or failed, along with the instance name and log file size. This helps you quickly identify the right log file, especially when there are multiple startup attempts across different instances. Show startup log content To view the latest startup log, run: az webapp log startup show \ --name <app-name> \ --resource-group <resource-group> You can also view a specific log file by name: az webapp log startup show \ --name <app-name> \ --resource-group <resource-group> \ --log-file-name <log-file-name> The log content includes startup events from the platform and the application. For example, you can see the container image being pulled, the startup script being generated, the app command being run, and the warmup probe result. In a successful startup, the log shows that the site startup probe succeeded and the site started successfully. Failure logs are prioritized by default When you run az webapp log startup show without specifying a log file name, the command automatically prefers failure logs from the most recent date. This helps reduce the time spent looking for the right log when debugging startup failures. Instead of manually searching through multiple files, you can run one command and immediately see the most relevant failure details. For example, if the app fails because the worker process does not start within the allotted time, the log shows the timeout details and the platform actions taken during startup cancellation. Better hints for common startup failures The command also includes improved handling for common failure scenarios, including runtime startup failures and container startup timeouts. For example, if the app starts but does not respond on the expected port, the startup log may show application output such as: listening on 3000 (wrong port) while the platform is expecting the app to respond on a different port. This makes it much easier to understand why the warmup probe failed. Slot support The startup log commands also support deployment slots. To list startup logs for a slot: az webapp log startup list \ --name <app-name> \ --resource-group <resource-group> \ --slot <slot-name> To show startup logs for a slot: az webapp log startup show \ --name <app-name> \ --resource-group <resource-group> \ --slot <slot-name> This is useful when debugging slot-specific startup issues before swapping traffic to production. Summary The new az webapp log startup commands make it easier to inspect startup behavior for Azure App Service for Linux apps directly from Azure CLI. These commands are currently in preview. Try them out the next time you need to understand why your App Service Linux app did or did not start successfully.181Views0likes0Comments