azure app service

504 Topics

Govern AI Agents on App Service with the Microsoft Agent Governance Toolkit
Part 3 of 3 — Multi-Agent AI on Azure App Service In Blog 1, we built a multi-agent travel planner with Microsoft Agent Framework 1.0 on App Service. In Blog 2, we added observability with OpenTelemetry and the new Application Insights Agents view. Now in Part 3, we secure those agents for production with the Microsoft Agent Governance Toolkit. This post assumes you've followed the guidance in Blog 1 to deploy the multi-agent travel planner to Azure App Service. If you haven't deployed the app yet, start there first. The governance gap Our travel planner works. It's observable. But here's the question I'm hearing from customers: "How do I make sure my agents don't do something they shouldn't?" It's a fair question. Our six agents — Coordinator, Currency Converter, Weather Advisor, Local Knowledge, Itinerary Planner, and Budget Optimizer — can call external APIs, process user data, and make autonomous decisions. In a demo, that's impressive. In production, that's a risk surface. Consider what can go wrong with ungoverned agents: Unauthorized API calls — An agent calls an external API it was never intended to use, leaking data or incurring costs Sensitive data exposure — An agent passes PII to a third-party service without consent controls Runaway token spend — A recursive agent loop burns through your OpenAI budget in minutes Tool misuse — A prompt injection tricks an agent into executing a tool it shouldn't Cascading failures — One agent's error propagates through the entire multi-agent workflow These aren't theoretical. In December 2025, OWASP published the Top 10 for Agentic Applications — the first formal taxonomy of risks specific to autonomous AI agents, including goal hijacking, tool misuse, identity abuse, memory poisoning, and rogue agents. Regulators are paying attention too: the EU AI Act's high-risk AI obligations take effect in August 2026, and the Colorado AI Act becomes enforceable in June 2026. The bottom line: if you're running agents in production, you need governance. Not eventually — now. What the Agent Governance Toolkit does The Agent Governance Toolkit is an open-source project (MIT license) from Microsoft that brings runtime security governance to autonomous AI agents. It's the first toolkit to address all 10 OWASP agentic AI risks with deterministic, sub-millisecond policy enforcement. The toolkit is organized into 7 packages: Package What it does Think of it as... Agent OS Stateless policy engine, intercepts every action before execution (<0.1ms p99) The kernel for AI agents Agent Mesh Cryptographic identity (DIDs), inter-agent trust protocol, dynamic trust scoring mTLS for agents Agent Runtime Execution rings (like CPU privilege levels), saga orchestration, kill switch Process isolation for agents Agent SRE SLOs, error budgets, circuit breakers, chaos engineering SRE practices for agents Agent Compliance Automated governance verification, regulatory mapping (EU AI Act, HIPAA, SOC2) Compliance-as-code Agent Marketplace Plugin lifecycle management, Ed25519 signing, supply-chain security Package manager security Agent Lightning RL training governance with policy-enforced runners Safe training guardrails The toolkit is available in Python, TypeScript, Rust, Go, and .NET. It's framework-agnostic — it works with MAF, LangChain, CrewAI, Google ADK, and more. For our ASP.NET Core travel planner, we'll use the .NET SDK via NuGet ( Microsoft.AgentGovernance ). For this blog, we're focusing on three packages: Agent OS — the policy engine that intercepts and evaluates every agent action Agent Compliance — regulatory mapping and audit trail generation Agent SRE — SLOs and circuit breakers for agent reliability How easy it was to add governance Here's the part that surprised me. I expected adding governance to a production agent system to be a multi-hour effort — new infrastructure, complex configuration, extensive refactoring. Instead, it took about 30 minutes. Here's exactly what we changed: Step 1: Add NuGet packages Three packages added to TravelPlanner.Shared.csproj : <itemgroup>  <packagereference include="Azure.Monitor.OpenTelemetry.AspNetCore" version="1.3.0"> <packagereference include="Microsoft.Agents.AI" version="1.0.0">  <packagereference include="Microsoft.AgentGovernance" version="3.0.2"> </packagereference></packagereference></packagereference></itemgroup> Step 2: Create the policy file One new file: governance-policies.yaml in the project root. This is where all your governance rules live: apiVersion: governance.toolkit/v1 name: travel-planner-governance description: Policy enforcement for the multi-agent travel planner on App Service scope: global defaultAction: deny rules: - name: allow-currency-conversion condition: "tool == 'ConvertCurrency'" action: allow priority: 10 description: Allow Currency Converter agent to call Frankfurter exchange rate API - name: allow-weather-forecast condition: "tool == 'GetWeatherForecast'" action: allow priority: 10 description: Allow Weather Advisor agent to call NWS forecast API - name: allow-weather-alerts condition: "tool == 'GetWeatherAlerts'" action: allow priority: 10 description: Allow Weather Advisor agent to check NWS weather alerts Step 3: One line in BaseAgent.cs This is the moment. Here's our BaseAgent.cs before: Agent = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName) .Build(); And after: var kernel = serviceProvider.GetService<GovernanceKernel>(); if (kernel is not null) builder.UseGovernance(kernel, AgentName); Agent = builder.Build(); One line of intent, two lines of null-safety. The .UseGovernance(kernel, AgentName) call intercepts every tool/function invocation in the agent's pipeline, evaluating it against the loaded policies before execution. If the GovernanceKernel isn't registered (governance disabled), agents work exactly as before — no crash, no code change needed. Here's the full updated constructor using IServiceProvider to optionally resolve governance: using AgentGovernance; using Microsoft.Extensions.DependencyInjection; public abstract class BaseAgent : IAgent { protected readonly ILogger Logger; protected readonly AgentOptions Options; protected readonly AIAgent Agent; // Constructor for simple agents without tools protected BaseAgent( ILogger logger, IOptions<AgentOptions> options, IChatClient chatClient, IServiceProvider serviceProvider) { Logger = logger; Options = options.Value; var builder = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName); var kernel = serviceProvider.GetService<GovernanceKernel>(); if (kernel is not null) builder.UseGovernance(kernel, AgentName); Agent = builder.Build(); } // Constructor for agents with tools protected BaseAgent( ILogger logger, IOptions<AgentOptions> options, IChatClient chatClient, ChatOptions chatOptions, IServiceProvider serviceProvider) { Logger = logger; Options = options.Value; var builder = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description, tools: chatOptions.Tools?.ToList()) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName); var kernel = serviceProvider.GetService<GovernanceKernel>(); if (kernel is not null) builder.UseGovernance(kernel, AgentName); Agent = builder.Build(); } // ... rest unchanged } Step 4: DI registrations in Program.cs A few lines to wire up governance in the dependency injection container: using AgentGovernance; // ... existing builder setup ... // Configure OpenTelemetry with Azure Monitor (existing — from Blog 2) builder.Services.AddOpenTelemetry().UseAzureMonitor(); // NEW: Configure Agent Governance Toolkit // Load policy from YAML, register as singleton. Agents resolve via IServiceProvider. var policyPath = Path.Combine(builder.Environment.ContentRootPath, "governance-policies.yaml"); if (File.Exists(policyPath)) { try { var yaml = File.ReadAllText(policyPath); var kernel = new GovernanceKernel(new GovernanceOptions { EnableAudit = true, EnableMetrics = true }); kernel.LoadPolicyFromYaml(yaml); builder.Services.AddSingleton(kernel); Console.WriteLine($"[Governance] Loaded policies from {policyPath}"); } catch (Exception ex) { Console.WriteLine($"[Governance] Failed to load: {ex.Message}. Running without governance."); } } That's it. Your agents are now governed. Let me repeat that because it's the core message of this blog: we added production governance to a six-agent system by adding one NuGet package, creating one YAML policy file, adding a few lines to our base agent class, and registering the governance kernel in DI. No new infrastructure. No complex rewiring. No multi-sprint project. If you followed Blog 1 and Blog 2, you can do this in 30 minutes. Policy flexibility deep-dive The YAML policy language is intentionally simple to start with, but it supports real complexity when you need it. Let's walk through what each policy in our file does. API allowlists and blocklists Our travel planner calls two external APIs: Frankfurter (currency exchange) and the National Weather Service. The defaultAction: deny combined with explicit allow rules ensures agents can only call these approved tools. If an agent attempts to call any other function — whether through a prompt injection or a bug — the call is blocked before it executes: defaultAction: deny rules: - name: allow-currency-conversion condition: "tool == 'ConvertCurrency'" action: allow priority: 10 - name: allow-weather-forecast condition: "tool == 'GetWeatherForecast'" action: allow priority: 10 When a blocked call happens, you'll see output like this in your logs: [Governance] Tool call 'DeleteDatabase' blocked for agent 'LocalKnowledgeAgent': No matching rules; default action is deny. Condition language The condition field supports equality checks, pattern matching, and boolean logic. You can match on tool name, agent ID, or any key in the evaluation context: # Match a specific tool condition: "tool == 'ConvertCurrency'" # Match multiple tools with OR condition: "tool == 'GetWeatherForecast' or tool == 'GetWeatherAlerts'" # Match by agent condition: "agent == 'CurrencyConverterAgent' and tool == 'ConvertCurrency'" Priority and conflict resolution When multiple rules match, the toolkit evaluates by priority (higher number = higher priority). A deny rule at priority 100 will override an allow rule at priority 10. This lets you layer broad allows with specific denies: rules: - name: allow-all-weather-tools condition: "tool == 'GetWeatherForecast' or tool == 'GetWeatherAlerts'" action: allow priority: 10 - name: block-during-maintenance condition: "tool == 'GetWeatherForecast'" action: deny priority: 100 description: Temporarily block NWS calls during API maintenance Advanced: OPA Rego and Cedar The YAML policy language handles most scenarios, but for teams with advanced needs, the toolkit also supports OPA Rego and Cedar policy languages. You can mix them — use YAML for simple rules and Rego for complex conditional logic: # policies/advanced.rego — Example: time-based access control package travel_planner.governance default allow_tool_call = false allow_tool_call { input.agent == "CurrencyConverterAgent" input.tool == "get_exchange_rate" time.weekday(time.now_ns()) != "Sunday" # Markets closed } Start simple with YAML. Add complexity only when you need it. Why App Service for governed agent workloads You might be wondering: why does hosting platform matter for governance? It matters a lot. The governance toolkit handles the application-level policies, but a production agent system also needs platform-level security, networking, identity, and deployment controls. App Service gives you these out of the box. Managed Identity Governance policies enforce what agents can access. Managed Identity handles how they authenticate — without secrets to manage, rotate, or leak. Our travel planner already uses DefaultAzureCredential for Azure OpenAI, Cosmos DB, and Service Bus. Governance layers on top of this identity foundation. VNet Integration + Private Endpoints The governance toolkit enforces API allowlists at the application level. App Service's VNet integration and private endpoints enforce network boundaries at the infrastructure level. This is defense in depth: even if a governance policy is misconfigured, the network layer prevents unauthorized egress. Your agents can only reach the networks you've explicitly allowed. Easy Auth App Service's built-in authentication (Easy Auth) protects your agent APIs without custom code. Before a request even reaches your governance engine, App Service has already validated the caller's identity. No custom auth middleware. No JWT parsing. Just toggle it on. Deployment Slots This is underrated for governance. With deployment slots, you can test new governance policies in a staging slot before swapping to production. Deploy updated governance-policies.yaml to staging, run your test suite, verify the policies work as expected, and then swap. Zero-downtime policy updates with full rollback capability. App Insights integration Governance audit events flow into the same Application Insights instance we configured in Blog 2. This means your governance decisions appear alongside your OTel traces in the Agents view. One pane of glass for agent behavior and governance enforcement. Always-on + WebJobs Our travel planner uses WebJobs for long-running agent workflows. With App Service's Always-on feature, those workflows stay warm, and governance is continuous — no cold-start gaps where agents run unmonitored. azd deployment One command deploys the full governed stack — application code, governance policies, infrastructure, and monitoring: azd up App Service gives you the enterprise production features governance needs — identity, networking, observability, safe deployment — out of the box. The governance toolkit handles agent-level policy enforcement; App Service handles platform-level security. Together, they're a complete governed agent platform. Governance audit events in App Insights In Blog 2, we set up OpenTelemetry and the Application Insights Agents view to monitor agent behavior. With the governance toolkit, those same traces now include governance audit events — every policy decision is recorded as a span attribute on the agent's trace. When you open a trace in the Agents view, you'll see governance events inline: Policy: api-allowlist → ALLOWED — CurrencyConverterAgent called Frankfurter API, permitted Policy: token-budget → ALLOWED — Request used 3,200 tokens, within per-request limit of 8,000 Policy: rate-limit → THROTTLED — WeatherAdvisorAgent exceeded 60 calls/min, request delayed For deeper analysis, use KQL to query governance events directly. Here's a query that finds all policy violations in the last 24 hours: // Find all governance policy violations in the last 24 hours traces | where timestamp > ago(24h) | where customDimensions["governance.decision"] != "ALLOWED" | extend agentName = tostring(customDimensions["agent.name"]), policyName = tostring(customDimensions["governance.policy"]), decision = tostring(customDimensions["governance.decision"]), violationReason = tostring(customDimensions["governance.reason"]), targetUrl = tostring(customDimensions["tool.target_url"]) | project timestamp, agentName, policyName, decision, violationReason, targetUrl | order by timestamp desc And here's one for tracking token budget consumption across agents: // Token budget consumption by agent over the last hour customMetrics | where timestamp > ago(1h) | where name == "governance.tokens.consumed" | extend agentName = tostring(customDimensions["agent.name"]) | summarize totalTokens = sum(value), avgTokensPerRequest = avg(value), maxTokensPerRequest = max(value) by agentName, bin(timestamp, 5m) | order by totalTokens desc This is the power of integrating governance with your existing observability stack. You don't need a separate governance dashboard — everything lives in the same App Insights workspace you already know. SRE for agents The Agent SRE package brings Site Reliability Engineering practices to agent systems. This was the part that got me most excited, because it addresses a question I hear constantly: "How do I know my agents are actually reliable?" Service Level Objectives (SLOs) We defined SLOs in our policy file: slos: - name: weather-agent-latency agent: "WeatherAdvisorAgent" metric: latency-p99 target: 5000ms window: 5m This says: "The Weather Advisor Agent must respond within 5 seconds at the 99th percentile, measured over a 5-minute rolling window." When the SLO is breached, the toolkit emits an alert event and can trigger automated responses. Circuit breakers Circuit breakers prevent cascading failures. If an agent fails 5 times in a row, the circuit opens, and subsequent requests get a fast failure response instead of waiting for another timeout: circuit-breakers: - agent: "*" failure-threshold: 5 recovery-timeout: 30s half-open-max-calls: 2 After 30 seconds, the circuit enters a half-open state, allowing 2 test calls through. If those succeed, the circuit closes and normal operation resumes. If they fail, the circuit opens again. This pattern is battle-tested in microservices — now it protects your agents too. Error budgets Error budgets tie SLOs to business decisions. If your Coordinator Agent's success rate target is 99.5% over a 15-minute window, that means you have an error budget of 0.5%. When the budget is consumed, the toolkit can automatically reduce agent autonomy — for example, requiring human approval for high-risk actions until the error budget recovers. SRE practices turn agent reliability from a hope into a measurable, enforceable contract. Architecture Here's how everything fits together after adding governance: ┌─────────────────────────────────────────────────────────────────┐ │ Azure App Service │ │ ┌──────────────┐ ┌─────────────────────────────────────┐ │ │ │ Frontend │───▶│ ASP.NET Core API │ │ │ │ (Static) │ │ │ │ │ └──────────────┘ │ ┌─────────────────────────────┐ │ │ │ │ │ Coordinator Agent │ │ │ │ │ │ ┌───────┐ ┌────────────┐ │ │ │ │ │ │ │ OTel │─▶│ Governance │ │ │ │ │ │ │ └───────┘ │ Engine │ │ │ │ │ │ │ │ ┌────────┐ │ │ │ │ │ │ │ │ │Policies│ │ │ │ │ │ │ │ │ └────────┘ │ │ │ │ │ │ │ └─────┬──────┘ │ │ │ │ │ └───────────────────┼─────────┘ │ │ │ │ ┌───────────────────┼──────────┐ │ │ │ │ │ Specialist Agents │ │ │ │ │ │ │ (Currency, Weather, etc.) │ │ │ │ │ │ Each with OTel + Governance │ │ │ │ │ └───────────────────┼──────────┘ │ │ │ └──────────────────────┼──────────────┘ │ │ │ │ │ ┌────────────┐ ┌───────────┐ ┌───────────┼─────────┐ │ │ │ Managed │ │ VNet │ │ App Insights │ │ │ │ Identity │ │Integration│ │ (Traces + │ │ │ │ (no keys) │ │(network │ │ Governance Audit) │ │ │ │ │ │ boundary) │ │ │ │ │ └────────────┘ └───────────┘ └─────────────────────┘ │ └──────────────────────────────┬──────────────────────────────────┘ │ Only allowed APIs ▼ ┌──────────────────────┐ │ External APIs │ │ ✅ Frankfurter API │ │ ✅ NWS Weather API │ │ ❌ Everything else │ └──────────────────────┘ The key insight: governance is a transparent layer in the agent pipeline. It sits between the agent's decision and the action's execution. The agent code doesn't know or care about governance — it just builds the agent with .UseGovernance() and the policy engine handles the rest. Bring it to your own agents We've shown governance with Microsoft Agent Framework on .NET, but the toolkit is framework-agnostic. Here's how to add it to other popular frameworks: LangChain (Python) from agent_governance import PolicyEngine, GovernanceCallbackHandler policy_engine = PolicyEngine.from_yaml("governance-policies.yaml") # Add governance as a LangChain callback handler agent = create_react_agent( llm=llm, tools=tools, callbacks=[GovernanceCallbackHandler(policy_engine)] ) CrewAI (Python) from agent_governance import PolicyEngine from agent_governance.integrations.crewai import GovernanceTaskDecorator policy_engine = PolicyEngine.from_yaml("governance-policies.yaml") # Add governance as a CrewAI task decorator @GovernanceTaskDecorator(policy_engine) def research_task(agent, context): return agent.execute(context) Google ADK (Python) from agent_governance import PolicyEngine from agent_governance.integrations.google_adk import GovernancePlugin policy_engine = PolicyEngine.from_yaml("governance-policies.yaml") # Add governance as a Google ADK plugin agent = Agent( model="gemini-2.0-flash", tools=[...], plugins=[GovernancePlugin(policy_engine)] ) TypeScript / Node.js import { PolicyEngine } from '@microsoft/agentmesh-sdk'; const policyEngine = PolicyEngine.fromYaml('governance-policies.yaml'); // Use as middleware in your agent pipeline agent.use(policyEngine.middleware()); Every integration hooks into the framework's native extension points — callbacks, decorators, plugins, middleware — so adding governance doesn't require rewriting your agent code. Install the package, point it at your policy file, and you're governed. What's next This wraps up our three-part series on building production-ready multi-agent AI applications on Azure App Service: Blog 1: Build — Deploy a multi-agent travel planner with Microsoft Agent Framework 1.0 Blog 2: Monitor — Add observability with OpenTelemetry and the Application Insights Agents view Blog 3: Govern — Secure agents for production with the Agent Governance Toolkit (you are here) The progression is intentional: first make it work, then make it visible, then make it safe. And the consistent theme across all three parts is that App Service makes each step easier — managed hosting for Blog 1, integrated monitoring for Blog 2, and platform-level security features for Blog 3. Next steps for your agents Explore the Agent Governance Toolkit — star the repo, browse the 20 tutorials, try the demo Customize policies for your compliance needs — start with our YAML template and adapt it to your domain. Healthcare teams: enable HIPAA mappings. Finance teams: add SOC2 controls. Explore Agent Mesh for multi-agent trust — if you have agents communicating across services or trust boundaries, Agent Mesh's cryptographic identity and trust scoring add another layer of defense Deploy the sample — clone our travel planner repo, run azd up , and see governed agents in action AI agents are becoming autonomous decision-makers in high-stakes domains. The question isn't whether we need governance — it's whether we build it proactively, before incidents force our hand. With the Agent Governance Toolkit and Azure App Service, you can add production governance to your agents today. In about 30 minutes.
jordanselig
Jul 08, 2026 Place Apps on Azure Blog
992Views
0likes
0Comments
Auditing and Telemetry for the Agent Governance Toolkit - Getting Started with .NET Core
We've entered an era where AI agents autonomously invoke tools — reading and writing files, calling APIs, querying databases. Convenient as this is, without a mechanism to control who can call what, and under what conditions, you can't put it into production. The Agent Governance Toolkit (AGT), open-sourced by Microsoft, is exactly the toolkit for embedding that "gatekeeper" into AI agents. This article walks through getting started with AGT in .NET (C#), based on the following GitHub repository sample.
daisami
Jul 02, 2026 Place Apps on Azure Blog
216Views
5likes
0Comments
A Better Way to View Logs in Kudu for Azure App Service on Linux
Logs are often the fastest way to understand what is happening inside your application. Whether you are investigating startup behavior, runtime errors, failed requests, dependency issues, or unexpected application behavior, having the right log view can make troubleshooting much easier. To make this easier, we have added a new Log stream page in Kudu for Azure App Service on Linux, available under the Logs dropdown. This experience gives you a single place to stream, browse, search, and filter logs so you can understand what is happening in your app faster. Opening the Logs page You can open Kudu from the Azure portal: Go to your App Service. Select Advanced Tools. Click Go. You can also open Kudu directly by going to: https://<app-name>.scm.azurewebsites.net From there, open the Logs page. View live logs across your app and platform The Logs page lets you view logs as they are being written, with filters for timeframe, instance, container, log type, and level. This helps when you want to focus on a specific instance, look only at errors, or separate application logs from platform events. For example, you can use platform logs to understand container lifecycle events, restarts, startup behavior, warmup probe activity, and other platform-side events related to your app. Quickly find the log entries that matter You can use keyword search to narrow down the log stream or historical logs. This is useful when you are looking for a specific error message, request path, exception, dependency failure, timeout, or any application-specific keyword. Instead of scanning through hundreds of entries, you can search for the terms that are relevant to the issue you are investigating. Investigate issues within a specific timeframe The Log stream page also supports viewing logs for a selected time range. This is useful when you know when an issue occurred and want to inspect both application and platform activity around that time. For example, you can filter to a specific timeframe, switch to Application logs, and check what your app was doing when the issue happened. This can help you troubleshoot scenarios such as failed requests, application exceptions, slow startup, container restarts, dependency issues, or configuration problems. Summary The new Log stream page in Kudu makes it easier to work with logs for Azure App Service on Linux. With live streaming, keyword search, historical views, and filters for application and platform logs, you can quickly narrow down the information you need and troubleshoot issues more efficiently. We are continuing to improve the App Service Linux experience to make diagnostics simpler and more useful for day-to-day development and operations.
TulikaC
Jul 01, 2026 Place Apps on Azure Blog
215Views
1like
1Comment
App Service Easy MCP: Add AI Agent Capabilities to Your Existing Apps with Zero Code Changes
The age of AI agents is here. Tools like GitHub Copilot, Claude, and other AI assistants are no longer just answering questions—they're taking actions, calling APIs, and automating complex workflows. But how do you make your existing applications and APIs accessible to these intelligent agents? At Microsoft Ignite, I teamed up to present session BRK116: Apps, agents, and MCP is the AI innovation recipe, where I demonstrated how you can add agentic capabilities to your existing applications with little to no code changes. Today, I'm excited to share a concrete example of that vision: Easy MCP—a way to expose any REST API to AI agents with absolutely zero code changes to your existing apps. The Challenge: Bridging REST APIs and AI Agents Most organizations have invested years building REST APIs that power their applications. These APIs represent critical business logic, data access patterns, and integrations. But AI agents speak a different language—they use protocols like Model Context Protocol (MCP) to discover and invoke tools. The traditional approach would require you to: Learn the MCP SDK Write new MCP server code Manually map each API endpoint to an MCP tool Deploy and maintain additional infrastructure What if you could skip all of that? Introducing Easy MCP (a proof of concept not associated with the App Service platform) Easy MCP is an OpenAPI-to-MCP translation layer that automatically generates MCP tools from your existing REST APIs. If your API has an OpenAPI (Swagger) specification—which most modern APIs do—you can make it accessible to AI agents in minutes. This means that if you have existing apps with OpenAPI specifications already running on App Service, or really any hosting platform, this tool makes enabling MCP seamless. How It Works Point the gateway at your API's base URL Detect your OpenAPI specification automatically Connect and the gateway generates MCP tools for every endpoint Use the MCP endpoint URL with any MCP-compatible AI client That's it. No code changes. No SDK integration. No manual tool definitions. See It in Action Let's say you have a Todo API running on Azure App Service at `https://my-todo-app.azurewebsites.net`. In just a few clicks: Open the Easy MCP web UI Enter your API URL Click "Detect" to find your OpenAPI spec Click "Connect" Now configure your AI client (like VS Code with GitHub Copilot) to use the gateway's MCP endpoint: { "servers": { "my-api": { "type": "http", "url": "https://my-gateway.azurewebsites.net/mcp" } } } Instantly, your AI assistant can: "What's on my todo list?" "Add 'Review PR #123' to my todos with high priority" "Mark all tasks as complete" All powered by your existing REST API, with zero modifications. The Bigger Picture: Modernization Without Rewrites This approach aligns perfectly with a broader modernization strategy we're enabling on Azure App Service. App Service Managed Instance: Move and Modernize Legacy Apps For organizations with legacy applications—whether they're running on older Windows frameworks, custom configurations, or traditional hosting environments—Azure App Service Managed Instance provides a path to the cloud with minimal friction. You can migrate these applications to a fully managed platform without rewriting code. Easy MCP: Add AI Capabilities Post-Migration Once your legacy applications are running on App Service, Easy MCP becomes the next step in your modernization journey. That 10-year-old internal API? It can now be accessed by AI agents. That legacy inventory system? AI assistants can query and update it. No code changes needed. The modernization path: Migrate legacy apps to App Service with Managed Instance (no code changes) Expose APIs to AI agents with Easy MCP Gateway (no code changes) Empower your organization with AI-assisted workflows Deploy It Yourself Easy MCP is open source and ready to deploy. If you already have an existing API to use with this tool, go for it. If you need an app to test with, check out this sample. Make sure you complete the "Add OpenAPI functionality to your web app" step. You don't need to go beyond that. GitHub Repository: seligj95/app-service-easy-mcp Deploy to Azure in minutes with Azure Developer CLI: azd auth login azd init azd up Or run it locally for testing: npm install npm run dev # Open http://localhost:3000 What's Next: Native App Service Integration Here's where it gets really exciting. We're exploring ways to build this capability directly into the Azure App Service platform so you won't have to deploy a second app or additional resources to get this capability. Azure API Management recently released a feature with functionality to expose a REST API, including an API on App Service, as an MCP server, which I highly recommend that you check out if you're familiar with Azure API Management. But in this case, imagine a future where adding AI agent capabilities to your App Service apps is as simple as flipping a switch in the Azure Portal—no gateway or API Management deployment required, no additional infrastructure or services to manage, and built-in security, monitoring, scaling, etc.—all of the features you're already using and are familiar with on App Service. Stay tuned for updates as we continue to make Azure App Service the best platform for AI-powered applications. And please share your feedback on Easy MCP—we want to hear how you're using it and what features you'd like to see next as we consider this feature for native integration.
jordanselig
Jun 25, 2026 Place Apps on Azure Blog
1.2KViews
1like
2Comments
Build Multi-Agent AI Apps on Azure App Service with Microsoft Agent Framework 1.0
Part 1 of 3 — Multi-Agent AI on Azure App Service This is part 1 of a 3 part series on deploying and working with multi-agent AI on Azure App Service. Follow allong to learn how to deploy, manage, observe, and secure your agents on Azure App Service. A couple of months ago, we published a three-part series showing how to build multi-agent AI systems on Azure App Service using preview packages from the Microsoft Agent Framework (MAF) (formerly AutoGen / Semantic Kernel Agents). The series walked through async processing, the request-reply pattern, and client-side multi-agent orchestration — all running on App Service. Since then, Microsoft Agent Framework has reached 1.0 GA — unifying AutoGen and Semantic Kernel into a single, production-ready agent platform. This post is a fresh start with the GA bits. We'll rebuild our travel-planner sample on the stable API surface, call out the breaking changes from preview, and get you up and running fast. All of the code is in the companion repo: seligj95/app-service-multi-agent-maf-otel. What Changed in MAF 1.0 GA The 1.0 release is more than a version bump. Here's what moved: Unified platform. AutoGen and Semantic Kernel agent capabilities have converged into Microsoft.Agents.AI . One package, one API surface. Stable APIs with long-term support. The 1.0 contract is now locked for servicing. No more preview churn. Breaking change — Instructions on options removed. In preview, you set instructions through ChatClientAgentOptions.Instructions . In GA, pass them directly to the ChatClientAgent constructor. Breaking change — RunAsync parameter rename. The thread parameter is now session (type AgentSession ). If you were using named arguments, this is a compile error. Microsoft.Extensions.AI upgraded. The framework moved from the 9.x preview of Microsoft.Extensions.AI to the stable 10.4.1 release. OpenTelemetry integration built in. The builder pipeline now includes UseOpenTelemetry() out of the box — more on that in Blog 2. Our project references reflect the GA stack: <PackageReference Include="Microsoft.Agents.AI" Version="1.0.0" /> <PackageReference Include="Microsoft.Extensions.AI" Version="10.4.1" /> <PackageReference Include="Azure.AI.OpenAI" Version="2.1.0" /> Why Azure App Service for AI Agents? If you're building with Microsoft Agent Framework, you need somewhere to run your agents. You could reach for Kubernetes, containers, or serverless — but for most agent workloads, Azure App Service is the sweet spot. Here's why: No infrastructure management — App Service is fully managed. No clusters to configure, no container orchestration to learn. Deploy your .NET or Python agent code and it just runs. Always On — Agent workflows can take minutes. App Service's Always On feature (on Premium tiers) ensures your background workers never go cold, so agents are ready to process requests instantly. WebJobs for background processing — Long-running agent workflows don't belong in HTTP request handlers. App Service's built-in WebJob support gives you a dedicated background worker that shares the same deployment, configuration, and managed identity — no separate compute resource needed. Managed Identity everywhere — Zero secrets in your code. App Service's system-assigned managed identity authenticates to Azure OpenAI, Service Bus, Cosmos DB, and Application Insights automatically. No connection strings, no API keys, no rotation headaches. Built-in observability — Native integration with Application Insights and OpenTelemetry means you can see exactly what your agents are doing in production (more on this in Part 2). Enterprise-ready — VNet integration, deployment slots for safe rollouts, custom domains, auto-scaling rules, and built-in authentication. All the things you'll need when your agent POC becomes a production service. Cost-effective — A single P0v4 instance (~$75/month) hosts both your API and WebJob worker. Compare that to running separate container apps or a Kubernetes cluster for the same workload. The bottom line: App Service lets you focus on building your agents, not managing infrastructure. And since MAF supports both .NET and Python — both first-class citizens on App Service — you're covered regardless of your language preference. Architecture Overview The sample is a travel planner that coordinates six specialized agents to build a personalized trip itinerary. Users fill out a form (destination, dates, budget, interests), and the system returns a comprehensive travel plan complete with weather forecasts, currency advice, a day-by-day itinerary, and a budget breakdown. The Six Agents Currency Converter — calls the Frankfurter API for real-time exchange rates Weather Advisor — calls the National Weather Service API for forecasts and packing tips Local Knowledge Expert — cultural insights, customs, and hidden gems Itinerary Planner — day-by-day scheduling with timing and costs Budget Optimizer — allocates spend across categories and suggests savings Coordinator — assembles everything into a polished final plan Four-Phase Workflow Phase Agents Execution 1 — Parallel Gathering Currency, Weather, Local Knowledge Task.WhenAll 2 — Itinerary Itinerary Planner Sequential (uses Phase 1 context) 3 — Budget Budget Optimizer Sequential (uses Phase 2 output) 4 — Assembly Coordinator Final synthesis Infrastructure Azure App Service (P0v4) — hosts the API and a continuous WebJob for background processing Azure Service Bus — decouples the API from heavy AI work (async request-reply) Azure Cosmos DB — stores task state, results, and per-agent chat histories (24-hour TTL) Azure OpenAI (GPT-4o) — powers all agent LLM calls Application Insights + Log Analytics — monitoring and diagnostics ChatClientAgent Deep Dive At the core of every agent is ChatClientAgent from Microsoft.Agents.AI . It wraps an IChatClient (from Microsoft.Extensions.AI ) with instructions, a name, a description, and optionally a set of tools. This is client-side orchestration — you control the chat history, lifecycle, and execution order. No server-side Foundry agent resources are created. Here's the BaseAgent pattern used by all six agents in the sample: // BaseAgent.cs — constructor for agents with tools Agent = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description, tools: chatOptions.Tools?.ToList()) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName) .Build(); Notice the builder pipeline: .AsBuilder().UseOpenTelemetry(...).Build() . This opts every agent into the framework's built-in OpenTelemetry instrumentation with a single line. We'll explore what that telemetry looks like in Blog 2. Invoking an agent is equally straightforward: // BaseAgent.cs — InvokeAsync public async Task<ChatMessage> InvokeAsync( IList<ChatMessage> chatHistory, CancellationToken cancellationToken = default) { var response = await Agent.RunAsync( chatHistory, session: null, options: null, cancellationToken); return response.Messages.LastOrDefault() ?? new ChatMessage(ChatRole.Assistant, "No response generated."); } Key things to note: session: null — this is the renamed parameter (was thread in preview). We pass null because we manage chat history ourselves. The agent receives the full chatHistory list, so context accumulates across turns. Simple agents (Local Knowledge, Itinerary Planner, Budget Optimizer, Coordinator) use the tool-less constructor; agents that call external APIs (Currency, Weather) use the constructor that accepts ChatOptions with tools. Tool Integration Two of our agents — Weather Advisor and Currency Converter — call real external APIs through the MAF tool-calling pipeline. Tools are registered using AIFunctionFactory.Create() from Microsoft.Extensions.AI . Here's how the WeatherAdvisorAgent wires up its tool: // WeatherAdvisorAgent.cs private static ChatOptions CreateChatOptions( IWeatherService weatherService, ILogger logger) { var chatOptions = new ChatOptions { Tools = new List<AITool> { AIFunctionFactory.Create( GetWeatherForecastFunction(weatherService, logger)) } }; return chatOptions; } GetWeatherForecastFunction returns a Func<double, double, int, Task<string>> that the model can call with latitude, longitude, and number of days. Under the hood, it hits the National Weather Service API and returns a formatted forecast string. The Currency Converter follows the same pattern with the Frankfurter API. This is one of the nicest parts of the GA API: you write a plain C# method, wrap it with AIFunctionFactory.Create() , and the framework handles the JSON schema generation, function-call parsing, and response routing automatically. Multi-Phase Workflow Orchestration The TravelPlanningWorkflow class coordinates all six agents. The key insight is that the orchestration is just C# code — no YAML, no graph DSL, no special runtime. You decide when agents run, what context they receive, and how results flow between phases. // Phase 1: Parallel Information Gathering var gatheringTasks = new[] { GatherCurrencyInfoAsync(request, state, progress, cancellationToken), GatherWeatherInfoAsync(request, state, progress, cancellationToken), GatherLocalKnowledgeAsync(request, state, progress, cancellationToken) }; await Task.WhenAll(gatheringTasks); After Phase 1 completes, results are stored in a WorkflowState object — a simple dictionary-backed container that holds per-agent chat histories and contextual data: // WorkflowState.cs public Dictionary<string, object> Context { get; set; } = new(); public Dictionary<string, List<ChatMessage>> AgentChatHistories { get; set; } = new(); Phases 2–4 run sequentially, each pulling context from the previous phase. For example, the Itinerary Planner receives weather and local knowledge gathered in Phase 1: var localKnowledge = state.GetFromContext<string>("LocalKnowledge") ?? ""; var weatherAdvice = state.GetFromContext<string>("WeatherAdvice") ?? ""; var itineraryChatHistory = state.GetChatHistory("ItineraryPlanner"); itineraryChatHistory.Add(new ChatMessage(ChatRole.User, $"Create a detailed {days}-day itinerary for {request.Destination}..." + $"\n\nWEATHER INFORMATION:\n{weatherAdvice}" + $"\n\nLOCAL KNOWLEDGE & TIPS:\n{localKnowledge}")); var itineraryResponse = await _itineraryAgent.InvokeAsync( itineraryChatHistory, cancellationToken); This pattern — parallel fan-out followed by sequential context enrichment — is simple, testable, and easy to extend. Need a seventh agent? Add it to the appropriate phase and wire it into WorkflowState . Async Request-Reply Pattern A multi-agent workflow with six LLM calls (some with tool invocations) can easily run 30–60 seconds. That's well beyond typical HTTP timeout expectations and not a great user experience for a synchronous request. We use the Async Request-Reply pattern to handle this: The API receives the travel plan request and immediately queues a message to Service Bus. It stores an initial task record in Cosmos DB with status queued and returns a taskId to the client. A continuous WebJob (running as a separate process on the same App Service plan) picks up the message, executes the full multi-agent workflow, and writes the result back to Cosmos DB. The client polls the API for status updates until the task reaches completed . This pattern keeps the API responsive, makes the heavy work retriable (Service Bus handles retries and dead-lettering), and lets the WebJob run independently — you can restart it without affecting the API. We covered this pattern in detail in the previous series, so we won't repeat the plumbing here. Deploy with azd The repo is wired up with the Azure Developer CLI for one-command provisioning and deployment: git clone https://github.com/seligj95/app-service-multi-agent-maf-otel.git cd app-service-multi-agent-maf-otel azd auth login azd up azd up provisions the following resources via Bicep: Azure App Service (P0v4 Windows) with a continuous WebJob Azure Service Bus namespace and queue Azure Cosmos DB account, database, and containers Azure AI Services (Azure OpenAI with GPT-4o deployment) Application Insights and Log Analytics workspace Managed Identity with all necessary role assignments After deployment completes, azd outputs the App Service URL. Open it in your browser, fill in the travel form, and watch six agents collaborate on your trip plan in real time. What's Next We now have a production-ready multi-agent app running on App Service with the GA Microsoft Agent Framework. But how do you actually observe what these agents are doing? When six agents are making LLM calls, invoking tools, and passing context between phases — you need visibility into every step. In the next post, we'll dive deep into how we instrumented these agents with OpenTelemetry and the new Agents (Preview) view in Application Insights — giving you full visibility into agent runs, token usage, tool calls, and model performance. You already saw the .UseOpenTelemetry() call in the builder pipeline; Blog 2 shows what that telemetry looks like end to end and how to light up the new Agents experience in the Azure portal. Stay tuned! Resources Sample repo — app-service-multi-agent-maf-otel Microsoft Agent Framework 1.0 GA Announcement Microsoft Agent Framework Documentation Previous Series — Part 3: Client-Side Multi-Agent Orchestration on App Service Microsoft.Extensions.AI Documentation Azure App Service Documentation Blog 2: Monitor AI Agents on App Service with OpenTelemetry and the New Application Insights Agents View Blog 3: Govern AI Agents on App Service with the Microsoft Agent Governance Toolkit
jordanselig
Jun 24, 2026 Place Apps on Azure Blog
1.6KViews
0likes
0Comments
Only 8.5% of MCP Servers Use OAuth — Here's How to Host One Securely on App Service
The Model Context Protocol exploded onto the scene because it's easy. Stand up a server, expose a few tools, point Claude or VS Code at it, and your agent can suddenly read files, hit APIs, and run code. That same ease is the problem: most MCP servers ship with no authentication at all, and they're getting pushed straight to the internet. The numbers are bleak-into-an-incident-report bad. Astrix Research's State of MCP Server Security 2025 found that only 8.5% of MCP servers use OAuth — the rest lean on static API keys or nothing. And the CVEs have already started: CVE-2025-6514 — a CVSS 9.6 OS command-injection flaw in mcp-remote . If a client connects to a malicious or hijacked MCP server, the server can inject shell commands through the OAuth authorization_endpoint during discovery and achieve remote code execution on the client. Roughly half a million downloads were exposed. CVE-2025-49596 — RCE in the MCP Inspector dev tool, which shipped with no authentication on its local web UI. A crafted request from a webpage you happened to visit could execute code on your machine. The throughline: MCP doesn't enforce security at the protocol level. The spec is explicit that authorization is optional and implementation-dependent. That's a reasonable design choice for a transport, but it means you own the perimeter. Skip it, and you've published an unauthenticated RPC endpoint that can read secrets and run tools. So let's not skip it. This post walks through a hardened MCP server on Azure App Service that closes every gap above — and most of it is platform configuration, not code you have to write and get right yourself. Sample: seligj95/app-service-secure-mcp. One azd up (plus an Entra app registration the hook creates for you) gives you an MCP server behind Easy Auth, talking to Key Vault over a managed identity, with no public network access, fronted by API Management, and an Application Insights alert watching for abuse. The threat model for a hosted MCP server Before the architecture, be honest about the attack surface. When an MCP server is internet-reachable, the bad days look like this: Unauthenticated tool invocation. Anyone who finds the endpoint calls your tools. If one of them reads a database or a secret, that's the whole game. Credential exfiltration. A tool that returns a secret value — even "helpfully," for debugging — hands credentials to whatever is driving the session. Prompt injection via tool responses. A compromised or malicious tool return can carry instructions that hijack the calling agent. Path traversal / injection. A tool that concatenates user input into a file path or shell command is the same class of bug we've fought for 25 years, now with an LLM cheerfully supplying the payload. Lateral movement. A server running with a broad identity or a network line of sight to everything becomes a pivot point. The architecture below maps a defense to each one. None of it is exotic — it's the App Service security stack, pointed at MCP. The architecture Five layers, each one a checkbox or a few lines of Bicep. Let's take them in order. 1. Easy Auth — spec-compliant OAuth you don't have to write The single most important fix is also the easiest: turn on App Service built-in authentication (Easy Auth) and point it at Entra ID. Now App Service validates the token and rejects unauthenticated requests at the platform, before a single line of your Python runs. App Service Authentication also has a built-in MCP server authorization mode (Preview) that makes your server comply with the MCP authorization spec: it serves Protected Resource Metadata (PRM) so a compliant MCP client can discover the authorization server and complete the OAuth handshake itself — instead of just getting a bare 401. In the sample that's an authsettingsV2 resource: resource authSettings 'Microsoft.Web/sites/config@2024-04-01' = { parent: web name: 'authsettingsV2' properties: { globalValidation: { requireAuthentication: true unauthenticatedClientAction: 'Return401' // reject, don't redirect } identityProviders: { azureActiveDirectory: { enabled: true registration: { clientId: authClientId openIdIssuer: '${environment().authentication.loginEndpoint}${authTenantId}/v2.0' } validation: { allowedAudiences: [ 'api://${authClientId}' ] } } } } } The piece that makes it MCP-compliant — not just "returns 401" — is enabling PRM. That's one app setting that publishes the metadata document MCP clients look for: { name: 'WEBSITE_AUTH_PRM_DEFAULT_WITH_SCOPES' value: 'api://${authClientId}/user_impersonation' } unauthenticatedClientAction: 'Return401' gives a clean 401 instead of a login redirect, and PRM turns that 401 into a discoverable OAuth challenge — the client follows the metadata, signs the user in, and retries with a valid token. Recall that 8.5% figure: this is the spec-compliant OAuth the other 91.5% are missing, and you got it from configuration, not code. One gotcha worth calling out: when App Service creates the Entra registration for you, the default policy only accepts tokens the app itself obtained. For a real MCP client to connect, add its client id to the allowed-applications policy and preauthorize it on the app registration. (Entra has no Dynamic Client Registration, so the client ships a known client id; for VS Code / GitHub Copilot, preauthorization avoids a consent prompt the client won't surface.) The bonus is that the validated identity is handed to your code. App Service injects the caller's claims into every forwarded request as the X-MS-CLIENT-PRINCIPAL headers — and crucially, it strips any client-supplied copy first, so they can't be forged. The whoami tool just reads them: def _client_principal(request: Request) -> Dict[str, Any]: raw = request.headers.get("x-ms-client-principal") # base64-encoded JSON of the caller's claims, injected by Easy Auth ... return {"authenticated": bool(raw), "name": name, ...} Your tools now know who's calling without you owning any of the token machinery. 2. Managed identity — stop storing the keys to the kingdom The static-API-key habit is how secrets leak. Replace it with a system-assigned managed identity: App Service gets an Entra identity that Azure manages, and your code authenticates to Key Vault, Storage, or Azure OpenAI with no stored credential. This matters for a subtle reason the MCP authorization guidance calls out explicitly: the token a client presents represents access to your server, not to Key Vault. Never forward it downstream — use the managed identity (or an on-behalf-of token) for that hop. Pass-through is a vulnerability; delegation is the fix, and the managed identity is how you delegate without holding a secret. resource web 'Microsoft.Web/sites@2024-04-01' = { identity: { type: 'SystemAssigned' } ... } In Python, DefaultAzureCredential resolves to that identity automatically — the same code runs locally against your az login and in Azure against the MI: from azure.identity import DefaultAzureCredential from azure.keyvault.secrets import SecretClient credential = DefaultAzureCredential() client = SecretClient(vault_url=KEY_VAULT_URI, credential=credential) And least privilege is one role assignment. The sample grants the identity exactly Key Vault Secrets User — read secret values, nothing else: var keyVaultSecretsUserRoleId = '4633458b-17de-408a-b874-0445c86b69e6' resource appSecretsUser 'Microsoft.Authorization/roleAssignments@2022-04-01' = { name: guid(keyVault.id, appPrincipalId, keyVaultSecretsUserRoleId) scope: keyVault properties: { roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', keyVaultSecretsUserRoleId) principalId: appPrincipalId principalType: 'ServicePrincipal' } } There is now no secret used to read secrets. That's the chain of custody you want. 3. Key Vault references — and a tool that won't leak them Two pieces here. First, Key Vault references keep secrets out of your configuration. You point an app setting at a vault URI, and App Service resolves the value at runtime via the managed identity: { name: 'SECURE_CONFIG_VALUE' value: '@Microsoft.KeyVault(SecretUri=${secureConfigSecretUri})' } The plaintext never appears in your repo, your Bicep, or the portal's app settings blade — it shows up as a resolved reference. Second, and this is the part developers get wrong: a tool that reads a secret should never return the secret. The sample's read_secret_metadata proves the managed-identity path works end to end, then deliberately withholds the value: async def tool_read_secret_metadata(secret_name: str = "demo-secret"): secret = client.get_secret(secret_name) return { "available": True, "version": secret.properties.version, "value_length": len(secret.value), # length, never the value "note": "Value intentionally withheld — metadata only.", } If your MCP server has a get_secret tool that returns the secret, you've built a credential-exfiltration API with a friendly name. Return metadata; act on the value server-side. The same discipline applies to input. The safe_lookup tool matches against a fixed allow-list and refuses anything that smells like traversal or injection — it never touches a filesystem or a shell: suspicious = any(t in key for t in ("..", "/", "\\", ";", "|", "$(", "`")) if key in DOCS: return {"topic": key, "doc": DOCS[key], "found": True} return {"found": False, "rejected_as_suspicious": suspicious, ...} safe_lookup("../../etc/passwd") comes back rejected_as_suspicious: true . That is the entire fix for a whole class of CVEs. 4. Private endpoints + APIM — take the server off the internet Authentication is necessary but not sufficient. The strongest version of "don't expose your MCP server" is to not expose it — give the App Service and Key Vault private endpoints, disable public network access, and let API Management be the only public door. resource web 'Microsoft.Web/sites@2024-04-01' = { properties: { virtualNetworkSubnetId: appSubnetId // outbound: reach KV's private endpoint publicNetworkAccess: 'Disabled' // inbound: no public access at all ... } } Now the App Service hostname returns nothing from the internet. The only ingress is the APIM gateway, which runs the security policy before traffic ever reaches the VNet — validate the Entra JWT, rate-limit per caller, and (the documented extension point) run a content-safety check: <inbound> <base /> <validate-jwt header-name="Authorization" failed-validation-httpcode="401"> <openid-config url="https://login.microsoftonline.com/{tenant}/v2.0/.well-known/openid-configuration" /> <required-claims> <claim name="aud"><value>api://{clientId}</value></claim> </required-claims> </validate-jwt> <rate-limit-by-key calls="60" renewal-period="60" counter-key="@(context.Request.IpAddress)" /> <set-backend-service backend-id="mcp-backend" /> </inbound> This is defense in depth: APIM validates the token, and Easy Auth validates it again at the app. An attacker has to get past a public gateway with JWT enforcement and rate limiting just to reach a private endpoint that also demands a valid token. Compare that to the median MCP server, which is a raw port on the internet. The honest trade-off: this is a security reference architecture, not a 60-second demo. APIM takes ~30–45 minutes to provision, and because the app is private, you test through the gateway, not the App Service hostname. That friction is the point — it's the same friction an attacker hits. 5. Monitoring — see the abuse before it's an incident The last layer is visibility. The Azure Monitor OpenTelemetry distro auto-instruments FastAPI, and the audit_event tool emits a structured custom event per call. A scheduled-query alert watches the rate of those events and fires when tool invocations spike — the signature of an agent looping over a sensitive tool, or someone probing the surface: criteria: { allOf: [{ query: 'customEvents | where name == "mcp_tool_audit" | summarize calls = count() by bin(timestamp, 5m)' metricMeasureColumn: 'calls' operator: 'GreaterThan' threshold: 100 }] } Tune the threshold to your baseline. The point is that "is someone hammering my credential-reading tool?" becomes an alert, not a forensic exercise after the fact. Deploy it azd auth login azd up A preprovision hook creates the Entra ID app registration and stashes its client id in the azd environment, so Easy Auth and the APIM policy wire themselves up. Then Bicep provisions the VNet, private endpoints, Key Vault, App Service, APIM, and the monitoring stack. (Grab a coffee for the APIM step.) To verify, get a token and call through the gateway: TOKEN=$(az account get-access-token \ --resource "api://$(azd env get-value AZURE_AUTH_CLIENT_ID)" \ --query accessToken -o tsv) curl -s -X POST "$(azd env get-value APIM_MCP_URL)" \ -H "Authorization: Bearer $TOKEN" -H 'content-type: application/json' \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"whoami","arguments":{}}}' The response shows the authenticated principal. Drop the token and you get a 401 from APIM — exactly what you want an unauthenticated caller to see. To let a real MCP client (VS Code, Claude) sign users in itself rather than pasting a bearer token, point it at the same URL: PRM is already published, so the client discovers the auth server and runs the OAuth flow. Just make sure its app id is allowed — azd env set AZURE_MCP_CLIENT_APP_ID <client-id> before azd up adds it to the allowed-applications policy — and preauthorize it on the server's app registration so clients that don't surface a consent prompt can connect. Once it's deployed and you've verified it, take the App Service off the public internet with a one-line flip — azd env set LOCK_DOWN_WEB_APP true && azd provision . (The first deploy keeps public access on just long enough to push the code, because a fully-private app can only be deployed from inside its VNet. The sample's README walks through both phases.) Why this matters MCP is going to be the USB-C of agent tooling, and right now most of the connectors are unauthenticated and exposed. The CVEs aren't hypothetical — they have numbers and CVSS scores. But the fix isn't a research project. On App Service, the perimeter is mostly configuration: flip on Easy Auth, use a managed identity, reference Key Vault, go private, front it with APIM, and alert on the logs. That's the difference between "I shipped an MCP server" and "I shipped an MCP server I'd put in production." If you're hosting MCP — especially anywhere a compliance auditor will eventually look — start from the secure shape, not the demo shape. Try it Sample repo: github.com/seligj95/app-service-secure-mcp Astrix — State of MCP Server Security 2025: astrix.security MCP authorization spec: modelcontextprotocol.io App Service authentication: learn.microsoft.com
jordanselig
Jun 23, 2026 Place Apps on Azure Blog
593Views
0likes
0Comments
MCP Just Went Stateless — What the 2026 Spec Changes About Scaling on App Service
A couple of months ago I wrote about scaling MCP servers behind App Service's built-in load balancer. The trick back then was to lean on stateless HTTP transport so any instance could serve any request — and to make sure you turned off ARR affinity so the load balancer was actually free to spread traffic around. That post still works. But the MCP spec just caught up to it in a big way. The 2026-07-28 release candidate is the largest revision of the Model Context Protocol since it launched, and the headline change is exactly the thing we were working around: MCP is now stateless at the protocol layer. The handshake is gone, the session header is gone, and the sticky-routing-and-shared-session-store dance that horizontal deployments used to need is no longer part of the protocol at all. If you're hosting an MCP server on App Service, this is good news — and it means a few of the steps from my last post are now things the protocol does for you. Here's what changed, and what (if anything) you need to do about it. Here's the before and after, straight from the spec. In 2025-11-25 , the client POST s an initialize call to /mcp first and gets a session ID back: {"jsonrpc":"2.0","id":1,"method":"initialize", "params":{"protocolVersion":"2025-11-25","capabilities":{}, "clientInfo":{"name":"my-app","version":"1.0"}}} Heads up on timing: 2026-07-28 is a release candidate as I write this; the final spec ships July 28, 2026. It contains breaking changes, so treat this as "get ready" guidance rather than "rip everything out today." Quick recap: how we scaled MCP before In the original post, the recipe looked like this: Run the MCP server in stateless HTTP mode (the 2025-11-25 transport). Scale App Service out to N instances (the sample used three). Set clientAffinityEnabled: false so there's no ARR affinity cookie pinning a client to one instance. If you genuinely needed cross-request state, externalize it — typically into Azure Cache for Redis — so every instance saw the same data. Watch traffic spread across instances in Application Insights via cloud_RoleInstance . The catch: even in "stateless HTTP" mode, the 2025-11-25 protocol still started every connection with an initialize handshake and handed back an Mcp-Session-Id that the client had to send on every follow-up request. That session ID pinned a client to whichever instance issued it — so to scale cleanly you either kept affinity on (and gave up even load balancing) or did real work to share session state across instances. That's the part the 2026 spec deletes. What the 2026 spec actually changes The handshake and the session are gone Two proposals do the heavy lifting: SEP-2575 removes the initialize / initialized handshake. The protocol version, client info, and client capabilities that used to be exchanged once at connect time now ride along in _meta on every request. A new server/discover method lets a client ask for server capabilities when it actually wants them. SEP-2567 removes the Mcp-Session-Id header and the protocol-level session that came with it. With both gone, any MCP request can land on any instance. The sticky routing and shared session stores that horizontal deployments needed before just aren't required at the protocol layer anymore. Here's the before and after, straight from the spec. In 2025-11-25 , the client POST s an initialize call to /mcp first and gets a session ID back: {"jsonrpc":"2.0","id":1,"method":"initialize", "params":{"protocolVersion":"2025-11-25","capabilities":{}, "clientInfo":{"name":"my-app","version":"1.0"}}} …then every later call has to carry the Mcp-Session-Id header the server handed back, which pins it to that instance: {"jsonrpc":"2.0","id":2,"method":"tools/call", "params":{"name":"search","arguments":{"q":"otters"}}} In 2026-07-28 , the same tool call is one self-contained request that any instance can answer. The routing info rides in headers — MCP-Protocol-Version , Mcp-Method , and Mcp-Name — and the body carries everything else: {"jsonrpc":"2.0","id":1,"method":"tools/call", "params":{"name":"search","arguments":{"q":"otters"}, "_meta":{"io.modelcontextprotocol/clientInfo":{"name":"my-app","version":"1.0"}}}} No handshake, no session ID, nothing to pin. Traffic you can route and cache at the edge A few smaller changes make this traffic much friendlier to the infrastructure App Service already gives you: Routable headers (SEP-2243): Streamable HTTP now requires Mcp-Method and Mcp-Name headers, so load balancers, gateways, and rate-limiters can route or throttle on the operation without cracking open the request body. (Servers reject requests where the headers and body disagree.) Cacheable lists (SEP-2549): tools/list and resource-read results now carry ttlMs and cacheScope , modeled on HTTP Cache-Control . Clients know exactly how long a tool list is fresh and whether it's safe to share across users — no more holding an SSE stream open just to learn the list changed. Traceable calls (SEP-414): W3C Trace Context ( traceparent , tracestate , baggage ) propagation in _meta is now documented with fixed key names. A trace that starts in the host app can follow a tool call through the client SDK, your MCP server, and whatever it calls downstream — and show up as one span tree in any OpenTelemetry backend, including Application Insights. That last one pairs really nicely with the App Insights setup from the original sample, which already tags spans with cloud_RoleInstance . Why this is easier on App Service now App Service's built-in load balancer has always wanted to round-robin your requests. The thing stopping it from doing that cleanly with MCP was the protocol's own session affinity. Now that the protocol is stateless: No affinity tuning to reason about. You still want clientAffinityEnabled: false , but there's no longer a protocol session fighting it. Any instance serves any request, for real. Scale from 3 to 10 instances and the load balancer just spreads the work — no shared session store required for protocol state. Less Redis glue. In the old model, Redis was often there to share protocol session state. That reason is gone (see the next section for what Redis is still great for). "Stateless protocol" doesn't mean "stateless app" This is the part I want to be really clear about, because it's easy to over-read the headline. Removing the protocol session does not mean your application can't have state. It means the protocol stops carrying state for you. If your server needs to remember something across calls, you do what HTTP APIs have always done: mint an explicit handle and let the model pass it back as an argument. The spec calls this the explicit-handle pattern. A tool returns a basket_id (or browser_id , or whatever), and later calls include that ID as a normal parameter: // 1) create returns a handle {"name": "create_basket", "arguments": {}} // -> { "basket_id": "b_12345" } // 2) later calls pass it back as an ordinary argument {"name": "add_item", "arguments": {"basket_id": "b_12345", "sku": "ABC"}} The nice side effect: the model can see the handle, compose it across tools, and hand it off between steps — in ways that session state hidden in transport metadata never really allowed. So where does Redis fit now? Exactly where it always belonged — your application's data, not the protocol's plumbing: Backing store for those explicit handles (what's actually in basket b_12345 ). Caching expensive lookups or model responses across instances. App-level conversation memory or rate-limit counters. Stateless protocol, stateful application. You externalize state because your app needs it shared, not because the transport forces you to. Migrating an existing MCP server on App Service If you deployed the original sample (or something like it), here's the punch list to get to the 2026 model. The good news: the App Service / infra side barely changes — most of the work is in the protocol layer your SDK handles for you. App Service config — mostly already done: Keep clientAffinityEnabled: false . (Still the right call.) Keep scaling out to N instances. Nothing here changes. Keep Application Insights + OpenTelemetry — and lean into the new Trace Context key names for cleaner end-to-end traces. Protocol layer — the real work: Update to an SDK build that speaks 2026-07-28 . The handshake and session handling go away; your server reads protocol version and client info from _meta per request instead of from an initialize exchange. Emit ttlMs / cacheScope on tools/list and resource reads so clients (and your gateway) can cache them. Make sure your server honors / validates the Mcp-Method and Mcp-Name headers. If you were storing anything keyed off Mcp-Session-Id , move it to the explicit-handle pattern (handle in, handle out, state in Redis/Cosmos/etc.). Audit for the breaking bits: tasks/list is removed, Roots/Sampling/Logging are deprecated, and the "resource not found" error code moves from -32002 to the standard -32602 . I built a standalone companion sample for exactly this — the 2026-07-28 version of the original, with the handshake gone, everything read from _meta , server/discover implemented, and the explicit-handle pattern shown in a real tool. Link below. Try it yourself I built a companion sample for this post: a FastAPI MCP server that speaks 2026-07-28 natively — no handshake, no session — running on three App Service instances behind the built-in load balancer, with a staging slot, App Insights, a spec-compliant client, and a k6 load test: 👉 seligj95/app-service-mcp-stateless-scale-2026-python azd auth login azd up That provisions a Premium v3 plan with capacity: 3 , the web app with clientAffinityEnabled: false , a staging slot, and Log Analytics + Application Insights. No initialize , no Mcp-Session-Id anywhere — discovery is a single server/discover call, and every request carries its own protocol version and client info in _meta . The part I like best is the tally tool. It keeps a running total across calls using an explicit, signed handle instead of a session — so you can watch the total stay correct even as the load balancer routes each call to a different instance: +10 -> total=10 served_by=2103650c... +5 -> total=15 served_by=08fc7022... (different instance, total still right) +100 -> total=115 served_by=08fc7022... That's the stateless handle pattern from earlier, made concrete: state travels with the request, not the connection. Then watch the load spread in Application Insights: requests | where timestamp > ago(15m) | where name contains "/mcp" | summarize count() by cloud_RoleInstance Want the 2025-11-25 version for comparison? That's the original Part 1 sample: seligj95/app-service-mcp-stateless-scale-python. Diff the two main.py files and you can see the handshake and session handling simply disappear. The takeaway When I wrote the first post, "make MCP stateless so App Service can load-balance it" was a pattern you had to apply. With the 2026 spec, it's just how MCP works. The protocol deleted the exact friction we were routing around — which means hosting a horizontally scaled MCP server on App Service is now closer to "deploy a normal web app and scale it out" than ever. If you're already running MCP on App Service: you did the hard part early. The spec just made it official. Got an MCP server running on App Service? I'd love to hear how the migration goes — drop a comment.
jordanselig
Jun 23, 2026 Place Apps on Azure Blog
1.1KViews
0likes
0Comments
How Many Copies of Each Layer Does Your Container Registry Actually Need?
Authors: Payal Mahesh and Vicky Lin Azure Container Registry team: Jeanine Burke and Johnson Shi Introduction It's Monday morning. You spin up a fresh 1,000-node AKS cluster for a big training run or a fleet-wide rollout. Every node reaches for the same large container image at the same instant. What actually happens in the next ten minutes - and whether your pods reach Ready in 9 minutes or 14 - turns out to depend on a single number you've probably never thought about: how many copies of each image layer exist behind your registry. At the surface, you see a single capacity number for your registry size - but behind that abstraction, Azure Container Registry maintains copies of your layer data to optimize pull performance. That number of copies directly determines the read throughput available per layer. Each copy can serve requests independently, so distributing the layer across storage allows it to be read in parallel. More copies mean more independent readers - and higher aggregate throughput when thousands of nodes pull at once. The intuitive answer is that more is better: add copies, get faster pulls. When we actually tested it at 1,000-node scale, the truth turned out to be more interesting: A few extra copies helped a little. A moderate number helped a lot, and eliminated storage throttling entirely. A large number helped no more than the moderate one. A huge number actually made pulls slower again. Think of it like opening checkout lanes at a grocery store. Opening a few more lanes when the store is slammed cuts the line dramatically. Past a certain point, though, extra lanes barely help, because by then it's the customers, not the cashiers, who are the bottleneck. And open too many? Now the staff is spread thin and tripping over each other, and the line moves worse than it did at the sweet spot. This post walks through what we measured, why the curve bends where it does, and what we're building next so finding that sweet spot isn't something anyone has to do by hand. Key Takeaways There's a sweet spot, not a slope. Adding copies per layer cut pod-startup P99 by 27% and raised P50 per-node egress throughput by 244%, but only up to a point. Past that, the returns vanish, and far past it, latency actually regresses. Storage throttling is the real enemy. The win comes from spreading load across enough storage backends that no single backend gets pinned at its egress ceiling. Once throttling is gone, more copies stop helping. Storage scale alone has a ceiling. Even at the sweet spot, the per-backend egress limit caps total throughput. The next jump in performance has to come from somewhere else, which is exactly what we're building (see What's Next). This isn't something customers should need to manage. We're building a proactive, on-demand storage scaling capability that automatically grows the footprint before throttling happens and shrinks it back when the burst is over. A quick bit of background Within a region, the layer data behind your container images is backed by Azure storage. The number of copies ACR maintains per layer determines how many independent storage backends a concurrent-pull workload can spread its reads across. That's what matters, because each backend has a finite egress ceiling. Once concurrent reads against one backend get close to that ceiling, requests start getting throttled, and your pulls slow down in proportion. The principle is simple: more copies per layer means more backends serving the same data, which means more total egress headroom and fewer throttled requests. What we wanted data on was how many, and where it stops helping. How we tested We ran a controlled series of large-scale pull tests against ACR Premium on a roughly 1,000-node cluster, with every node pulling the same large image cold at the same time (no local cache on any node). The only thing we changed between runs was the number of per-layer copies behind a single registry endpoint. Everything else, including rate limits, the image, node count, and concurrency, stayed constant. For each run we measured pod-startup latency (P50/P90/P99), end-to-end storage read latency, egress throughput distributions (P50-P99.9), and storage throttling events. Pod-startup latency is our headline metric, because it's the one number that reflects the actual customer experience no matter where the bottleneck happens to be. Per-node egress throughput matters too, though. It tells you directly how much pull bandwidth ACR delivers to your fleet, and it's usually what customers have in mind when they ask how much faster extra copies will make their pulls. We report egress as a distribution rather than a single average, since per-request and per-time-window views can tell very different stories about the same set of pulls. These are observations from a single controlled environment, not a service guarantee. Absolute numbers will move with image size, node count, layer composition, network topology, and concurrency. What we found We tested five configurations, sweeping from a low baseline number of per-layer copies up to a very high one. We name them by relative copy count rather than exact instance counts: Baseline: the lowest level, our reference point. Low: a modest step up from Baseline. Mid: a meaningful step up from Low. Higher: a further step up from Mid. Very high: the largest configuration we tested, well above Higher. Here are the numbers. All percent changes are relative to Baseline. Configuration Pod startup P50 Pod startup P90 Pod startup P99 Storage throttling events Peak per-backend egress Baseline (fewest copies) 9m 36s 11m 0s 14m 16s Many; all top backends above the egress ceiling Highest Low 9m 27s (−2%) 10m 14s (−7%) 12m 59s (−9%) Some; one backend still above the ceiling High Mid 9m 25s (−2%) 9m 45s (−11%) 10m 22s (−27%) Zero Below the ceiling Higher 9m 20s (−3%) 9m 37s (−13%) 10m 22s (−27%) Zero Well below the ceiling Very high 9m 28s (−1%) 10m 31s (−4%) 13m 48s (−3%) Zero Lowest Look at the P99 pod-startup column from top to bottom: 14m 16s, 12m 59s, 10m 22s, 10m 22s, 13m 48s. It improves, flattens out, then climbs back up. Three things explain that shape: 1. The win: Throttling falls off a cliff at the Mid configuration As we added copies per layer, per-backend egress fell and storage-side throttling decreased. At the Mid configuration, throttling errors hit zero, and they stayed at zero for every configuration above it. The upside isn't just that the errors went away, though. It's raw pull bandwidth. At the Mid sweet spot, the typical node saw its P50 egress throughput jump 244% over Baseline. With load spread across enough copies, each node pulled its layers off storage much faster, not just without stalling. For a workload owner, that's the difference between watching pods come up in a steady stream and watching them stall for tens of seconds at a time while throttling clears. Same image, same node count, same registry, very different experience. To put it in concrete terms: if your team runs a daily AI training kickoff that needs all 1,000 nodes pulling before the job can start, this is the difference between starting on time and starting four minutes late every day. Over a quarter of training runs, that adds up. 2. The surprise: more copies made pulls slower This is the finding that genuinely surprised us. Going from Higher to Very high, the largest configuration we tested, cost us 3 minutes and 26 seconds at P99: 10m 22s climbing back up to 13m 48s. That gave back almost the entire benefit we'd built up over the previous four configurations. Tail storage-read latency at Very high actually came out worse than Baseline. The Very high run is where the wheels came off, and the reason is the trade-off underneath. Once storage throttling is gone, more copies stop buying you anything, and the cost of fanning reads across that many backends starts to take over. The throughput distribution shows it clearly. P50 and P75 throughput had been climbing steadily and getting smoother through Mid and Higher, then dropped sharply at Very high while the peak P99/P99.9 spikes came back. Spread the same load across too many backends and it fragments into smaller, less consistent bursts. The takeaway is that "more is better" stops being true past the sweet spot, and the failure mode is quiet. You won't see throttling errors. You'll just see your pulls get slower. 3. What we didn't expect: at few copies, the hottest backend is what hurts you At the lowest copy counts, pull traffic wasn't spread evenly across the underlying storage footprint. Some backends absorbed far more traffic than others. As we added copies, that distribution evened out and the hottest backends cooled down. The implication is sharp. You can saturate the busiest backend, and trigger throttling, even when the total headroom across all your backends is large in aggregate. What matters is the load on the hottest backend, not the average. That's exactly the failure mode that demand-driven, proactive scaling (described below) is meant to head off before it happens. So how should you think about this? You don't size copies yourself; ACR manages the storage footprint behind your registry. Still, it helps to understand what moves the sweet spot, because the shape of your own workload is what decides where it lands. The bigger your worst-case concurrent burst (more nodes, larger images, higher concurrency), the more copies per layer it takes to keep pulls off the throttling ceiling, and the further out the sweet spot sits. Smaller workloads may already be sitting on the flat part of the curve. One thing is worth saying plainly. The storage footprint underneath is managed by ACR and shared across many registries, so there's no fixed, private storage budget that maps one-to-one to your workload. The sweet spot isn't a number you compute and provision; it's a behavior the platform has to land on for you, which is exactly why we're moving toward demand-driven scaling that handles it automatically. That's what brings us to what we're building next. What's next: proactive, on-demand storage scaling and a caching layer The fixed-copy tests above answer the question "how many should the ACR system provision?" but they assume a single, static answer. Real workloads aren't static. A 1,000-node burst happens at deploy time, not at 3 a.m. on a Tuesday. And no matter how many copies are provisioned, the per-backend storage ceiling still bounds peak deliverable throughput. So we're investing along two complementary directions. 1. Proactive, demand-driven storage scaling We're building a capability that adjusts the number of per-layer copies automatically based on real-time pull demand: Proactive, not reactive. The system scales the storage footprint before concurrent pull pressure pushes any single backend near the throttling threshold, so throttling is prevented before it forms rather than cleaned up after the fact. On-demand scale-out. The footprint expands automatically as sustained pull demand grows. Scale-in when demand subsides. The footprint contracts so you're not paying for steady-state capacity you only needed during a burst. Tiering for cold content. Long-tail, rarely-pulled content can sit on colder storage, so the redundant footprint of frequently-pulled content doesn't pay full hot-storage cost everywhere. The benefit to customers is straightforward: smoother pulls under burst, higher delivered throughput on average, no permanent over-provisioning, and no manual re-tuning as workloads grow. 2. A caching layer to absorb burst beyond the storage ceiling Even a perfectly scaled storage footprint runs into the per-backend egress ceiling at extreme scale. To push past it, we're investing in a caching layer in the registry service that absorbs burst traffic before it ever reaches storage. A pull surge that hits the same set of layers, which is the common case for fleet-wide deployments, can be served largely from cache. That takes a lot of load off any single storage backend and complements the storage scaling above. We'll share results from this work in follow-up posts. If you have questions about scaling ACR for your workload, or about how we measure storage performance, reach out on the Azure Container Registry GitHub repository. Note: All results in this post are based on controlled internal testing configurations and are intended to illustrate general scaling behavior rather than prescribe exact configurations.
payalmahesh
Jun 22, 2026 Place Apps on Azure Blog
222Views
0likes
0Comments
Announcing the Public Preview of the New App Service Quota Self-Service Experience
Update 10/30/2025: The App Service Quota Self-Service experience is back online after a short period where we were incorporating your feedback and making needed updates. As this is public preview, availability and features are subject to change as we receive and incorporate feedback. What’s New? The updated experience introduces a dedicated App Service Quota blade in the Azure portal, offering a streamlined and intuitive interface to: View current usage and limits across the various SKUs Set custom quotas tailored to your App Service plan needs This new experience empowers developers and IT admins to proactively manage resources, avoid service disruptions, and optimize performance. Quick Reference - Start here! Leverage the new self-service experience to increase your quota automatically. If your deployment requires quota for ten or more subscriptions, then file a support ticket with problem type Quota following the instructions at the bottom of this post. If any subscription included in your request requires zone redundancy (note that most Isolated v2 deployments require ZR), then file a support ticket with problem type Quota following the instructions at the bottom of this post. Self-service Quota Requests For non-zone-redundant needs, quota alone is sufficient to enable App Service deployment or scale-out. Follow the provided steps to place your request. 1. Navigate to the Quotas resource provider in the Azure portal 2. Select App Service (Pubic Preview) Navigating the primary interface: Each App Service VM size is represented as a separate SKU. If the intention is to be able to scale up or down within a specific offering (e.g., Premium v3), then equivalent number of VMs need to be requested for each applicable size of that offering (e.g., request 5 instances for both P1v3 and P3v3). As with other quotas, you can filter by region, subscription, provider, or usage. Note that your portal will now show "App Service (Public Preview)" for the Provider name. You can also group the results by usage, quota (App Service VM type), or location (region). Current usage is represented as App Service VMs. This allows you to quickly identify which SKUs are nearing their quota limits. Adjustments can be made inline: no need to visit another page. This is covered in detail in the next section. Total Regional VMs: There is a SKU in each region called Total Regional VMs. This SKU summarizes your usage and available quota across all individual SKUs in that region. There are three key points about using Total Regional VMs. You should never request Total Regional VMs quota directly - it will automatically increase in response to your request for individual SKU quota. If you are unable to deploy a given SKU, then you must request more quota for that SKU to unblock deployment. For your deployment to succeed, you must have sufficient quota in the individual SKU as well as Total Regional VMs. If either usage is at its respective limit, then you will be unable to deploy and must request more of that individual SKU's quota to proceed. In some regions, Total Regional VMs appears as "0 of 0" usage and limit and no individual SKU quotas are shown. This is an indication that you should not interact with the portal to resolve any quota-related issues in this region. Instead, you should try the deployment and observe any error messages that arise. If any error messages indicate more quota is needed, then this must be requested by filing a support ticket with problem type Quota following the instructions at the bottom of this post so that App Service can identify and fix any potential quota issues. In most cases, this will not be necessary, and your deployment will work without requesting quota wherever "0 of 0" is shown for Total Regional VMs and no individual SKU quotas are visible. See the example below: 3. Request quota adjustments Clicking the pen icon opens a flyout window to capture the quota request: The quota type (App Service SKU) is already populated, along with current usage. Note that your request is not incremental: you must specify the new limit that you wish to see reflected in the portal. For example, to request two additional instances of P1v2 VMs, you would file the request like this: Click submit to send the request for automatic processing. How quota approvals work: Immediately upon submitting a quota request, you will see a processing dialog like the one shown: If the quota request can be automatically fulfilled, then no support request is needed. You should receive this confirmation within a few minutes of submission: If the request cannot be automatically fulfilled, then you will be given the option to file a support request with the same information. In the example below, the requested new limit exceeds what can be automatically granted for the region: 4. If applicable, create support ticket If automatic quota fulfillment fails, and it recommend you “Create a support request”, then follow the steps given in the at the end of this post. Known issues The self-service quota request experience for App Service is in public preview. Here are some caveats worth mentioning while the team finalizes the release for general availability: Closing the quota request flyout window will stop meaningful notifications for that request. You can still view the outcome of your quota requests by checking actual quota, but if you want to rely on notifications for alerts, then we recommend leaving the quota request window open for the few minutes that it is processing. Some SKUs are not yet represented in the quota dashboard. These will be added later in the public preview. The Activity Log does not currently provide a meaningful summary of previous quota requests and their outcomes. This will also be addressed during the public preview. As noted in the walkthrough, the new experience does not enable zone-redundant deployments. Quota is an inherently regional construct, and zone-redundant enablement requires a separate step that can only be taken in response to a support ticket being filed. Quota API documentation is being drafted to enable bulk non-zone redundant quota requests without requiring you to file a support ticket. Create a support request While we are continuously improving the system to automatically process quota requests, there are certain scenarios you might need to create a support request: Automatic fulfillment request failed on quota blade. Deployment requires zone-redundancy You want to make bulk request for ten or more subscriptions When creating a support request, select your Issue type as “Service and subscription limits (quotas)” and Quota type as “Function or Web App (Windows and Linux)”. Select Next. You can then fill in your quota requirements by clicking on “Enter details”. There are 4 mandatory fields you must provide in your request: Region – Quota limits are based on region. If you are facing quota limits in one region, you can always try deployment in a geographically paired region. For example, West US 2 and West Central US are paired regions. East Asia (Hong Kong) and Southeast Asia (Singapore) are also paired regions. See Azure cross-region replication pairings for all geographies for more information. Deployment type – This is another important criterion when submitting quota request. If you are not sure which deployment type your App Service Plan is using, you can check it here on the portal: App >> App Service Plan >> Zone redundant >> Enabled\Disabled. App Service plan – Each SKU in your subscription and the region selected above, will have its own limit. Choose the SKU based on your deployment requirement. You will be able to see the current usage and the limit on that SKU. New limit – You must submit the new limit that you want for the SKU selected above. Do not add the increment value. The new limit must be higher than the existing limit. If you choose to create a support ticket, then you will interact with the capacity management team for that region. This is a 24x7 service, so requests may be created at any time. Once you have filed the support request, you can track its status via the Help + support dashboard. Note for Logic Apps You can now self-serve your Logic App quota requirements using the same App Services quotas blade. You must choose one of the Logic App SKUs (WS1, WS2, WS3) when making the request, and it will be processed in the same way App Services requests are processed. We want your feedback! If you notice any aspect of the experience that does not work as expected, or you have feedback on how to make it better, please use the comments below to share your thoughts!
jordanselig
Jun 22, 2026 Place Apps on Azure Blog
15KViews
4likes
36Comments
Tutorial:A graceful process to develop and deploy Docker Containers to Azure with Visual Studio Code
Creating and deploying Docker containers to Azure resources manually can be a complicated and time-consuming process. This tutorial outlines a graceful process for developing and deploying a Linux Docker container on your Windows PC, making it easy to deploy to Azure resources. This tutorial emphasizes using the user interface to complete most of the steps, making the process more reliable and understandable. While there are a few steps that require the use of command lines, the majority of tasks can be completed using the UI. This focus on the UI is what makes the process graceful and user-friendly. In this tutorial, we will use a Python Flask application as an example, but the steps should be similar for other languages such as Node.js. Prerequisites: Before you begin, you'll need to have the following prerequisites set up: WSL 2 installation WSL provides a great way to develop your Linux application on a Windows machine, without worrying about compatibility issues when running in a Linux environment. We recommend installing WSL 2 as it has better support with Docker. To install WSL 2, open PowerShell or Windows Command Prompt in administrator mode, enter below command: wsl --install And then restart your machine. You'll also need to install the WSL extension in your Visual Studio Code. Python 3 installation Run “wsl” in your command prompt. Then run following commands to install python 3.10 (if you use Python 3.5 or a lower version, you may need to install venv by yourself): sudo apt-get update sudo apt-get upgrade sudo apt install python3.10 Docker for Linux You'll need to install Docker in your Linux environment. For Ubuntu, please refer to below official documentation: https://docs.docker.com/engine/install/ubuntu/ Docker for Windows To create an image for your application in WSL, you'll need Docker Desktop for Windows. Download the installer from below Docker website and run the downloaded file to install it. https://www.docker.com/products/docker-desktop/ Steps for Developing and Deployment 1. Connect Visual Studio Code to WSL To develop your project in Visual Studio Code in WSL, you need to click the bottom left blue button: Then select “Connect to WSL” or “Connect to WSL using Distro”: 2. Install some extensions for Visual Studio Code Below two extensions have to be installed after you connect Visual Studio Code to WSL. The Docker extension can help you create Dockerfile automatically and highlight the syntax of Dockerfile. Please search and install via Visual Studio Code Extension. To deploy your container to Azure in Visual Studio Code, you also need to have Azure Tools installed. 3. Create your project folder Click "Terminal" in menu, and click "New Terminal": Then you should see a terminal for your WSL. I use a quick simple Flask application here for example, so I run below command to clone its git project: git clone https://github.com/Azure-Samples/msdocs-python-flask-webapp-quickstart 4. Python Environment setup (optional) After you install Python 3 and create project folder. It is recommended to create your own project python environment. It makes your runtime and modules easy to be managed. To setup your Python Environment in your project, you need to run below commands in the terminal: cd msdocs-python-flask-webapp-quickstart python3 -m venv .venv Then after you open the folder, you will be able to see some folders are created in your project: Then if you open the app.py file, you can see it used the newly created python environment as your python environment: If you open a new terminal, you also find the prompt shows that you are now in new python environment as well: Then run below command to install the modules required in the requirement.txt: pip install -r requirements.txt 5. Generate a Dockerfile for your application To create a docker image, you need to have a Dockerfile for your application. You can use Docker extension to create the Dockerfile for you automatically. To do this, enter ctrl+shift+P and search "Dockerfile" in your Visual Studio Code. Then select “Docker: Add Docker Files to Workspace” You will be required to select your programming languages and framework(It also supports other language such as node.js, java, node). I select “Python Flask”. Firstly, you will be asked to select the entry point file. I select app.py for my project. Secondly, you will be asked the port your application listens on. I select 80. Finally, you will be asked if Docker Compose file is included. I select no as it is not multi-container. A Dockefile like below is generated: Note: If you do not have requirements.txt file in the project, the Docker extension will create one for you. However, it DOES NOT contain all the modules you installed for this project. Therefore, it is recommended to have the requirements.txt file before you create the Dockerfile. You can run below command in the terminal to create the requirements.txt file: pip freeze > requirements.txt After the file is generated, please add “gunicorn” in the requirements.txt if there is no "gunicorn" as the Dockerfile use it to launch your application for Flask application. Please review the Dockerfile it generated and see if there is anything need to modify. You will also find there is a .dockerignore file is generated too. It contains the file and the folder to be excluded from the image. Please also check it too see if it meets your requirement. 6. Build the Docker Image You can use the Docker command line to build image. However, you can also right-click anywhere in the Dockefile and select build image to build the image: Please make sure that you have Docker Desktop running in your Windows. Then you should be able to see the docker image with the name of the project and tag as "latest" in the Docker extension. 7. Push the Image to Azure Container Registry Click "Run" for the Docker image you created and check if it works as you expected. Then, you can push it to the Azure Container Registry (ACR). Click "Push" and select "Azure". You may need to create a new registry if there isn't one. Answer the questions that Visual Studio Code asks you, such as subscription and ACR name, and then push the image to the ACR. 8. Deploy the image to Azure Resources Follow the instructions in the following documents to deploy the image to the corresponding Azure resource: Azure App Service or Azure Container App: Deploy a containerized app to Azure (visualstudio.com) Opens in new window or tab Container Instance: Deploy container image from Azure Container Registry using a service principal - Azure Container Instances | Microsoft Learn Opens in new window or tab
yorkzhang
Jun 18, 2026 Place Apps on Azure Blog
6.9KViews
4likes
1Comment