python
109 TopicsMCP Just Went Stateless — What the 2026 Spec Changes About Scaling on App Service
A couple of months ago I wrote about scaling MCP servers behind App Service's built-in load balancer. The trick back then was to lean on stateless HTTP transport so any instance could serve any request — and to make sure you turned off ARR affinity so the load balancer was actually free to spread traffic around. That post still works. But the MCP spec just caught up to it in a big way. The 2026-07-28 release candidate is the largest revision of the Model Context Protocol since it launched, and the headline change is exactly the thing we were working around: MCP is now stateless at the protocol layer. The handshake is gone, the session header is gone, and the sticky-routing-and-shared-session-store dance that horizontal deployments used to need is no longer part of the protocol at all. If you're hosting an MCP server on App Service, this is good news — and it means a few of the steps from my last post are now things the protocol does for you. Here's what changed, and what (if anything) you need to do about it. Here's the before and after, straight from the spec. In 2025-11-25 , the client POST s an initialize call to /mcp first and gets a session ID back: {"jsonrpc":"2.0","id":1,"method":"initialize", "params":{"protocolVersion":"2025-11-25","capabilities":{}, "clientInfo":{"name":"my-app","version":"1.0"}}} Heads up on timing: 2026-07-28 is a release candidate as I write this; the final spec ships July 28, 2026. It contains breaking changes, so treat this as "get ready" guidance rather than "rip everything out today." Quick recap: how we scaled MCP before In the original post, the recipe looked like this: Run the MCP server in stateless HTTP mode (the 2025-11-25 transport). Scale App Service out to N instances (the sample used three). Set clientAffinityEnabled: false so there's no ARR affinity cookie pinning a client to one instance. If you genuinely needed cross-request state, externalize it — typically into Azure Cache for Redis — so every instance saw the same data. Watch traffic spread across instances in Application Insights via cloud_RoleInstance . The catch: even in "stateless HTTP" mode, the 2025-11-25 protocol still started every connection with an initialize handshake and handed back an Mcp-Session-Id that the client had to send on every follow-up request. That session ID pinned a client to whichever instance issued it — so to scale cleanly you either kept affinity on (and gave up even load balancing) or did real work to share session state across instances. That's the part the 2026 spec deletes. What the 2026 spec actually changes The handshake and the session are gone Two proposals do the heavy lifting: SEP-2575 removes the initialize / initialized handshake. The protocol version, client info, and client capabilities that used to be exchanged once at connect time now ride along in _meta on every request. A new server/discover method lets a client ask for server capabilities when it actually wants them. SEP-2567 removes the Mcp-Session-Id header and the protocol-level session that came with it. With both gone, any MCP request can land on any instance. The sticky routing and shared session stores that horizontal deployments needed before just aren't required at the protocol layer anymore. Here's the before and after, straight from the spec. In 2025-11-25 , the client POST s an initialize call to /mcp first and gets a session ID back: {"jsonrpc":"2.0","id":1,"method":"initialize", "params":{"protocolVersion":"2025-11-25","capabilities":{}, "clientInfo":{"name":"my-app","version":"1.0"}}} …then every later call has to carry the Mcp-Session-Id header the server handed back, which pins it to that instance: {"jsonrpc":"2.0","id":2,"method":"tools/call", "params":{"name":"search","arguments":{"q":"otters"}}} In 2026-07-28 , the same tool call is one self-contained request that any instance can answer. The routing info rides in headers — MCP-Protocol-Version , Mcp-Method , and Mcp-Name — and the body carries everything else: {"jsonrpc":"2.0","id":1,"method":"tools/call", "params":{"name":"search","arguments":{"q":"otters"}, "_meta":{"io.modelcontextprotocol/clientInfo":{"name":"my-app","version":"1.0"}}}} No handshake, no session ID, nothing to pin. Traffic you can route and cache at the edge A few smaller changes make this traffic much friendlier to the infrastructure App Service already gives you: Routable headers (SEP-2243): Streamable HTTP now requires Mcp-Method and Mcp-Name headers, so load balancers, gateways, and rate-limiters can route or throttle on the operation without cracking open the request body. (Servers reject requests where the headers and body disagree.) Cacheable lists (SEP-2549): tools/list and resource-read results now carry ttlMs and cacheScope , modeled on HTTP Cache-Control . Clients know exactly how long a tool list is fresh and whether it's safe to share across users — no more holding an SSE stream open just to learn the list changed. Traceable calls (SEP-414): W3C Trace Context ( traceparent , tracestate , baggage ) propagation in _meta is now documented with fixed key names. A trace that starts in the host app can follow a tool call through the client SDK, your MCP server, and whatever it calls downstream — and show up as one span tree in any OpenTelemetry backend, including Application Insights. That last one pairs really nicely with the App Insights setup from the original sample, which already tags spans with cloud_RoleInstance . Why this is easier on App Service now App Service's built-in load balancer has always wanted to round-robin your requests. The thing stopping it from doing that cleanly with MCP was the protocol's own session affinity. Now that the protocol is stateless: No affinity tuning to reason about. You still want clientAffinityEnabled: false , but there's no longer a protocol session fighting it. Any instance serves any request, for real. Scale from 3 to 10 instances and the load balancer just spreads the work — no shared session store required for protocol state. Less Redis glue. In the old model, Redis was often there to share protocol session state. That reason is gone (see the next section for what Redis is still great for). "Stateless protocol" doesn't mean "stateless app" This is the part I want to be really clear about, because it's easy to over-read the headline. Removing the protocol session does not mean your application can't have state. It means the protocol stops carrying state for you. If your server needs to remember something across calls, you do what HTTP APIs have always done: mint an explicit handle and let the model pass it back as an argument. The spec calls this the explicit-handle pattern. A tool returns a basket_id (or browser_id , or whatever), and later calls include that ID as a normal parameter: // 1) create returns a handle {"name": "create_basket", "arguments": {}} // -> { "basket_id": "b_12345" } // 2) later calls pass it back as an ordinary argument {"name": "add_item", "arguments": {"basket_id": "b_12345", "sku": "ABC"}} The nice side effect: the model can see the handle, compose it across tools, and hand it off between steps — in ways that session state hidden in transport metadata never really allowed. So where does Redis fit now? Exactly where it always belonged — your application's data, not the protocol's plumbing: Backing store for those explicit handles (what's actually in basket b_12345 ). Caching expensive lookups or model responses across instances. App-level conversation memory or rate-limit counters. Stateless protocol, stateful application. You externalize state because your app needs it shared, not because the transport forces you to. Migrating an existing MCP server on App Service If you deployed the original sample (or something like it), here's the punch list to get to the 2026 model. The good news: the App Service / infra side barely changes — most of the work is in the protocol layer your SDK handles for you. App Service config — mostly already done: Keep clientAffinityEnabled: false . (Still the right call.) Keep scaling out to N instances. Nothing here changes. Keep Application Insights + OpenTelemetry — and lean into the new Trace Context key names for cleaner end-to-end traces. Protocol layer — the real work: Update to an SDK build that speaks 2026-07-28 . The handshake and session handling go away; your server reads protocol version and client info from _meta per request instead of from an initialize exchange. Emit ttlMs / cacheScope on tools/list and resource reads so clients (and your gateway) can cache them. Make sure your server honors / validates the Mcp-Method and Mcp-Name headers. If you were storing anything keyed off Mcp-Session-Id , move it to the explicit-handle pattern (handle in, handle out, state in Redis/Cosmos/etc.). Audit for the breaking bits: tasks/list is removed, Roots/Sampling/Logging are deprecated, and the "resource not found" error code moves from -32002 to the standard -32602 . I built a standalone companion sample for exactly this — the 2026-07-28 version of the original, with the handshake gone, everything read from _meta , server/discover implemented, and the explicit-handle pattern shown in a real tool. Link below. Try it yourself I built a companion sample for this post: a FastAPI MCP server that speaks 2026-07-28 natively — no handshake, no session — running on three App Service instances behind the built-in load balancer, with a staging slot, App Insights, a spec-compliant client, and a k6 load test: 👉 seligj95/app-service-mcp-stateless-scale-2026-python azd auth login azd up That provisions a Premium v3 plan with capacity: 3 , the web app with clientAffinityEnabled: false , a staging slot, and Log Analytics + Application Insights. No initialize , no Mcp-Session-Id anywhere — discovery is a single server/discover call, and every request carries its own protocol version and client info in _meta . The part I like best is the tally tool. It keeps a running total across calls using an explicit, signed handle instead of a session — so you can watch the total stay correct even as the load balancer routes each call to a different instance: +10 -> total=10 served_by=2103650c... +5 -> total=15 served_by=08fc7022... (different instance, total still right) +100 -> total=115 served_by=08fc7022... That's the stateless handle pattern from earlier, made concrete: state travels with the request, not the connection. Then watch the load spread in Application Insights: requests | where timestamp > ago(15m) | where name contains "/mcp" | summarize count() by cloud_RoleInstance Want the 2025-11-25 version for comparison? That's the original Part 1 sample: seligj95/app-service-mcp-stateless-scale-python. Diff the two main.py files and you can see the handshake and session handling simply disappear. The takeaway When I wrote the first post, "make MCP stateless so App Service can load-balance it" was a pattern you had to apply. With the 2026 spec, it's just how MCP works. The protocol deleted the exact friction we were routing around — which means hosting a horizontally scaled MCP server on App Service is now closer to "deploy a normal web app and scale it out" than ever. If you're already running MCP on App Service: you did the hard part early. The spec just made it official. Got an MCP server running on App Service? I'd love to hear how the migration goes — drop a comment.682Views0likes0CommentsTutorial:A graceful process to develop and deploy Docker Containers to Azure with Visual Studio Code
Creating and deploying Docker containers to Azure resources manually can be a complicated and time-consuming process. This tutorial outlines a graceful process for developing and deploying a Linux Docker container on your Windows PC, making it easy to deploy to Azure resources. This tutorial emphasizes using the user interface to complete most of the steps, making the process more reliable and understandable. While there are a few steps that require the use of command lines, the majority of tasks can be completed using the UI. This focus on the UI is what makes the process graceful and user-friendly. In this tutorial, we will use a Python Flask application as an example, but the steps should be similar for other languages such as Node.js. Prerequisites: Before you begin, you'll need to have the following prerequisites set up: WSL 2 installation WSL provides a great way to develop your Linux application on a Windows machine, without worrying about compatibility issues when running in a Linux environment. We recommend installing WSL 2 as it has better support with Docker. To install WSL 2, open PowerShell or Windows Command Prompt in administrator mode, enter below command: wsl --install And then restart your machine. You'll also need to install the WSL extension in your Visual Studio Code. Python 3 installation Run “wsl” in your command prompt. Then run following commands to install python 3.10 (if you use Python 3.5 or a lower version, you may need to install venv by yourself): sudo apt-get update sudo apt-get upgrade sudo apt install python3.10 Docker for Linux You'll need to install Docker in your Linux environment. For Ubuntu, please refer to below official documentation: https://docs.docker.com/engine/install/ubuntu/ Docker for Windows To create an image for your application in WSL, you'll need Docker Desktop for Windows. Download the installer from below Docker website and run the downloaded file to install it. https://www.docker.com/products/docker-desktop/ Steps for Developing and Deployment 1. Connect Visual Studio Code to WSL To develop your project in Visual Studio Code in WSL, you need to click the bottom left blue button: Then select “Connect to WSL” or “Connect to WSL using Distro”: 2. Install some extensions for Visual Studio Code Below two extensions have to be installed after you connect Visual Studio Code to WSL. The Docker extension can help you create Dockerfile automatically and highlight the syntax of Dockerfile. Please search and install via Visual Studio Code Extension. To deploy your container to Azure in Visual Studio Code, you also need to have Azure Tools installed. 3. Create your project folder Click "Terminal" in menu, and click "New Terminal": Then you should see a terminal for your WSL. I use a quick simple Flask application here for example, so I run below command to clone its git project: git clone https://github.com/Azure-Samples/msdocs-python-flask-webapp-quickstart 4. Python Environment setup (optional) After you install Python 3 and create project folder. It is recommended to create your own project python environment. It makes your runtime and modules easy to be managed. To setup your Python Environment in your project, you need to run below commands in the terminal: cd msdocs-python-flask-webapp-quickstart python3 -m venv .venv Then after you open the folder, you will be able to see some folders are created in your project: Then if you open the app.py file, you can see it used the newly created python environment as your python environment: If you open a new terminal, you also find the prompt shows that you are now in new python environment as well: Then run below command to install the modules required in the requirement.txt: pip install -r requirements.txt 5. Generate a Dockerfile for your application To create a docker image, you need to have a Dockerfile for your application. You can use Docker extension to create the Dockerfile for you automatically. To do this, enter ctrl+shift+P and search "Dockerfile" in your Visual Studio Code. Then select “Docker: Add Docker Files to Workspace” You will be required to select your programming languages and framework(It also supports other language such as node.js, java, node). I select “Python Flask”. Firstly, you will be asked to select the entry point file. I select app.py for my project. Secondly, you will be asked the port your application listens on. I select 80. Finally, you will be asked if Docker Compose file is included. I select no as it is not multi-container. A Dockefile like below is generated: Note: If you do not have requirements.txt file in the project, the Docker extension will create one for you. However, it DOES NOT contain all the modules you installed for this project. Therefore, it is recommended to have the requirements.txt file before you create the Dockerfile. You can run below command in the terminal to create the requirements.txt file: pip freeze > requirements.txt After the file is generated, please add “gunicorn” in the requirements.txt if there is no "gunicorn" as the Dockerfile use it to launch your application for Flask application. Please review the Dockerfile it generated and see if there is anything need to modify. You will also find there is a .dockerignore file is generated too. It contains the file and the folder to be excluded from the image. Please also check it too see if it meets your requirement. 6. Build the Docker Image You can use the Docker command line to build image. However, you can also right-click anywhere in the Dockefile and select build image to build the image: Please make sure that you have Docker Desktop running in your Windows. Then you should be able to see the docker image with the name of the project and tag as "latest" in the Docker extension. 7. Push the Image to Azure Container Registry Click "Run" for the Docker image you created and check if it works as you expected. Then, you can push it to the Azure Container Registry (ACR). Click "Push" and select "Azure". You may need to create a new registry if there isn't one. Answer the questions that Visual Studio Code asks you, such as subscription and ACR name, and then push the image to the ACR. 8. Deploy the image to Azure Resources Follow the instructions in the following documents to deploy the image to the corresponding Azure resource: Azure App Service or Azure Container App: Deploy a containerized app to Azure (visualstudio.com) Opens in new window or tab Container Instance: Deploy container image from Azure Container Registry using a service principal - Azure Container Instances | Microsoft Learn Opens in new window or tab6.9KViews4likes1CommentIntroducing Azure Container Apps Sandboxes: Secure Infrastructure for Agentic Workloads
Today we are announcing the public preview of Azure Container Apps Sandboxes - a new first-class resource type that gives you fast, secure, ephemeral compute environments with built-in suspend and resume. This is the underlying infrastructure on which products like Cloud sandboxes in GitHub Copilot, Foundry Hosted Agents, and Azure Container Apps Express are built, you now have the opportunity to build your solutions leveraging this infrastructure. Azure Container Apps Sandboxes unlocks two massive opportunities. For platform developers and ISVs, sandboxes give you the same isolated compute fabric that powers many Microsoft products. You get the building blocks to create your own multi-tenant platform on proven, enterprise-scale infrastructure. For AI agents, sandboxes become a self-configurable tool that lets agents extend their own capabilities on the fly. An agent can spin up a fresh sandbox in milliseconds and use it to execute untrusted code, compile source, test HTTP requests against a live app, launch a browser session, or tackle whatever needs a quick and scalable infrastructure. On one side it empowers humans to build platforms, on the other it empowers agents to build their own capabilities. Both get enterprise-grade isolation, instant startup, and snapshot-based persistence out of the box. We'll walk through the resource model, sandbox lifecycle, the features that set Sandboxes apart - like snapshots, lifecycle policies, network egress controls, volumes, and managed identities - and show you how to get started with the portal and CLI. What Are Container Apps Sandboxes? Container Apps Sandboxes are secure, isolated compute environments that start in sub-second time, scale to thousands, and cost nothing when idle. Each sandbox runs in its own hardware-isolated microVM boundary - fully separated from the host, the platform, and every other sandbox. You bring your own Open Container Initiative (OCI) image, and Sandboxes handle the rest: provisioning from prewarmed pools, strong multi-tenant isolation, and snapshot-based suspend/resume that preserves full memory and disk state across sessions. There are many ways Sandboxes can help you build your next project - here are a few: Your own build & test systems - wire a Sandbox into your CI/CD flow to run builds while your laptop stays cool. Agents that can run anything safely - an agent spawns a sandbox, executes work inside it, and returns the output with no agent host privileges required. Agent swarms - decompose a research question, spawn N sandbox workers in parallel (each pinned to its own image and egress policy), and synthesize the result. Early access customers are already unlocking significant benefits by leveraging Azure Container Apps Sandboxes. "With Azure Container Apps sandboxes, SitecoreAI can safely enable agents to take real action. The combination of multi-tenant isolation, rapid scale-out, and full automation allows Sitecore to run long-lived, autonomous agents that securely execute code, manage workflows, and interact with enterprise systems within secure, governed environments. With this foundation, we can build agents that do real work: assembling content, personalizing experiences, and optimizing campaigns in production. Agents that operate continuously, learn from results, and improve over time, so our customers get better outcomes without giving up control." - Mo Cherif, VP of AI and Innovation, Sitecore "We got early access to Azure Container Apps Sandboxes, and got the first prototype integrated with Atlas AI in hours, and it's already shaping a new Atlas AI capability that we plan to launch in preview in Q3. It gives every Atlas AI agent a safe, sandboxed workspace (file system, terminal, code execution) on a customer's live data in Cognite Data Fusion. The value: Industrial process, reliability, and production engineers spend days and weeks on questions like "which wells are underperforming and why?" These questions are tractable but expensive, so they are asked rarely and decisions are made on gut feel. With this, an agent pulls the data, runs the analysis, cross-references maintenance and inspection records, and returns a cited draft in minutes. Sandboxes make it practical: Aligned feature set, per-customer isolation, pause/resume across multi-day investigations, scale-to-zero economics." - Kelvin Sundli, Product manager, Atlas AI, Cognite Resource Model: Sandbox Groups and Sandboxes The top-level ARM resource is Microsoft.App/SandboxGroups. A Sandbox Group is the management boundary for a collection of sandboxes that share configuration - think of it like a Container Apps Environment, but purpose-built for sandboxes. When you create a Sandbox Group, you specify: Subscription, Resource Group, and Region Sandbox defaults (optional): default CPU, memory, disk, max sandbox count, and default idle timeout Networking: optionally deploy into a custom VNet with a dedicated subnet for private networking Identity: System or user assigned Entra identity. Individual sandboxes are created within a Sandbox Group. Each sandbox has its own source (disk image or snapshot), resource tier, lifecycle policy, network egress policy, environment variables, ports, volumes, and connections. Sandbox Lifecycle Sandboxes have a well-defined lifecycle with the following states: State Description Creating Provisioning the sandbox from a disk image or snapshot Running Actively executing - backed by a live microVM Idle System-suspended after inactivity; can auto-resume on the next request Suspended Full state (memory + disk) preserved as a snapshot; no compute costs Resuming Restoring from a suspended or idle state - sub-second for most workloads Stopped User-initiated stop; can be resumed Stopping Graceful shutdown in progress Deleting Teardown in progress The key insight here is the distinction between Idle and Suspended. When a sandbox goes idle (e.g., no traffic for a configured timeout), the system can automatically suspend it and capture a snapshot. When a new request arrives, the sandbox resumes transparently. This gives you scale-to-zero economics with stateful compute - something that wasn't possible before without significant custom engineering. Disk Images: Bring Your Own Container Sandboxes boot from Disk Images - Open Container Initiative (OCI) images converted into an optimized root filesystem format. You point to any OCI image (public or private registry), and the platform builds a bootable disk image from it. You can start with public, pre-built images maintained by the platform (for example, Ubuntu base images), or bring your own private images. For private registries, you can authenticate with username/token or use a user-assigned managed identity for Azure Container Registry (ACR) – integrated with Azure as you expect. Snapshots: Full-State Persistence Snapshots capture the complete state of a running sandbox - memory, disk, and all running processes. When you resume a sandbox from a snapshot, every process, open file handle, and in-memory data structure is restored exactly as it was. A snapshot captures the full state of a running sandbox: memory pages, disk, processes. Two ways to make one - automatically on suspend, or manually on demand. Three things they're great for: Checkpointing mid-task so a long-running agent can resume exactly where it left off Cloning an environment that's already warm - dependencies installed, caches populated, services running Shipping a "ready-to-go" state that resumes in sub-second instead of cold-booting Snapshots are free during the preview, after which they will be stored as Azure Blob Storage at standard rates. Each snapshot records the source sandbox, resource allocation (CPU, memory, disk), and container metadata - so what you get back is exactly what you snapshotted. Resource Tiers Every sandbox is assigned to a resource tier that determines its CPU, memory, and disk allocation: Tier CPU Memory Disk XS 0.25 vCPU 0.5 GB 5 GB S 0.5 vCPU 1 GB 10 GB M (default) 1vCPU 2 GB 20 GB L 2 vCPU 4 GB 40 GB XL 4 vCPU 8 GB 80 GB When creating a sandbox from a snapshot, the resource tier is inherited from the snapshot and cannot be changed - this ensures the restored environment has the exact resources it was running with when the snapshot was taken. Lifecycle Policies: Auto-Suspend and Auto-Delete Every sandbox can be configured with lifecycle policies that automate state transitions and cleanup: Auto-Suspend Idle timeout: How long a sandbox can sit idle before being suspended (configurable: 1m, 2m, 5m, 10m, 30m, 60m) Suspend mode: Disk + Memory (default): Full snapshot including memory state - resume picks up exactly where you left off, with all processes and in-memory data intact. Disk: Only the disk is preserved; the VM restarts fresh on resume. Useful when you only need file persistence, not process continuity. Auto-Delete Automatically delete sandboxes after a configurable number of days of inactivity Prevents accumulation of abandoned sandboxes that consume snapshot storage These lifecycle policies are what make Sandboxes economically viable at scale. A platform serving thousands of tenants can configure aggressive idle timeouts (say, 60 seconds) with Memory suspend mode, and each tenant's sandbox disappears from the billing meter almost immediately - but resumes in sub-second time the moment they return. Network Egress Policy For scenarios involving untrusted code - AI agents executing LLM-generated scripts, multi-tenant SaaS with user-submitted workloads - controlling outbound network access is critical. Sandboxes provide a per-sandbox Network Egress Policy: Default action: Allow or Deny all outbound traffic Host rules: Domain-pattern rules (e.g., *.github.com → Allow) to permit specific destinations Custom CIDR rules: Network-level rules for IP ranges (e.g., 10.0.0.0/8 → Deny) Skip egress proxy: Option to bypass the egress proxy entirely when custom VNet routing handles policy enforcement This means you can run a sandbox in a deny-by-default posture and allowlist only the specific endpoints it needs (your API server, a package registry, etc.) - without setting up NSGs or firewall appliances. Managed Volumes: Persistent and Shared Storage Sandboxes support two types of mountable volumes, both managed by Microsoft: Volume Type Backed By Best For Managed Azure Blob Azure Blob Storage Shared data across sandboxes, file uploads/downloads, persistent artifacts Managed Data Disk Azure Disk Storage High-performance storage for databases, build caches, large working sets - only available to one sandbox at a time Blob volumes come with a built-in file explorer in the portal - you can browse, upload, download, create folders, and drag-and-drop files directly. Data Disk volumes provide dedicated block storage with configurable sizes. Secrets and Identity Secrets Sandbox Groups support key-value secrets scoped to the group. Secrets can be created, edited, and referenced by sandboxes within the group. These secrets can be used in egress policies to modify requests with transform or header-injection rules, without exposing the secrets to code running inside the sandbox. Managed Identity Sandbox Groups support both system-assigned and user-assigned managed identities, with full RBAC role assignment management. This means your sandboxes can authenticate to Azure services (Key Vault, Storage, Cosmos DB, etc.) without managing credentials - the same identity model you use everywhere else in Azure. MCP Connectors and Triggers ACA Sandboxes now supports managed connectors through the Model Context Protocol (MCP), giving sandboxes access to external APIs - including Microsoft 365, Salesforce, ServiceNow, GitHub, and 1,400+ other systems - without managing credentials directly. Attach a Connector Gateway to your sandbox group, and every sandbox in the group can call external APIs through a standardized MCP interface at runtime. Pair connectors with triggers to build event-driven automation: route an Outlook email to a sandbox that triages it with an AI agent, or react to a SharePoint file upload by extracting and processing the document all without writing glue code. Triggers can fire a shell command inside a sandbox or invoke an HTTP endpoint the sandbox exposes, so your automation shapes fit naturally around your workload. The integration is built on the new Connector Namespace service (az connector-namespace), the same runtime behind Logic Apps and Power Platform connectors, now available as a programmable layer for sandboxes. See the end-to-end samples for runnable azd up-deployable examples covering email triage and document automation scenarios. The Portal Experience Azure Container Apps Sandboxes are only available in the new Azure Container Apps portal that provides a rich, IDE-like experience for working with sandboxes. Creating a Sandbox The portal offers multiple creation paths: Standard Sandbox - full configuration control over source, resources, lifecycle, networking, and volumes GitHub Copilot Sandbox - preset, Copilot CLI ready to go, GitHub credentials can be wired through the Access Token before the sandbox is created Claude Sandbox - Claude CLI pre-installed, ready for agentic coding inside the sandbox Using Coding Agents (Copilot CLI / Claude Code) If you live inside Copilot CLI or Claude Code, you don't need to learn a new CLI. Install the azure-sandbox skill once and your agent picks up the right skills: # GitHub Copilot CLI # Add as a plugin marketplace /plugin marketplace add microsoft/azure-container-apps # Install all skills /plugin install sandboxes@Azure-Container-Apps # Claude Code claude plugin add microsoft/azure-container-apps The skill runs prerequisite checks silently (az --version, az account show, node --version, aca --version), prompts only if something's missing, and maps natural-language asks to the right aca commands. Bundled runbooks cover Copilot CLI BYOK (bring your own Azure OpenAI key), the deploy-a-web-app walkthrough, and shell setup. Sandbox Detail Page Once your sandbox is running, the detail page gives you immediate access to the sandbox terminal and additional details, such as - Network Audit - real-time egress traffic log showing allowed and denied requests Monitor - live CPU, memory, disk, and network utilization charts Connectors - attached connections with an "Add" action Volumes - mounted volumes with an "Add" action Log Stream - streaming container logs Processes - running process list inside the sandbox Files - file explorer to browse the sandbox filesystem The toolbar actions let you manage the state of the sandbox - Resume or Stop. In the Ellipsis menu (⁝) you can find additional settings to manage network Egress Policy and ingress (Add port), take a Snapshot of the sandbox, Commit (save disk state as a new disk image), set Lifecycle Policy or permanently Delete the sandbox. Finally, you can see additional Details in a side panel. Getting Started with the CLI and Python SDK All sandbox and sandbox-group operations go through the aca CLI. There are no az containerapp sandbox commands, - az is only used for az login, az account show, and resource-group management. Install (CLI) # Mac, Linux curl -fsSL https://aka.ms/aca-cli-install | sh # Windows irm https://aka.ms/aca-cli-install-ps | iex Run aca --help to get started. Install (Python SDK) pip install azure-containerapps-sandbox For more details, quick start and examples on ACA CLI and Python SDK, please go to https://sandboxes.azure.com Evolution from Dynamic Sessions If you've used Azure Container Apps Dynamic Sessions, Sandboxes are the next evolution of that capability. Everything Sessions can do, Sandboxes can do - and significantly more: Capability Dynamic Sessions Sandboxes Sub-second startup ✓ ✓ Strong isolation ✓ ✓ Custom container images ✓ ✓ Custom VNet integration ✓ (Partial) ✓ Suspend/resume with Memory and Disk snapshots - ✓ Lifecycle policies (auto-suspend, auto-delete) - ✓ Network egress policy (per-sandbox) - ✓ Persistent managed volumes (Blob, Data Disk) - ✓ Managed identity (system + user-assigned) - ✓ Secrets management - ✓ Configurable resource tiers - ✓ Direct access to sandbox in Portal experience - ✓ We will continue to support Dynamic Sessions, but all new investment goes into Sandboxes. If you're building new workloads on isolated ephemeral compute, start with Sandboxes. How It All Fits Together ACA Sandboxes is a platform primitive. It's the foundation on which multiple Microsoft products are already built - including ACA Express, Cloud sandboxes in GitHub Copilot, and Foundry Hosted Agents. When you build on Sandboxes, you're building on the same infrastructure that powers Microsoft's own portfolio. This is the evolution of what we shared with Project Legion in 2024. Legion described the internal infrastructure; Sandboxes exposes it as a customer-facing primitive that you can use directly. What's Next • Deeper Azure integrations - first-class connectivity with Azure networking, identity, storage, and AI services • Enhanced SDK and CLI - richer programmatic experiences for managing sandboxes at scale • More Microsoft services built on Sandboxes - this is just the beginning Get Started Today • Portal: https://sandboxes.azure.com/ • Documentation: Azure Container Apps Sandboxes • Pricing: Azure Container Apps Pricing (per-second vCPU/memory billing, scale-to-zero, snapshots at Blob Storage rates) We'd love to hear your feedback. You can ask questions, or file issues on the Azure Container Apps GitHub (prefix with [Sandbox] for Sandboxes-specific issues).4.5KViews3likes1CommentWhat's new in Azure App Service at #MSBuild 2026
At Microsoft Build 2026, Azure App Service introduced a powerful set of updates designed to help organizations accelerate their journey into AI, without increasing complexity or cost. These innovations focus on one clear business outcome: enabling teams to build, deploy, and scale AI-powered applications and agents faster, more securely, and with greater operational efficiency. A key highlight is the new Easy AI experience, which allows existing web apps to become AI-ready with no rearchitecting required. With capabilities like built-in Model Context Protocol (MCP), developers can instantly expose app functionality as agent-ready endpoints, enabling AI agents to interact with business logic securely and seamlessly. This dramatically reduces development time, allowing teams to move from idea to intelligent application in a fraction of the usual effort. Security and compliance are also strengthened with the general availability of Isolated v4 for Azure App Service Environments, delivering improved performance for customers that need single-tenant isolation and strong data residency guarantees. For enterprises operating in regulated industries, this ensures AI applications meet strict governance requirements without sacrificing scalability or speed. For modernization scenarios, Managed Instance on Azure App Service simplifies the migration of legacy applications, including those with OS-level dependencies. Faster restarts, enhanced diagnostics, and AI-assisted migration workflows help organizations modernize existing systems cost-effectively—avoiding expensive rewrites while unlocking AI capabilities. Recent updates include an AI-assisted approach to migrating legacy IIS applications using a multi-agent workflow powered by MCP. Managed Instance is supported on both Premium v4 and Isolated v4, laying the foundation for a modern compute infrastructure across the board. Operational efficiency is further enhanced through platform and CLI improvements designed for the “agent era.” From structured deployment diagnostics to optimized Python pipelines delivering faster deployments, these updates reduce friction and infrastructure overhead, lowering total cost of ownership. Together, these innovations position Azure App Service as a future-ready platform where businesses can rapidly build intelligent, agent-driven applications securely, efficiently, and at scale. 👉 Learn more in the full announcement: Deep dive into Azure App Service Build 2026 updates1.4KViews0likes0CommentsAzure Functions MCP Extension: What's New at Build 2026
The Azure Functions MCP extension has had a breakout year! Since its initial preview, the extension has grown from a single trigger type into a full-featured platform for building remote MCP servers: with tool, resource, and prompt triggers across multiple languages, MCP Apps for interactive UIs, built-in MCP authentication, and feature enhancements. Here's what's new and what it means for developers building MCP servers on Azure Functions. The full MCP primitive set: Tools, resources, and prompts When the MCP extension first shipped, it supported tool triggers. Declare a function as an MCP tool, and any MCP client can discover and call it. That was the starting point. Since then, we've shipped the remaining MCP primitives: Resource triggers: expose a function as an MCP resource. Prompt triggers: expose a function as an MCP prompt, letting clients request structured prompt templates from your server. Like tool triggers, resource and prompt triggers are supported in multiple languages including .NET, Java, Python, TypeScript, and JavaScript. MCP Apps: interactive UI from your MCP server MCP Apps let your tools return interactive user interfaces instead of plain text. Combine tool triggers with resource triggers, and your MCP server can serve rich, rendered experiences to MCP-aware clients. The Azure Functions MCP extension supports MCP Apps natively, meaning the same function app that exposes tools and resources can also serve UI components. The launch blog post on the Azure Apps Blog walked through the pattern in detail. For .NET developers, the new fluent builder API (available in the latest NuGet release) makes it easier to compose MCP Apps by chaining tool and resource definitions in a declarative style. MCP authentication The extension supports built-in MCP authentication, implementing the requirements of the MCP auth spec. All samples in the aka.ms/remote-mcp repo enable built-in MCP auth by default with Microsoft Entra ID as the identity provider. Samples have also been updated to demonstrate how to exchange tokens in the On-Behalf-Of (OBO) flow, so your MCP tools can access downstream APIs using the invoking user's identity. Auth configuration in the Azure portal: Preview at Build is a one-click experience in the Azure portal for configuring built-in MCP auth. No more manual app registration creating, configuration and wiring to the server. Just open your server app on the portal and click to enable MCP auth. Try it out! Feature enhancements Beyond the headline primitives and auth, the extension has shipped a steady stream of capabilities the past few months. The following are the notable additions. Structured content Structured content lets you return machine-readable JSON metadata alongside your tool's response via the `structuredContent` field. Clients that support it can programmatically consume the data (e.g. parse fields, render tables, drive downstream logic) rather than just displaying text. Clients that don't support it still get the regular content blocks as a fallback. Rich content types Tools aren't limited to returning plain text. The extension supports the full set of MCP content block types, e.g. `TextContent`, `ImageContent`, `AudioContent`, `ResourceLink`, and `EmbeddedResource`, so your tools can return images, audio clips, references to resources, and inline file content alongside text. Input and output schemas `WithInputSchema` and `WithOutputSchema` give you explicit control over the JSON schemas advertised for your tools. This is especially useful when the auto-generated schema from function parameters doesn't capture the full contract. For example, when your tool accepts a complex nested object or returns a specific shape that clients depend on. Input and output schemas are currently supported in .NET, with support for other languages coming soon. builder.ConfigureMcpTool("SearchDocs") .WithOutputSchema(""" { "type": "object", "properties": { "results": { "type": "array", "items": { "type": "string" } }, "query": { "type": "string" } }, "required": ["results", "query"] } """); Fluent configuration APIs in .NET A set of fluent builder APIs that let you configure MCP primitives declaratively in `Program.cs`: ConfigureMcpTool: add properties, metadata, input/output schemas, or promote a tool to an MCP App ConfigureMcpResource: attach metadata to resources ConfigureMcpPrompt: define prompt arguments and metadata builder.ConfigureMcpTool("sayhello") .WithProperty("name", McpToolPropertyType.String, "Name of the user", required: true) .WithMetadata("ui", new { resourceUri = "ui://index.html" }); What's next Usage of the MCP extension has grown steadily since its preview launch. Tool execution volume has increased 15x over the past several months as more customers move from experimentation to production. As adoption grows, so do the expectations. Developers building production MCP servers are hitting real friction around auth complexity, client configuration, and observability. We're continuing to invest in the extension to address these gaps and help customers be more successful building and hosting MCP servers on Azure Functions. Here's where we're focusing next. Continued auth simplification Auth remains the biggest barrier to getting an MCP server into production. We'll work on: Smoother client setup: making it easier to connect any MCP client to an authenticated Azure Functions MCP server, not just VS Code. Simplified OBO flow: streamlining the experience of On-Behalf-Of authentication so developers can delegate user identity to downstream services with less configuration. Our goal: the secure path should be the easy path. Deeper integration with Microsoft Foundry We'll build tighter integration between Azure Functions MCP servers and Microsoft Foundry. This includes surfacing MCP servers in Foundry Toolbox, a new feature introduced to help Foundry agents discover and consume tools from a single endpoint. Developers will be able to publish an MCP server from Functions and have it available to Foundry agents through Toolbox without manual endpoint configuration. Continued feature enhancement We prioritize based on feedback from the community raised in our GitHub repo. For example, support for streaming output and pagination are top items in our backlog today based on user demand. We also track the MCP spec's evolution closely and will continue shipping support for strategic features as they land. Examples of proposals we're following: MCP Tasks: the Tasks extension (SEP-2663) defines a standard pattern for async, long-running tool calls with durable task handles. This replaces hand-rolled polling patterns and aligns well with Functions' execute-and-return model. Stateless MCP: SEP-2575 proposes removing the mandatory initialization handshake, which is a natural fit for serverless platforms like Azure Functions where fresh instances can handle any request. Have something you'd like us to prioritize? Let us know by filing a request on GitHub. Get started Samples: Samples showcasing most up-to-date features: aka.ms/remote-mcp Documentation: Model Context Protocol for Azure Functions MCP Extension GitHub repo: Azure Functions MCP Extension497Views1like0CommentsAnyscale on Azure: Powering Enterprise AI at Massive Scale on Azure Kubernetes Service
Somewhere on your AI platform team, an engineer is on call this weekend — not for the model, not for the training run, but for the integration code holding five separate AI processing systems together. Data preparation on one. Training on a second. Evaluation on a third. Serving on a fourth. Observability bolted on across all of it. The glue between them has quietly evolved into a production system of its own, complete with its own failure modes and its own pager. This is what running AI at scale looks like for most enterprises in 2026. To process the full breadth of AI workloads, teams don’t have one platform, but a stack of multiple compute engines — stitched together and monitored around the clock. Training failures become increasingly costly as multi-node GPU clusters remain underutilized and difficult to operate. Inference costs climb in a straight line when they should be bending the other way. And the accelerators underneath, at six figures a year per node, sit at 30–40% utilization. None of this is a model problem. It is a systems problem, and it exposes a divide that is widening across the industry. The AI shift: Moving from API inference calls only to end-to-end AI Most enterprises start an AI journey by calling hosted model APIs. It’s the fastest way to experiment and ship. But as adoption grows, inference costs scale in a straight line while differentiation remains limited. The organizations pulling ahead are doing more than consuming models. They are customizing them with proprietary data, operating them at scale, and owning the infrastructure between their data and their models. Their unit economics improve as they scale. The dividing line isn’t budget. It isn’t ambition. It is a single architectural decision: whether the layer between your data and your models is something you rent in pieces or run as a single system. That unified system for end-to-end AI, almost without exception, is built on one runtime: Ray, an open-source framework widely adopted by AI-natives such as Cursor, Mistral and xAI to act as the engine that powers many of their workloads from multimodal data processing to reinforcement learning. Anyscale on Azure: Build and run end-to-end AI on your Azure subscription Anyscale on Azure brings the distributed compute runtime the AI industry has converged on — Ray— into your Azure tenant as an Azure Native service, that includes purpose-built developer tooling and unified pane for cluster management, built through deep engineering collaboration between Anyscale and Microsoft. Unlike other processing engines which either only support one hardware type (e.g. CPUs) or focus on a single workload (e.g. inference), Ray turns a heterogeneous cluster of CPUs and GPUs into a single Python runtime composing data preparation, distributed training, fine-tuning, reinforcement learning, high-throughput inference, and agentic execution as one program, not five interlocking systems. Anyscale created Ray and stewards the open-source Ray project, now governed by the PyTorch Foundation; the Anyscale Runtime is the production-grade layer that enterprises can utilize on critical paths from day one, bringing managed cluster operations, enterprise-grade support, and the operational reliability needed to run AI and data workloads at scale. On Azure, that runtime executes on your Azure Kubernetes Service (AKS) clusters, inside your subscription, and under Microsoft Entra ID workload identity. Your data, models, and weights never leave your cloud, and consumption is billed through Azure with drawdown against your existing Azure commitment (MACC). Sovereignty isn't a label bolted on after the fact. It is the architectural starting point: customer-owned data and models in the customer-owned tenant and governance boundary. The variable per-token economics of hosted APIs are replaced with compute you govern directly. Your proprietary data becomes a compounding advantage rather than a payload shipped to a third-party endpoint. A single runtime for the full AI lifecycle The cost profile of enterprise AI is largely architectural. Fragmented stacks — separate systems for prep, training, evaluation, and serving — produce a predictable set of failure modes such as Idle GPU time, Integration code and cross-system data movement. The result: production GPU utilization only in the 30–40% range, against accelerators that cost six figures per node per year. On the same fleet, Anyscale customers run those accelerators at 80%+ sustained utilization and report 40–60% lower GPU spend versus static, single-tenant clusters — driven by fractional GPU allocation (down to 0.2 of a device), bin-packing across complementary memory and compute profiles, gang scheduling for distributed training, priority-aware preemption that lets production inference take precedence over ad-hoc training, and spot integration with checkpoint-aware preemption so long-running jobs survive reclamation without lost work. Anyscale on Azure replaces this with a single Ray-powered runtime that spans the lifecycle as one distributed computation graph: Ray Data (distributed preparation) → Ray Train (fault-tolerant training) → Ray Tune (hyperparameter search) → Ray Serve (inference) — under one managed control plane. On top of open-source Ray, the Anyscale Runtime adds fault-tolerant training with checkpoint/restart, optimized scheduling, faster cluster bring-up, inference-aware autoscaling, and per-stage observability. Ray is the unifying layer that, rather than replacing, streamlines distributed processing of the framework stack the AI industry already uses: PyTorch, Hugging Face Transformers, FSDP, DeepSpeed, and Megatron for training, vLLM and SGLang for high-throughput inference with continuous batching, paged attention, and speculative decoding. Ray Train orchestrates the three parallelism patterns modern training requires — data parallel, model parallel, and hybrid 3D parallel (data + tensor + pipeline) — for trillion-parameter models, without requiring teams to write custom distributed code. The architectural payoff is direct: a single Python program defines a graph spanning CPU-heavy preparation and GPU-heavy training. The model produced by Ray Train is served by Ray Serve in the same cluster, against the same storage. The operational, identity, and observability surface is unified instead of fragmented. What enterprises deploy with Anyscale on Azure There are five workloads that power the development of modern AI systems, spanning data processing, training, inference, and simulation. But in most environments, each depends on separate engines, frameworks, and orchestration layers. The resulting fragmentation drives up infrastructure spend, latency, and engineering complexity. This makes a single Ray-based runtime under Anyscale’s managed control plane the operationally rational choice. Anyscale on Azure provides a complete platform to build and deploy AI applications using the same APIs as open-source Ray. While the data plane runs inside the customer’s AKS cluster, the managed control plane provides a unified interface for development, debugging, and cluster operations. AI in your trust boundary by design: the architecture Anyscale on Azure is an Azure Native product — discoverable via the Azure portal and provisioned through Azure Resource Manager with every resource tagged, scoped, and policy‑bound like any other in your subscription. Anyscale on Azure is a split-plane deployment: Control plane (managed by Anyscale) — scheduling, jobs, services, workspaces, and observability. Data plane (your Azure subscription) — Ray clusters run on your AKS, in your VNet, on your storage (Azure Blob / ADLS Gen2 via BlobFuse2). The trust boundary is what matters — more than any individual data plane feature — for regulated workloads (financial services, healthcare, public sector) and any enterprise where proprietary data is the differentiation. The execution model: Workloads run inside your AKS cluster — your subscription, your VNet. Model weights, training data, KV caches, checkpoints, and inference traffic never leave the boundary. Provisioning is ARM-native — resources tag, scope, and inherit Azure Policy like anything else in the subscription. Identity is Microsoft Entra ID end to end — workload identity issues pod credentials; RBAC governs access. No long-lived keys, no parallel secret store. Network controls are yours — Private Link, NSGs, Cilium-based Azure CNI policies, and customer-managed encryption keys via Key Vault. Audit is the Azure Activity Log — the same surface your compliance team already monitors. The Anyscale Operator is the only Anyscale-controlled component in your environment — it runs inside your AKS, communicates with the control plane via egress only, and accepts no inbound access from Anyscale. The result: code and data stay in your Azure subscription. Your existing compliance posture, audit surface, and data residency certifications carry forward — nothing new to attest. Billing rolls through the same Azure invoice with MACC drawdown — no second invoice, no parallel procurement. Production evidence Xoople planetary‑scale satellite imagery on Anyscale on Azure; multimodal AI turns spectral data into operational intelligence. "Anyscale lets our teams focus on models and outcomes rather than infrastructure, dramatically accelerating the path from experimentation to deployment," — Milos Colic, VP of Engineering, Xoople. Wayve trains the next generation of autonomous‑driving foundation models on Anyscale on Azure, running distributed ML and data pipelines across large CPU and GPU fleets. The operational driver is GPU‑capacity aggregation at a scale that no single region or cluster can deliver. Beyond Anyscale on Azure, the same Ray runtime is used in production at Cursor, Physical Intelligence, xAI, Coinbase, Bedrock Robotics, and Runway. Bedrock Robotics scaled compute 85x on Anyscale without linearly increasing costs. Currently with 12M+ weekly downloads (+400% YoY) and 42K+ GitHub stars and now openly governed under the PyTorch Foundation (Linux Foundation), Ray is becoming the de-factor open-source standard and is not a single-vendor runtime. Pricing Pricing is usage‑based and consolidates onto the same Azure invoice as the rest of the customer's subscription, including drawdown against existing Azure commitment (MACC): Azure infrastructure — standard Azure compute and GPU charges for the AKS substrate the workload runs on, scaling directly with actual usage. Anyscale service layer — pay‑as‑you‑go through Azure service meters with no upfront commitment, priced by CPU, memory, and GPU type. Where Anyscale on Azure fits Base-model intelligence is converging. Enterprises can buy access to the same frontier models, so the model itself is no longer the moat. What separates the enterprises pulling ahead is the layer underneath: how efficiently they run the full AI lifecycle at scale, how much compounding leverage they extract from their proprietary data, and whether they own the runtime that ties it all together. Anyscale on Azure is the Azure Native runtime layer for that posture — bringing the open-source distributed compute standard the AI industry has converged on into the same Azure governance, identity, and procurement model as the rest of the tenant. The shape of enterprise AI is settling. The teams pulling ahead are not the ones renting the most intelligence through APIs — they are the ones building and operating AI systems inside their own cloud, on their own data, under their own governance, and scaling those systems on the open distributed runtime the industry has already converged on. Anyscale on Azure is that runtime, delivered as an Azure Native product: Ray, productionized — the open‑source distributed compute standard for AI, hardened with the Anyscale Runtime, a managed control plane, and observability designed for foundation‑model‑scale workloads. One runtime, the full AI lifecycle — data preparation, training, fine‑tuning, reinforcement learning, inference, and agentic workloads in a single Python program, on a single substrate, with no cross‑system glue. Inside your Azure tenant, on the AKS you already run — customer‑owned data, customer‑owned models, customer‑owned governance. Entra identity, Azure RBAC, Private Link, Activity Log audit, and customer‑managed keys end to end. One Azure invoice — usage‑based pricing through the Marketplace with MACC drawdown; no parallel procurement, no second vendor contract. If your team is wrestling with GPU utilization, fragmented data‑to‑serving stacks, training jobs that exceed any single region's capacity, or hosted‑API costs that scale faster than your usage — this is the runtime built for that problem. Try it now Provision your first Anyscale Cloud by navigating to the Azure portal. Click on "Create" to begin creating the Anyscale cloud resource and link the necessary Azure resources. your Anyscale Cloud directly from Azure Portal. e. Explore the quickstart guides and documentation on Microsoft Learn to get started. For architectural deep‑dives, capacity planning, or a hands‑on workshop with the Anyscale on Azure solution architects, reach out through your Microsoft account team. Deepen your expertise and deep dive on best practices in the upcoming virtual webinar. Register here. The infrastructure for the next decade of enterprise AI is here. Build on it. Links and Resources Press Release: Anyscale Launches on Microsoft Azure as a Native Integration for Enterprises Announcing Anyscale on Azure public preview: Powered by Ray on AKS Youtube Video: Anyscale on Azure: Scale Python AI workloads with managed Ray on AKS Azure on Anyscale overview Architecture Create an Anyscale Cloud in Azure Portal Pricing Support model Terms and Conditions Frequently asked questions276Views0likes0CommentsAnnouncing Anyscale on Azure public preview: Powered by Ray on AKS
Today, I’m excited to announce the public preview of Anyscale on Azure, bringing Anyscale’s managed Ray platform and the Anyscale Runtime natively to Azure, all running on Azure Kubernetes Service (AKS). It is the fastest path I have seen from a single notebook to a multi-region distributed AI job, running on the AKS clusters your platform team already operates.594Views1like0CommentsAzure Functions MCP extension now supports MCP Prompts
We are thrilled to announce that the MCP prompt trigger is now available in public preview in the Azure Functions MCP extension! With this release, the extension now supports all three core MCP server primitives - tools, resources, and prompts, giving you a complete platform for building rich MCP servers on Azure Functions. In case you missed it, the MCP resource trigger is generally available for serving resources and building interactive UIs in MCP Apps. What are MCP Prompts In the Model Context Protocol (MCP), prompts are reusable templates that allow server authors to provide parameterized prompts for a domain, or showcase how to best use the MCP server. Prompts are user-controlled in that they require explicit invocation rather than automatic triggering, and can be context-aware, referencing available resources and tools to create comprehensive workflows. Unlike tools (which are model-controlled) and resources (which are application-controlled), prompts are exposed from servers to clients so users can explicitly select them. Applications typically expose prompts through slash commands, command palettes, dedicated UI buttons, or context menus. How It Works In Python, defining a prompt is as simple as decorating a function. Here's a prompt that returns a code review checklist: app.mcp_prompt_trigger( arg_name="context", prompt_name="code_review_checklist", description="Returns a structured code review checklist prompt for evaluating code changes." ) def code_review_checklist(context: func.PromptInvocationContext) -> str: logging.info("Code review checklist prompt invoked.") return """You are a senior software engineer performing a code review. Use the following checklist to evaluate the code: 1. **Correctness** — Does the code do what it's supposed to? 2. **Error Handling** — Are edge cases and failures handled? 3. **Security** — Are there any vulnerabilities (injection, auth, secrets)? 4. **Performance** — Are there obvious inefficiencies? 5. **Readability** — Is the code clear and well-named? 6. **Tests** — Are there adequate tests for the changes? Provide your feedback in a structured format with a severity level (critical, warning, suggestion) for each finding.""" Prompts can accept arguments, allowing clients to customize the generated message. Here's a prompt that generates documentation with configurable parameters: app.mcp_prompt_trigger( arg_name="context", prompt_name="generate_documentation", prompt_arguments=[ func.PromptArgument("function_name", "The name of the function to document.", required=False), func.PromptArgument("style", "Documentation style: 'concise', 'detailed', or 'tutorial'.", required=False) ], description="Generates API documentation for a function. Arguments are configured in Program.cs." ) def generate_documentation(context: func.PromptInvocationContext) -> str: function_name = context.arguments.get("function_name", "(unknown)") style = context.arguments.get("style", "concise") logging.info(f"Generate docs prompt invoked for function: {function_name}") return f"""Generate API documentation for the function named **{function_name}**. Documentation style: **{style}** Include the following sections: - **Description** — What the function does. - **Parameters** — List each parameter with its type and purpose. - **Return Value** — What the function returns. - **Example Usage** — A short code example showing how to call it.""" Checkout the Get Started section for the complete sample and samples in different languages. Why Azure Functions Azure Functions is the ideal platform for hosting remote MCP servers because of its built-in MCP authentication, event-driven scaling from 0 to N, and serverless billing. This ensures your agentic tools are secure, cost-effective, and ready to handle any load. With the MCP extension, you focus on implementing the primitives you want to expose, tools, resources, and prompts, instead of worrying about MCP protocol details and server logistics. Get Started You can start building today using our quickstarts and samples: Python TypeScript .NET Java Documentation Azure Functions MCP extension overview Prompt trigger We'd Love to Hear from You! Let us know your thoughts about the new prompt trigger. What kinds of prompts are you building for your MCP servers? What would you like us to prioritize next? Share your feedback in our GitHub repo.482Views0likes0CommentsYou Can Scale MCP Servers Behind a Load Balancer on App Service — Here's How
Most MCP servers in the wild are single-instance processes. That's fine when they're driving a local Claude or VS Code session — but it's the wrong shape for a production agent fleet that has to absorb traffic spikes, ride through deploys, and survive instance failures. The good news: the MCP spec already grew up. The 2025-06-18 revision formalizes stateless HTTP transport (and the current 2025-11-25 revision keeps it), which means a single request carries everything the server needs to answer. No long-lived connection, no in-process session table, no sticky-session hacks to keep a client glued to one box. That tiny protocol change unlocks something big: you can stick an MCP server behind App Service's built-in load balancer and scale it like any other web API. This post walks through how, with a runnable sample. Sample: seligj95/app-service-mcp-stateless-scale-python. One azd up and you have a stateless FastAPI MCP server running on three App Service instances behind the platform load balancer, with a staging slot, Application Insights, and a k6 script that visualizes load distribution from the client side. Why "stateless" is the whole story Earlier MCP transports leaned on persistent connections — SSE channels and WebSocket-style sessions where the server held per-client state in memory (open tools, subscriptions, partial streams). That model is great for a local IDE talking to a local process. It's hostile to load balancing, because routing a follow-up request to a different instance breaks the session. The stateless HTTP transport flips that. Each request is a complete JSON-RPC envelope ( initialize , tools/list , tools/call ), every response is self-contained, and the server is allowed to forget the client between requests. Any instance can serve any call. That is the property a load balancer needs. In the sample, every tool is a pure function of its arguments — whoami reports the serving instance, lookup_fact reads a static dictionary, compute_primes runs a sieve. None of them touches per-client memory. That's not a constraint of the protocol; it's a discipline you adopt to keep statelessness intact. Why App Service, and not Functions or AKS Functions and AKS are a couple of the many great options for MCP server hosting depending on what the MCP server is used for. The use case we are discussing here is a scaled MCP server, i.e. an MCP server that must reach a large and broad audience. Here are a few defaults that make App Service a solid option for this scenario: Always On. Reasoning tools call into LLMs and external APIs; latencies routinely sit in the multi-second range. Functions caps a single execution at ten minutes by default (and aggressively scales workers to zero between bursts, which kills warm caches). App Service keeps the process resident. Horizontal scale is one parameter. Pick a Premium SKU, set the plan's capacity to N, and you have N instances behind a managed load balancer. No VMSS to declare, no ingress controller to wire up, no Service to reconcile. Deployment slots. Swap a warmed-up staging slot into production for zero-downtime deploys. Critical when your "API" is an LLM tool surface that an agent is actively driving. Easy Auth. OAuth 2.1 in front of the MCP endpoint without writing the flow yourself — turn on the App Service authentication blade and point it at Entra ID. The sample leaves this off so the deploy is one command, but the wiring is a checkbox away. The TL;DR: it's PaaS that already knows how to run a stateful long-lived process at horizontal scale, which is exactly the shape of a scaled MCP server. The FastAPI MCP server, end-to-end stateless The whole transport is one POST handler. The full source is in main.py , but here are the load-bearing pieces: @app.post("/mcp") async def mcp_endpoint(request: Request): body = await request.json() method = body.get("method", "") msg_id = body.get("id") if method == "initialize": return {"jsonrpc": "2.0", "id": msg_id, "result": _server_info()} if method == "tools/list": return {"jsonrpc": "2.0", "id": msg_id, "result": {"tools": [...]}} if method == "tools/call": params = body.get("params", {}) result = await MCP_TOOLS[params["name"]]["function"](**params.get("arguments", {})) return { "jsonrpc": "2.0", "id": msg_id, "result": {"content": [{"type": "text", "text": json.dumps(result)}]}, } There is no session table. There is no client_id cookie. There is no AsyncIterator held open between requests. initialize , tools/list , and tools/call all return in a single round trip, which is the shape App Service's load balancer expects. The most useful debugging tool in the sample is whoami : async def tool_whoami() -> Dict[str, Any]: return { "instance_id": os.environ.get("WEBSITE_INSTANCE_ID", "local"), "hostname": socket.gethostname(), ... } WEBSITE_INSTANCE_ID is unique per App Service worker. Call whoami a few times from your MCP client and the value rotates — that's the load balancer working. If it doesn't rotate, something is pinning your traffic (almost always the ARR Affinity cookie; we'll get there). The Bicep that actually makes it scale The infra is a P0v3 plan with capacity: 3 , a web app with affinity disabled, and a staging slot on the same plan: resource appServicePlan 'Microsoft.Web/serverfarms@2024-04-01' = { name: name sku: { name: 'P0v3' capacity: instanceCount // 3 by default } properties: { reserved: true } } resource web 'Microsoft.Web/sites@2024-04-01' = { name: name properties: { serverFarmId: appServicePlanId httpsOnly: true clientAffinityEnabled: false // ← the one line that matters siteConfig: { linuxFxVersion: 'PYTHON|3.11' alwaysOn: true healthCheckPath: '/health' appCommandLine: 'python -m uvicorn main:app --host 0.0.0.0 --port 8000' } } } resource staging 'Microsoft.Web/sites/slots@2024-04-01' = { parent: web name: 'staging' properties: { /* same shape — separate hostname, same plan */ } } The single most important line in that template is clientAffinityEnabled: false . App Service defaults to on, which sets the ARRAffinity cookie and pins every subsequent request from a given client to the instance that handled the first one. That default exists because legacy ASP.NET apps used in-process session state. Stateless MCP does not. Leaving affinity on silently undoes everything we just built. Premium v3 (P0v3) is the floor for two reasons: it gives Always On and unlocks deployment slots. Below that tier you don't get either. Application Insights without writing telemetry code The sample drops one line of bootstrap into main.py : from azure.monitor.opentelemetry import configure_azure_monitor if os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING"): configure_azure_monitor(logger_name="mcp") The Azure Monitor OpenTelemetry distro auto-instruments FastAPI and outbound HTTP. Every request span App Service emits is tagged with cloud_RoleInstance , which Application Insights populates from WEBSITE_INSTANCE_ID . That makes the question "is traffic actually spreading across my instances?" a one-liner in Logs: requests | where timestamp > ago(15m) | where name contains "/mcp" | summarize count() by cloud_RoleInstance | order by count_ desc If you see three roughly-equal rows, you're done. If you see one row, your client is sending ARRAffinity cookies — turn affinity off and redeploy. Deploy azd auth login azd up That provisions the resource group, plan, web app, staging slot, Log Analytics workspace, and Application Insights resource, then deploys the Python app via Oryx. The output prints both WEB_URI and WEB_STAGING_URI . Open the production URI — the home page renders the instance ID that served it. Refresh. The ID changes. To swap the staging slot into production with no downtime: az webapp deployment slot swap \ --resource-group <rg> --name <app> \ --slot staging --target-slot production App Service warms the staging instances, redirects traffic, and the old production becomes the new staging — the classic blue-green pattern, but free. Prove it scales The sample ships a k6 script that hammers /mcp with tools/call requests and tags every response with the instance_id the server returned: BASE_URL=https://<your-app>.azurewebsites.net \ k6 run --summary-export=summary.json loadtest/k6-mcp.js jq '.metrics.mcp_instance_hits.values' summary.json The output groups hits per instance tag. On a three-instance plan with a 60-second steady load you should see something close to: { "count": 1842, "instance0d3e2f...": 614, "instance7a91bc...": 612, "instance19f0c4...": 616 } Roughly 33% on each box — the App Service load balancer round-robining new connections, with no help from the application. What I'd do next The sample is intentionally a starting point. Two extensions are the obvious next moves: Add Easy Auth. Turn on App Service authentication, pick Entra ID, require auth on /mcp . The token surfaces as headers; your tool handlers can use it to identify the calling agent without you owning any of the OAuth machinery. Autoscale on CPU. instanceCount: 3 is a starting point. Wire up Microsoft.Insights/autoscalesettings against the plan and let it scale 3 → 10 on the prime-counting tool. The architecture already supports it — that's the whole point of stateless. Try it Sample repo: github.com/seligj95/app-service-mcp-stateless-scale-python MCP spec: modelcontextprotocol.io/specification/2025-11-25 App Service docs: learn.microsoft.com/azure/app-service/overview If you ship something with it, I'd love to hear how it held up.250Views0likes0CommentsDebugging Python apps on App Service with the new SSH helper aliases
You shipped a Python app to App Service. It worked in the demo. It works locally. In production, /chat is returning 502s — but /health is green, the deployment succeeded, the logs are quiet, and your laptop can't reproduce it. What you actually need is a shell on the running container so you can poke at DNS, env vars, installed packages, the listening port, and the AI endpoint your app is calling. The platform has had SSH for a while, but the playbook of "open SSH, then remember which 14 commands to run" was tribal knowledge. We just shipped a set of SSH helper aliases that turn that tribal knowledge into one-word commands. apphelp shows you everything; appconfig , showpkgs , and appcurl cover the app side; ai-test , ai-diagnose , ai-curl , ai-latency , ai-dns , and ai-access-check cover the Azure AI Foundry side. This post is a hands-on tour. We built a deliberately fragile FastAPI sample with six different fault modes, deployed it, broke it, and SSH'd in to watch the aliases drive each one to root cause. Every transcript below is real output from the deployed sample. 📦 Sample repo: seligj95/app-service-ssh-diagnostics-python — azd up and you have a fault-injectable Python + Foundry app live in your subscription in about 4 minutes. The sample, in one breath FastAPI app, Python 3.14, App Service Linux on P0v3 — uses the new Oryx FastAPI auto-detection so no custom startup command is needed Calls Azure OpenAI (gpt-4o-mini) via managed identity — no keys POST /admin/fault toggles one of seven modes: off , bad-creds , wrong-endpoint , dns-fail , port-mismatch , dep-import-error , latency-spike GET / is a landing page with a built-in cheat sheet of the SSH aliases The endpoints are intentionally boring. The point is to give the aliases something realistic to chew on. A quick note on Azure OpenAI vs. AI Foundry. This sample provisions an Azure OpenAI account ( kind: OpenAI ). The new ai-* aliases speak the OpenAI chat-completions API ( /openai/deployments/<model>/chat/completions ), which is identical on Azure OpenAI and on Azure AI Foundry projects — both expose *.openai.azure.com endpoints, both accept managed-identity bearer tokens, both speak the same schema. The aliases work against either; the env-var name AZURE_AI_FOUNDRY_ENDPOINT is just the alias contract. Drop a Foundry endpoint into it and the same walkthrough applies. Shout-out to the new FastAPI auto-detect on Python 3.14. This sample also benefits from another recent App Service change: on Python 3.14+, App Service automatically detects FastAPI apps and starts them with gunicorn -k uvicorn_worker.UvicornWorker — no custom startup command needed. Our Bicep ships an empty appCommandLine and lets Oryx do the right thing. The whole sample is a nice tour of recent App Service Python improvements landing together. Step zero: apphelp After azd up finishes, the first thing to do over SSH is: az webapp ssh -g rg-ssh-diag-demo -n app-web-<token> Then inside the container: $ apphelp apphelp prints every alias the image ships with, grouped by category. You don't need to memorize anything — when you forget what checkport does, you run apphelp and it's right there. We'll lean on most of these: App info: showpkgs , appconfig , appenv Logs: applogs , deploylogs , logfiles Reachability: appcurl , checkport , gohome , gosrc AI/Foundry: ai-test , ai-dns , ai-access-check , ai-curl , ai-latency , ai-diagnose Network tools: install-nettools The healthy baseline Before breaking anything, run ai-diagnose . This is the one-shot "is my AI path healthy?" check, and it's the alias we reach for most: $ ai-diagnose ──────────────────────────────────────────────────────────────── AI Foundry Diagnostics ──────────────────────────────────────────────────────────────── [✓] Managed identity token [✓] DNS resolution (d8f9grasb7ewc7h8.ai-gateway.eastus2-01.azure-api.net. - public) [✓] Foundry connectivity (761ms) ──────────────────────────────────────────────────────────────── Three green checks tell you three different things: the managed identity is issuing tokens, the Foundry hostname resolves, and the endpoint responded in a reasonable time. If any of these are red, you already know which layer the fault is in. For more detail, the individual aliases are worth knowing: $ ai-test ✓ Connected | 1009ms | Model: gpt-4o-mini | Auth: Managed Identity $ ai-access-check ✓ Foundry endpoint: https://cog-ftirxupt2yjoe.openai.azure.com/ ✓ Model: gpt-4o-mini ✓ Using auth mode: Managed Identity ✓ Access check passed: authorized to call Foundry $ ai-latency Running 5 requests to gpt-4o-mini... Request 1: 679ms ✓ Request 2: 826ms ✓ Request 3: 758ms ✓ Request 4: 641ms ✓ Request 5: 664ms ✓ Results (5/5 successful): Avg: 713ms | Min: 641ms | Max: 826ms And the app side: $ checkport ✓ App is listening on port 8000 $ appcurl /health HTTP Status: 200 Time: 0.002417s Size: 5423 bytes That's our "everything is fine" reference. Now let's break things. One trick: applying a fault inside the SSH shell A subtle thing trips people up the first time. POST /admin/fault mutates the app process's environment — but your SSH shell is a separate process. It inherited the container's env when you opened the session, so ai-test will still see the healthy values. The sample handles this by also writing a small file to the persistent share: # app/faults.py def _write_env_file() -> None: """Write fault env to /home/site/diagnostics/fault.env so SSH can `source` it.""" diag = Path("/home/site/diagnostics") diag.mkdir(parents=True, exist_ok=True) snap = _snapshot_unlocked() lines = [f"# Active fault: {snap['mode']}", ""] for k, v in snap["env"].items(): lines.append(f"export {k}={shlex.quote(v) if v else "''"}") (diag / "fault.env").write_text("\n".join(lines) + "\n") After toggling a fault, run this once in your SSH session: source /home/site/diagnostics/fault.env Now the aliases see the same env the broken app sees. This pattern — flip a flag from outside, source the change inside — is worth stealing for your own debugging workflows. Group A: faults the AI aliases catch directly Some faults are in the path between App Service and Foundry — wrong endpoint, broken DNS, network. The ai-* aliases reproduce the failure end-to-end, and they tell you exactly which layer. Fault 1: wrong-endpoint — a typo in the AOAI endpoint The most common AI-side incident: someone fat-fingers an app setting. The endpoint resolves to something (it's still *.openai.azure.com ) but it's not your resource. curl -X POST $URL/admin/fault -H 'content-type: application/json' \ -d '{"mode":"wrong-endpoint"}' curl $URL/chat -H 'content-type: application/json' \ -d '{"prompt":"hi"}' # HTTP 502 # {"detail":"APIConnectionError: Connection error."} SSH in, source the fault env, run the AI aliases: $ source /home/site/diagnostics/fault.env $ ai-dns Resolving: this-resource-does-not-exist.openai.azure.com ✗ DNS resolution failed for this-resource-does-not-exist.openai.azure.com $ ai-curl Request: POST https://this-resource-does-not-exist.openai.azure.com//openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-02-01 Authorization: Bearer [hidden] Content-Type: application/json curl: (6) Could not resolve host: this-resource-does-not-exist.openai.azure.com $ ai-diagnose [✓] Managed identity token [✗] DNS resolution failed for this-resource-does-not-exist.openai.azure.com [✗] Foundry connectivity (HTTP 000) ai-diagnose collapses the whole story into three lines: token works, DNS fails, connectivity fails. The fault is unambiguously a bad endpoint — check appconfig and your Bicep parameters. Fault 2: dns-fail — NXDOMAIN A subtler variant of the same failure mode is when the endpoint is structurally wrong (private endpoint misconfigured, hosts file mishap, custom domain expired). ai-dns calls it out the same way: $ ai-dns Resolving: no-such-host.invalid.example ✗ DNS resolution failed for no-such-host.invalid.example If you need deeper diagnostics — say, you suspect a flaky resolver rather than the hostname itself — install-nettools gives you dig , nslookup , and friends without rebuilding the container. $ install-nettools $ dig openai.azure.com $ nslookup cog-ftirxupt2yjoe.openai.azure.com Group B: faults that pass ai-test but break your app Here's the most useful thing we learned building this sample: ai-test can be green while your app is on fire, and that's a signal, not a bug. The ai-* aliases call Foundry directly. If they're green and your app is red, the platform-to-Foundry path is fine — the divergence is in your app. Time to pivot to appenv , applogs , showpkgs . Fault 3: bad-creds — wrong AZURE_CLIENT_ID This one is the classic user-assigned managed identity mishap: you scoped your code to a user-assigned managed identity, but the GUID in AZURE_CLIENT_ID doesn't actually exist (or wasn't granted RBAC). curl -X POST $URL/admin/fault -d '{"mode":"bad-creds"}' curl $URL/chat -d '{"prompt":"hi"}' # HTTP 502 # {"detail":"ClientAuthenticationError: DefaultAzureCredential failed to retrieve a token..."} Now SSH in and try the AI aliases: $ source /home/site/diagnostics/fault.env $ ai-test ✓ Connected | 734ms | Model: gpt-4o-mini | Auth: Managed Identity $ ai-access-check ✓ Foundry endpoint: https://cog-ftirxupt2yjoe.openai.azure.com/ ✓ Using auth mode: Managed Identity ✓ Access check passed: authorized to call Foundry Both green. That looks like a contradiction, but it's not. The aliases authenticate using the system-assigned managed identity directly (via IMDS), and they pass. Your Python app uses DefaultAzureCredential , which honors AZURE_CLIENT_ID to pick a user-assigned identity — and that one is broken. The takeaway: when ai-test is green but /chat is red, the platform's identity is fine. Pivot to appenv to see exactly what env your app process sees, and check AZURE_CLIENT_ID : $ appenv | grep AZURE_CLIENT_ID AZURE_CLIENT_ID=00000000-0000-0000-0000-000000000000 There's the bug. The aliases didn't fail — they told you the fault isn't in the platform. That's diagnosis by elimination, and it's faster than guessing. Fault 4: dep-import-error — your code throws Same pattern. The app raises an ImportError on /chat , the AI aliases are green: curl -X POST $URL/admin/fault -d '{"mode":"dep-import-error"}' curl $URL/chat -d '{"prompt":"hi"}' # HTTP 500 # {"detail":"ImportError: No module named 'tiktoken'..."} This is where the app-side aliases earn their keep: $ showpkgs | head -20 ────────────────────────────────────────────────────── Virtual environment packages (antenv) ────────────────────────────────────────────────────── Package Version -------------------------------------- --------- annotated-types 0.7.0 anyio 4.13.0 azure-core 1.41.0 azure-identity 1.19.0 azure-monitor-opentelemetry 1.8.8 ... No tiktoken in that list. Confirmation in one command — no need to remember pip list or where the virtualenv lives. deploylogs then tells you what the last deployment actually built: $ deploylogs 10 Latest deployment: b8a64ed4-b6b7-4419-91eb-6d8e4e7ef323 Log file: /home/site/deployments/b8a64ed4-b6b7-4419-91eb-6d8e4e7ef323/log.log 2026-05-18T19:10:52.3844297Z,Parsing the build logs,abc3cf97-... 2026-05-18T19:10:52.5414396Z,Found 0 issue(s),7d11d013-... 2026-05-18T19:10:52.7913394Z,Build Summary :,... 2026-05-18T19:10:53.5643089Z,Deployment successful. deployer = Push-Deployer ... Build was clean. The package just isn't in requirements.txt . Two aliases, one minute, root cause. Fault 5: port-mismatch — uvicorn binds the wrong port A real-world bug: someone sets WEBSITES_PORT=9999 in app settings to expose a different port, but the app still binds to 8000. curl -X POST $URL/admin/fault -d '{"mode":"port-mismatch"}' The aliases tell you exactly which port everything sees: $ checkport Checking if app is listening on port 8000... ✓ App is listening on port 8000 $ appcurl /health Testing app at localhost:8000 ... HTTP Status: 200 Time: 0.002417s $ appconfig PORT Value: 8000 Note: The port your Python app should listen on. Default is 8000. The app is healthy from inside the container. The mismatch is between what the platform tries to forward to and what uvicorn is bound to. This is the kind of fault where curling the public URL fails but appcurl /health succeeds — and the contrast is itself the diagnosis. Fault 6: latency-spike — the alias bench is fast, your app is slow The app injects 4 seconds of asyncio.sleep before each Foundry call. /chat is now ~4.5 seconds. ai-latency : $ ai-latency Running 5 requests to gpt-4o-mini... Request 1: 715ms ✓ Request 2: 588ms ✓ Request 3: 578ms ✓ Request 4: 669ms ✓ Request 5: 643ms ✓ Results (5/5 successful): Avg: 638ms | Min: 578ms | Max: 715ms Foundry, from this instance, averages 638ms. If your app is taking 5 seconds end-to-end and ai-latency says the model is sub-second, the slowness is in your code — not in Foundry, not in the network. Time to look at App Insights end-to-end transactions, or at any pre-call work (retrieval, vector lookup, your own sleep). What this changes about the debugging workflow Before these aliases, the SSH playbook for a Python AI app went something like: open SSH, dig around /home/site/wwwroot/antenv , grep applicationHost.config for ports, write a curl by hand against the AOAI endpoint with a manually-fetched managed identity token, hope you got the API version right. Now it's ai-diagnose . If that's red, you know exactly which layer. If it's green, you know the fault is in your code or your settings, and appenv , appconfig , showpkgs , applogs walk you the rest of the way. Three patterns we'd lean on going forward: Start with apphelp and ai-diagnose every time. Don't try to remember the right command — let the aliases tell you. Treat ai-test being green as a signal, not a finish line. If /chat is red and ai-test is green, the platform path is fine; pivot to app-side aliases. Use source /home/site/diagnostics/fault.env as a pattern. Any time you want your SSH shell to see what the app process sees, write env to a file and source it. It's a small thing that removes a huge class of "but it worked when I tested it" confusions. We want feedback The aliases are GA today on Python images and we have ideas for where they go next — Node, .NET, more ai-* checks (Foundry agents, vector indexes), tighter integration with azd diagnose . If you have a Python app on App Service and you want a specific alias added, tell us by dropping a comment on this post. Try the sample git clone https://github.com/seligj95/app-service-ssh-diagnostics-python cd app-service-ssh-diagnostics-python azd auth login azd up Four minutes later you'll have the whole thing live. Then curl -X POST $URL/admin/fault -d '{"mode":"<pick one>"}' , SSH in, and walk through any of the six faults above. The README has the full alias-to-fault map.151Views0likes0Comments