web apps
337 TopicsIntroducing Azure Container Apps Sandboxes: Secure Infrastructure for Agentic Workloads
Today we are announcing the public preview of Azure Container Apps Sandboxes - a new first-class resource type that gives you fast, secure, ephemeral compute environments with built-in suspend and resume. This is the underlying infrastructure on which products like Cloud sandboxes in GitHub Copilot, Foundry Hosted Agents, and Azure Container Apps Express are built, you now have the opportunity to build your solutions leveraging this infrastructure. Azure Container Apps Sandboxes unlocks two massive opportunities. For platform developers and ISVs, sandboxes give you the same isolated compute fabric that powers many Microsoft products. You get the building blocks to create your own multi-tenant platform on proven, enterprise-scale infrastructure. For AI agents, sandboxes become a self-configurable tool that lets agents extend their own capabilities on the fly. An agent can spin up a fresh sandbox in milliseconds and use it to execute untrusted code, compile source, test HTTP requests against a live app, launch a browser session, or tackle whatever needs a quick and scalable infrastructure. On one side it empowers humans to build platforms, on the other it empowers agents to build their own capabilities. Both get enterprise-grade isolation, instant startup, and snapshot-based persistence out of the box. We'll walk through the resource model, sandbox lifecycle, the features that set Sandboxes apart - like snapshots, lifecycle policies, network egress controls, volumes, and managed identities - and show you how to get started with the portal and CLI. What Are Container Apps Sandboxes? Container Apps Sandboxes are secure, isolated compute environments that start in sub-second time, scale to thousands, and cost nothing when idle. Each sandbox runs in its own hardware-isolated microVM boundary - fully separated from the host, the platform, and every other sandbox. You bring your own Open Container Initiative (OCI) image, and Sandboxes handle the rest: provisioning from prewarmed pools, strong multi-tenant isolation, and snapshot-based suspend/resume that preserves full memory and disk state across sessions. There are many ways Sandboxes can help you build your next project - here are a few: Your own build & test systems - wire a Sandbox into your CI/CD flow to run builds while your laptop stays cool. Agents that can run anything safely - an agent spawns a sandbox, executes work inside it, and returns the output with no agent host privileges required. Agent swarms - decompose a research question, spawn N sandbox workers in parallel (each pinned to its own image and egress policy), and synthesize the result. Early access customers are already unlocking significant benefits by leveraging Azure Container Apps Sandboxes. "With Azure Container Apps sandboxes, SitecoreAI can safely enable agents to take real action. The combination of multi-tenant isolation, rapid scale-out, and full automation allows Sitecore to run long-lived, autonomous agents that securely execute code, manage workflows, and interact with enterprise systems within secure, governed environments. With this foundation, we can build agents that do real work: assembling content, personalizing experiences, and optimizing campaigns in production. Agents that operate continuously, learn from results, and improve over time, so our customers get better outcomes without giving up control." - Mo Cherif, VP of AI and Innovation, Sitecore "We got early access to Azure Container Apps Sandboxes, and got the first prototype integrated with Atlas AI in hours, and it's already shaping a new Atlas AI capability that we plan to launch in preview in Q3. It gives every Atlas AI agent a safe, sandboxed workspace (file system, terminal, code execution) on a customer's live data in Cognite Data Fusion. The value: Industrial process, reliability, and production engineers spend days and weeks on questions like "which wells are underperforming and why?" These questions are tractable but expensive, so they are asked rarely and decisions are made on gut feel. With this, an agent pulls the data, runs the analysis, cross-references maintenance and inspection records, and returns a cited draft in minutes. Sandboxes make it practical: Aligned feature set, per-customer isolation, pause/resume across multi-day investigations, scale-to-zero economics." - Kelvin Sundli, Product manager, Atlas AI, Cognite Resource Model: Sandbox Groups and Sandboxes The top-level ARM resource is Microsoft.App/SandboxGroups. A Sandbox Group is the management boundary for a collection of sandboxes that share configuration - think of it like a Container Apps Environment, but purpose-built for sandboxes. When you create a Sandbox Group, you specify: Subscription, Resource Group, and Region Sandbox defaults (optional): default CPU, memory, disk, max sandbox count, and default idle timeout Networking: optionally deploy into a custom VNet with a dedicated subnet for private networking Identity: System or user assigned Entra identity. Individual sandboxes are created within a Sandbox Group. Each sandbox has its own source (disk image or snapshot), resource tier, lifecycle policy, network egress policy, environment variables, ports, volumes, and connections. Sandbox Lifecycle Sandboxes have a well-defined lifecycle with the following states: State Description Creating Provisioning the sandbox from a disk image or snapshot Running Actively executing - backed by a live microVM Idle System-suspended after inactivity; can auto-resume on the next request Suspended Full state (memory + disk) preserved as a snapshot; no compute costs Resuming Restoring from a suspended or idle state - sub-second for most workloads Stopped User-initiated stop; can be resumed Stopping Graceful shutdown in progress Deleting Teardown in progress The key insight here is the distinction between Idle and Suspended. When a sandbox goes idle (e.g., no traffic for a configured timeout), the system can automatically suspend it and capture a snapshot. When a new request arrives, the sandbox resumes transparently. This gives you scale-to-zero economics with stateful compute - something that wasn't possible before without significant custom engineering. Disk Images: Bring Your Own Container Sandboxes boot from Disk Images - Open Container Initiative (OCI) images converted into an optimized root filesystem format. You point to any OCI image (public or private registry), and the platform builds a bootable disk image from it. You can start with public, pre-built images maintained by the platform (for example, Ubuntu base images), or bring your own private images. For private registries, you can authenticate with username/token or use a user-assigned managed identity for Azure Container Registry (ACR) – integrated with Azure as you expect. Snapshots: Full-State Persistence Snapshots capture the complete state of a running sandbox - memory, disk, and all running processes. When you resume a sandbox from a snapshot, every process, open file handle, and in-memory data structure is restored exactly as it was. A snapshot captures the full state of a running sandbox: memory pages, disk, processes. Two ways to make one - automatically on suspend, or manually on demand. Three things they're great for: Checkpointing mid-task so a long-running agent can resume exactly where it left off Cloning an environment that's already warm - dependencies installed, caches populated, services running Shipping a "ready-to-go" state that resumes in sub-second instead of cold-booting Snapshots are free during the preview, after which they will be stored as Azure Blob Storage at standard rates. Each snapshot records the source sandbox, resource allocation (CPU, memory, disk), and container metadata - so what you get back is exactly what you snapshotted. Resource Tiers Every sandbox is assigned to a resource tier that determines its CPU, memory, and disk allocation: Tier CPU Memory Disk XS 0.25 vCPU 0.5 GB 5 GB S 0.5 vCPU 1 GB 10 GB M (default) 1vCPU 2 GB 20 GB L 2 vCPU 4 GB 40 GB XL 4 vCPU 8 GB 80 GB When creating a sandbox from a snapshot, the resource tier is inherited from the snapshot and cannot be changed - this ensures the restored environment has the exact resources it was running with when the snapshot was taken. Lifecycle Policies: Auto-Suspend and Auto-Delete Every sandbox can be configured with lifecycle policies that automate state transitions and cleanup: Auto-Suspend Idle timeout: How long a sandbox can sit idle before being suspended (configurable: 1m, 2m, 5m, 10m, 30m, 60m) Suspend mode: Disk + Memory (default): Full snapshot including memory state - resume picks up exactly where you left off, with all processes and in-memory data intact. Disk: Only the disk is preserved; the VM restarts fresh on resume. Useful when you only need file persistence, not process continuity. Auto-Delete Automatically delete sandboxes after a configurable number of days of inactivity Prevents accumulation of abandoned sandboxes that consume snapshot storage These lifecycle policies are what make Sandboxes economically viable at scale. A platform serving thousands of tenants can configure aggressive idle timeouts (say, 60 seconds) with Memory suspend mode, and each tenant's sandbox disappears from the billing meter almost immediately - but resumes in sub-second time the moment they return. Network Egress Policy For scenarios involving untrusted code - AI agents executing LLM-generated scripts, multi-tenant SaaS with user-submitted workloads - controlling outbound network access is critical. Sandboxes provide a per-sandbox Network Egress Policy: Default action: Allow or Deny all outbound traffic Host rules: Domain-pattern rules (e.g., *.github.com → Allow) to permit specific destinations Custom CIDR rules: Network-level rules for IP ranges (e.g., 10.0.0.0/8 → Deny) Skip egress proxy: Option to bypass the egress proxy entirely when custom VNet routing handles policy enforcement This means you can run a sandbox in a deny-by-default posture and allowlist only the specific endpoints it needs (your API server, a package registry, etc.) - without setting up NSGs or firewall appliances. Managed Volumes: Persistent and Shared Storage Sandboxes support two types of mountable volumes, both managed by Microsoft: Volume Type Backed By Best For Managed Azure Blob Azure Blob Storage Shared data across sandboxes, file uploads/downloads, persistent artifacts Managed Data Disk Azure Disk Storage High-performance storage for databases, build caches, large working sets - only available to one sandbox at a time Blob volumes come with a built-in file explorer in the portal - you can browse, upload, download, create folders, and drag-and-drop files directly. Data Disk volumes provide dedicated block storage with configurable sizes. Secrets and Identity Secrets Sandbox Groups support key-value secrets scoped to the group. Secrets can be created, edited, and referenced by sandboxes within the group. These secrets can be used in egress policies to modify requests with transform or header-injection rules, without exposing the secrets to code running inside the sandbox. Managed Identity Sandbox Groups support both system-assigned and user-assigned managed identities, with full RBAC role assignment management. This means your sandboxes can authenticate to Azure services (Key Vault, Storage, Cosmos DB, etc.) without managing credentials - the same identity model you use everywhere else in Azure. MCP Connectors and Triggers ACA Sandboxes now supports managed connectors through the Model Context Protocol (MCP), giving sandboxes access to external APIs - including Microsoft 365, Salesforce, ServiceNow, GitHub, and 1,400+ other systems - without managing credentials directly. Attach a Connector Gateway to your sandbox group, and every sandbox in the group can call external APIs through a standardized MCP interface at runtime. Pair connectors with triggers to build event-driven automation: route an Outlook email to a sandbox that triages it with an AI agent, or react to a SharePoint file upload by extracting and processing the document all without writing glue code. Triggers can fire a shell command inside a sandbox or invoke an HTTP endpoint the sandbox exposes, so your automation shapes fit naturally around your workload. The integration is built on the new Connector Namespace service (az connector-namespace), the same runtime behind Logic Apps and Power Platform connectors, now available as a programmable layer for sandboxes. See the end-to-end samples for runnable azd up-deployable examples covering email triage and document automation scenarios. The Portal Experience Azure Container Apps Sandboxes are only available in the new Azure Container Apps portal that provides a rich, IDE-like experience for working with sandboxes. Creating a Sandbox The portal offers multiple creation paths: Standard Sandbox - full configuration control over source, resources, lifecycle, networking, and volumes GitHub Copilot Sandbox - preset, Copilot CLI ready to go, GitHub credentials can be wired through the Access Token before the sandbox is created Claude Sandbox - Claude CLI pre-installed, ready for agentic coding inside the sandbox Using Coding Agents (Copilot CLI / Claude Code) If you live inside Copilot CLI or Claude Code, you don't need to learn a new CLI. Install the azure-sandbox skill once and your agent picks up the right skills: # GitHub Copilot CLI # Add as a plugin marketplace /plugin marketplace add microsoft/azure-container-apps # Install all skills /plugin install sandboxes@Azure-Container-Apps # Claude Code claude plugin add microsoft/azure-container-apps The skill runs prerequisite checks silently (az --version, az account show, node --version, aca --version), prompts only if something's missing, and maps natural-language asks to the right aca commands. Bundled runbooks cover Copilot CLI BYOK (bring your own Azure OpenAI key), the deploy-a-web-app walkthrough, and shell setup. Sandbox Detail Page Once your sandbox is running, the detail page gives you immediate access to the sandbox terminal and additional details, such as - Network Audit - real-time egress traffic log showing allowed and denied requests Monitor - live CPU, memory, disk, and network utilization charts Connectors - attached connections with an "Add" action Volumes - mounted volumes with an "Add" action Log Stream - streaming container logs Processes - running process list inside the sandbox Files - file explorer to browse the sandbox filesystem The toolbar actions let you manage the state of the sandbox - Resume or Stop. In the Ellipsis menu (⁝) you can find additional settings to manage network Egress Policy and ingress (Add port), take a Snapshot of the sandbox, Commit (save disk state as a new disk image), set Lifecycle Policy or permanently Delete the sandbox. Finally, you can see additional Details in a side panel. Getting Started with the CLI and Python SDK All sandbox and sandbox-group operations go through the aca CLI. There are no az containerapp sandbox commands, - az is only used for az login, az account show, and resource-group management. Install (CLI) # Mac, Linux curl -fsSL https://aka.ms/aca-cli-install | sh # Windows irm https://aka.ms/aca-cli-install-ps | iex Run aca --help to get started. Install (Python SDK) pip install azure-containerapps-sandbox For more details, quick start and examples on ACA CLI and Python SDK, please go to https://sandboxes.azure.com Evolution from Dynamic Sessions If you've used Azure Container Apps Dynamic Sessions, Sandboxes are the next evolution of that capability. Everything Sessions can do, Sandboxes can do - and significantly more: Capability Dynamic Sessions Sandboxes Sub-second startup ✓ ✓ Strong isolation ✓ ✓ Custom container images ✓ ✓ Custom VNet integration ✓ (Partial) ✓ Suspend/resume with Memory and Disk snapshots - ✓ Lifecycle policies (auto-suspend, auto-delete) - ✓ Network egress policy (per-sandbox) - ✓ Persistent managed volumes (Blob, Data Disk) - ✓ Managed identity (system + user-assigned) - ✓ Secrets management - ✓ Configurable resource tiers - ✓ Direct access to sandbox in Portal experience - ✓ We will continue to support Dynamic Sessions, but all new investment goes into Sandboxes. If you're building new workloads on isolated ephemeral compute, start with Sandboxes. How It All Fits Together ACA Sandboxes is a platform primitive. It's the foundation on which multiple Microsoft products are already built - including ACA Express, Cloud sandboxes in GitHub Copilot, and Foundry Hosted Agents. When you build on Sandboxes, you're building on the same infrastructure that powers Microsoft's own portfolio. This is the evolution of what we shared with Project Legion in 2024. Legion described the internal infrastructure; Sandboxes exposes it as a customer-facing primitive that you can use directly. What's Next • Deeper Azure integrations - first-class connectivity with Azure networking, identity, storage, and AI services • Enhanced SDK and CLI - richer programmatic experiences for managing sandboxes at scale • More Microsoft services built on Sandboxes - this is just the beginning Get Started Today • Portal: https://sandboxes.azure.com/ • Documentation: Azure Container Apps Sandboxes • Pricing: Azure Container Apps Pricing (per-second vCPU/memory billing, scale-to-zero, snapshots at Blob Storage rates) We'd love to hear your feedback. You can ask questions, or file issues on the Azure Container Apps GitHub (prefix with [Sandbox] for Sandboxes-specific issues).502Views0likes0CommentsWhat's new in Azure Container Apps at Build'26
Azure Container Apps (ACA) is a fully managed serverless container platform that enables developers to build and deploy microservices and modern applications without requiring container expertise or needing infrastructure management. ACA provides built-in autoscaling (including scale to zero), per-second billing, advanced networking, built-in observability, and simplified developer experiences across multiple programming languages and frameworks. The world of application development is shifting rapidly. Agentic AI is fundamentally changing the requirements of cloud platforms - more code is being written by AI, more apps are being deployed by agents, and more deployment stacks are being assembled autonomously. Platforms are aligning to two concurrent demands: hosting intelligent agents as first-class workloads, and giving those same agents access to empty, secure compute pools as tools they can invoke on demand. At the same time, the proliferation of AI-generated code means that platforms must offer strong isolation for untrusted workloads, instant provisioning for rapid iteration, and production-grade defaults that make the right thing the easy thing - for both humans and agents. Azure Container Apps is purpose-built for this new reality. Whether you're a developer shipping a web app in minutes or an agent spinning up ephemeral sandboxes for code execution, ACA provides the serverless foundation that meets both audiences where they are. Customers across industries are betting on ACA as the compute foundation for their AI and cloud-native workloads: Replit runs its agent-driven software creation platform on Azure, enabling enterprises like Hexaware to securely build and deploy AI-generated applications at scale with seamless procurement through Azure Marketplace. LayerX built its Ai Workforce document processing platform on Azure Container Apps, Azure OpenAI, Azure AI Search, and Cosmos DB - helping clients like Mitsui & Co. save 570 hours annually by automating manual document tasks. SJR built GX Manager with Microsoft Foundry to automate website personalization at scale - delivering production-grade, data-grounded content in seconds instead of hours of manual curation. August AI powers an AI health companion serving over 3.5 million customers on Azure infrastructure, scoring 100% on the U.S. Medical Licensing Examination and delivering potentially life-saving medical support. Photon Education created Classwise on Azure OpenAI and Foundry with Defender for Cloud security, enabling teachers to prepare lessons faster and engage students more effectively in inclusive learning environments. Microsoft Foundry Agent Service is built directly on Azure Container Apps, serving over 20,000 customers with a dedicated agent runtime that handles fast startup, tool execution, long-running operations, and enterprise-grade isolation at scale. Following the features announced at Ignite'25 and our continued momentum through early 2026, we're excited to share what's new at Build'26. This release deepens our commitment to the agentic era with new primitives for secure ephemeral compute, the fastest path from container to production, a reimagined portal experience, and continued investment in security, observability, and developer productivity. Azure Container Apps Sandboxes (Public Preview) Teams building agentic applications, multi-tenant platforms, development environments, and CI/CD systems have often had to stitch together custom infrastructure to run untrusted code safely, preserve state across sessions, and handle bursty demand without paying for idle capacity. Azure Container Apps Sandboxes addresses that challenge with a new first-class resource type that provides fast, secure, ephemeral compute environments with built-in suspend and resume capabilities. Each sandbox runs in its own hardware-isolated microVM boundary, supports standard OCI container images, and starts in sub-second time. Sandboxes can preserve memory, disk state, and preloaded libraries in a snapshot, so workloads resume quickly from the same point without incurring a cold-start reload penalty. Why Sandboxes are perfect for agents Agents can safely run AI-generated code in isolated environments with instant startup. Agents also accumulate context, intermediate results, and working state during long-running tasks. With sandbox snapshots, agents get persistent, isolated workspaces that survive across task boundaries - they can suspend and resume as needed, preserving full execution context including memory and disk. Key capabilities Sub-second startup - provision and execute immediately Hardware-isolated microVMs - strong security boundary for untrusted code Snapshot and resume - full state preservation (memory + disk) across sessions OCI container image support - bring any container Scale to zero, scale to thousands - consumption pricing with per-second billing This is the underlying infrastructure on which products like Cloud Sandbox in GitHub Copilot, Foundry Hosted Agents, and Azure Container Apps Express are built. ACA Sandboxes joins the Container Apps family alongside Apps, Jobs, Functions, and Dynamic Sessions as a foundational building block for the next generation of cloud and AI application workloads. Learn more about Azure Container Apps Sandboxes at https://aka.ms/aca/sandboxes Azure Container Apps Express (Public Preview) We recently launched Azure Container Apps Express in public preview - the simplest and fastest way to launch and scale powerful applications on Azure, from zero to hyperscale, without infrastructure decisions. It represents the first Azure compute platform purpose-built for agent and developer use alike. Express is based on years of experience running Azure Container Apps at scale. We've learned that most developers working on web apps, APIs, and agents want to deploy quickly, have automatic scaling, and avoid dealing with complex infrastructure. Express provides these capabilities - it sets up your environment in seconds, handles any amount of traffic, and removes complicated settings. This helps teams move from writing code to having a production-ready app in minutes, not hours. What makes Express different Instant provisioning - your app is running in seconds, not minutes Sub-second cold starts - fast enough for interactive UIs and on-demand agent endpoints Scale to and from zero - automatic, no configuration required Per-second billing - pay only for what you use, no environment provisioning fee Production-ready defaults - autoscaling, managed identity, secrets management, custom domains, container registry integration, revision management, and built-in observability Purpose-built for custom agents Agents need to spin up application endpoints on demand - fast, reliably, and without pre-provisioning infrastructure. Express is purpose-built for this pattern: it provisions in seconds, scales from zero instantly when an agent triggers a workload, and scales back down when the task is complete. Whether an agent is deploying a tool-use endpoint, standing up a temporary API for a multi-step workflow, or launching a web UI for human-in-the-loop review, Express gives it a production-grade, internet-reachable application with zero operational overhead. It's the fastest path from "an agent decided to deploy something" to "it's live and serving traffic." Learn more about Azure Container Apps Express at https://aka.ms/aca/express/launch-blog New Azure Container Apps Portal You open the Azure portal and want to deploy a Container App. Ten minutes later you're three blades deep, toggling settings you don't understand, wondering which workload profile is best before you even have an app. We built a different portal. One where deploying a container app takes less time than reading this paragraph. One where creating an Azure Container App is a single click. And one where experimental features ship weekly, not quarterly. Smart defaults, advanced when you need Developers care about outcomes - where their app is running and how to reach it - not starting with a configuration form. The new portal offers three creation modes to keep setup simple: Simple "one-click create" - auto-generates a unique name and provisions your app. Provide the container image and egress settings. That's it - no environment type selection, networking decisions, or container registry configuration. Advanced create - unlocks everything: custom VNets with subnet selection, managed identity for registry auth, lifecycle policies, egress controls, environment variables, custom scale rules, and more. It's a toggle at the top of the same form, not a separate workflow. Express App (Preview) - the new kind of ACA application that provisions and starts almost instantly. Observe quickly, act faster The app overview page surfaces critical information at a glance - including a unified Log Stream that brings app and system logs together in one place. Getting to the root cause now takes fewer clicks, and next steps are always one click away. Faster releases, direct feedback loop Azure Container Apps Express (Preview) and Azure Container Apps Sandboxes (Preview) are currently available only in this new portal. We ship weekly - often more. Upcoming Portal Features in settings give you an easy way to opt in to early access features and share feedback directly. Security: Defender for Cloud Serverless Containers Posture and Confidential Compute Security remains a top priority as enterprises run more sensitive and regulated workloads on Azure Container Apps. At Build'26, we're announcing two key security milestones. Public Preview: Defender for Cloud Serverless Containers Posture on Azure Container Apps Customers can now bring Azure Container Apps environments into Microsoft Defender for Cloud's Serverless Containers Posture experience, helping security teams extend posture management across more of their container estate from a single workflow. This makes it easier to gain visibility into Container Apps resources and assess risks across areas such as identity, networking, and container or image configuration. With this capability, teams can more consistently evaluate risk across container environments and use attack path analysis to identify potential exposure faster. The result is a more unified security posture, less manual effort, and stronger confidence when securing Container Apps deployments. Serverless Containers Posture is available as part of the Defender CSPM plan. Learn more at the Defender for Cloud documentation. General Availability: Confidential Compute for Azure Container Apps Confidential Compute in Azure Container Apps is now generally available, providing hardware-backed Trusted Execution Environments (TEEs) through workload profiles. This extends protection to data in use - in addition to data at rest and in transit - enabling teams to run higher-trust workloads with stronger isolation for sensitive data. With confidential computing now GA, Azure Container Apps becomes more viable for regulated, financial, healthcare, and other high-trust scenarios where organizations need hardware-enforced isolation that protects in-memory data, including from the underlying infrastructure. There is no extra charge for confidential compute workload profiles. Learn more at the Azure Confidential Computing documentation. Observability: HTTP Traffic Logs and OpenTelemetry Destinations Knowing what's happening inside your application is essential to running production workloads with confidence. At Build'26, we're announcing two enhancements that give teams deeper visibility and more flexibility in where they send telemetry. Monitor HTTP traffic in Azure Container Apps Azure Container Apps now adds a dedicated Azure Monitor diagnostic setting category - ContainerAppHTTPLogs - that exposes detailed HTTP access logs for incoming traffic. This capability is designed for high-volume request data, enabling teams to troubleshoot ingress and request-flow issues with much greater precision. With HTTP traffic logs, you can now investigate: Failed requests and error codes Latency patterns and outliers Retries and WebSocket disconnects Routing behavior and backend connectivity The result is faster issue resolution, less operational friction, and stronger confidence in running high-traffic, business-critical applications. Standard Azure Monitor log volume charges apply. Learn more at Azure Monitor pricing. Additional OpenTelemetry Destinations: New Relic, Dynatrace, Elastic Azure Container Apps enhances its managed OpenTelemetry (OTel) capabilities by expanding support for third-party observability platforms. This update introduces additional endpoint options for commonly used monitoring tools - New Relic, Dynatrace, and Elastic - extending the existing managed OpenTelemetry experience. Teams can now use a more consistent OpenTelemetry-based pipeline across Azure Monitor, Datadog, New Relic, Dynatrace, Elastic, and any OTLP-compatible endpoint, with less configuration overhead and more flexibility to route logs, metrics, and traces where they need them - without deploying or managing their own collectors. No extra charge applies. Learn more at the OpenTelemetry agents documentation. Additional Enhancements and Ecosystem Updates Beyond the headline announcements, Azure Container Apps continues to evolve with a steady cadence of improvements across the platform. Override Scale Rules in Azure Functions on Azure Container Apps Azure Functions on Container Apps has traditionally used platform-managed scaling, where triggers are automatically translated into KEDA scale rules. With the new allowScalingRuleOverride property, customers can now choose to override platform-managed scaling and define their own custom KEDA scaling rules. This enhancement is especially useful for scenarios where automatically generated KEDA rules lead to unintended scaling behavior, where workloads require custom thresholds or concurrency tuning, or where teams need standardized scaling policies across services. It works with any of the 60+ KEDA scalers - Service Bus, Kafka, PostgreSQL, HTTP concurrency, Cron, and more. Heroku Migration to Azure Container Apps With Heroku entering maintenance mode, Azure Container Apps is a natural landing zone for Heroku workloads. New guidance and tooling makes the migration path straightforward - from understanding why ACA is the right next step to a practical migration guide for hands-on implementation. Dapr v1.16 Platform Upgrade Azure Container Apps completed a staged platform upgrade to Dapr v1.16.4, bringing modernized actor scheduling, improved scalability for reminders, and updated TLS/security internals. The upgrade is fully platform-managed, with minimal customer action required for most workloads. Running AI Models on ACA Serverless GPUs The community continues to push the boundaries of what's possible with serverless GPUs on ACA. Recent highlights include running Gemma 4 with Ollama for fully private, self-hosted inference, and deploying ComfyUI for text-to-image and text-to-video workloads - all with scale-to-zero and per-second billing. Hosting Remote MCP Servers on ACA Azure Container Apps is emerging as the preferred platform for hosting Model Context Protocol (MCP) servers. With serverless scaling, idle billing, HTTP/1.1 and HTTP/2 support, and managed identity integration, ACA provides a production-ready environment for exposing tools and APIs to AI agents. Multiple tutorials and guides are now available for deploying MCP servers on ACA, including integration with Azure API Management. App Modernization with GitHub Copilot GitHub Copilot App Modernization can dramatically reduce the time required to modernize legacy applications and deploy them to ACA. A recent walkthrough demonstrated upgrading a classic ASP.NET MVC app on .NET Framework to .NET 10 and deploying it to Azure Container Apps in hours - with managed identity and Key Vault integration enabled by default. Azure Skills Repository for Container Apps The new Azure Skills repository includes comprehensive skills specifically for Azure Container Apps - covering troubleshooting, best practices, architecture patterns, security, deployment, and integration. These skills are designed to be used by AI agents and developer tools like GitHub Copilot CLI, providing rich context for building, deploying, and operating ACA workloads. It's another example of how the ACA ecosystem is evolving to be agent-native. Docker Compose for Agents Docker Compose for Agents on Container Apps (public preview) brings the familiar Compose workflow to agentic applications. Declare models, agents, and MCP tools in a single compose.yaml file and deploy unchanged from laptop to cloud - supporting LangGraph, Vercel AI SDK, Spring AI, CrewAI, and other frameworks. Learn more at the Compose for Agents documentation. What's Next Azure Container Apps is redefining how developers and agents build, deploy, and operate intelligent applications. With Sandboxes for secure ephemeral compute, Express for instant provisioning, a reimagined portal for streamlined management, and continued investment in security and observability - ACA provides the ideal foundation for the agentic era. The features announced at Build'26 deepen our commitment to making Azure Container Apps the platform where both humans and AI agents can ship production workloads with confidence, speed, and minimal operational overhead. Also, if you're at Build, come see us at the following sessions: Breakout 221: Idea to production-ready agent in seconds on AI-native runtime Demo 312: Multi-agents in action with 3 AI agents, 3 frameworks, tools & models Lab 580: Build and deploy reasoning agents with NVIDIA Nemotron and Foundry Lightning Talk 453: Building an End‑to‑End Enterprise AI Platform on Azure Or come visit us at the Azure Application Services booth #44. Visit our GitHub page for feedback, feature requests, or questions. Check out our roadmap to see what we're working on next. We look forward to hearing from you!150Views0likes0CommentsBetter Deployment Errors in az webapp deploy
Deployment failures can be difficult to interpret, especially when the error returned by the deployment API does not clearly explain what went wrong or what to do next. To make this easier, we have added a new switch to az webapp deploy for App Service for Linux: --enriched-errors true When enabled, deployment failures show context-enriched diagnostics directly in the CLI output. This includes an error code, deployment context, the raw error, suggested fixes, and a Copilot-ready prompt that you can use for additional guidance. By default, this option is disabled. How to use it Add --enriched-errors true to your deployment command: az webapp deploy \ --resource-group <resource-group-name> \ --name <app-name> \ --src-path <path-to-package> \ --enriched-errors true What you get With enriched errors enabled, failed deployments can include details such as: Error Code : ArtifactStackMismatch Stage : Deployment Runtime : DOTNETCORE|9.0 Deploy Type : WarDeploy Kudu Status : 400 Raw Error: Artifact type = 'War' cannot be deployed to stack = 'DOTNETCORE'. Suggested Fixes: - Ensure the artifact type matches the app's runtime stack - Check the current linuxFxVersion - Update the runtime stack if needed This makes it easier to understand whether the failure is caused by an artifact/runtime mismatch, an invalid deployment path, missing required parameters, or a configuration conflict such as WEBSITE_RUN_FROM_PACKAGE. The screenshots below show a couple of more examples of enriched deployment failures. Use with GitHub Copilot The enriched output also includes a prompt that you can paste into GitHub Copilot along with the full error details: Why did my Linux App Service deployment fail and how do I fix it? This can help you get more specific guidance based on your deployment configuration and the failure details. Summary The new --enriched-errors switch gives you clearer and more actionable deployment failure information directly in the Azure CLI. Try it out the next time you are deploying to Azure App Service for Linux.94Views0likes0CommentsIntroducing Azure Container Apps Express!
Three years ago, a 15-second cold start was industry-leading. Today, developers and AI agents expect sub-second. The speed bar has moved, and the tooling needs to move with it. After running Azure Container Apps for years, we've learned something important: for most developers, the ACA environment is an unnecessary construct. It adds provisioning time, configuration surface, and cognitive overhead — when all you really want is to run your app with scaling, networking, and operations handled for you. At the same time, a new class of workloads has emerged. Agent-first platforms — systems where AI agents deploy endpoints on demand, spin up tool-use APIs, and tear them down when work is done — demand an even more radical focus on speed and simplicity. Every second of provisioning delay is wasted agent productivity. Today, we're launching Azure Container Apps Express in Public Preview — the fastest, simplest way to go from a container image to an internet-reachable app on Azure, ready for many production-style workloads. What Is ACA Express? ACA Express removes the infrastructure decisions. There's no environment to provision, no networking to configure, no scaling rules to write. You bring a container image, Express handles everything else. Behind the scenes, Express runs your container on pre-provisioned capacity with sensible defaults baked in — so you skip environment setup without giving up ACA's serverless model. There's more coming in this space soon — keep watching. Here's what that means in practice: Instant provisioning — your app is running in seconds, not minutes Sub-second cold starts — fast enough for interactive UIs and on-demand agent endpoints Scale to and from zero — automatic, no configuration required (full scaling controls coming soon) Per-second billing — pay only for what you use Production-ready defaults — ingress, secrets, environment variables, and observability are built in Express is purpose-built for two audiences: developers who want to ship fast (SaaS apps, APIs, web dashboards, prototypes) and agents that deploy on demand (MCP servers, tool-use endpoints, multi-step workflow APIs, human-in-the-loop UIs). If you've ever waited for an ACA environment to provision, only to realize you didn't need half of the configuration options it asked you for — Express is your answer. What You Can Do Today Note: West Central US is currently the only available region. We will expand to new regions through the coming days. Express is in Public Preview starting today. It's a deliberate early ship — there's a meaningful feature gap compared to the existing Azure Container Apps offering, and we're filling it fast. New capabilities are landing on a rapid cadence throughout the preview, and by Microsoft Build in June, Express should be close to feature-complete. For the current list of supported features, known gaps, and what's on the way, see the Express documentation. We'd rather put valuable technology in your hands early and iterate with you than wait behind closed doors for perfection. Who Is Express For? Scenario Why Express SaaS apps and APIs Deploy and scale without infrastructure planning AI app frontends Chat UIs and copilot frontends that scale with usage spikes MCP servers Expose API endpoints for AI agents in seconds Agent workflows Spin up endpoints on demand, tear down when done Prototypes and startups Go from idea to production in minutes Web dashboards Internal tools with instant availability Get Started Express is available now in Public Preview. Try it: Azure Container Apps Express overview — concepts, capabilities, and the current feature support matrix Deploy your first app with the Azure CLI — step-by-step quickstart New Azure Container Apps Portal — create and manage Express apps alongside your existing Container Apps resources Have questions? Check the Azure Container Apps Express FAQ for answers to common questions about pricing, limits, regions, and the road to GA. We're building Express in the open and we want to hear from you. Tell us what features matter most, what works, and what doesn't — reach out on the Azure Container Apps GitHub or in the comments below.14KViews6likes6CommentsYou Can Build a Framework-Agnostic AI Gateway on Azure App Service — Here's How
The agent infrastructure conversation moved this year. In October 2025, AWS shipped Amazon Bedrock AgentCore — a managed agent runtime with per-session microVM isolation, built-in long-term memory, native MCP support, and an opinionated policy engine. A few months earlier, Cloudflare shipped its Agents SDK on top of Durable Objects, betting that edge-native stateful agents are the future. Both bets are real, both are interesting, and both arrive as closed, proprietary runtimes. So: what's Azure's answer? It's a question I've heard a couple times from architects in the last six months. The honest answer is that Azure already has the pieces. They don't ship as one product called AgentRuntime, and that's actually the point. Azure's pitch is composable: App Service + API Management + MCP, three services you already have access to, glued together with open standards. This post walks through a runnable sample of that composition. One App Service hosting both an agent (built with the Microsoft Agent Framework) and the stateless MCP server it calls, fronted by Azure API Management with the AI Gateway policy set — semantic caching, token rate limiting, per-subscription token emission for chargeback. One azd up deploys the lot. Repo: app-service-ai-gateway-mcp-apim-python. The headline claim is in the title. The point I actually want to make is the one underneath it: the framework is replaceable, the gateway is the contribution. Swap the Agent Framework module for Pydantic AI or LangGraph and the rest of the architecture is unchanged. That's what "run anything" means, made literal. The composable stack ┌────────────────────────────────────────────────┐ │ Azure API Management │ MCP / Agent ──┤ AI Gateway policies: │ client │ • llm-token-limit │ │ • llm-semantic-cache-lookup / store │ │ • llm-emit-token-metric │ │ • rate-limit-by-key (MCP API) │ └─────────────┬───────────────────┬──────────────┘ │ │ ┌────────────────▼──┐ ┌────────────▼──────────────┐ │ Azure OpenAI │ │ Azure App Service │ │ • chat model │ │ FastAPI app: │ │ • embedding │ │ • /mcp (stateless) │ │ model │ │ • /agent/chat │ └───────────────────┘ │ Managed identity → │ │ APIM (via subscription) │ └────────────┬──────────────┘ ▼ Application Insights (cloud_RoleInstance, APIM token metrics) Three observations that drive everything else: APIM is the only thing that talks to Azure OpenAI. The App Service agent doesn't have an AOAI key. It has an APIM subscription key. Every LLM call passes through the gateway, picks up the policies, and gets logged with consistent dimensions. That's where the governance part lives. The agent runtime is App Service. Linux, Python, FastAPI. Any language. Any framework. Pick your tool. We use Microsoft Agent Framework because it just GA'd and the API is clean, but the agent module is the easiest thing in the stack to swap. The MCP server is co-located with the agent. Same App Service, different route. The agent calls its own tools either in-process (fast path) or back out through APIM (so MCP traffic gets rate-limited and observed too). That choice is one environment variable. What the sample actually does The FastAPI app exposes three routes that matter: /mcp — a stateless HTTP MCP server (protocol revision 2025-11-25 ), implementing four tools: whoami , echo , lookup_fact , and summarize_app_service_doc . Any MCP client (Claude, VS Code, your own agent runtime) can connect. /agent/chat — a Microsoft Agent Framework agent that uses those same MCP tools as its tool set, and calls AOAI through APIM. /health and / — the boring but essential supporting cast (health check for App Service probes, status page showing the serving instance ID). Here's the agent definition. The key line is the endpoint: from agent_framework.openai import OpenAIChatCompletionClient client = OpenAIChatCompletionClient( azure_endpoint=os.environ["APIM_GATEWAY_URL"], # ← APIM, not AOAI model=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"], api_version="2024-10-21", api_key=os.environ["APIM_SUBSCRIPTION_KEY"], default_headers={"Ocp-Apim-Subscription-Key": os.environ["APIM_SUBSCRIPTION_KEY"]}, ) agent = client.as_agent( name="AppServiceExpert", instructions=SYSTEM_INSTRUCTIONS, tools=build_tools(), ) That's it. The agent has no idea APIM exists. It thinks it's talking to AOAI. APIM is doing every interesting thing — auth, caching, throttling, metric emission — without the agent code knowing or caring. The policy that does the heavy lifting The AOAI API in APIM has one policy attached at the API scope. The full XML is in infra/apim/policies/aoai-policy.xml; here's the bones of it: <policies> <inbound> <base /> <authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="aoai-token" /> <set-header name="Authorization" exists-action="override"> <value>@("Bearer " + (string)context.Variables["aoai-token"])</value> </set-header> <azure-openai-token-limit counter-key="@(context.Subscription?.Id ?? "anonymous")" tokens-per-minute="50000" estimate-prompt-tokens="true" /> <azure-openai-semantic-cache-lookup score-threshold="0.85" embeddings-backend-id="aoai-embeddings-backend" embeddings-backend-auth="system-assigned"> <vary-by>@(context.Subscription?.Id ?? "anonymous")</vary-by> </azure-openai-semantic-cache-lookup> <set-backend-service backend-id="aoai-backend" /> <azure-openai-emit-token-metric namespace="ai-gateway"> <dimension name="Subscription ID" value="@(context.Subscription?.Id ?? "anonymous")" /> <dimension name="API ID" value="@(context.Api.Id)" /> <dimension name="Operation ID" value="@(context.Operation.Id)" /> <dimension name="Client IP" value="@(context.Request.IpAddress)" /> </azure-openai-emit-token-metric> </inbound> <outbound> <base /> <azure-openai-semantic-cache-store duration="3600" /> </outbound> </policies> Four things are happening here that would otherwise be your problem: Auth to AOAI. APIM's managed identity holds the Cognitive Services OpenAI User role on the AOAI account. No keys. Token rate limiting. Each APIM subscription gets a tokens-per-minute budget. One runaway team can't starve everyone else. Semantic caching. The inbound policy embeds the prompt using the embedding deployment, queries the Redis-backed APIM cache for a vector match above the 0.85 threshold, and short-circuits the AOAI call on a hit. The outbound azure-openai-semantic-cache-store writes successful misses back. Per-call metric emission. Every call pushes PromptTokens , CompletionTokens , and TotalTokens to Application Insights as custom metrics tagged with the APIM subscription, the API, the operation, and the client IP. That's your chargeback dashboard, ready to query. The whole thing is XML. None of it is in your agent code. Deploying it azd auth login azd up azd up provisions a P0v3 App Service Plan with the web app and a staging slot, an AOAI account with gpt-4o-mini + text-embedding-3-small deployments, an APIM Developer SKU service with the two APIs and the policy XML wired up, an Azure Cache for Redis Basic C0 as the semantic-cache store, and a Log Analytics workspace + Application Insights. The postprovision hook fetches the APIM subscription key for the AI Gateway product and writes it into the App Service's APIM_SUBSCRIPTION_KEY setting (and the staging slot's, so slot swaps are clean). Be patient. Developer SKU APIM takes 30–45 minutes the first time. If you want to prototype faster, the sample supports Consumption SKU as a one-line flip: azd env set APIM_SKU Consumption azd provision Consumption provisions in about a minute and is great for sketching. Verify your specific policies are supported there before you ship it. Governing it like a grown-up The toy version of this post stops at "look, semantic cache works." The version your platform engineering lead wants to see goes further. Per-team chargeback. The token-emit policy tags every call with the APIM subscription ID. Issue one subscription per team, hand it over with a quota, and your monthly chargeback report is a KQL query: customMetrics | where timestamp > startofmonth(now()) | where name == "TotalTokens" | summarize Tokens=sum(valueSum) by Team=tostring(customDimensions["Subscription ID"]) | extend USD = Tokens * 0.00015 / 1000 // gpt-4o-mini blended rate | order by USD desc Content safety as a policy plug-in. Add an llm-content-safety block to the inbound policy and point it at an Azure AI Content Safety resource — every prompt and response gets moderated before reaching agents or end users. The sample doesn't deploy Content Safety by default (to keep the demo cost-free), but the README has the one-line bicep + one-block policy delta. Circuit breaker + multi-region failover. Add a second AOAI backend in a different region and an APIM backend pool, give the pool a circuit-breaker rule, and your agents inherit failover with zero code changes. Rate-limit MCP traffic too. The MCP API has its own policy with rate-limit-by-key , so a runaway agent can't pin the MCP server with a hot loop. None of these are gymnastics. They're one policy block each. The pattern is the same every time: write policy at the gateway, leave the agent code alone. Proving it works After azd up finishes, two checks. First, hit the agent endpoint: curl -sS -X POST "$(azd env get-value WEB_URI)/agent/chat" \ -H 'Content-Type: application/json' \ -d '{"message": "How does App Service horizontally scale an MCP server?"}' | jq You should see a reply that cites the instance ID (the agent calls whoami and summarize_app_service_doc to ground its answer) and a tool_calls array showing the agent's reasoning trace. Second, run the k6 load test: export BASE_URL="$(azd env get-value WEB_URI)" export APIM_SUBSCRIPTION_KEY="$(azd env get-value APIM_SUBSCRIPTION_KEY)" k6 run loadtest/k6-gateway.js The script hits /agent/chat with a small pool of semantically-similar prompts. After a 30-second warmup, the steady phase should report a cache-hit ratio above 30%: APIM AI Gateway — k6 summary ───────────────────────────── Cache hits : 412 Cache misses : 88 Hit ratio : 82.4% Cross-check in App Insights: ApiManagementGatewayLogs | where TimeGenerated > ago(15m) | where ApiId == "aoai" | extend cache = tostring(parse_json(ResponseHeaders)["x-llm-cache-status"]) | summarize count() by cache, bin(TimeGenerated, 1m) | render columnchart A solid bar of hits next to a smaller bar of misses is the gateway earning its keep. "Run anything" — the proof Here's the part where I cash the check the title wrote. The agent module is the easiest thing in this stack to replace. Three changes to ship the same demo on Pydantic AI: # requirements.txt - agent-framework-core==1.5.0 - agent-framework-openai==1.5.0 + pydantic-ai==0.4.0 # agent/agent.py from pydantic_ai import Agent from pydantic_ai.models.openai import OpenAIModel def build_agent(): model = OpenAIModel( "gpt-4o-mini", base_url=f"{os.environ['APIM_GATEWAY_URL']}/openai/deployments/gpt-4o-mini", api_key=os.environ["APIM_SUBSCRIPTION_KEY"], ) return Agent(model, system_prompt=SYSTEM_INSTRUCTIONS, tools=build_tools()) That's it. build_tools() returns the same list of async callables (Pydantic AI accepts plain Python functions as tools, same as Agent Framework). LangGraph works the same way — wire build_tools() into a ToolNode , point ChatOpenAI at the APIM gateway URL, done. Every APIM policy still fires. Every token metric still emits. Every cache hit still hits. The gateway is the boundary; the runtime above it is fungible. What AgentCore gets right I want to land this without spin. AgentCore's per-session microVM isolation is genuinely interesting — it's a stronger sandboxing story than running multiple agents in shared App Service workers, and it matters for multi-tenant SaaS where agents execute arbitrary user code or call third-party tools you don't fully trust. The managed long-term memory primitive is also a real convenience; Azure has the building blocks (Cosmos DB, AI Search, Cognitive Search) but they aren't pre-wired into a single "agent memory" API the way AgentCore's are. Where the App Service + APIM + MCP composition genuinely wins: Open standards. MCP is a public protocol with implementations across the industry. AgentCore's tool layer is AWS-native. No new runtime to learn. App Service is the same App Service. Your existing CI/CD, your existing security review, your existing monitoring all apply. Bring your own framework. Pydantic AI, LangGraph, Agent Framework, Semantic Kernel, AutoGen, CrewAI — they all work, because the App Service doesn't care what's running inside the container. Existing enterprise footprint. VNet integration, private endpoints, managed identity, deployment slots, sidecars, Easy Auth. None of it is new for App Service. You inherit a decade of platform work. The right framing isn't "Azure's answer to AgentCore." It's that Azure is making a different bet: that enterprises will value the composability of services they already trust over the convenience of a new proprietary runtime. For some, that bet is probably correct. For a few — multi-tenant agent marketplaces, untrusted code execution — AgentCore's isolation model is a better fit. Pick the one that matches your threat model. What's next If you ship the sample and want to compare notes, the repo is at app-service-ai-gateway-mcp-apim-python.215Views0likes0CommentsYou Can Scale MCP Servers Behind a Load Balancer on App Service — Here's How
Most MCP servers in the wild are single-instance processes. That's fine when they're driving a local Claude or VS Code session — but it's the wrong shape for a production agent fleet that has to absorb traffic spikes, ride through deploys, and survive instance failures. The good news: the MCP spec already grew up. The 2025-06-18 revision formalizes stateless HTTP transport (and the current 2025-11-25 revision keeps it), which means a single request carries everything the server needs to answer. No long-lived connection, no in-process session table, no sticky-session hacks to keep a client glued to one box. That tiny protocol change unlocks something big: you can stick an MCP server behind App Service's built-in load balancer and scale it like any other web API. This post walks through how, with a runnable sample. Sample: seligj95/app-service-mcp-stateless-scale-python. One azd up and you have a stateless FastAPI MCP server running on three App Service instances behind the platform load balancer, with a staging slot, Application Insights, and a k6 script that visualizes load distribution from the client side. Why "stateless" is the whole story Earlier MCP transports leaned on persistent connections — SSE channels and WebSocket-style sessions where the server held per-client state in memory (open tools, subscriptions, partial streams). That model is great for a local IDE talking to a local process. It's hostile to load balancing, because routing a follow-up request to a different instance breaks the session. The stateless HTTP transport flips that. Each request is a complete JSON-RPC envelope ( initialize , tools/list , tools/call ), every response is self-contained, and the server is allowed to forget the client between requests. Any instance can serve any call. That is the property a load balancer needs. In the sample, every tool is a pure function of its arguments — whoami reports the serving instance, lookup_fact reads a static dictionary, compute_primes runs a sieve. None of them touches per-client memory. That's not a constraint of the protocol; it's a discipline you adopt to keep statelessness intact. Why App Service, and not Functions or AKS Functions and AKS are a couple of the many great options for MCP server hosting depending on what the MCP server is used for. The use case we are discussing here is a scaled MCP server, i.e. an MCP server that must reach a large and broad audience. Here are a few defaults that make App Service a solid option for this scenario: Always On. Reasoning tools call into LLMs and external APIs; latencies routinely sit in the multi-second range. Functions caps a single execution at ten minutes by default (and aggressively scales workers to zero between bursts, which kills warm caches). App Service keeps the process resident. Horizontal scale is one parameter. Pick a Premium SKU, set the plan's capacity to N, and you have N instances behind a managed load balancer. No VMSS to declare, no ingress controller to wire up, no Service to reconcile. Deployment slots. Swap a warmed-up staging slot into production for zero-downtime deploys. Critical when your "API" is an LLM tool surface that an agent is actively driving. Easy Auth. OAuth 2.1 in front of the MCP endpoint without writing the flow yourself — turn on the App Service authentication blade and point it at Entra ID. The sample leaves this off so the deploy is one command, but the wiring is a checkbox away. The TL;DR: it's PaaS that already knows how to run a stateful long-lived process at horizontal scale, which is exactly the shape of a scaled MCP server. The FastAPI MCP server, end-to-end stateless The whole transport is one POST handler. The full source is in main.py , but here are the load-bearing pieces: @app.post("/mcp") async def mcp_endpoint(request: Request): body = await request.json() method = body.get("method", "") msg_id = body.get("id") if method == "initialize": return {"jsonrpc": "2.0", "id": msg_id, "result": _server_info()} if method == "tools/list": return {"jsonrpc": "2.0", "id": msg_id, "result": {"tools": [...]}} if method == "tools/call": params = body.get("params", {}) result = await MCP_TOOLS[params["name"]]["function"](**params.get("arguments", {})) return { "jsonrpc": "2.0", "id": msg_id, "result": {"content": [{"type": "text", "text": json.dumps(result)}]}, } There is no session table. There is no client_id cookie. There is no AsyncIterator held open between requests. initialize , tools/list , and tools/call all return in a single round trip, which is the shape App Service's load balancer expects. The most useful debugging tool in the sample is whoami : async def tool_whoami() -> Dict[str, Any]: return { "instance_id": os.environ.get("WEBSITE_INSTANCE_ID", "local"), "hostname": socket.gethostname(), ... } WEBSITE_INSTANCE_ID is unique per App Service worker. Call whoami a few times from your MCP client and the value rotates — that's the load balancer working. If it doesn't rotate, something is pinning your traffic (almost always the ARR Affinity cookie; we'll get there). The Bicep that actually makes it scale The infra is a P0v3 plan with capacity: 3 , a web app with affinity disabled, and a staging slot on the same plan: resource appServicePlan 'Microsoft.Web/serverfarms@2024-04-01' = { name: name sku: { name: 'P0v3' capacity: instanceCount // 3 by default } properties: { reserved: true } } resource web 'Microsoft.Web/sites@2024-04-01' = { name: name properties: { serverFarmId: appServicePlanId httpsOnly: true clientAffinityEnabled: false // ← the one line that matters siteConfig: { linuxFxVersion: 'PYTHON|3.11' alwaysOn: true healthCheckPath: '/health' appCommandLine: 'python -m uvicorn main:app --host 0.0.0.0 --port 8000' } } } resource staging 'Microsoft.Web/sites/slots@2024-04-01' = { parent: web name: 'staging' properties: { /* same shape — separate hostname, same plan */ } } The single most important line in that template is clientAffinityEnabled: false . App Service defaults to on, which sets the ARRAffinity cookie and pins every subsequent request from a given client to the instance that handled the first one. That default exists because legacy ASP.NET apps used in-process session state. Stateless MCP does not. Leaving affinity on silently undoes everything we just built. Premium v3 (P0v3) is the floor for two reasons: it gives Always On and unlocks deployment slots. Below that tier you don't get either. Application Insights without writing telemetry code The sample drops one line of bootstrap into main.py : from azure.monitor.opentelemetry import configure_azure_monitor if os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING"): configure_azure_monitor(logger_name="mcp") The Azure Monitor OpenTelemetry distro auto-instruments FastAPI and outbound HTTP. Every request span App Service emits is tagged with cloud_RoleInstance , which Application Insights populates from WEBSITE_INSTANCE_ID . That makes the question "is traffic actually spreading across my instances?" a one-liner in Logs: requests | where timestamp > ago(15m) | where name contains "/mcp" | summarize count() by cloud_RoleInstance | order by count_ desc If you see three roughly-equal rows, you're done. If you see one row, your client is sending ARRAffinity cookies — turn affinity off and redeploy. Deploy azd auth login azd up That provisions the resource group, plan, web app, staging slot, Log Analytics workspace, and Application Insights resource, then deploys the Python app via Oryx. The output prints both WEB_URI and WEB_STAGING_URI . Open the production URI — the home page renders the instance ID that served it. Refresh. The ID changes. To swap the staging slot into production with no downtime: az webapp deployment slot swap \ --resource-group <rg> --name <app> \ --slot staging --target-slot production App Service warms the staging instances, redirects traffic, and the old production becomes the new staging — the classic blue-green pattern, but free. Prove it scales The sample ships a k6 script that hammers /mcp with tools/call requests and tags every response with the instance_id the server returned: BASE_URL=https://<your-app>.azurewebsites.net \ k6 run --summary-export=summary.json loadtest/k6-mcp.js jq '.metrics.mcp_instance_hits.values' summary.json The output groups hits per instance tag. On a three-instance plan with a 60-second steady load you should see something close to: { "count": 1842, "instance0d3e2f...": 614, "instance7a91bc...": 612, "instance19f0c4...": 616 } Roughly 33% on each box — the App Service load balancer round-robining new connections, with no help from the application. What I'd do next The sample is intentionally a starting point. Two extensions are the obvious next moves: Add Easy Auth. Turn on App Service authentication, pick Entra ID, require auth on /mcp . The token surfaces as headers; your tool handlers can use it to identify the calling agent without you owning any of the OAuth machinery. Autoscale on CPU. instanceCount: 3 is a starting point. Wire up Microsoft.Insights/autoscalesettings against the plan and let it scale 3 → 10 on the prime-counting tool. The architecture already supports it — that's the whole point of stateless. Try it Sample repo: github.com/seligj95/app-service-mcp-stateless-scale-python MCP spec: modelcontextprotocol.io/specification/2025-11-25 App Service docs: learn.microsoft.com/azure/app-service/overview If you ship something with it, I'd love to hear how it held up.171Views0likes0Comments[Retired] Strapi on App Service: Quick start
In this quick start guide, you will learn how to create and deploy your first Strapi site on Azure App Service Linux, using Azure Database for MySQL or PostgreSQL, along with other necessary Azure resources. This guide utilizes an ARM template to install the required resources for hosting your Strapi application.3.7KViews1like4Comments[Retired] Strapi on App Service: Overview
Looking to self-host Strapi? Deploy Strapi on Azure App Service to gain greater customization control, global region availability, and seamless integration with other Azure services. Hosting Strapi on Azure App Service simplifies infrastructure management while ensuring high availability, security, and performance. Whether you're searching for Strapi hosting, Strapi deployment, or where to host Strapi, Azure App Service provides the ideal solution for deploying Strapi efficiently and securely.1.3KViews2likes0Comments[Retired] Strapi on App Service: FAQ
This comprehensive guide is designed to answer all your questions about hosting and deploying Strapi on Azure App Service. Whether you're looking to understand the best platforms for Strapi integration, how to run Strapi in different modes, or how to deploy Strapi on various hosting services, this FAQ has you covered.1KViews0likes0CommentsDebugging Python apps on App Service with the new SSH helper aliases
You shipped a Python app to App Service. It worked in the demo. It works locally. In production, /chat is returning 502s — but /health is green, the deployment succeeded, the logs are quiet, and your laptop can't reproduce it. What you actually need is a shell on the running container so you can poke at DNS, env vars, installed packages, the listening port, and the AI endpoint your app is calling. The platform has had SSH for a while, but the playbook of "open SSH, then remember which 14 commands to run" was tribal knowledge. We just shipped a set of SSH helper aliases that turn that tribal knowledge into one-word commands. apphelp shows you everything; appconfig , showpkgs , and appcurl cover the app side; ai-test , ai-diagnose , ai-curl , ai-latency , ai-dns , and ai-access-check cover the Azure AI Foundry side. This post is a hands-on tour. We built a deliberately fragile FastAPI sample with six different fault modes, deployed it, broke it, and SSH'd in to watch the aliases drive each one to root cause. Every transcript below is real output from the deployed sample. 📦 Sample repo: seligj95/app-service-ssh-diagnostics-python — azd up and you have a fault-injectable Python + Foundry app live in your subscription in about 4 minutes. The sample, in one breath FastAPI app, Python 3.14, App Service Linux on P0v3 — uses the new Oryx FastAPI auto-detection so no custom startup command is needed Calls Azure OpenAI (gpt-4o-mini) via managed identity — no keys POST /admin/fault toggles one of seven modes: off , bad-creds , wrong-endpoint , dns-fail , port-mismatch , dep-import-error , latency-spike GET / is a landing page with a built-in cheat sheet of the SSH aliases The endpoints are intentionally boring. The point is to give the aliases something realistic to chew on. A quick note on Azure OpenAI vs. AI Foundry. This sample provisions an Azure OpenAI account ( kind: OpenAI ). The new ai-* aliases speak the OpenAI chat-completions API ( /openai/deployments/<model>/chat/completions ), which is identical on Azure OpenAI and on Azure AI Foundry projects — both expose *.openai.azure.com endpoints, both accept managed-identity bearer tokens, both speak the same schema. The aliases work against either; the env-var name AZURE_AI_FOUNDRY_ENDPOINT is just the alias contract. Drop a Foundry endpoint into it and the same walkthrough applies. Shout-out to the new FastAPI auto-detect on Python 3.14. This sample also benefits from another recent App Service change: on Python 3.14+, App Service automatically detects FastAPI apps and starts them with gunicorn -k uvicorn_worker.UvicornWorker — no custom startup command needed. Our Bicep ships an empty appCommandLine and lets Oryx do the right thing. The whole sample is a nice tour of recent App Service Python improvements landing together. Step zero: apphelp After azd up finishes, the first thing to do over SSH is: az webapp ssh -g rg-ssh-diag-demo -n app-web-<token> Then inside the container: $ apphelp apphelp prints every alias the image ships with, grouped by category. You don't need to memorize anything — when you forget what checkport does, you run apphelp and it's right there. We'll lean on most of these: App info: showpkgs , appconfig , appenv Logs: applogs , deploylogs , logfiles Reachability: appcurl , checkport , gohome , gosrc AI/Foundry: ai-test , ai-dns , ai-access-check , ai-curl , ai-latency , ai-diagnose Network tools: install-nettools The healthy baseline Before breaking anything, run ai-diagnose . This is the one-shot "is my AI path healthy?" check, and it's the alias we reach for most: $ ai-diagnose ──────────────────────────────────────────────────────────────── AI Foundry Diagnostics ──────────────────────────────────────────────────────────────── [✓] Managed identity token [✓] DNS resolution (d8f9grasb7ewc7h8.ai-gateway.eastus2-01.azure-api.net. - public) [✓] Foundry connectivity (761ms) ──────────────────────────────────────────────────────────────── Three green checks tell you three different things: the managed identity is issuing tokens, the Foundry hostname resolves, and the endpoint responded in a reasonable time. If any of these are red, you already know which layer the fault is in. For more detail, the individual aliases are worth knowing: $ ai-test ✓ Connected | 1009ms | Model: gpt-4o-mini | Auth: Managed Identity $ ai-access-check ✓ Foundry endpoint: https://cog-ftirxupt2yjoe.openai.azure.com/ ✓ Model: gpt-4o-mini ✓ Using auth mode: Managed Identity ✓ Access check passed: authorized to call Foundry $ ai-latency Running 5 requests to gpt-4o-mini... Request 1: 679ms ✓ Request 2: 826ms ✓ Request 3: 758ms ✓ Request 4: 641ms ✓ Request 5: 664ms ✓ Results (5/5 successful): Avg: 713ms | Min: 641ms | Max: 826ms And the app side: $ checkport ✓ App is listening on port 8000 $ appcurl /health HTTP Status: 200 Time: 0.002417s Size: 5423 bytes That's our "everything is fine" reference. Now let's break things. One trick: applying a fault inside the SSH shell A subtle thing trips people up the first time. POST /admin/fault mutates the app process's environment — but your SSH shell is a separate process. It inherited the container's env when you opened the session, so ai-test will still see the healthy values. The sample handles this by also writing a small file to the persistent share: # app/faults.py def _write_env_file() -> None: """Write fault env to /home/site/diagnostics/fault.env so SSH can `source` it.""" diag = Path("/home/site/diagnostics") diag.mkdir(parents=True, exist_ok=True) snap = _snapshot_unlocked() lines = [f"# Active fault: {snap['mode']}", ""] for k, v in snap["env"].items(): lines.append(f"export {k}={shlex.quote(v) if v else "''"}") (diag / "fault.env").write_text("\n".join(lines) + "\n") After toggling a fault, run this once in your SSH session: source /home/site/diagnostics/fault.env Now the aliases see the same env the broken app sees. This pattern — flip a flag from outside, source the change inside — is worth stealing for your own debugging workflows. Group A: faults the AI aliases catch directly Some faults are in the path between App Service and Foundry — wrong endpoint, broken DNS, network. The ai-* aliases reproduce the failure end-to-end, and they tell you exactly which layer. Fault 1: wrong-endpoint — a typo in the AOAI endpoint The most common AI-side incident: someone fat-fingers an app setting. The endpoint resolves to something (it's still *.openai.azure.com ) but it's not your resource. curl -X POST $URL/admin/fault -H 'content-type: application/json' \ -d '{"mode":"wrong-endpoint"}' curl $URL/chat -H 'content-type: application/json' \ -d '{"prompt":"hi"}' # HTTP 502 # {"detail":"APIConnectionError: Connection error."} SSH in, source the fault env, run the AI aliases: $ source /home/site/diagnostics/fault.env $ ai-dns Resolving: this-resource-does-not-exist.openai.azure.com ✗ DNS resolution failed for this-resource-does-not-exist.openai.azure.com $ ai-curl Request: POST https://this-resource-does-not-exist.openai.azure.com//openai/deployments/gpt-4o-mini/chat/completions?api-version=2024-02-01 Authorization: Bearer [hidden] Content-Type: application/json curl: (6) Could not resolve host: this-resource-does-not-exist.openai.azure.com $ ai-diagnose [✓] Managed identity token [✗] DNS resolution failed for this-resource-does-not-exist.openai.azure.com [✗] Foundry connectivity (HTTP 000) ai-diagnose collapses the whole story into three lines: token works, DNS fails, connectivity fails. The fault is unambiguously a bad endpoint — check appconfig and your Bicep parameters. Fault 2: dns-fail — NXDOMAIN A subtler variant of the same failure mode is when the endpoint is structurally wrong (private endpoint misconfigured, hosts file mishap, custom domain expired). ai-dns calls it out the same way: $ ai-dns Resolving: no-such-host.invalid.example ✗ DNS resolution failed for no-such-host.invalid.example If you need deeper diagnostics — say, you suspect a flaky resolver rather than the hostname itself — install-nettools gives you dig , nslookup , and friends without rebuilding the container. $ install-nettools $ dig openai.azure.com $ nslookup cog-ftirxupt2yjoe.openai.azure.com Group B: faults that pass ai-test but break your app Here's the most useful thing we learned building this sample: ai-test can be green while your app is on fire, and that's a signal, not a bug. The ai-* aliases call Foundry directly. If they're green and your app is red, the platform-to-Foundry path is fine — the divergence is in your app. Time to pivot to appenv , applogs , showpkgs . Fault 3: bad-creds — wrong AZURE_CLIENT_ID This one is the classic user-assigned managed identity mishap: you scoped your code to a user-assigned managed identity, but the GUID in AZURE_CLIENT_ID doesn't actually exist (or wasn't granted RBAC). curl -X POST $URL/admin/fault -d '{"mode":"bad-creds"}' curl $URL/chat -d '{"prompt":"hi"}' # HTTP 502 # {"detail":"ClientAuthenticationError: DefaultAzureCredential failed to retrieve a token..."} Now SSH in and try the AI aliases: $ source /home/site/diagnostics/fault.env $ ai-test ✓ Connected | 734ms | Model: gpt-4o-mini | Auth: Managed Identity $ ai-access-check ✓ Foundry endpoint: https://cog-ftirxupt2yjoe.openai.azure.com/ ✓ Using auth mode: Managed Identity ✓ Access check passed: authorized to call Foundry Both green. That looks like a contradiction, but it's not. The aliases authenticate using the system-assigned managed identity directly (via IMDS), and they pass. Your Python app uses DefaultAzureCredential , which honors AZURE_CLIENT_ID to pick a user-assigned identity — and that one is broken. The takeaway: when ai-test is green but /chat is red, the platform's identity is fine. Pivot to appenv to see exactly what env your app process sees, and check AZURE_CLIENT_ID : $ appenv | grep AZURE_CLIENT_ID AZURE_CLIENT_ID=00000000-0000-0000-0000-000000000000 There's the bug. The aliases didn't fail — they told you the fault isn't in the platform. That's diagnosis by elimination, and it's faster than guessing. Fault 4: dep-import-error — your code throws Same pattern. The app raises an ImportError on /chat , the AI aliases are green: curl -X POST $URL/admin/fault -d '{"mode":"dep-import-error"}' curl $URL/chat -d '{"prompt":"hi"}' # HTTP 500 # {"detail":"ImportError: No module named 'tiktoken'..."} This is where the app-side aliases earn their keep: $ showpkgs | head -20 ────────────────────────────────────────────────────── Virtual environment packages (antenv) ────────────────────────────────────────────────────── Package Version -------------------------------------- --------- annotated-types 0.7.0 anyio 4.13.0 azure-core 1.41.0 azure-identity 1.19.0 azure-monitor-opentelemetry 1.8.8 ... No tiktoken in that list. Confirmation in one command — no need to remember pip list or where the virtualenv lives. deploylogs then tells you what the last deployment actually built: $ deploylogs 10 Latest deployment: b8a64ed4-b6b7-4419-91eb-6d8e4e7ef323 Log file: /home/site/deployments/b8a64ed4-b6b7-4419-91eb-6d8e4e7ef323/log.log 2026-05-18T19:10:52.3844297Z,Parsing the build logs,abc3cf97-... 2026-05-18T19:10:52.5414396Z,Found 0 issue(s),7d11d013-... 2026-05-18T19:10:52.7913394Z,Build Summary :,... 2026-05-18T19:10:53.5643089Z,Deployment successful. deployer = Push-Deployer ... Build was clean. The package just isn't in requirements.txt . Two aliases, one minute, root cause. Fault 5: port-mismatch — uvicorn binds the wrong port A real-world bug: someone sets WEBSITES_PORT=9999 in app settings to expose a different port, but the app still binds to 8000. curl -X POST $URL/admin/fault -d '{"mode":"port-mismatch"}' The aliases tell you exactly which port everything sees: $ checkport Checking if app is listening on port 8000... ✓ App is listening on port 8000 $ appcurl /health Testing app at localhost:8000 ... HTTP Status: 200 Time: 0.002417s $ appconfig PORT Value: 8000 Note: The port your Python app should listen on. Default is 8000. The app is healthy from inside the container. The mismatch is between what the platform tries to forward to and what uvicorn is bound to. This is the kind of fault where curling the public URL fails but appcurl /health succeeds — and the contrast is itself the diagnosis. Fault 6: latency-spike — the alias bench is fast, your app is slow The app injects 4 seconds of asyncio.sleep before each Foundry call. /chat is now ~4.5 seconds. ai-latency : $ ai-latency Running 5 requests to gpt-4o-mini... Request 1: 715ms ✓ Request 2: 588ms ✓ Request 3: 578ms ✓ Request 4: 669ms ✓ Request 5: 643ms ✓ Results (5/5 successful): Avg: 638ms | Min: 578ms | Max: 715ms Foundry, from this instance, averages 638ms. If your app is taking 5 seconds end-to-end and ai-latency says the model is sub-second, the slowness is in your code — not in Foundry, not in the network. Time to look at App Insights end-to-end transactions, or at any pre-call work (retrieval, vector lookup, your own sleep). What this changes about the debugging workflow Before these aliases, the SSH playbook for a Python AI app went something like: open SSH, dig around /home/site/wwwroot/antenv , grep applicationHost.config for ports, write a curl by hand against the AOAI endpoint with a manually-fetched managed identity token, hope you got the API version right. Now it's ai-diagnose . If that's red, you know exactly which layer. If it's green, you know the fault is in your code or your settings, and appenv , appconfig , showpkgs , applogs walk you the rest of the way. Three patterns we'd lean on going forward: Start with apphelp and ai-diagnose every time. Don't try to remember the right command — let the aliases tell you. Treat ai-test being green as a signal, not a finish line. If /chat is red and ai-test is green, the platform path is fine; pivot to app-side aliases. Use source /home/site/diagnostics/fault.env as a pattern. Any time you want your SSH shell to see what the app process sees, write env to a file and source it. It's a small thing that removes a huge class of "but it worked when I tested it" confusions. We want feedback The aliases are GA today on Python images and we have ideas for where they go next — Node, .NET, more ai-* checks (Foundry agents, vector indexes), tighter integration with azd diagnose . If you have a Python app on App Service and you want a specific alias added, tell us by dropping a comment on this post. Try the sample git clone https://github.com/seligj95/app-service-ssh-diagnostics-python cd app-service-ssh-diagnostics-python azd auth login azd up Four minutes later you'll have the whole thing live. Then curl -X POST $URL/admin/fault -d '{"mode":"<pick one>"}' , SSH in, and walk through any of the six faults above. The README has the full alias-to-fault map.138Views0likes0Comments