azure app service
505 TopicsDebug App Startup Faster on Azure App Service for Linux with Startup Logs
When an app fails to start on Azure App Service for Linux, one of the first things you need is visibility into what happened during startup. This can include container initialization, runtime setup, startup command execution, application output, and warmup probe results. To make this easier, we have added new Azure CLI commands that let you list and view App Service startup logs directly from the command line. List available startup logs You can list startup logs for an app using: az webapp log startup list \ --name <app-name> \ --resource-group <resource-group> The output shows whether the startup attempt succeeded or failed, along with the instance name and log file size. This helps you quickly identify the right log file, especially when there are multiple startup attempts across different instances. Show startup log content To view the latest startup log, run: az webapp log startup show \ --name <app-name> \ --resource-group <resource-group> You can also view a specific log file by name: az webapp log startup show \ --name <app-name> \ --resource-group <resource-group> \ --log-file-name <log-file-name> The log content includes startup events from the platform and the application. For example, you can see the container image being pulled, the startup script being generated, the app command being run, and the warmup probe result. In a successful startup, the log shows that the site startup probe succeeded and the site started successfully. Failure logs are prioritized by default When you run az webapp log startup show without specifying a log file name, the command automatically prefers failure logs from the most recent date. This helps reduce the time spent looking for the right log when debugging startup failures. Instead of manually searching through multiple files, you can run one command and immediately see the most relevant failure details. For example, if the app fails because the worker process does not start within the allotted time, the log shows the timeout details and the platform actions taken during startup cancellation. Better hints for common startup failures The command also includes improved handling for common failure scenarios, including runtime startup failures and container startup timeouts. For example, if the app starts but does not respond on the expected port, the startup log may show application output such as: listening on 3000 (wrong port) while the platform is expecting the app to respond on a different port. This makes it much easier to understand why the warmup probe failed. Slot support The startup log commands also support deployment slots. To list startup logs for a slot: az webapp log startup list \ --name <app-name> \ --resource-group <resource-group> \ --slot <slot-name> To show startup logs for a slot: az webapp log startup show \ --name <app-name> \ --resource-group <resource-group> \ --slot <slot-name> This is useful when debugging slot-specific startup issues before swapping traffic to production. Summary The new az webapp log startup commands make it easier to inspect startup behavior for Azure App Service for Linux apps directly from Azure CLI. These commands are currently in preview. Try them out the next time you need to understand why your App Service Linux app did or did not start successfully.68Views0likes0CommentsWhat's new in Azure App Service at #MSBuild 2026
At Microsoft Build 2026, Azure App Service introduced a powerful set of updates designed to help organizations accelerate their journey into AI, without increasing complexity or cost. These innovations focus on one clear business outcome: enabling teams to build, deploy, and scale AI-powered applications and agents faster, more securely, and with greater operational efficiency. A key highlight is the new Easy AI experience, which allows existing web apps to become AI-ready with no rearchitecting required. With capabilities like built-in Model Context Protocol (MCP), developers can instantly expose app functionality as agent-ready endpoints, enabling AI agents to interact with business logic securely and seamlessly. This dramatically reduces development time, allowing teams to move from idea to intelligent application in a fraction of the usual effort. Security and compliance are also strengthened with the general availability of Isolated v4 for Azure App Service Environments, delivering improved performance for customers that need single-tenant isolation and strong data residency guarantees. For enterprises operating in regulated industries, this ensures AI applications meet strict governance requirements without sacrificing scalability or speed. For modernization scenarios, Managed Instance on Azure App Service simplifies the migration of legacy applications, including those with OS-level dependencies. Faster restarts, enhanced diagnostics, and AI-assisted migration workflows help organizations modernize existing systems cost-effectivelyβavoiding expensive rewrites while unlocking AI capabilities. Recent updates include an AI-assisted approach to migrating legacy IIS applications using a multi-agent workflow powered by MCP. Managed Instance is supported on both Premium v4 and Isolated v4, laying the foundation for a modern compute infrastructure across the board. Operational efficiency is further enhanced through platform and CLI improvements designed for the βagent era.β From structured deployment diagnostics to optimized Python pipelines delivering faster deployments, these updates reduce friction and infrastructure overhead, lowering total cost of ownership. Together, these innovations position Azure App Service as a future-ready platform where businesses can rapidly build intelligent, agent-driven applications securely, efficiently, and at scale. π Learn more in the full announcement: Deep dive into Azure App Service Build 2026 updates919Views0likes0CommentsDesigning for High Availability: The Operational Reference for Running a Geo-Replicated ACR
By Johnson Shi, Zoey (Zhuyu) Li, Huangli Wu Introduction Three of the most common questions we hear from enterprise teams running geo-replicated Azure Container Registries (ACR) are: "How do I control which region serves my traffic?" β When my AKS clusters are spread across regions, can I pin each one to its co-located replica, or am I stuck with however the global endpoint routes? "What happens during a regional incident β is failover automatic or do I have to act?" β If the registry in one region degrades, does the global endpoint reroute on its own, or do I need to manually disable the affected replica? "What happens after the region recovers β does traffic return on its own?" β Is there a cooldown, a quarantine, or any manual step before failback? We answer those head-on, then go deeper on the operational details that come up when you actually run a geo-replicated registry: authentication across endpoint switches, throttling under load concentration, eventual-consistency failure modes, home region outage scope, webhooks, and private endpoint interaction. We draw on the official geo-replication docs, the global endpoint health-aware failover blog, the regional endpoints engineering design implementation, the regional endpoints public preview and private preview announcements, and the ACR reference for various registry endpoints, . This post also draws notes from the ACR product team on roadmap items that aren't yet documented elsewhere. Key Takeaways Health-aware failover is automatic. When the registry in a region degrades, the global endpoint reroutes away from it on the order of minutes, evaluated per-registry. No customer action required. Failback is automatic too. Once health-aware failover marks a region healthy again, the global endpoint resumes routing to it. There is no cooldown period. Health-aware failover applies only to global endpoint operations. It does not apply to regional endpoints (you're talking to one replica, period) or to dedicated data endpoints (the redirect is per-region). Health-aware failover is not triggered by throttling. It responds to regional ACR service health and Azure infrastructure health, not HTTP 429 responses. Use regional endpoints to manage per-replica throttling. Regional endpoints (Step 2a) give you explicit per-region URLs for workloads that need affinity, capacity planning, push/pull consistency, troubleshooting, or client-side failover. Use myregistry.<region>.geo.azurecr.io . Regional endpoints are available on Premium SKU registries. For workloads that don't need pinning, do nothing (Step 2b). The global endpoint plus health-aware failover handles routing automatically. Re-authenticate when switching endpoints. Each global or regional endpoint is its own authenticated surface; re-auth via az acr login , SDK auth, or the Kubernetes ACR credential provider on endpoint change. Don't run a long-lived DNS cache for the global endpoint. ACR purges DNS server-side on disable and during failover; a long-lived client cache works against that. For production workloads, enable dedicated data endpoints for security and DNS predictability on layer downloads. ACR is working on bounded staleness consistency for cross-replica eventual-consistency failure modes; see the FAQ. Background What is ACR geo-replication? Geo-replication is a Premium SKU feature that turns a single ACR registry into a multi-region, multi-write service. Every geo-replica in every region is writable β you can push, pull, and delete from any of them β and content syncs asynchronously between replicas under an eventual consistency model. Per-push replication time scales with the size and number of images being pushed. Similarly, when creating a new geo-replica, the time to populate the new geo-replica scales with the total size of the registry. A geo-replicated registry exposes a global endpoint at myregistry.azurecr.io . Behind that endpoint, ACR uses an internal traffic manager to direct each request to the replica with the best network performance profile for the caller β usually the closest replica, but not always. When clients are equidistant from multiple replicas, or when the closest replica is experiencing Azure infrastructure degradation, requests may be routed elsewhere. A geo-replicated registry also exposes a regional endpoint at myregistry.<region>.geo.azurecr.io , which allows clients to pin API requests to a specific geo-replica in lieu of global endpoints, which has Azure-managed routing among geo-replicas. Zone redundancy is always enabled for geo-replicas in regions where Azure has multiple availability zones β in those regions, ACR automatically spreads replica data across multiple availability zones within each region to protect against zonal outages. Endpoints and data endpoints: what goes where A common point of confusion: when you push or pull, not every request goes to the same place. The registry endpoints (global endpoint and regional endpoints), as well as the data endpoint, do different jobs. Your choice of data endpoint configuration has real consequences for security and resilience. Two kinds of traffic flow during a typical pull: Registry API traffic β authentication, manifest reads/writes, tag resolution, referrers, repository operations, blob location lookups, listing, metadata. This is everything except the actual layer (blob) bytes. All these API requests go to the global endpoint ( myregistry.azurecr.io ) or, if you've pinned your clients to call these APIs to a specific geo-replica, a geo-replica's regional endpoint ( myregistry.<region>.geo.azurecr.io ). Behind the scenes, the global endpoint internally proxies these requests to a specific geo-replica. Layer (blob) downloads β when the client asks for a blob, the registry doesn't serve the bytes itself. It returns an HTTP 307 redirect to a regional data endpoint (separate endpoint from the global endpoint or regional endpoints), and the client follows the redirect to download the layer from that region. Where that 307 sends you depends on whether you've enabled the registry's dedicated data endpoints feature: Configuration Layer downloads redirect to Default (no dedicated data endpoints) *.blob.core.windows.net (the underlying Azure storage account) Dedicated data endpoints enabled myregistry.<region>.data.azurecr.io for the region you were routed to Private endpoints enabled myregistry.<region>.data.azurecr.io for the region you were routed to Regional by design. Dedicated data endpoints always land you on a specific geo-replica's data endpoint β there is no "global data endpoint." With the global endpoint as your registry endpoint, the 307 redirect picks the data endpoint for whichever region the global endpoint chose to serve you. With a regional endpoint pinned to a specific region, the 307 always redirects you to that same region's data endpoint β never cross-region. Why dedicated data endpoints matter. Dedicated data endpoints are a Premium SKU feature that exists primarily to address security and firewall scoping. By default, layer downloads redirect to *.blob.core.windows.net β a wildcard storage FQDN. Firewall rules to allow that wildcard either let all Azure storage accounts through or none of them, which raises data exfiltration concerns and isn't tightly scoped to your registry. Dedicated data endpoints replace the wildcard with a fully qualified domain in your registry's own domain β myregistry.<region>.data.azurecr.io β so firewall rules can be scoped tightly to your specific registry, in your specific regions. That same design choice can also make layer downloads more predictable during routing changes. With dedicated data endpoints, the data endpoint FQDN is known ahead of time and lives in the registry's domain β one predictable hostname per region, configured once. Without them, the layer download has to resolve a wildcard storage FQDN that points to whichever storage account the registry happens to have provisioned, which is a separate DNS resolution path with its own routing behavior and its own caching profile. Dedicated data endpoints simplify the DNS picture by aligning the data path with the registry path and keeping the entire pull experience inside one set of predictable, scoped FQDNs. For any geo-replicated registry where security and high availability matter, enable dedicated data endpoints. Note: Health-aware failover applies only to operations against the global endpoint, not to regional endpoints or dedicated data endpoints. Take note that health-aware failover only kicks in and directs traffic away from a geo-replica when an Azure region is experiencing significant infrastructure degradation. At this stage, it does not kick in to redirect traffic to another geo-replica if a client's data plane API requests are throttled. See the relevant section below for the full scope when health-aware auto failover kicks in or not. The three traffic control tools ACR geo-replication gives you three complementary tools for controlling where traffic lands. Each one solves a different class of problem, and customers most often run into trouble when they reach for the wrong one. We name them up front and use these names throughout the post: Tool Who controls it What it does Use cases Health-aware failover Platform (automatic) Reroutes the global endpoint away from a region whose registry can't reliably serve requests Regional incidents, automatic recovery Replica enable/disable for global routing Customer (manual) Excludes a specific replica from global endpoint routing without deleting it; data continues syncing DR rehearsals, planned maintenance, quarantining a replica without losing it Regional endpoints Customer (per request) Dedicated per-region URLs ( myregistry.<region>.geo.azurecr.io ) that bypass the internal traffic manager entirely Pinning AKS clusters to co-located replicas, push/pull consistency, capacity planning, troubleshooting, client-side failover Health-aware failover and replica enable/disable both act on the global endpoint. Regional endpoints are a separate URL surface that coexists with the global endpoint β enabling them does not disable the global endpoint myregistry.azurecr.io . You can use both simultaneously and choose per workload. The behavior in question When the registry in one region experiences a real degradation, there are three possible answers to "what happens?": (A) Nothing automatic. The customer must manually disable the affected region's endpoint to stop traffic from being routed there. (B) The system detects the regional front-door failure and reroutes within seconds. (C) A per-registry health evaluation detects the degradation and reroutes the global endpoint within minutes, with no customer action. After the region recovers, routing resumes automatically. The answer today is (C). Before health-aware failover, customers were stuck closer to (A) β the system could see whether the regional reverse proxy responded, but not whether the registry could actually serve real pull and push traffic end to end. Health-aware failover closes that gap. We walk through all three tools in the next section, in order: setting up geo-replication, using regional endpoints to pin specific workloads, keeping the global endpoint for everything else, the manual replica disable mechanism, re-enabling participation in global routing, and what to expect when health-aware failover triggers. Walkthrough The following steps assume an existing Premium SKU registry and the Azure CLI logged in. We use myregistry as the registry name, myrg as the resource group, and eastus as the home region. Substitute <your-registry> , <your-rg> , and <your-region> for your environment. Prerequisites A Premium SKU ACR registry (geo-replication requires Premium) Azure CLI ( az ) installed and logged in For regional endpoints (Step 2a): Azure CLI 2.86.0 or later. All regional endpoints commands ( --regional-endpoints , az acr show-endpoints , az acr login --endpoint ) are available natively in Azure CLI 2.86.0+. If you previously installed the acrregionalendpoint private preview CLI extension, uninstall it with az extension remove --name acrregionalendpoint to prevent conflicts with the built-in CLI commands. Step 1: Add a West US replica to a registry that lives in East US Geo-replication requires the Premium SKU. The create call below fails on Basic or Standard. # Confirm the registry is Premium az acr show --name myregistry --resource-group myrg \ --query sku.name --output tsv # Premium # Create a West US geo-replica az acr replication create --registry myregistry --location westus # Confirm both replicas are present az acr replication list --registry myregistry --output table NAME LOCATION PROVISIONING STATE STATUS REGION ENDPOINT ENABLED ------ ---------- -------------------- -------- ----------------------- eastus eastus Succeeded online True westus westus Succeeded online True Pushes and pulls continue working through the existing replica throughout initial sync. Because the registry is multi-region, multi-write, the existing replica keeps serving traffic while the new replica catches up in the background. Initial replica seeding time is a function of registry size β the total number and cumulative size of images already in the registry that need to be replicated to the new replica β not the size of any single image. Step 2a: Pin workloads to specific regions using regional endpoints Use regional endpoints when a workload needs explicit per-region control. The five common cases: Regional affinity β an AKS cluster in East US should pull from the East US replica, every time, without ever hopping to a more distant replica because of a network performance fluctuation. Predictable routing β workloads that need to know exactly which replica will serve them, for benchmarking, capacity planning, or in-region traffic SLAs. Push/pull consistency β pinning both ends of a publish-then-deploy flow to the same replica eliminates eventual-consistency races. Troubleshooting β reproducing an issue on a specific replica requires sending traffic to that specific replica. Client-side failover β customers with their own health checks and business rules want to implement failover on their own terms, on signals only they can see. Enable regional endpoints on the registry: az acr update -n myregistry -g myrg --regional-endpoints enabled When enabled, ACR automatically creates per-region login server URLs for every existing geo-replica. No per-region configuration is needed. Note: Regional endpoints can be enabled on any Premium SKU registry, even without geo-replication. A registry without geo-replication has a single geo-replica in the home region, which gets one regional endpoint URL. However, the feature is most useful when your registry has at least two geo-replicas, where you can pin different workloads to different replicas for routing control and capacity distribution. Push to a specific region using its regional endpoint: # Log in to the West US regional endpoint az acr login --name myregistry --endpoint westus # Tag and push using the regional endpoint URL docker tag myapp:v1 myregistry.westus.geo.azurecr.io/myapp:v1 docker push myregistry.westus.geo.azurecr.io/myapp:v1 Pin AKS deployments to their co-located replica by using regional endpoint URLs in the deployment manifest. The example below shows two clusters in different regions; each cluster references the regional endpoint for its own region's replica (assuming replicas exist in both eastus and westeurope ): # East US-based AKS cluster pulls from the East US replica apiVersion: apps/v1 kind: Deployment metadata: name: myapp-eastus spec: template: spec: containers: - name: myapp image: myregistry.eastus.geo.azurecr.io/myapp:v1 --- # West Europe-based AKS cluster pulls from the West Europe replica apiVersion: apps/v1 kind: Deployment metadata: name: myapp-westeurope spec: template: spec: containers: - name: myapp image: myregistry.westeurope.geo.azurecr.io/myapp:v1 This eliminates cross-region pulls when global routing would otherwise prefer a different replica for a given client, and it gives you a per-region traffic profile you can plan capacity against. Regional endpoint operational tips View all endpoints. Use az acr show-endpoints to see all endpoint URLs for your registry β global, regional (if enabled), and dedicated data endpoints (if enabled): az acr show-endpoints --name myregistry --resource-group myrg Import from a specific geo-replica. When importing images between registries, you can use a regional endpoint to import from a specific geo-replica of the source registry. This is useful when you want predictable network paths or need to import from a replica in a specific region: az acr import \ --name mydownstreamregistry \ --source myupstreamregistry.westeurope.geo.azurecr.io/myapp:v1 \ --image myapp:v1 Firewall rules for regional endpoints. If you use firewall rules, allow access to the following endpoints for each geo-replica that clients connect to: Endpoint Purpose myregistry.<region>.geo.azurecr.io Regional endpoint for registry operations myregistry.azurecr.io Global endpoint (if also used) myregistry.<region>.data.azurecr.io Layer downloads (if using private endpoints or dedicated data endpoints) *.blob.core.windows.net Layer downloads (if not using private endpoints or dedicated data endpoints) For the full list of endpoint types and FQDN patterns, see the ACR reference for various registry endpoints. DNS-based routing without changing manifests. If you don't want to maintain different deployment manifests per region, you can keep all manifests pointing to the global endpoint ( myregistry.azurecr.io ) and use software-defined networking or a regional traffic manager to resolve the global endpoint to the appropriate regional endpoint based on the originating region's traffic. This achieves the same co-location goals as regional endpoints β predictable routing and reduced latency β without embedding region-specific URLs in your deployment manifests. Step 2b: Keep using the global endpoint for everything else For workloads that don't need explicit pinning, do nothing. The global endpoint at myregistry.azurecr.io continues to work exactly as before, and the global endpoint plus health-aware failover gives you intelligent routing across replicas without configuration. ACR picks the best replica for each client based on network performance and reroutes during regional incidents. Regional endpoints coexist with the global endpoint β enabling them does not disable myregistry.azurecr.io . You can use both simultaneously and choose per workload, mixing pinned workloads (Step 2a) with workloads that ride the global endpoint (Step 2b) in the same registry. Step 3: Take a replica out of global endpoint routing Use this when you need to keep a replica alive but stop it from serving global-endpoint traffic β for DR rehearsals, planned maintenance, or troubleshooting an isolated replica. # Exclude the West US replica from global endpoint routing az acr replication update --registry myregistry --name westus \ --global-endpoint-routing false Confirm the change: az acr replication list --registry myregistry --output table NAME LOCATION PROVISIONING STATE STATUS REGION ENDPOINT ENABLED ------ ---------- -------------------- -------- ----------------------- eastus eastus Succeeded online True westus westus Succeeded online False Requests to myregistry.azurecr.io no longer route to West US. The replica still receives replicated content β and continues to replicate its own content out to other replicas β and storage quota and per-replica costs continue to accrue. If regional endpoints are enabled, the West US regional endpoint URL also continues to work; --global-endpoint-routing controls only the replica's participation in global endpoint routing. A note on naming. The CLI flag --global-endpoint-routing (on az acr replication update ) and the regional endpoints feature (enabled via az acr update --regional-endpoints enabled ) are two different things despite the similar names. --global-endpoint-routing controls whether a replica participates in global endpoint routing. The regional endpoints feature creates per-region URLs ( myregistry.<region>.geo.azurecr.io ) that bypass the global endpoint entirely. They are independent controls. In Azure CLI 2.86.0 and later, the old --region-endpoint-enabled flag has been renamed to --global-endpoint-routing . The old flag name is deprecated and will be removed in Azure CLI 2.87.0 (June 2026). If you have existing scripts or automation that use --region-endpoint-enabled , update them to use --global-endpoint-routing . CLI flags quick reference: Flag Scope Purpose --regional-endpoints Registry-level ( az acr create or az acr update ) Enables dedicated regional endpoint URLs ( myregistry.<region>.geo.azurecr.io ) for all geo-replicas. --global-endpoint-routing Per-geo-replica ( az acr replication create or az acr replication update ) Controls whether the global endpoint routes traffic to a specific geo-replica. Set to false to temporarily exclude a geo-replica from global routing. --data-endpoint-enabled Registry-level ( az acr create or az acr update ) Enables dedicated data endpoints ( myregistry.<region>.data.azurecr.io ) for layer blob downloads. Auto-enabled when at least one private endpoint is configured. This bidirectional sync during disable is intentional. When you re-enable the replica, every image pushed to the registry while the replica was disabled β from any region β is already present, so the replica can serve traffic immediately with no catch-up window. If we stopped syncing on disable, re-enabling would leave the replica with stale data and force a long catch-up before it could safely serve pulls. Step 4: Re-enable the replica to participate in global endpoint routing Re-enable the replica: az acr replication update --registry myregistry --name westus \ --global-endpoint-routing true NAME LOCATION PROVISIONING STATE STATUS REGION ENDPOINT ENABLED ------ ---------- -------------------- -------- ----------------------- eastus eastus Succeeded online True westus westus Succeeded online True There is no cooldown. The global endpoint resumes routing requests to the West US replica as soon as the change takes effect on ACR's side. Because data continued syncing while the replica was disabled (Step 3), the replica is immediately ready to serve pulls β no catch-up window. Note on DNS during disable/enable. When you take a replica out of global routing, ACR purges its own DNS records for that replica from the global endpoint on a fast path β there is no waiting on a published TTL on ACR's side. If clients run their own DNS cache for the global endpoint, however, those clients will keep resolving to the disabled replica until the client cache expires. We can't control client-side caches. The recommendation: do not run a long-lived DNS cache for the global endpoint. A short-lived DNS pin for the duration of a single push (covered in the DNS and Client-Side Considerations section) is fine and even helpful β but a long-lived DNS cache will make --global-endpoint-routing false look broken from the client's perspective. Step 5: What to expect when health-aware failover triggers Health-aware failover is automatic. ACR evaluates registry health on a per-registry basis, and when a registry in a region can't reliably serve requests, the global endpoint reroutes that registry's traffic to a healthy replica. There is no customer-invocable trigger β that's the point. End-to-end timing is on the order of minutes β fast enough to catch real regional degradation, slow enough to ride out transient errors that resolve on their own. DNS TTL may add additional propagation delay before all clients switch to the new region. Scope of health-aware failover. Health-aware failover applies only to operations against the global endpoint β the registry API calls (auth, get manifest, get tag, get referrers, get blob location). It evaluates health when those API calls come in; it does not trigger mid-operation. Two important consequences: Regional endpoints are not in scope. When you talk to a regional endpoint like myregistry.westus.geo.azurecr.io , you're talking to that one replica. There is no automatic reroute. If you've pinned a workload to a regional endpoint and that region degrades, you implement client-side failover by switching the workload to a different regional endpoint. Dedicated data endpoints are not in scope. Once a registry endpoint has redirected you to a dedicated data endpoint, you stay on that region's data endpoint for the duration of the layer download. There is no automatic reroute of an in-flight blob download. The region targeted by the redirect is decided up front by whichever registry endpoint served the blob-location call: the global endpoint chooses based on its per-registry health evaluation, and a regional endpoint always targets its own region. The signals you can use to confirm a failover is in progress: # Check replication status az acr replication list --registry myregistry --output table You can also check Resource Health for the registry in the Azure portal β navigate to your registry and select Resource health under the Help section to see platform-side degradation signals. You'll typically see: Increased pull latency as traffic shifts to a more distant replica Resource Health flagging known issues in the affected region Replication status indicating which replicas are online After the region recovers, the per-registry health evaluation marks it healthy again and the global endpoint resumes routing β automatic, no cooldown, no customer action. Note that health is evaluated per registry, not per region: if a degradation affects only a subset of registries in a region, only those registries are rerouted, and other registries in the same region continue to be served locally with no unnecessary latency penalty. Not triggered by throttling. Health-aware failover is DNS-based and responds to regional ACR service health and Azure infrastructure health. It does not reroute traffic based on HTTP 429 (throttling) responses. If a geo-replica is throttling your requests but the region's infrastructure is healthy, the global endpoint continues routing you to that geo-replica. To manage throttling, use regional endpoints to spread workloads across multiple geo-replicas for better capacity distribution. Note on long-running pushes during a failover. A multi-layer push that spans a failover boundary can land layers and the manifest on different replicas β exactly the failure mode that DNS bouncing produces during a single push. ACR is actively tightening health-aware failover behavior to minimize cross-replica scatter during these scenarios, and the recommendation today remains: pin pushes to a single replica via a regional endpoint when push/pull consistency matters. Common Questions Q1. Performance impact during initial replica creation on a live registry Because ACR is multi-region, multi-write, the existing replica continues serving pull and push traffic throughout the period when a new replica is being seeded. Replication is asynchronous and content propagates in the background; the time to populate a new geo-replica scales with the size of the registry β the cumulative number and total size of images already in the registry β not with any single image. The docs do not publish a quantified degradation percentage or a throttling window for this period, and they do not promise zero performance impact β the safe operating assumption for a live production registry is that existing replicas continue serving traffic normally, with the new replica catching up in the background. Q2. Restricted/updating state during initial sync There is no "restricted" state for the registry during normal replica creation. Writes, control-plane operations, and pushes/pulls against existing replicas continue normally. The only time configuration changes are unavailable is during a home region outage β see the relevant FAQ item later on for the full data-plane-versus-control-plane breakdown. Q3. Cooldown periods and non-straightforward failback scenarios There is no cooldown before failback, manual or automatic. Re-enabling a replica's participation in global endpoint routing takes effect immediately on ACR's side. Health-aware failover returns traffic to a region as soon as its per-registry health evaluation passes again. The failback case that is not seamless: if a recently pushed image has not yet replicated to the failover region, a pull from that region may not find the image until replication catches up. This is a function of eventual consistency, not failback timing β and it's part of a broader class of issues we cover in Q4. Q4. Common pull and push failure modes during the eventual-consistency window DNS bouncing during a single push is one well-known problem, but it isn't the only one. The eventual-consistency window between geo-replicas surfaces in several recurring failure modes worth knowing about: Push-then-immediate-pull-cross-region. Pushing myapp:v1 to one region and immediately pulling it from a different region can fail with manifest unknown until replication catches up. This shows up most painfully in CI/CD pipelines where one CI runner pushes an image and thousands of pods across other regions all try to pull from their local geo-replicas at the same time. Today, customers work around this with indeterminate sleeps before scheduling expensive compute, or with retry logic, or by waiting on a replication-complete signal β none of which is a clean planning story. Tag overwrite races. Pushing myapp:v1 , then re-pushing myapp:v1 shortly after with a fix (same tag, different digest), can leave different replicas resolving the same tag to different digests during the eventual-consistency window. Delete propagation. Deleting a tag or repository in one region takes some time to propagate to other replicas. Pulls from regions where the delete hasn't yet propagated can return the supposedly-deleted content. Mid-push failover scatter. A multi-layer push that spans a health-aware failover boundary or a DNS bouncing event can land layers on one replica and the manifest on another, surfacing as manifest validation errors or blob unknown on subsequent pulls. What ACR is doing about this. We're working on bounded staleness consistency for pushed images across all geo-replicas worldwide, which addresses these four failure modes directly. This will be covered in an upcoming blog post. If you're hitting eventual-consistency brittleness today and want to talk through your scenario, reach out to us on the Azure Container Registry GitHub repository β we want the customer signal to land in the design. Mitigations available today: Pin pushes to a single replica via a regional endpoint. Every sub-request in the push β login, blob uploads, manifest upload β goes to the same replica, eliminating the DNS bouncing and mid-push scatter classes entirely. Use a short-lived client-side DNS cache like dnsmasq scoped to the duration of a single push, only when you're not using regional endpoints. Do not run a long-lived DNS cache for the global endpoint β it interferes with --global-endpoint-routing false and with health-aware failover routing. Build retry logic into pulls that immediately follow a cross-region push. Either retry with backoff or check replication status with ACR webhooks before pulling. ACR can detect and notify you when an image or tag is available for pull in a geo-replica (say geo-replica B), after it has been pushed to another geo-replica (geo-replica A) and background replication has succeeded to geo-replica B. Design publish steps to be idempotent so retries triggered by mid-push failover are safe. Q5. Auth behavior across endpoint switches For safety, treat each global endpoint and each regional endpoint as its own authenticated surface. All registry APIs except the actual blob downloads (auth, manifests, tag resolution, referrers) flow through whichever endpoint you've chosen. If you switch from the global endpoint to a regional endpoint, or from one regional endpoint to another, re-authenticate. That means az acr login , fresh SDK auth, or β for AKS β letting the Kubernetes ACR credential provider handle re-auth, which it does automatically when the endpoint changes. Q6. Throttling under failover and pinning Throttling limits on registry API operations are per-replica, not per-registry. This has two operational implications: During health-aware failover, traffic that was spread across replicas can shift heavily onto whichever replicas remain in the global endpoint's routing pool. Capacity plan to spread traffic across two or three healthy replicas during a failover scenario rather than concentrating onto one β the global endpoint's routing already does this for you when multiple healthy replicas exist, but registries with only two regions configured can hit per-replica limits more easily during a failover. To mitigate, use regional endpoints to spread workloads across multiple geo-replicas and plan per-replica capacity. When pinning via regional endpoints (Step 2a), you concentrate traffic on whichever replica you've pinned to. If you've pinned all your AKS clusters to a single regional endpoint, you may hit that replica's per-region throttling limits at peak. Mitigations: pin different workloads to different regional endpoints across multiple regions for better topology mapping and capacity distribution, or use the global endpoint (Step 2b) for workloads where you don't need explicit pinning so ACR's routing can spread load. We're also working on improving the throttling metrics surfaced during health-aware failover events. Note: Health-aware failover does not reroute traffic based on HTTP 429 (throttling). If you're experiencing throttling but the region's infrastructure is healthy, the global endpoint continues routing you there. Use regional endpoints to explicitly spread load across replicas for capacity planning. Q7. Home region outage scope Geo-replication provides high availability for the data plane. During a home region outage, the control plane is unavailable, which means you can't create or delete replicas, modify network rules, or change replication settings until the home region recovers. ACR Tasks are also bound to the home region and don't run while it's unavailable. The data plane keeps working: Global endpoint continues routing pulls and pushes to healthy replicas. Regional endpoints continue working β you talk directly to specific replicas, and your client-side logic decides which region to use. Authentication, manifests, blob downloads, webhooks continue functioning through any healthy replica. The home region of a registry is fixed at creation and cannot be changed afterward. Microsoft's registry relocation guidance describes a redeployment procedure β creating a new registry in a different region β not an in-place change to an existing registry's home region. Note: If your registry uses a customer-managed key, review the key vault failover and redundancy guidance for maximum resilience. Key vault availability directly affects the registry's ability to encrypt and decrypt data. Q8. Webhooks during failover Webhooks fire from the replica that received the push. Because ACR also replicates content to other geo-replicas, webhooks fire from each geo-replica as the image syncs to it β so a single push results in webhook events from the receiving replica plus an event from each replica as replication completes. During a failover where pushes are routed to a different region, webhooks from those pushes fire from the new region; once the original region recovers and replication catches up, webhook events fire from there too. Webhook consumers should be designed to handle multiple events per pushed image and deduplicate as needed. Q9. Private endpoints with regional endpoints and dedicated data endpoints When a private endpoint is created against a registry, the private endpoint covers all of the registry's endpoint surfaces β the global endpoint, every regional endpoint (if regional endpoints are enabled), and every regional dedicated data endpoint. A single private endpoint in one VNet can reach the global endpoint (which routes you to a suitable replica), any regional endpoint in the same or a different region, and any region's dedicated data endpoint for blob downloads. The trade-off is private IP allocation: each endpoint surface consumes IPs in the VNet. With many replicas plus regional endpoints plus dedicated data endpoints all enabled, private endpoint creation can fail if the VNet runs out of available private IPs. IP address consumption per feature: Configuration IPs consumed per VNet Initial private endpoint (global endpoint + home region dedicated data endpoint) 2 Each geo-replication region added +1 (regional dedicated data endpoint) Regional endpoints enabled +1 per geo-replica Example: A registry with 3 geo-replicas and regional endpoints enabled consumes 7 private IPs per VNet: 1 (global) + 3 (data) + 3 (regional). Without regional endpoints, the same registry requires 4 private IPs: 1 (global) + 3 (data). Subnet sizing: Use at minimum a /27 (32 addresses) subnet for PE subnets on geo-replicated registries, and /24 where possible. To check how many private IPs are already consumed on a subnet: az network vnet subnet show \ --name <subnet-name> \ --vnet-name <vnet-name> \ --resource-group <resource-group> \ --query "{addressPrefix:addressPrefix, usedIPs:length(ipConfigurations || \`[]\`)}" \ --output table See the ACR private endpoints documentation for the full IP-allocation math and sizing guidance. Q10. Geo-replica creation stuck for private endpoint-enabled registries When creating a geo-replica for a registry that has private endpoints configured, the replica provisioning can get stuck in a Creating state if the identity performing the operation doesn't have sufficient permissions to create private endpoint networking resources. Solution: Manually delete the geo-replica that got stuck in the provisioning state. Ensure the identity has the permission Microsoft.Network/privateEndpoints/privateLinkServiceProxies/write before creating the geo-replica again. Also verify that every PE subnet connected to the registry has free IP capacity β if any PE subnet across any connected VNet does not have enough free IPs, the replication provisioning fails and rolls back. The replica appears briefly in a Creating state and then is removed. The resulting error does not identify which subnet or VNet is exhausted. Q11. Metrics, logs, and alerts for the three phases We map each phase to the signals available in the Monitoring Guidance section below. The headline: Resource Health (in the Azure portal) and az acr replication list give you the platform-side signals; Azure Monitor platform metrics are collected automatically, and resource logs require Diagnostic Settings to be enabled on the customer side. Behavior summary Scenario Automatic? Customer Action Required Notes Registry in a region degrades Yes None Health-aware failover; per-registry; minutes-scale; global endpoint operations only Region recovers after a degradation event Yes None No cooldown Pin AKS clusters to co-located replicas No Use regional endpoint URLs in deployment manifests (Step 2a) Coexists with global endpoint No pinning needed for most workloads Yes None β keep using myregistry.azurecr.io (Step 2b) Global endpoint plus health-aware failover Push/pull from the same replica (consistency) No Use a regional endpoint for both push and pull Eliminates DNS bouncing and mid-push scatter Capacity planning per region No Spread workloads across multiple regional endpoints Per-replica throttling; avoid concentrating on one replica DR rehearsal: take a replica out of global routing No az acr replication update --global-endpoint-routing false Data continues syncing both directions; costs continue accruing Re-enable replica participation in global routing No az acr replication update --global-endpoint-routing true No cooldown; replica is immediately ready Switch a workload between endpoints No Re-auth ( az acr login , SDK auth, or Kubernetes ACR credential provider) Each endpoint is its own authenticated surface Initial replica seeding on a live registry N/A None Existing replica continues serving traffic; seeding time scales with registry size Long-running push during a failover No Retry; design publishes to be idempotent Pin via regional endpoint to avoid mid-push scatter; ACR is tightening this behavior Pull of a recently pushed image from a different region No Wait for replication, retry with backoff, or check replication status Eventual consistency; bounded staleness consistency in development Home region outage Data plane: yes; control plane: no Use global or regional endpoints for data plane operations Control plane (replica config, network rules) requires home region DNS and Client-Side Considerations DNS bouncing during a single push is the most common geo-replication push problem in customer threads, and it warrants a section of its own. The failure mode. A docker push is a sequence of HTTP requests: blob uploads for each layer, then a manifest upload that references those layers by digest. If the Linux DNS resolver on the client doesn't cache myregistry.azurecr.io consistently for the duration of the push, individual sub-requests can resolve to different replicas. Because replication is eventually consistent, the manifest can land on a replica that doesn't yet have the layers it references, and the manifest validation fails. The two mitigations: Regional endpoints pin the push to a single replica end-to-end. Every sub-request β login, blob uploads, manifest upload β goes to the same replica. This is the cleanest fix and the one we recommend for any pipeline where push/pull consistency matters. A short-lived client-side DNS cache like dnsmasq scoped to the duration of a single push. For Linux VMs in Azure, follow the DNS name resolution options guidance. The pin should last the push and no longer. For other clients performing pushes, you can customize your stack's DNS resolver to have a similar short-lived DNS cache to pin the global endpoint's resolved DNS for only the duration of an image push operation. A note on long-lived DNS caching for the global endpoint. Don't run a long-lived DNS cache for myregistry.azurecr.io . ACR purges its own DNS records on the server side when a replica is taken out of global routing (Step 3) and during health-aware failover; a long-lived client-side cache will keep clients pointed at the old region after our purge, which makes both the manual disable mechanism and health-aware failover look broken from the client's perspective. Retry behavior: In-flight pushes during a failover may fail. Design publish steps to be idempotent so retries are safe. Pipelines that push in one region and immediately pull from a different region should retry with backoff or check replication status β eventual consistency means the pull may race ahead of replication. ACR is working on bounded staleness consistency that addresses this directly by enabling proxying (on ACR infrastructure) an image pull request from one geo-replica (if it does not have the image) to another geo-replica that has the image; see the relevant FAQ item. Note: Specific retry counts, back-off intervals, and push timeout values are application-layer decisions. The platform behavior is documented; the retry policy belongs to your client. Monitoring Guidance We map the three phases to the signals available from each source. Where a signal requires customer-side configuration, we flag it. Phase A: Initial replication (after creating a new replica) az acr replication list and az acr replication show β confirm the new replica reaches provisioningState: Succeeded and status: online , and view per-replica status. Azure Monitor platform metrics β push count, pull count, and other registry metrics are collected automatically and visible in the Azure portal under Metrics. No customer configuration is needed to view platform metrics. To export metrics or enable resource logs (detailed operation logs), configure Diagnostic Settings on the registry. Phase B: Failover (planned via replica disable, or automatic via health-aware failover) Per-replica regionEndpointEnabled state via az acr replication list β confirms whether a manual disable took effect, i.e. which replicas are currently eligible for global endpoint routing. Note: this flag reflects the manual configuration for configuring a geo-replica's global endpoint routing eligibility; it does not indicate whether health-aware failover has actively rerouted traffic away from a replica. Resource Health for the registry (in the Azure portal under Help > Resource health) β surfaces platform-side degradation signals during incidents. ACR does not yet expose a definitive "this region is currently serving your traffic" signal; Resource Health and client-side latency changes are the best available indicators. Pull latency from clients β increased latency from a more distant replica is the client-observable signal that traffic has rerouted. Azure Monitor platform metrics β visible per-region in the Azure portal Metrics blade. To export metrics or query them programmatically, enable Diagnostic Settings. Phase C: Failback (replica returns to global routing) az acr replication list β confirms regionEndpointEnabled: True (manual) or online status across all replicas (automatic). Pull latency normalizing as clients reach the recovered replica again. Resource Health clearing for the registry (visible in the Azure portal). Note: The health-aware failover blog calls out ongoing work to surface richer signals β including notifications for when routing changes and which region is currently serving your traffic. The signals listed above are what's available today. Pricing Considerations Storage billing vs. storage quota: Storage is billed per geo-replica β a 1 GiB image replicated to 5 geo-replicas is charged as 5 GiB of storage (1 GiB Γ 5 geo-replicas). However, storage quota (the tier's maximum storage limit) counts the image only once β the same 1 GiB image counts as 1 GiB toward your tier's maximum, not 5 GiB. Data transfer: Geo-replication can reduce costs by enabling in-region image pushes and pulls, which avoids cross-region data transfer charges during these push or pull operations. However, cross-region data transfer charges still apply when ACR replicates pushed content to other geo-replicas as part of eventual consistency. Disabled replicas still cost: When you take a replica out of global routing with --global-endpoint-routing false , storage and per-replica costs continue accruing because data continues syncing bidirectionally. For more information, see ACR pricing. Cleanup Run these commands to undo the walkthrough setup. Order matters: disable regional endpoints before deleting replicas, since regional endpoint URLs depend on which replicas exist. # Disable regional endpoints if you enabled them in Step 2a az acr update -n myregistry -g myrg --regional-endpoints disabled # Re-enable any replicas you disabled in Step 3 (no-op if already enabled) az acr replication update --registry myregistry --name westus \ --global-endpoint-routing true # Delete the West US replica created in Step 1 az acr replication delete --registry myregistry --name westus # Confirm only the home region replica remains az acr replication list --registry myregistry --output table Note: Replica deletion is a control-plane operation that requires the home region to be available. During a home region outage, replica configuration cannot be modified. Summary Table Question Answer When should I use regional endpoints vs the global endpoint? Use regional endpoints (Step 2a) for workloads that need affinity, predictable routing, push/pull consistency, troubleshooting, or client-side failover. Use the global endpoint (Step 2b) for everything else and let health-aware failover handle routing. What should I enable for secure, resilient layer downloads? Enable dedicated data endpoints. They scope firewall rules tightly to your registry and replace wildcard storage DNS with predictable per-region FQDNs. How do I avoid DNS-bouncing manifest validation failures on push? Pin pushes to a single replica via a regional endpoint. A short-lived client-side dnsmasq for the push duration is also fine if you're not using regional endpoints. Should I run a long-lived DNS cache for the global endpoint? No. ACR purges DNS server-side on disable and during failover; client-side caching works against that. Do I need to re-auth when switching endpoints? Yes. Each global or regional endpoint is its own authenticated surface. az acr login , SDK auth, or the Kubernetes ACR credential provider handles the re-auth. What happens during a home region outage? Data plane keeps working through any replica via the global endpoint or regional endpoints. Control plane operations (replica configuration, network rules) are unavailable until the home region recovers. The home region is fixed at registry creation. What's ACR doing about eventual-consistency pain? Bounded staleness consistency for cross-replica pushed images is in development and will be covered in an upcoming blog post. Reach out via GitHub if you want to share your scenario. For the full automation matrix β what's automatic, what requires customer action, and what to expect for each scenario β see the behavior summary above. If you have further questions about ACR geo-replication routing, pinning, capacity planning, eventual consistency, or failover behavior, reach out to us on the Azure Container Registry GitHub repository or file feedback through the Azure portal.120Views0likes0CommentsNow in preview: built-in MCP for Azure App Service
At Build 2026 last week, we announced the public preview of built-in MCP for Azure App Service. It does one thing, and it does it with almost no effort on your part: it turns a REST API you already host on App Service into a Model Context Protocol (MCP) server, so AI agents and assistants can call your API as a set of tools. No MCP code to write. No second service to deploy. What it does You give App Service an OpenAPI 3.x specification (JSON or YAML) describing the operations you want to expose. The platform reads that spec and generates one MCP tool per operation, then serves the MCP endpoint over streamable HTTP at a path you choose (the default is /mcp). From there, App Service handles the parts that are tedious to build yourself: MCP protocol negotiation Tool discovery, so clients can list the operations your spec exposes Hot reload of the spec when it changes Client cancellation Any MCP-compatible client can connect, including GitHub Copilot Chat, Cursor, Windsurf, and Claude Desktop. Why it matters Most teams already have the API the agent ecosystem wants to call. What they don't have is the time to wrap it in a bespoke MCP server, keep that server in sync with the API, and operate it. Built-in MCP removes that work entirely. If your REST API runs on App Service and you can produce an OpenAPI spec for it β and most web frameworks generate one for you β you're a configuration change away from an agent-ready endpoint. Built-in or custom? Built-in MCP is the fastest path when your tools map cleanly to REST operations. If you need behavior that doesn't β multi-step workflows, in-memory aggregation, MCP resources or prompts, or more than one MCP server on a single app β a custom MCP server built with an MCP SDK and deployed as your application code is still the right choice. The two approaches complement each other, and you can read more about choosing between them in the docs. Security Built-in MCP works with App Service Authentication, so MCP requests go through the same identity checks as every other route on your app, using Microsoft Entra or any OpenID Connect provider you've configured. When App Service Authentication is enabled, the platform also publishes OAuth protected-resource metadata so MCP clients can complete the OAuth flow automatically. As always, your application code is responsible for validating the bearer token on each request β and you should avoid exposing an MCP server publicly without authentication, since every published tool becomes callable once a client connects. Getting started Built-in MCP is configured through the aiIntegration property on your App Service app, and the preview supports three configuration paths: the Azure portal, the Azure CLI (az rest), and Bicep. It runs on dedicated pricing tiers, Basic or higher β it isn't supported on Free, Shared, Consumption, or Flex Consumption plans. This is a preview, and we'd love your feedback as you try it. To enable built-in MCP on your own app and connect an MCP client, head to the docs: Configure App Service built-in MCP (preview) Use App Service as a Model Context Protocol (MCP) server224Views0likes0CommentsRegional Endpoints for Azure Container Registry Geo-Replication β Now in Public Preview
By Johnson Shi, Zoey (Zhuyu) Li, Huangli Wu What's new Regional endpoints for geo-replicated Azure Container Registries are now in public preview. See the feature's official MS Learn documentation. If you've been following since the private preview announcement, here's what changed: No feature flag registration. No subscription enrollment so all Azure subscriptions and customers can now use this feature. No CLI extension. Regional endpoints commands are built into Azure CLI 2.86.0+ natively. If you installed the private preview acrregionalendpoint extension, uninstall it to avoid conflicts. Native CLI and portal support. With Azure CLI 2.86.0+, enable regional endpoints for all geo-replicas of a registry with az acr create --regional-endpoints enabled or az acr update --regional-endpoints enabled . The Azure portal also supports configuring regional endpoints natively. CLI flag rename for configuring a geo-replica's global endpoint routing (an existing separate feature). The existing flag --region-endpoint-enabled (on az acr replication create/update ) has been renamed to --global-endpoint-routing . Key clarifications: "--global-endpoint-routing" (formerly "--region-endpoint-enabled" on "az acr replication create / az acr replication update") β controls whether a specific geo-replica participates in global endpoint routing. This is an existing feature that is different from the new registry-level "--regional-endpoints" feature being discussed in this post. "--regional-endpoints" (on az "acr create / az acr update") β enables or disables the regional endpoints feature at the registry level for all geo-replicas. This is the feature discussed in this post. See the endpoint reference for the full breakdown of the various registry endpoints (global endpoints, regional endpoints, and data endpoints). Regional endpoints are available on Premium SKU registries in all Azure public cloud regions. What are regional endpoints? Regional endpoints give you dedicated, per-region login server URLs for each geo-replica with the following URL pattern: myregistry.eastus.geo.azurecr.io myregistry.westeurope.geo.azurecr.io Regional endpoints coexist with the registry's global endpoint ( myregistry.azurecr.io ) β enabling regional endpoints doesn't disable a registry's global endpoint that is backed by Azure-managed routing. You can choose per workload: You can use the global endpoint with automatic Azure-managed routing with health-aware failover, where Azure will route your requests to the geo-replica with the best network performance profile to the client. You can use a regional endpoint when you need explicit control or routing to a specific geo-replica. Other resources: For the full background on why regional endpoints exist and the problems they solve, see the private preview blog post. For the complete operational deep dive β health-aware failover, throttling considerations, storage quota and pricing, eventual consistency, home region outage behavior, DNS propagation, private endpoint interaction, capacity planning, and monitoring guidance β see How ACR geo-replication handles failover, failback, and traffic redirection. For the behind-the-scenes engineering implementation β architectural overview and the engineering system design of the feature β see Determinism over magic: the engineering design behind Azure Container Registry Regional Endpoints. Getting started Enable regional endpoints on an existing registry: az acr update -n myregistry -g myrg --regional-endpoints enabled View all registry endpoint URLs, including the registry global endpoint, geo-replica regional endpoints, and data endpoints: az acr show-endpoints --name myregistry --resource-group myrg Using regional endpoints Authenticate to a specific regional endpoint: az acr login --name myregistry --endpoint eastus Push to a specific geo-replica. Images and tags pushed to a geo-replica via regional endpoints still propagate to all other geo-replicas under eventual consistency. docker tag myapp:v1 myregistry.eastus.geo.azurecr.io/myapp:v1 docker push myregistry.eastus.geo.azurecr.io/myapp:v1 Pull an image: docker pull myregistry.eastus.geo.azurecr.io/myapp:v1 You can specify regional endpoints directly in Kubernetes deployment manifests if you need to pin workloads to specific regions. This ensures clusters in specific regions always pull from their colocated replica, providing predictable routing and reduced latency. By using different regional endpoints in each cluster's manifests, you can choose to guarantee that each cluster pulls from its local replica instead of relying on Azure-managed routing. East US cluster deployment: apiVersion: apps/v1 kind: Deployment metadata: name: myapp-eastus spec: template: spec: containers: - name: myapp image: myregistry.eastus.geo.azurecr.io/myapp:v1 West Europe cluster deployment: apiVersion: apps/v1 kind: Deployment metadata: name: myapp-westeurope spec: template: spec: containers: - name: myapp image: myregistry.westeurope.geo.azurecr.io/myapp:v1 When to use regional endpoints Scenario What to do Most workloads Keep using the global endpoint ( myregistry.azurecr.io ). Health-aware failover handles routing automatically. Pin AKS clusters to co-located replicas Use regional endpoint URLs in deployment manifests. CI/CD push-then-pull consistency Pin pushes to a regional endpoint to avoid eventual-consistency races. Client-side failover Switch between regional endpoints based on your own health checks. Capacity planning Spread workloads across multiple regional endpoints to avoid per-replica throttling. Troubleshooting Target a specific geo-replica to reproduce or isolate an issue. What changed from private preview Private preview Public preview Feature flag registration required ( az feature register ) No registration needed Subscription private preview enrollment and propagation wait Immediately available to all Azure subscriptions for all Premium SKU registries in all Azure public cloud regions. Separate CLI extension ( acrregionalendpoint ) Built into Azure CLI 2.86.0+ natively No registry-level CLI flag az acr update --regional-endpoints enabled enables regional endpoints for all geo-replicas --region-endpoint-enabled flag for controlling a geo-replica's global endpoint routing via az acr replication update Flag for controlling a geo-replica's global endpoint routing renamed to --global-endpoint-routing No portal support Native Azure portal support for enabling regional endpoints for new registries (during creation) and for existing registries Private preview docs in Azure/acr Full documentation on MS Learn Enabling regional endpoints in the Azure portal You can enable regional endpoints directly from the Azure portal for both new registries (during creation), as well as existing registries: If you were in the private preview 1. Uninstall the CLI extension. The private preview CLI extension conflicts with the built-in commands in Azure CLI 2.86.0+. Remove it: az extension remove --name acrregionalendpoint Verify it's gone: az extension list --query "[?name=='acrregionalendpoint']" -o table 2. Ensure you're running Azure CLI 2.86.0 or later. Regional endpoints commands are available natively starting in Azure CLI 2.86.0. Check your version: az version 3. Update scripts that use --region-endpoint-enabled for controlling global endpoint routing for a geo-replica. The old flag name for controlling a geo-replica's global endpoint routing configuration is deprecated and will be removed in Azure CLI 2.87.0 (June 2026). Update to --global-endpoint-routing : # Old (deprecated) az acr replication update --registry myregistry --name westus \ --region-endpoint-enabled false # New az acr replication update --registry myregistry --name westus \ --global-endpoint-routing false Why the rename? The old flag name --region-endpoint-enabled was confusing β it sounded like it controlled the regional endpoints feature, but it actually controlled whether a geo-replica participates in global endpoint routing. The new name --global-endpoint-routing says exactly what it does. For a full breakdown of all three CLI flags and how they relate, see the endpoint reference. Learn more Full documentation: Geo-replication in Azure Container Registry β Regional endpoints β prerequisites, CLI commands, network considerations, private endpoint integration, and troubleshooting. Operational deep dive: How ACR geo-replication handles failover, failback, and traffic redirection β health-aware failover, throttling, eventual consistency, DNS considerations, monitoring, pricing, and a full walkthrough. Behind-the-scenes engineering implementation: Determinism over magic: the engineering design behind Azure Container Registry Regional Endpoints β architectural details and the engineering system design behind the feature. Endpoint reference: Azure Container Registry endpoint reference β all endpoint types, URL formats, and CLI flags in one place. Private endpoints: Connect privately to a registry using private endpoints β IP allocation math, subnet sizing, and NIC queries for registries with regional endpoints. Firewall rules: Configure firewall access rules β which FQDNs to allow for regional endpoints. Feedback We'd love to hear how you're using regional endpoints and what we can improve. Reach out via: Azure Container Registry GitHub repository β issues, feature requests, and discussion Azure portal feedback β use the feedback button in the Azure portal on your registry's page Regional endpoints are on the path to GA. Your feedback directly shapes the feature's direction.190Views1like1CommentBetter Deployment Errors in az webapp deploy
Deployment failures can be difficult to interpret, especially when the error returned by the deployment API does not clearly explain what went wrong or what to do next. To make this easier, we have added a new switch to az webapp deploy for App Service for Linux: --enriched-errors true When enabled, deployment failures show context-enriched diagnostics directly in the CLI output. This includes an error code, deployment context, the raw error, suggested fixes, and a Copilot-ready prompt that you can use for additional guidance. By default, this option is disabled. How to use it Add --enriched-errors true to your deployment command: az webapp deploy \ --resource-group <resource-group-name> \ --name <app-name> \ --src-path <path-to-package> \ --enriched-errors true What you get With enriched errors enabled, failed deployments can include details such as: Error Code : ArtifactStackMismatch Stage : Deployment Runtime : DOTNETCORE|9.0 Deploy Type : WarDeploy Kudu Status : 400 Raw Error: Artifact type = 'War' cannot be deployed to stack = 'DOTNETCORE'. Suggested Fixes: - Ensure the artifact type matches the app's runtime stack - Check the current linuxFxVersion - Update the runtime stack if needed This makes it easier to understand whether the failure is caused by an artifact/runtime mismatch, an invalid deployment path, missing required parameters, or a configuration conflict such as WEBSITE_RUN_FROM_PACKAGE. The screenshots below show a couple of more examples of enriched deployment failures. Use with GitHub Copilot The enriched output also includes a prompt that you can paste into GitHub Copilot along with the full error details: Why did my Linux App Service deployment fail and how do I fix it? This can help you get more specific guidance based on your deployment configuration and the failure details. Summary The new --enriched-errors switch gives you clearer and more actionable deployment failure information directly in the Azure CLI. Try it out the next time you are deploying to Azure App Service for Linux.118Views0likes0CommentsUnderstand Whatβs Happening with Your App Service for Linux Website Using Site Status
When your website is not starting or behaving unexpectedly, it can be difficult to quickly understand what state the application is in and what might be causing the issue. To make this easier, Azure App Service for Linux now includes Site Status. Site Status provides runtime information for your website, including the current state of the app and detailed error information when issues are detected. What Site Status shows Site Status gives you a view into the current runtime state of your App Service for Linux website. You can see the site runtime status from the web appβs Properties experience. If the platform detects an issue with the site, the runtime status will show Issues Detected. Selecting Issues Detected opens a detailed view where you can see the current status of the site, the last known error, and additional troubleshooting details. If your app is scaled out across multiple instances, Site Status shows this information for each instance hosting your app. This makes it easier to answer questions such as: Is my website still starting? Did my website start successfully? Is the site stopped or blocked? Is the app recycling to apply changes? What was the last known runtime error? Is this likely a transient issue, or does the error point to a configuration problem? Site Status values Site Status reports one of the following platform-defined runtime states for your website: Status Description Starting The site is initializing the container and all necessary components. Started The site successfully initialized all necessary components and is running. Stopping The container and site components are being torn down. Stopped The site is no longer running and will not receive requests. Updating The site is recycling, either overlapped or non-overlapped, to apply the provided changes. Blocked The site attempted to start multiple times and is temporarily blocked from another attempt to reduce instance load. Unknown A platform-side issue is preventing status assessment. These statuses provide a quick summary of the current or last known runtime state of your website. View detailed issue information When Site Status detects an issue, you can select Issues Detected to view more detailed runtime information. The details page shows information such as: Field What it tells you Status The current runtime state of the website. Last error A short error category or failure type. Last error info Additional troubleshooting details about the issue. Last error occurrence When the error was last observed. Actions Available repair actions. For example, the screenshot above show a site that failed because the configured storage could not be mounted. The message points you toward the likely root cause. In this case, the issue is probably not with the site process itself. Restarting the app or replacing the instance may not resolve the problem. You would likely need to review the storage account, file share, firewall, networking, private endpoint, or authentication configuration. Repair actions From the issue details view, you can select Repair to take an action for the affected instance. Available repair actions include: Action Description Restart Restarts the site on the selected instance. Replace instance Moves the site away from the current instance and replaces it with another instance. These actions can be useful when your website has run into a transient runtime issue, or when the underlying instance is in a bad state. However, repair actions are not a substitute for fixing configuration issues. For example, if the site cannot access a configured storage account because of network or authentication settings, restarting or replacing the instance is unlikely to fix the issue. The underlying configuration must be corrected first. Site Status vs. Health Check Site Status and Health Check are both useful for understanding and improving the reliability of App Service for Linux websites, but they serve different purposes. Site Status helps you understand the current runtime state of your website. It provides platform-defined status values and detailed error information to help you troubleshoot startup, runtime, and configuration-related issues. Health Check helps determine whether an instance should continue receiving traffic. It pings a customer-configured endpoint and uses the HTTP response to identify unhealthy instances, redirect traffic, and replace instances when needed. Health Check Site Status Pings a customer-configured endpoint. Uses platform-side runtime checks. Reports the HTTP status returned by your configured endpoint. Reports a platform-defined runtime status for the site. Helps determine whether an instance should receive traffic. Helps explain what is happening with the website at runtime. Requires a health check path to be configured. Does not require a customer-configured health endpoint. Summary Site Status gives you a clearer view into the runtime state of your App Service for Linux website. By surfacing platform-defined site status values and detailed runtime information, it helps you understand what is happening with your application as it starts, runs, updates, or stops. We are continuously improving App Service for Linux to provide better visibility, more actionable information, and a smoother experience for running your applications in Azure.168Views0likes0CommentsYou Can Build a Framework-Agnostic AI Gateway on Azure App Service β Here's How
The agent infrastructure conversation moved this year. In October 2025, AWS shipped Amazon Bedrock AgentCore β a managed agent runtime with per-session microVM isolation, built-in long-term memory, native MCP support, and an opinionated policy engine. A few months earlier, Cloudflare shipped its Agents SDK on top of Durable Objects, betting that edge-native stateful agents are the future. Both bets are real, both are interesting, and both arrive as closed, proprietary runtimes. So: what's Azure's answer? It's a question I've heard a couple times from architects in the last six months. The honest answer is that Azure already has the pieces. They don't ship as one product called AgentRuntime, and that's actually the point. Azure's pitch is composable: App Service + API Management + MCP, three services you already have access to, glued together with open standards. This post walks through a runnable sample of that composition. One App Service hosting both an agent (built with the Microsoft Agent Framework) and the stateless MCP server it calls, fronted by Azure API Management with the AI Gateway policy set β semantic caching, token rate limiting, per-subscription token emission for chargeback. One azd up deploys the lot. Repo: app-service-ai-gateway-mcp-apim-python. The headline claim is in the title. The point I actually want to make is the one underneath it: the framework is replaceable, the gateway is the contribution. Swap the Agent Framework module for Pydantic AI or LangGraph and the rest of the architecture is unchanged. That's what "run anything" means, made literal. The composable stack ββββββββββββββββββββββββββββββββββββββββββββββββββ β Azure API Management β MCP / Agent βββ€ AI Gateway policies: β client β β’ llm-token-limit β β β’ llm-semantic-cache-lookup / store β β β’ llm-emit-token-metric β β β’ rate-limit-by-key (MCP API) β βββββββββββββββ¬ββββββββββββββββββββ¬βββββββββββββββ β β ββββββββββββββββββΌβββ ββββββββββββββΌβββββββββββββββ β Azure OpenAI β β Azure App Service β β β’ chat model β β FastAPI app: β β β’ embedding β β β’ /mcp (stateless) β β model β β β’ /agent/chat β βββββββββββββββββββββ β Managed identity β β β APIM (via subscription) β ββββββββββββββ¬βββββββββββββββ βΌ Application Insights (cloud_RoleInstance, APIM token metrics) Three observations that drive everything else: APIM is the only thing that talks to Azure OpenAI. The App Service agent doesn't have an AOAI key. It has an APIM subscription key. Every LLM call passes through the gateway, picks up the policies, and gets logged with consistent dimensions. That's where the governance part lives. The agent runtime is App Service. Linux, Python, FastAPI. Any language. Any framework. Pick your tool. We use Microsoft Agent Framework because it just GA'd and the API is clean, but the agent module is the easiest thing in the stack to swap. The MCP server is co-located with the agent. Same App Service, different route. The agent calls its own tools either in-process (fast path) or back out through APIM (so MCP traffic gets rate-limited and observed too). That choice is one environment variable. What the sample actually does The FastAPI app exposes three routes that matter: /mcp β a stateless HTTP MCP server (protocol revision 2025-11-25 ), implementing four tools: whoami , echo , lookup_fact , and summarize_app_service_doc . Any MCP client (Claude, VS Code, your own agent runtime) can connect. /agent/chat β a Microsoft Agent Framework agent that uses those same MCP tools as its tool set, and calls AOAI through APIM. /health and / β the boring but essential supporting cast (health check for App Service probes, status page showing the serving instance ID). Here's the agent definition. The key line is the endpoint: from agent_framework.openai import OpenAIChatCompletionClient client = OpenAIChatCompletionClient( azure_endpoint=os.environ["APIM_GATEWAY_URL"], # β APIM, not AOAI model=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT"], api_version="2024-10-21", api_key=os.environ["APIM_SUBSCRIPTION_KEY"], default_headers={"Ocp-Apim-Subscription-Key": os.environ["APIM_SUBSCRIPTION_KEY"]}, ) agent = client.as_agent( name="AppServiceExpert", instructions=SYSTEM_INSTRUCTIONS, tools=build_tools(), ) That's it. The agent has no idea APIM exists. It thinks it's talking to AOAI. APIM is doing every interesting thing β auth, caching, throttling, metric emission β without the agent code knowing or caring. The policy that does the heavy lifting The AOAI API in APIM has one policy attached at the API scope. The full XML is in infra/apim/policies/aoai-policy.xml; here's the bones of it: <policies> <inbound> <base /> <authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="aoai-token" /> <set-header name="Authorization" exists-action="override"> <value>@("Bearer " + (string)context.Variables["aoai-token"])</value> </set-header> <azure-openai-token-limit counter-key="@(context.Subscription?.Id ?? "anonymous")" tokens-per-minute="50000" estimate-prompt-tokens="true" /> <azure-openai-semantic-cache-lookup score-threshold="0.85" embeddings-backend-id="aoai-embeddings-backend" embeddings-backend-auth="system-assigned"> <vary-by>@(context.Subscription?.Id ?? "anonymous")</vary-by> </azure-openai-semantic-cache-lookup> <set-backend-service backend-id="aoai-backend" /> <azure-openai-emit-token-metric namespace="ai-gateway"> <dimension name="Subscription ID" value="@(context.Subscription?.Id ?? "anonymous")" /> <dimension name="API ID" value="@(context.Api.Id)" /> <dimension name="Operation ID" value="@(context.Operation.Id)" /> <dimension name="Client IP" value="@(context.Request.IpAddress)" /> </azure-openai-emit-token-metric> </inbound> <outbound> <base /> <azure-openai-semantic-cache-store duration="3600" /> </outbound> </policies> Four things are happening here that would otherwise be your problem: Auth to AOAI. APIM's managed identity holds the Cognitive Services OpenAI User role on the AOAI account. No keys. Token rate limiting. Each APIM subscription gets a tokens-per-minute budget. One runaway team can't starve everyone else. Semantic caching. The inbound policy embeds the prompt using the embedding deployment, queries the Redis-backed APIM cache for a vector match above the 0.85 threshold, and short-circuits the AOAI call on a hit. The outbound azure-openai-semantic-cache-store writes successful misses back. Per-call metric emission. Every call pushes PromptTokens , CompletionTokens , and TotalTokens to Application Insights as custom metrics tagged with the APIM subscription, the API, the operation, and the client IP. That's your chargeback dashboard, ready to query. The whole thing is XML. None of it is in your agent code. Deploying it azd auth login azd up azd up provisions a P0v3 App Service Plan with the web app and a staging slot, an AOAI account with gpt-4o-mini + text-embedding-3-small deployments, an APIM Developer SKU service with the two APIs and the policy XML wired up, an Azure Cache for Redis Basic C0 as the semantic-cache store, and a Log Analytics workspace + Application Insights. The postprovision hook fetches the APIM subscription key for the AI Gateway product and writes it into the App Service's APIM_SUBSCRIPTION_KEY setting (and the staging slot's, so slot swaps are clean). Be patient. Developer SKU APIM takes 30β45 minutes the first time. If you want to prototype faster, the sample supports Consumption SKU as a one-line flip: azd env set APIM_SKU Consumption azd provision Consumption provisions in about a minute and is great for sketching. Verify your specific policies are supported there before you ship it. Governing it like a grown-up The toy version of this post stops at "look, semantic cache works." The version your platform engineering lead wants to see goes further. Per-team chargeback. The token-emit policy tags every call with the APIM subscription ID. Issue one subscription per team, hand it over with a quota, and your monthly chargeback report is a KQL query: customMetrics | where timestamp > startofmonth(now()) | where name == "TotalTokens" | summarize Tokens=sum(valueSum) by Team=tostring(customDimensions["Subscription ID"]) | extend USD = Tokens * 0.00015 / 1000 // gpt-4o-mini blended rate | order by USD desc Content safety as a policy plug-in. Add an llm-content-safety block to the inbound policy and point it at an Azure AI Content Safety resource β every prompt and response gets moderated before reaching agents or end users. The sample doesn't deploy Content Safety by default (to keep the demo cost-free), but the README has the one-line bicep + one-block policy delta. Circuit breaker + multi-region failover. Add a second AOAI backend in a different region and an APIM backend pool, give the pool a circuit-breaker rule, and your agents inherit failover with zero code changes. Rate-limit MCP traffic too. The MCP API has its own policy with rate-limit-by-key , so a runaway agent can't pin the MCP server with a hot loop. None of these are gymnastics. They're one policy block each. The pattern is the same every time: write policy at the gateway, leave the agent code alone. Proving it works After azd up finishes, two checks. First, hit the agent endpoint: curl -sS -X POST "$(azd env get-value WEB_URI)/agent/chat" \ -H 'Content-Type: application/json' \ -d '{"message": "How does App Service horizontally scale an MCP server?"}' | jq You should see a reply that cites the instance ID (the agent calls whoami and summarize_app_service_doc to ground its answer) and a tool_calls array showing the agent's reasoning trace. Second, run the k6 load test: export BASE_URL="$(azd env get-value WEB_URI)" export APIM_SUBSCRIPTION_KEY="$(azd env get-value APIM_SUBSCRIPTION_KEY)" k6 run loadtest/k6-gateway.js The script hits /agent/chat with a small pool of semantically-similar prompts. After a 30-second warmup, the steady phase should report a cache-hit ratio above 30%: APIM AI Gateway β k6 summary βββββββββββββββββββββββββββββ Cache hits : 412 Cache misses : 88 Hit ratio : 82.4% Cross-check in App Insights: ApiManagementGatewayLogs | where TimeGenerated > ago(15m) | where ApiId == "aoai" | extend cache = tostring(parse_json(ResponseHeaders)["x-llm-cache-status"]) | summarize count() by cache, bin(TimeGenerated, 1m) | render columnchart A solid bar of hits next to a smaller bar of misses is the gateway earning its keep. "Run anything" β the proof Here's the part where I cash the check the title wrote. The agent module is the easiest thing in this stack to replace. Three changes to ship the same demo on Pydantic AI: # requirements.txt - agent-framework-core==1.5.0 - agent-framework-openai==1.5.0 + pydantic-ai==0.4.0 # agent/agent.py from pydantic_ai import Agent from pydantic_ai.models.openai import OpenAIModel def build_agent(): model = OpenAIModel( "gpt-4o-mini", base_url=f"{os.environ['APIM_GATEWAY_URL']}/openai/deployments/gpt-4o-mini", api_key=os.environ["APIM_SUBSCRIPTION_KEY"], ) return Agent(model, system_prompt=SYSTEM_INSTRUCTIONS, tools=build_tools()) That's it. build_tools() returns the same list of async callables (Pydantic AI accepts plain Python functions as tools, same as Agent Framework). LangGraph works the same way β wire build_tools() into a ToolNode , point ChatOpenAI at the APIM gateway URL, done. Every APIM policy still fires. Every token metric still emits. Every cache hit still hits. The gateway is the boundary; the runtime above it is fungible. What AgentCore gets right I want to land this without spin. AgentCore's per-session microVM isolation is genuinely interesting β it's a stronger sandboxing story than running multiple agents in shared App Service workers, and it matters for multi-tenant SaaS where agents execute arbitrary user code or call third-party tools you don't fully trust. The managed long-term memory primitive is also a real convenience; Azure has the building blocks (Cosmos DB, AI Search, Cognitive Search) but they aren't pre-wired into a single "agent memory" API the way AgentCore's are. Where the App Service + APIM + MCP composition genuinely wins: Open standards. MCP is a public protocol with implementations across the industry. AgentCore's tool layer is AWS-native. No new runtime to learn. App Service is the same App Service. Your existing CI/CD, your existing security review, your existing monitoring all apply. Bring your own framework. Pydantic AI, LangGraph, Agent Framework, Semantic Kernel, AutoGen, CrewAI β they all work, because the App Service doesn't care what's running inside the container. Existing enterprise footprint. VNet integration, private endpoints, managed identity, deployment slots, sidecars, Easy Auth. None of it is new for App Service. You inherit a decade of platform work. The right framing isn't "Azure's answer to AgentCore." It's that Azure is making a different bet: that enterprises will value the composability of services they already trust over the convenience of a new proprietary runtime. For some, that bet is probably correct. For a few β multi-tenant agent marketplaces, untrusted code execution β AgentCore's isolation model is a better fit. Pick the one that matches your threat model. What's next If you ship the sample and want to compare notes, the repo is at app-service-ai-gateway-mcp-apim-python.249Views0likes0CommentsYou Can Scale MCP Servers Behind a Load Balancer on App Service β Here's How
Most MCP servers in the wild are single-instance processes. That's fine when they're driving a local Claude or VS Code session β but it's the wrong shape for a production agent fleet that has to absorb traffic spikes, ride through deploys, and survive instance failures. The good news: the MCP spec already grew up. The 2025-06-18 revision formalizes stateless HTTP transport (and the current 2025-11-25 revision keeps it), which means a single request carries everything the server needs to answer. No long-lived connection, no in-process session table, no sticky-session hacks to keep a client glued to one box. That tiny protocol change unlocks something big: you can stick an MCP server behind App Service's built-in load balancer and scale it like any other web API. This post walks through how, with a runnable sample. Sample: seligj95/app-service-mcp-stateless-scale-python. One azd up and you have a stateless FastAPI MCP server running on three App Service instances behind the platform load balancer, with a staging slot, Application Insights, and a k6 script that visualizes load distribution from the client side. Why "stateless" is the whole story Earlier MCP transports leaned on persistent connections β SSE channels and WebSocket-style sessions where the server held per-client state in memory (open tools, subscriptions, partial streams). That model is great for a local IDE talking to a local process. It's hostile to load balancing, because routing a follow-up request to a different instance breaks the session. The stateless HTTP transport flips that. Each request is a complete JSON-RPC envelope ( initialize , tools/list , tools/call ), every response is self-contained, and the server is allowed to forget the client between requests. Any instance can serve any call. That is the property a load balancer needs. In the sample, every tool is a pure function of its arguments β whoami reports the serving instance, lookup_fact reads a static dictionary, compute_primes runs a sieve. None of them touches per-client memory. That's not a constraint of the protocol; it's a discipline you adopt to keep statelessness intact. Why App Service, and not Functions or AKS Functions and AKS are a couple of the many great options for MCP server hosting depending on what the MCP server is used for. The use case we are discussing here is a scaled MCP server, i.e. an MCP server that must reach a large and broad audience. Here are a few defaults that make App Service a solid option for this scenario: Always On. Reasoning tools call into LLMs and external APIs; latencies routinely sit in the multi-second range. Functions caps a single execution at ten minutes by default (and aggressively scales workers to zero between bursts, which kills warm caches). App Service keeps the process resident. Horizontal scale is one parameter. Pick a Premium SKU, set the plan's capacity to N, and you have N instances behind a managed load balancer. No VMSS to declare, no ingress controller to wire up, no Service to reconcile. Deployment slots. Swap a warmed-up staging slot into production for zero-downtime deploys. Critical when your "API" is an LLM tool surface that an agent is actively driving. Easy Auth. OAuth 2.1 in front of the MCP endpoint without writing the flow yourself β turn on the App Service authentication blade and point it at Entra ID. The sample leaves this off so the deploy is one command, but the wiring is a checkbox away. The TL;DR: it's PaaS that already knows how to run a stateful long-lived process at horizontal scale, which is exactly the shape of a scaled MCP server. The FastAPI MCP server, end-to-end stateless The whole transport is one POST handler. The full source is in main.py , but here are the load-bearing pieces: @app.post("/mcp") async def mcp_endpoint(request: Request): body = await request.json() method = body.get("method", "") msg_id = body.get("id") if method == "initialize": return {"jsonrpc": "2.0", "id": msg_id, "result": _server_info()} if method == "tools/list": return {"jsonrpc": "2.0", "id": msg_id, "result": {"tools": [...]}} if method == "tools/call": params = body.get("params", {}) result = await MCP_TOOLS[params["name"]]["function"](**params.get("arguments", {})) return { "jsonrpc": "2.0", "id": msg_id, "result": {"content": [{"type": "text", "text": json.dumps(result)}]}, } There is no session table. There is no client_id cookie. There is no AsyncIterator held open between requests. initialize , tools/list , and tools/call all return in a single round trip, which is the shape App Service's load balancer expects. The most useful debugging tool in the sample is whoami : async def tool_whoami() -> Dict[str, Any]: return { "instance_id": os.environ.get("WEBSITE_INSTANCE_ID", "local"), "hostname": socket.gethostname(), ... } WEBSITE_INSTANCE_ID is unique per App Service worker. Call whoami a few times from your MCP client and the value rotates β that's the load balancer working. If it doesn't rotate, something is pinning your traffic (almost always the ARR Affinity cookie; we'll get there). The Bicep that actually makes it scale The infra is a P0v3 plan with capacity: 3 , a web app with affinity disabled, and a staging slot on the same plan: resource appServicePlan 'Microsoft.Web/serverfarms@2024-04-01' = { name: name sku: { name: 'P0v3' capacity: instanceCount // 3 by default } properties: { reserved: true } } resource web 'Microsoft.Web/sites@2024-04-01' = { name: name properties: { serverFarmId: appServicePlanId httpsOnly: true clientAffinityEnabled: false // β the one line that matters siteConfig: { linuxFxVersion: 'PYTHON|3.11' alwaysOn: true healthCheckPath: '/health' appCommandLine: 'python -m uvicorn main:app --host 0.0.0.0 --port 8000' } } } resource staging 'Microsoft.Web/sites/slots@2024-04-01' = { parent: web name: 'staging' properties: { /* same shape β separate hostname, same plan */ } } The single most important line in that template is clientAffinityEnabled: false . App Service defaults to on, which sets the ARRAffinity cookie and pins every subsequent request from a given client to the instance that handled the first one. That default exists because legacy ASP.NET apps used in-process session state. Stateless MCP does not. Leaving affinity on silently undoes everything we just built. Premium v3 (P0v3) is the floor for two reasons: it gives Always On and unlocks deployment slots. Below that tier you don't get either. Application Insights without writing telemetry code The sample drops one line of bootstrap into main.py : from azure.monitor.opentelemetry import configure_azure_monitor if os.environ.get("APPLICATIONINSIGHTS_CONNECTION_STRING"): configure_azure_monitor(logger_name="mcp") The Azure Monitor OpenTelemetry distro auto-instruments FastAPI and outbound HTTP. Every request span App Service emits is tagged with cloud_RoleInstance , which Application Insights populates from WEBSITE_INSTANCE_ID . That makes the question "is traffic actually spreading across my instances?" a one-liner in Logs: requests | where timestamp > ago(15m) | where name contains "/mcp" | summarize count() by cloud_RoleInstance | order by count_ desc If you see three roughly-equal rows, you're done. If you see one row, your client is sending ARRAffinity cookies β turn affinity off and redeploy. Deploy azd auth login azd up That provisions the resource group, plan, web app, staging slot, Log Analytics workspace, and Application Insights resource, then deploys the Python app via Oryx. The output prints both WEB_URI and WEB_STAGING_URI . Open the production URI β the home page renders the instance ID that served it. Refresh. The ID changes. To swap the staging slot into production with no downtime: az webapp deployment slot swap \ --resource-group <rg> --name <app> \ --slot staging --target-slot production App Service warms the staging instances, redirects traffic, and the old production becomes the new staging β the classic blue-green pattern, but free. Prove it scales The sample ships a k6 script that hammers /mcp with tools/call requests and tags every response with the instance_id the server returned: BASE_URL=https://<your-app>.azurewebsites.net \ k6 run --summary-export=summary.json loadtest/k6-mcp.js jq '.metrics.mcp_instance_hits.values' summary.json The output groups hits per instance tag. On a three-instance plan with a 60-second steady load you should see something close to: { "count": 1842, "instance0d3e2f...": 614, "instance7a91bc...": 612, "instance19f0c4...": 616 } Roughly 33% on each box β the App Service load balancer round-robining new connections, with no help from the application. What I'd do next The sample is intentionally a starting point. Two extensions are the obvious next moves: Add Easy Auth. Turn on App Service authentication, pick Entra ID, require auth on /mcp . The token surfaces as headers; your tool handlers can use it to identify the calling agent without you owning any of the OAuth machinery. Autoscale on CPU. instanceCount: 3 is a starting point. Wire up Microsoft.Insights/autoscalesettings against the plan and let it scale 3 β 10 on the prime-counting tool. The architecture already supports it β that's the whole point of stateless. Try it Sample repo: github.com/seligj95/app-service-mcp-stateless-scale-python MCP spec: modelcontextprotocol.io/specification/2025-11-25 App Service docs: learn.microsoft.com/azure/app-service/overview If you ship something with it, I'd love to hear how it held up.190Views0likes0CommentsAnnouncing Public Preview of Argo CD extension in AKS Azure Portal Experience
We are excited to announce the public preview of Argo CD in the Azure Portal for Azure Kubernetes Service. As GitOps becomes the standard for deploying and operating applications at scale, customers need a way to adopt GitOps with simpler onboarding, secure defaults, and integrated workflows. With Argo CD now available directly in the Portal, teams can enable and manage GitOps without the complexity of manual setup. Bringing GitOps into the AKS experience Argo CD is widely used across Kubernetes environments, but setup often requires manual configuration across identity, networking, and registry integrations. With the Azure Portal experience, customers can: Enable Argo CD directly from the AKS cluster Configure identity, access, ingress, and registry integration in a guided flow Manage and monitor GitOps workflows through Argo CD UI This reduces onboarding friction and helps you reach your first successful GitOps deployment faster. Trusted identity and secure access The Argo CD experience integrates with Microsoft Entra ID to provide a secure, enterprise-ready foundation: Secure authentication using Workload Identity federation to Azure Container Registry (ACR) and Azure DevOps, removing long-lived credentials and hard-coded secrets Single Sign-On (SSO) using existing Azure identities Enterprise-grade hardening and security This preview includes built-in improvements to strengthen security posture: Images built on Azure Linux for reduced CVEs and improved baseline security Optional automatic patch updates to stay current while maintaining control over change management Parity with upstream Argo CD Argo CD in AKS remains aligned with the upstream open-source project, supporting: High availability (HA) configurations for production workloads Hub-and-spoke architectures for multi-cluster GitOps Application and ApplicationSet for scalable deployment across fleets Getting Started We invite you to explore the Argo CD experience in the Azure Portal and share feedback. To get started, go to your AKS cluster in the Azure Portal, navigate to the GitOps experience, and select Enable Argo CD. Follow the guided setup to configure identity, access, ingress, and registry integration with secure defaults. Once enabled, you can monitor your deployment and view application health and sync status from the Argo CD UI linked in the GitOps blade. For customers who prefer automation and scripting, the Argo CD extension is also available via Azure CLI public preview. NOTE: You can choose between Flux and Argo CD as your GitOps solution based on your needs. The Argo CD option is available during the initial GitOps setup experience, while existing Flux users will continue to see their current configuration.450Views0likes0Comments