azure container apps
226 TopicsRegional Endpoints for Azure Container Registry Geo-Replication — Now in Public Preview
By Johnson Shi, Zoey (Zhuyu) Li, Huangli Wu What's new Regional endpoints for geo-replicated Azure Container Registries are now in public preview. See the feature's official MS Learn documentation. If you've been following since the private preview announcement, here's what changed: No feature flag registration. No subscription enrollment so all Azure subscriptions and customers can now use this feature. No CLI extension. Regional endpoints commands are built into Azure CLI 2.86.0+ natively. If you installed the private preview acrregionalendpoint extension, uninstall it to avoid conflicts. Native CLI and portal support. With Azure CLI 2.86.0+, enable regional endpoints for all geo-replicas of a registry with az acr create --regional-endpoints enabled or az acr update --regional-endpoints enabled . The Azure portal also supports configuring regional endpoints natively. CLI flag rename for configuring a geo-replica's global endpoint routing (an existing separate feature). The existing flag --region-endpoint-enabled (on az acr replication create/update ) has been renamed to --global-endpoint-routing . Key clarifications: "--global-endpoint-routing" (formerly "--region-endpoint-enabled" on "az acr replication create / az acr replication update") — controls whether a specific geo-replica participates in global endpoint routing. This is an existing feature that is different from the new registry-level "--regional-endpoints" feature being discussed in this post. "--regional-endpoints" (on az "acr create / az acr update") — enables or disables the regional endpoints feature at the registry level for all geo-replicas. This is the feature discussed in this post. See the endpoint reference for the full breakdown of the various registry endpoints (global endpoints, regional endpoints, and data endpoints). Regional endpoints are available on Premium SKU registries in all Azure public cloud regions. What are regional endpoints? Regional endpoints give you dedicated, per-region login server URLs for each geo-replica with the following URL pattern: myregistry.eastus.geo.azurecr.io myregistry.westeurope.geo.azurecr.io Regional endpoints coexist with the registry's global endpoint ( myregistry.azurecr.io ) — enabling regional endpoints doesn't disable a registry's global endpoint that is backed by Azure-managed routing. You can choose per workload: You can use the global endpoint with automatic Azure-managed routing with health-aware failover, where Azure will route your requests to the geo-replica with the best network performance profile to the client. You can use a regional endpoint when you need explicit control or routing to a specific geo-replica. Other resources: For the full background on why regional endpoints exist and the problems they solve, see the private preview blog post. For the complete operational deep dive — health-aware failover, throttling considerations, storage quota and pricing, eventual consistency, home region outage behavior, DNS propagation, private endpoint interaction, capacity planning, and monitoring guidance — see How ACR geo-replication handles failover, failback, and traffic redirection. For the behind-the-scenes engineering implementation — architectural overview and the engineering system design of the feature — see Determinism over magic: the engineering design behind Azure Container Registry Regional Endpoints. Getting started Enable regional endpoints on an existing registry: az acr update -n myregistry -g myrg --regional-endpoints enabled View all registry endpoint URLs, including the registry global endpoint, geo-replica regional endpoints, and data endpoints: az acr show-endpoints --name myregistry --resource-group myrg Using regional endpoints Authenticate to a specific regional endpoint: az acr login --name myregistry --endpoint eastus Push to a specific geo-replica. Images and tags pushed to a geo-replica via regional endpoints still propagate to all other geo-replicas under eventual consistency. docker tag myapp:v1 myregistry.eastus.geo.azurecr.io/myapp:v1 docker push myregistry.eastus.geo.azurecr.io/myapp:v1 Pull an image: docker pull myregistry.eastus.geo.azurecr.io/myapp:v1 You can specify regional endpoints directly in Kubernetes deployment manifests if you need to pin workloads to specific regions. This ensures clusters in specific regions always pull from their colocated replica, providing predictable routing and reduced latency. By using different regional endpoints in each cluster's manifests, you can choose to guarantee that each cluster pulls from its local replica instead of relying on Azure-managed routing. East US cluster deployment: apiVersion: apps/v1 kind: Deployment metadata: name: myapp-eastus spec: template: spec: containers: - name: myapp image: myregistry.eastus.geo.azurecr.io/myapp:v1 West Europe cluster deployment: apiVersion: apps/v1 kind: Deployment metadata: name: myapp-westeurope spec: template: spec: containers: - name: myapp image: myregistry.westeurope.geo.azurecr.io/myapp:v1 When to use regional endpoints Scenario What to do Most workloads Keep using the global endpoint ( myregistry.azurecr.io ). Health-aware failover handles routing automatically. Pin AKS clusters to co-located replicas Use regional endpoint URLs in deployment manifests. CI/CD push-then-pull consistency Pin pushes to a regional endpoint to avoid eventual-consistency races. Client-side failover Switch between regional endpoints based on your own health checks. Capacity planning Spread workloads across multiple regional endpoints to avoid per-replica throttling. Troubleshooting Target a specific geo-replica to reproduce or isolate an issue. What changed from private preview Private preview Public preview Feature flag registration required ( az feature register ) No registration needed Subscription private preview enrollment and propagation wait Immediately available to all Azure subscriptions for all Premium SKU registries in all Azure public cloud regions. Separate CLI extension ( acrregionalendpoint ) Built into Azure CLI 2.86.0+ natively No registry-level CLI flag az acr update --regional-endpoints enabled enables regional endpoints for all geo-replicas --region-endpoint-enabled flag for controlling a geo-replica's global endpoint routing via az acr replication update Flag for controlling a geo-replica's global endpoint routing renamed to --global-endpoint-routing No portal support Native Azure portal support for enabling regional endpoints for new registries (during creation) and for existing registries Private preview docs in Azure/acr Full documentation on MS Learn Enabling regional endpoints in the Azure portal You can enable regional endpoints directly from the Azure portal for both new registries (during creation), as well as existing registries: If you were in the private preview 1. Uninstall the CLI extension. The private preview CLI extension conflicts with the built-in commands in Azure CLI 2.86.0+. Remove it: az extension remove --name acrregionalendpoint Verify it's gone: az extension list --query "[?name=='acrregionalendpoint']" -o table 2. Ensure you're running Azure CLI 2.86.0 or later. Regional endpoints commands are available natively starting in Azure CLI 2.86.0. Check your version: az version 3. Update scripts that use --region-endpoint-enabled for controlling global endpoint routing for a geo-replica. The old flag name for controlling a geo-replica's global endpoint routing configuration is deprecated and will be removed in Azure CLI 2.87.0 (June 2026). Update to --global-endpoint-routing : # Old (deprecated) az acr replication update --registry myregistry --name westus \ --region-endpoint-enabled false # New az acr replication update --registry myregistry --name westus \ --global-endpoint-routing false Why the rename? The old flag name --region-endpoint-enabled was confusing — it sounded like it controlled the regional endpoints feature, but it actually controlled whether a geo-replica participates in global endpoint routing. The new name --global-endpoint-routing says exactly what it does. For a full breakdown of all three CLI flags and how they relate, see the endpoint reference. Learn more Full documentation: Geo-replication in Azure Container Registry — Regional endpoints — prerequisites, CLI commands, network considerations, private endpoint integration, and troubleshooting. Operational deep dive: How ACR geo-replication handles failover, failback, and traffic redirection — health-aware failover, throttling, eventual consistency, DNS considerations, monitoring, pricing, and a full walkthrough. Behind-the-scenes engineering implementation: Determinism over magic: the engineering design behind Azure Container Registry Regional Endpoints — architectural details and the engineering system design behind the feature. Endpoint reference: Azure Container Registry endpoint reference — all endpoint types, URL formats, and CLI flags in one place. Private endpoints: Connect privately to a registry using private endpoints — IP allocation math, subnet sizing, and NIC queries for registries with regional endpoints. Firewall rules: Configure firewall access rules — which FQDNs to allow for regional endpoints. Feedback We'd love to hear how you're using regional endpoints and what we can improve. Reach out via: Azure Container Registry GitHub repository — issues, feature requests, and discussion Azure portal feedback — use the feedback button in the Azure portal on your registry's page Regional endpoints are on the path to GA. Your feedback directly shapes the feature's direction.89Views1like1CommentVNet integration for Azure SRE Agent (preview)
For many production systems, the logs, databases, private endpoints, repositories, and runbooks an SRE Agent needs to do its job are behind network boundaries your security team already governs. VNet integration for Azure SRE Agent, now in preview, puts the agent's outbound traffic under those same controls - your virtual network, your NSG rules, your private DNS - so it reaches only what your network allows. The principle is one your security team already applies to every other workload: a component's network access shouldn't depend on the component behaving correctly. Identity governs what the agent can reach. Permissions and hooks shape what it does within reach. The network sits beneath both: it blocks any request to a destination you haven't allowed no matter what the agent decides. Why egress control matters Two reasons. First, the agent reads sensitive things by design. Inspecting logs, code, configuration, and internal systems is the whole point during an incident, which means you have to decide where that data can go. Open egress gives that data a path out of your network - a risk you wouldn't accept for any other production-adjacent workload. Second, it reasons over text it didn't write - logs, issue descriptions, tool output — which is how prompt injection gets in. Handling that is partly model safety, and Azure SRE Agent runs under Microsoft's Responsible AI standard with safety work from OpenAI and Anthropic. Network controls add another layer: an instruction that tries to reach a destination you haven't allowed can't run, because the network blocks it. For example, an agent investigating an outage might query Log Analytics, read deployment configuration, and call an internal runbook - all private resources. With VNet integration, those calls follow the routes, DNS, and firewall rules your workloads already use. A request to an external endpoint you haven't allowed fails at the network boundary. It doesn't depend on the model recognizing the risk and refusing; the network stops it either way. Choose an egress mode Azure SRE Agent has three egress modes, and you don't have to start at the strongest. Unrestricted - all outbound traffic allowed Limited - deny all outbound, allow an explicit list of hosts. Gives you host-level control without setting up a full VNet Azure VNet - outbound traffic goes through a delegated subnet in your network, with your NSG rules and private DNS applied. The recommended mode for production and regulated workloads. How Azure VNet mode works Outbound traffic takes one of two paths, and every call takes exactly one. Your VNet. Everything not placed on the managed path goes through a delegated subnet in your own network, where your NSG rules, private DNS, and firewall all apply. The agent is just another workload on that subnet, so it can reach what the subnet can reach: databases behind private endpoints, internal services, monitoring stores, and key vaults -the parts of production that aren't reachable from the public internet. The resources that matter most during an incident are usually the private ones. If your network connects to on-premises over ExpressRoute or VPN, the agent can reach those systems too, as long as your existing routes and rules allow it. The managed infra path. Some destinations go through Azure SRE Agent's managed infrastructure network instead - platform services the agent needs, plus optional categories you turn on: package registries, code repositories, and remote MCP servers. This path skips your VNet, so your NSG rules and Firewall Policies don't apply to it. Treat it as a deliberate exception, used only where you need it. Why public services start on the managed path Public services are hard to allow by IP address. GitHub, PyPI, npm, NuGet, apt, and the container registries run on large, changing IP ranges, and they don't map to a single Azure service tag. If your NSG filters by IP and port, keeping those lists up to date is constant work, and when a list falls behind, the agent can't pull a package or read a repository - and an investigation stalls on a networking problem that has nothing to do with the incident. Each category has a toggle: package registries (PyPI, npm, NuGet, apt), code repositories (GitHub, GitHub Enterprise, Azure DevOps), remote MCP servers, and a list of additional hostnames. Starting with these on the managed path keeps the agent working reliably without maintaining an IP allowlist. For build-time dependencies, that's usually fine. If you want this traffic inspected too, the next step is name-based (FQDN) egress filtering in your own network. Once your firewall can allow github.com and pypi.org by name, you can move these categories off the managed path and route them through your VNet instead Configure it Two decisions: the subnet, and what (if anything) uses the bypass. Navigate to Settings > Workspace Configuration > Network Choose Azure VNet as the egress mode. Select a subnet that is /28 or larger and delegated to `Microsoft.App/environments`. Decide which categories, if any, use the bypass. Restrict who can change the egress mode and bypass toggles. These settings widen or narrow the agent's reach, so govern them like any production network control. Test the outbound behavior before using the agent with production data. A reasonable setup for most enterprises during preview: use Azure VNet mode, keep package registries and code repositories on the bypass if you need reliable access to them, and route everything else through your VNet. Stricter environments can turn those categories off and rely on their own name-based firewall rules. What it doesn't cover yet VNet integration is in preview, with two limitations to know. It covers outbound traffic only - reaching the agent privately from inside your network isn't part of this preview. And connector traffic still routes over the public internet; the governance and credential isolation in Connectors V2 still apply. Use VNet integration for outbound control of the agent workspace, and combine it with identity, RBAC, tool permissions, hooks, and connector governance for a complete set of controls. Where it fits VNet integration doesn't replace identity, RBAC, tool permissions, or connector governance. It controls where traffic can go. The agent still needs the right identity and permissions to access a resource in the first place. Identity is the foundation: your RBAC assignments decide what the agent can reach. Permissions and hooks shape what it does within reach: allow/ask/deny rules control what runs, and hooks let you inspect or change a tool call before it runs. VNet integration sits underneath, controlling where traffic can go no matter what the agent tries to do. You want the agent to be capable. You also want a boundary that holds whether or not it is. Get started Create an SRE Agent - https://aka.ms/sreagent Documentation - https://aka.ms/sreagent/newdocs Recipes - https://aka.ms/sreagent/recipes Build 2026 Announcement - https://aka.ms/Build26/blog/SREAgent504Views1like0CommentsManaged Connectors for SRE Agent (preview)- Govern what your agent can do
Giving an agent access to a tool is the easy part. The harder question is what it's allowed to do with that access. "Can the agent copy a file in OneDrive?" mostly answers itself. "Can it copy any file, to any destination, over one that's already there?" is the one that decides whether the integration has a governance layer. Managed Connectors is built around that second question. It expands the catalog of tools the agent can reach - OneDrive, SharePoint, Google Drive, GitLab, Power BI, Microsoft Security Copilot, with more being added regularly - and pairs it with a governance model that keeps the policy for those tools outside the agent's control. This is part of the Azure SRE Agent announcements at Build 2026 What's new Managed Connectors is the next generation of our connector experience. It significantly expands the catalog of third-party and first-party SaaS integrations available to SRE Agent and surfaces each one to the agent as a curated set of operations through the Model Context Protocol (MCP) - the same standard the agent already uses for every other tool source. Governance: the agent gets capability, you keep control The governance model is the headline of this release, so it's worth being concrete about it. When you add a connector, you walk through a short wizard - Set up connector, Configure tools, Review & Save - and the "Configure tools" step is where the policy is set. Three things make it different from "just wire the API up to the LLM": You choose what's exposed - it isn't automatic. A connector might offer 40+ operations; in the wizard you pick the ones the agent can use. The rest aren't shown to the model, so it can't call them. Parameter policy lives outside the agent. For each selected operation you can mark parameters as user-defined (pinned to a value you specify) or agent-defined (the agent fills it in). On the Microsoft Planner “Create a task” tool, for example, you can choose the group ID from a list of your joined groups – this means that the agent provides the task details but can’t assign it to any arbitrary group, because that isn’t a parameter it sees when invoking the tool. Per-tool approval is built in. Each operation has an Allow/Ask toggle integrated directly into the creation and edit wizards. "Ask" routes the call through the agent runtime human-in-the-loop approval flow before it executes. On that same Microsoft Planner connector, you might leave read-only tools like “List tasks” or "Get plan details” on Allow, but flip “Delete a task” to Ask so a human must confirm before anything is removed. This is enforced on the agent's runtime; it is not a prompt instruction the model can be talked out of following. Credential Isolation No long-lived secrets in the agent. No API keys, no client secrets, no certificates, no OAuth tokens. All service credentials are encrypted at rest and stored outside of the agent’s trust boundary Automatic token refreshed. Once you consent, the internal connector resource keeps your tokens valid. You won't be asked to re-authenticate unless your service itself requires it. You consent once, in your own browser, with your own service. SRE Agent never proxies your password or the sign-in flow. Per-connection authorization. Each connection is bound to the specific SRE Agent instance you set up on and cannot be used by external threat actors. How it fits together All of this is stored and evaluated outside the agent loop. Each configured connector becomes an MCP server that the SRE Agent runtime registers as a tool source, the same standard wire format the agent uses for everything else, so adoption on the model side is trivial. Each layer does one job, and the trust boundary between "what the model decided" and "what was actually sent" is explicit and inspectable: the agent never sees the operations you didn't select, never sees the parameter slots you pinned, and cannot bypass approval on operations you marked Ask. How to try it Open the SRE Agent portal and go to Builder > Connectors. Pick a connector from the catalog with the “Preview” label and go through the creation wizard steps. At the “Set up connector” step, choose how the connector authenticates. Start with “OAuth” if you just want to sign-in and see it working against your own account. At “Configure tools”, select the operations you want to expose, pin any parameters that shouldn't be agent-controlled, and mark sensitive operations as “Ask.” Review & Save. The connector is registered with the runtime and immediately available to your agent. You can enable/disable specific tools or connectors in the “Capabilities” section. Edit connector – after creating the new connector, at any point you can go back and authenticate it with a different account, add or remove operations, update tool parameters and configure approval policies Resources Create new SRE Agent — https://aka.ms/sreagent SRE Agent Documentation — https://aka.ms/sreagent/newdocs SRE Agent recipes — https://aka.ms/sreagent/recipes Build 2026 SRE Agent announcements - https://aka.ms/Build26/blog/SREAgent222Views1like0CommentsShaping what Azure SRE Agent does: Tool Permissions and Hooks
When an AI agent runs against production, the first question every security team asks is "What can it do, who decided it could, and what stops it from doing something it should not." Azure SRE Agent reached general availability in March. Since then, teams inside Microsoft and customers running it against real production workloads have asked for the same thing: finer-grained controls over what the agent can do on its own and a clear answer to who governs each call that reaches a tool. Today at Build 2026, we are releasing global tool access policies as one of a set of new governance controls. This post covers how they work. Tool access policies give security and platform teams a single place to define which tools the agent can invoke, under what conditions, and what requires human approval before it runs. Underneath those policies sits the identity the agent runs as the bedrock that every other control layer depends on. It is defense in depth applied to agent behavior: layers of control, each one holding on its own, so that governing the agent is something you can read, audit, and reason about as you scale it across production. Identity is the bedrock: managed identity today, agent identity next Start here, because nothing else matters if you skip it. The identity the SRE Agent runs as, and the Azure RBAC role assignments on that identity, are the most powerful boundary the agent works inside of. If your role assignments do not grant the agent access to a resource, none of the controls below come into play, because the agent cannot reach the resource to begin with. Network rules, tool permissions, hooks, and connector contracts all sit on top of an RBAC story that you write. The features in this post add layers above that floor. They do not replace it. Today the SRE Agent operates as a managed identity, and your RBAC role assignments on that identity govern what it can do. This is the bedrock, and it is the same model your other Azure workloads already use. You assign roles, you scope them, and the agent inherits exactly what you granted and nothing more. Everything that follows assumes the bedrock is in place. With identity settled, the next question is the obvious one: where is the agent allowed to send its traffic? Permissions: govern what the agent does with a tool Identity decides what the agent can reach. Permissions decide what the agent does with the access it has, down to the individual tool. Two levels cover the range: a point-and-click grid for the common cases, and hooks when a decision needs your own code. The grid is the easy mode. Every tool the agent can use, built-in tools along with MCP servers, services, and custom tools, shows up in one searchable list with two switches. On/Off sets whether the tool is available at all; turn it off and the agent cannot use it. Allow/Ask sets what happens when it is on: Allow lets the agent run the tool automatically, Ask requires a human to approve every time, except in Autonomous mode. Select tools in bulk to flip a whole category at once, filter by category or permission, and use the Advanced permissions tab when you want rules that apply at global, per-agent, or per-thread scope instead of tool by tool. Defaults stay put until you touch them, and the engine is fail-closed: if a rule cannot be evaluated, the call is blocked rather than allowed. That covers most of what teams need. Underneath those switches are three rules, allow, ask, and deny, and the Advanced tab is where you set them by scope. Global rules apply to every agent and thread, Agent rules to one custom agent, Thread rules to a single conversation. Deny is the hard one: it blocks the tool outright no matter the run mode, and a deny at a higher scope always wins, so an Allow at thread scope cannot reopen something denied globally. That split is deliberate. A platform team sets the Global guardrails that should never be crossed and the Asks that always need a human, and service teams add their own Allow rules at Agent scope for routine work, without being able to override the guardrails above them. Platform team, Global scope: deny: bash(az * delete *) - never delete, on any agent or thread deny: bash(kubectl delete *) ask: bash(az webapp restart *) - always confirm, even in Autonomous allow: bash(az monitor *) - auto-approve monitoring queries Service team, Agent scope: allow: bash(kubectl get *) - routine read-only work allow: bash(kubectl describe *) Two details make this safe to lean on. Rules match the canonicalized tool invocation rather than the raw text, so enforcement holds no matter how the command was assembled. And fail-closed has a softer edge than a hard stop: a cached last-known-good policy covers transient failures, so a blip in the policy store blocks the call rather than silently widening access. You can find these under Capabilities > Tools The layer worth spending time on is hooks. Allow and Ask answer "should this tool run." Hooks answer "should this specific call run, given exactly what it is about to do." A hook fires before the agent runs a tool and receives the actual call, parameters and all. Your code then decides the outcome and can reshape it: rewrite parameters before they are sent, inject extra context into the pipeline as a user message so the agent reconsiders before its next step, block the call outright, or redirect the agent toward a safer path. Because your code sees the real parameters, the decision can depend on anything you can express in code: which resource the call targets, whether a value falls outside an allowed range, the time of day, the result of an external policy lookup. This is where you write the rule the grid cannot. Two kinds of hook, mixable on the same agent. Command hooks are a script you write; reach for these when code is enough. Prompt hooks put a separate LLM in the loop as a judge that evaluates the call in context; reach for these when the decision needs reasoning rather than a fixed rule. A real example from our own internal test agent: when the agent tries to list files through the shell with ls or dir, a hook blocks the call. The agent absorbs the signal, reconsiders, and reaches for the ListDir tool instead. The hook did not argue with a human. It shaped what happened next. As with the grid, configure nothing and the agent behaves exactly as it does today. Both are additive. Authoring one is a short form. You name the hook, pick the event (Pre Tool Use, so it runs before the call), and set a tool matcher, either picked from the tool menu or written as a regex like (FetchWebpage|SearchMemory) with anchors and lookaheads when you need them, so the hook fires only on the calls you care about. You set a timeout and a fail mode (Block, so a hook that errors or hangs stops the call rather than waving it through), and you write the body in Bash or Python. A command hook reads the call as JSON on stdin, the event name, the tool name, its parameters, and the call id, and answers on stdout. Print nothing and exit zero to allow. Return a block decision with a reason to stop the call, and that reason is what the agent reads back. You can also substitute: run a cheaper or safer version yourself, block the real call, and hand your own output back as the result, so the agent never runs the expensive or risky original. #!/bin/bash input=$(cat) tool=$(echo "$input" | jq -r '.tool_name') # Block one tool, with a reason the agent will read if [ "$tool" = "ExampleToolName" ]; then echo '{"decision":"block","reason":"Blocked ExampleToolName by hook policy."}' exit 0 fi # Otherwise allow: print nothing and exit 0 exit 0 You can find these under Builder > Hooks Each layer holds on its own The layers stack. Identity is the floor: your RBAC assignments decide what the agent can reach at all. Permissions, the grid and hooks together, decide what it does with a tool. You author each layer, each one holds whether or not the layer above it behaves as expected, and all of it configures through the same ARM and Bicep surface your platform team already uses, reproducible the way the rest of your Azure estate is. The upgrade path is additive and non-breaking. Existing agents keep working. Turn on each control when you are ready, in the order your governance requires. There is more coming. We run Azure SRE Agent inside Microsoft on our own production workloads, so we feel the same gaps you do, and the next round is shaped by what we hear from teams running it in production today. Which control is doing the most for you, and which one are you still waiting on? Let us know and thank you! Getting started Create new SRE Agent — https://aka.ms/sreagent SRE Agent Documentation — https://aka.ms/sreagent/newdocs SRE Agent recipes — https://aka.ms/sreagent/recipes Build 2026 Announcement - https://aka.ms/Build26/blog/SREAgent203Views0likes0CommentsPrivate Plugins with Azure SRE Agent
SRE's and platform teams are building operational skills specific to their infrastructure: investigation runbooks, compliance checks, cost analysis playbooks, deployment verification procedures. The next step is making that work reusable across every agent in the organization without exposing it publicly. Today, SRE Agent supports plugin marketplaces hosted in private GitHub repositories, including GitHub Enterprise. This is part of the Azure SRE Agent announcements at Build 2026. You can now point SRE Agent at a private repo when adding a marketplace or installing a plugin. Authentication is handled per-marketplace, and supports OAuth, GitHub PATs, and GitHub Apps for GHE tenants. From one agent to an organization’s plugin catalog Most teams start with a single SRE Agent connected to their services. The agent learns their infrastructure, runs their runbooks, and handles their incidents. It works well. Then adoption grows. A second team stands up their own agent. Then a third. Platform engineering wants every agent to run the same compliance checks. Security needs approval hooks enforced consistently. FinOps has cost governance skills that should be standard across the organization. Suddenly the question isn’t “how do I set up my agent,” it’s “how do we share operational knowledge across all of them.” Without a distribution model, teams end up copying skill files between agents manually. A platform team writes a runbook, shares it over email or a wiki link, and each service team pastes it into their agent individually. When the runbook improves, some agents get updated, some don’t. There’s no version tracking, no central catalog, and no way to know which agent is running which version of which skill. Private marketplace support solves this. How Private Plugin marketplace meet enterprise needs A platform team publishes once, every agent installs. Codify best practices as plugins in a private GitHub repo. Service teams add that repo as a marketplace in their agents and install what they need. Compliance checks, cost governance thresholds, incident playbooks, deployment verification procedures all distributed through versioned plugins. Each team retains ownership. Security controls which plugins enforce approval hooks. FinOps locks cost thresholds into parameter values. Platform engineering governs infrastructure investigation patterns. The marketplace is the distribution layer for organizational standards. Versions are pinned, updates are explicit. Each installation locks to the commit at install time. A merged PR upstream does not change any agent’s behavior. Teams promote new versions on their own schedule: validate in dev, promote to staging, then production. Different agents can run different versions simultaneously. Reuse across environments and tools. The same plugin works across dev, staging, and production agents, and can be reused by local coding agents and other services that support plugins. One source of truth, not separate copies per environment. Accessing Private Plugin marketplaces Private repo support adds authentication to the SRE Agent's plugin workflow so your agent can clone and install from repos that require credentials. Authentication is configured once per marketplace. Every plugin within it inherits the credentials. Auth method When to use Setup OAuth github.com repos your agent can already access Uses your existing GitHub connection. One click. Personal access token Private repos in other orgs on github.com Per-marketplace PAT. Scoped to just that marketplace. GitHub App GitHub Enterprise (*.ghe.com) BYO App with private key in Azure Key Vault. Short-lived tokens minted at runtime. Getting started In SRE Agent, navigate to Builder > Plugins, then click Add Marketplace and enter the URL of the private marketplace you want to connect to. Then click Connect to GitHub to complete the OAuth sign-in. Click Add and you will see the plugins available from your connected marketplace. Click on the plugin to install and in the detail view you can browse the skills packaged with the plugin. click Install to install this plugin. You can now see the skills imported from plugins from Capabilities > Skills > Custom Skills The bottom line Private repo support turns the Plugin Marketplace from a public skill catalog into your organization’s internal distribution platform for operational automation. Your team writes the plugins. Your agents install them. Your GitHub permissions control who has access. Try it yourself: create a private repo with a marketplace.json and a few skills, add it as a marketplace in your agent, and install a plugin. Resources SRE Agent documentation — https://aka.ms/sreagent/newdocs SRE Agent overview — https://aka.ms/sreagent/newdocsoverview Plugin Marketplace capability page — https://aka.ms/sreagent/newdocs/capabilities/plugin-marketplace Build 2026 SRE Agent announcements - https://aka.ms/Build26/blog/SREAgent174Views0likes0CommentsIntroducing On-demand Sandboxes for Azure Durable Task Scheduler (Private Preview)
Maybe it needs a native toolchain. Maybe it runs untrusted customer or LLM-generated code. Maybe it needs Python from a .NET orchestrator, or bursty compute that should scale to zero when the work is done. Today, we're thrilled to announce On-demand Sandboxes for Azure Durable Task Scheduler, now available in private preview. On-demand Sandboxes lets you move those individual workflow steps to managed, isolated compute while your orchestrator stays exactly where it is. Tell DTS which steps should run in isolation, provide a container image with the step code, and DTS handles provisioning, scaling, and teardown. No infrastructure to manage, no idle costs, no orchestrator changes. Sign up for On-demand Sandboxes Private Preview Today → Availability: On-demand Sandboxes targets the standalone Durable Task SDKs used outside the Azure Functions host — for apps running on Azure Container Apps, Azure Kubernetes Service, App Service, or anywhere else you self-host. The private preview supports the .NET and Python Durable Task SDKs, with additional language SDKs and Azure Functions support coming soon. What is Azure Durable Task Scheduler? The Durable Task Scheduler is a fully managed backend for durable execution on Azure. It can serve as the backend for a Durable Function App using the Durable Functions extension, or as the backend for an app leveraging the Durable Task SDKs in other compute environments, such as Azure Container Apps, Azure Kubernetes Service, or Azure App Service. For a deeper introduction, see the Durable Task Scheduler overview or the full Durable Task documentation. Why On-demand Sandboxes? Most activities belong in-process. They're fast, simple, and co-located with your orchestrator. But sometimes you hit a step that doesn't fit: it needs a native binary, a different language runtime, per-invocation isolation, or bursty compute you don't want to keep warm. On-demand Sandboxes gives you a way to handle those exceptions without spinning up dedicated infrastructure or managing scaling policies in Azure Kubernetes Service or Azure Container Apps. Activity-level granularity. Move individual steps to managed compute, not your whole app. Per-activity or per-invocation isolation. Each execution runs in a clean, microVM-backed sandbox. Ideal for untrusted code, customer plugins, or LLM-generated logic. Cross-runtime flexibility. Run a Python inference step from a .NET orchestrator. No compromise on either side. Scale-to-zero. Pay for CPU and memory per second of execution, not infrastructure that waits. No orchestrator changes. Your orchestration code and hosting model don't change at all. Here are a few scenarios where On-demand Sandboxes shines: Native toolchains. Package ffmpeg, LibreOffice, or Pandoc in a container without dragging them into your main app. CPU-heavy preprocessing. OCR, layout extraction, or image processing can scale independently of the rest of your workflow. Cross-runtime workflows. A .NET orchestrator dispatches a Python inference step. No compromises. Sandboxed code execution. Run customer plugins or LLM-generated code with a clean boundary on every invocation. Multi-tenant isolation. Tenant-specific steps get dedicated boundaries while everything else stays in-process. Bursty event-driven workloads. Steps that spike hard but rarely may not justify always-on infrastructure. Sub-second cold starts mean you get capacity when you need it without paying to keep it warm. How it works On-demand Sandboxes uses a two-part model: a worker profile in your orchestrator app that tells DTS which activities to offload, and a worker image that contains those activity implementations. Your orchestrator still calls activities the same way it always has; the decision to run one activity in a sandbox lives in the profile configuration. 1. Declare a sandbox worker profile In the app that hosts your orchestrator, define a sandbox worker profile. The profile gives DTS the container image, resource shape, concurrency setting, and activity names that should run in a sandbox: using Microsoft.DurableTask.Worker.AzureManaged.Sandbox; [SandboxWorkerProfile("code-executor")] internal sealed class CodeSandboxWorkerProfile : ISandboxWorkerProfile { public void Configure(SandboxOptions options) { options.ContainerImage = Environment.GetEnvironmentVariable("DTS_SANDBOX_IMAGE") ?? throw new InvalidOperationException("DTS_SANDBOX_IMAGE is required."); options.Cpu = "1000m"; options.Memory = "2048Mi"; options.MaxConcurrentActivities = 1; options.AddActivity(TaskNames.ExecuteCode); } } Then enable on-demand sandbox discovery when you configure the Durable Task worker in the main app: workerBuilder.AddTasks(tasks => tasks.AddAllGeneratedTasks()); workerBuilder.UseDurableTaskScheduler(options => { options.EndpointAddress = Environment.GetEnvironmentVariable("DTS_ENDPOINT"); options.TaskHubName = Environment.GetEnvironmentVariable("DTS_TASK_HUB"); options.Credential = credential; }); workerBuilder.EnableSandboxes(); Here's what the profile configuration does: SandboxWorkerProfile: a friendly profile id for this sandbox setup. It groups the activity, image, and resource settings for monitoring and reuse across deployments. ContainerImage: the container image (from your registry) that contains the activity implementations. Cpu / Memory: the resource shape for each worker instance. Sized per your activity's needs. MaxConcurrentActivities: how many activities a single worker instance can process concurrently. AddActivity: the specific activity to offload. Only activities added to a sandbox worker profile execute in DTS-managed isolated compute; everything else stays in-process. The orchestrator call site doesn't change: ExecuteCodeOutput execution = await context.CallActivityAsync<ExecuteCodeOutput>( TaskNames.ExecuteCode, new ExecuteCodeInput(pythonCode, input.CsvData)); ExecuteCode is not registered in the main app's in-process activity list. When the orchestrator calls it, DTS uses the codegen profile to route the work to the sandbox image. 2. Build the worker image The worker image is a container you own. In most apps, this worker lives in a separate project from the orchestrator host so it can have its own entry point, dependencies, and container image. It registers the activity implementations it can run and opts in to managed execution with UseSandboxWorker(): builder.Services.AddDurableTaskWorker(workerBuilder => { workerBuilder.AddTasks(tasks => { tasks.AddActivity<ExecuteCodeActivity>(); }); workerBuilder.UseSandboxWorker(); }); UseSandboxWorker() is the key line. It signals that this worker runs in DTS-managed compute. The sandbox worker does not need to configure the DTS endpoint, task hub, profile id, or credentials; DTS injects the runtime settings when it starts the container. The activity implementations themselves are standard Durable Task activities. There's nothing special about the activity code: it can call a runtime with different dependencies, such as Python and pandas, while running in an isolated container instead of in your main app's process. Package the image like any containerized service, including whatever runtimes and native tools the activity needs. Push it to your container registry (e.g., Azure Container Registry) and reference the image in the worker profile's ContainerImage option. View logs in the DTS dashboard Once your sandbox activities are running, you can view their execution logs directly in the Durable Task Scheduler dashboard. The dashboard shows real-time output from your managed workers, including stdout, stderr, and activity lifecycle events. This gives you full visibility into what's happening inside the sandbox without needing to configure external log sinks or set up your own observability pipeline. Demo Get started On-demand Sandboxes is in private preview. To get access, sign up here. We'll enable the feature on your scheduler and help you get your first sandbox activity running. Once you're in, the workflow is straightforward: declare a sandbox worker profile in your orchestrator app, build and push a worker image, and DTS takes care of the rest. Sign up for On-demand Sandboxes Private Preview Today → Documentation: Durable Task Scheduler overview Samples: Azure-Samples/Durable-Task-Scheduler Pricing: Azure Durable Task Scheduler pricing Questions, feedback, or ideas? Open an issue in the Durable-Task-Scheduler GitHub repo. We'd love to hear from you.262Views0likes0CommentsIntroducing Azure Container Apps Sandboxes: Secure Infrastructure for Agentic Workloads
Today we are announcing the public preview of Azure Container Apps Sandboxes - a new first-class resource type that gives you fast, secure, ephemeral compute environments with built-in suspend and resume. This is the underlying infrastructure on which products like Cloud sandboxes in GitHub Copilot, Foundry Hosted Agents, and Azure Container Apps Express are built, you now have the opportunity to build your solutions leveraging this infrastructure. Azure Container Apps Sandboxes unlocks two massive opportunities. For platform developers and ISVs, sandboxes give you the same isolated compute fabric that powers many Microsoft products. You get the building blocks to create your own multi-tenant platform on proven, enterprise-scale infrastructure. For AI agents, sandboxes become a self-configurable tool that lets agents extend their own capabilities on the fly. An agent can spin up a fresh sandbox in milliseconds and use it to execute untrusted code, compile source, test HTTP requests against a live app, launch a browser session, or tackle whatever needs a quick and scalable infrastructure. On one side it empowers humans to build platforms, on the other it empowers agents to build their own capabilities. Both get enterprise-grade isolation, instant startup, and snapshot-based persistence out of the box. We'll walk through the resource model, sandbox lifecycle, the features that set Sandboxes apart - like snapshots, lifecycle policies, network egress controls, volumes, and managed identities - and show you how to get started with the portal and CLI. What Are Container Apps Sandboxes? Container Apps Sandboxes are secure, isolated compute environments that start in sub-second time, scale to thousands, and cost nothing when idle. Each sandbox runs in its own hardware-isolated microVM boundary - fully separated from the host, the platform, and every other sandbox. You bring your own Open Container Initiative (OCI) image, and Sandboxes handle the rest: provisioning from prewarmed pools, strong multi-tenant isolation, and snapshot-based suspend/resume that preserves full memory and disk state across sessions. There are many ways Sandboxes can help you build your next project - here are a few: Your own build & test systems - wire a Sandbox into your CI/CD flow to run builds while your laptop stays cool. Agents that can run anything safely - an agent spawns a sandbox, executes work inside it, and returns the output with no agent host privileges required. Agent swarms - decompose a research question, spawn N sandbox workers in parallel (each pinned to its own image and egress policy), and synthesize the result. Early access customers are already unlocking significant benefits by leveraging Azure Container Apps Sandboxes. "With Azure Container Apps sandboxes, SitecoreAI can safely enable agents to take real action. The combination of multi-tenant isolation, rapid scale-out, and full automation allows Sitecore to run long-lived, autonomous agents that securely execute code, manage workflows, and interact with enterprise systems within secure, governed environments. With this foundation, we can build agents that do real work: assembling content, personalizing experiences, and optimizing campaigns in production. Agents that operate continuously, learn from results, and improve over time, so our customers get better outcomes without giving up control." - Mo Cherif, VP of AI and Innovation, Sitecore "We got early access to Azure Container Apps Sandboxes, and got the first prototype integrated with Atlas AI in hours, and it's already shaping a new Atlas AI capability that we plan to launch in preview in Q3. It gives every Atlas AI agent a safe, sandboxed workspace (file system, terminal, code execution) on a customer's live data in Cognite Data Fusion. The value: Industrial process, reliability, and production engineers spend days and weeks on questions like "which wells are underperforming and why?" These questions are tractable but expensive, so they are asked rarely and decisions are made on gut feel. With this, an agent pulls the data, runs the analysis, cross-references maintenance and inspection records, and returns a cited draft in minutes. Sandboxes make it practical: Aligned feature set, per-customer isolation, pause/resume across multi-day investigations, scale-to-zero economics." - Kelvin Sundli, Product manager, Atlas AI, Cognite Resource Model: Sandbox Groups and Sandboxes The top-level ARM resource is Microsoft.App/SandboxGroups. A Sandbox Group is the management boundary for a collection of sandboxes that share configuration - think of it like a Container Apps Environment, but purpose-built for sandboxes. When you create a Sandbox Group, you specify: Subscription, Resource Group, and Region Sandbox defaults (optional): default CPU, memory, disk, max sandbox count, and default idle timeout Networking: optionally deploy into a custom VNet with a dedicated subnet for private networking Identity: System or user assigned Entra identity. Individual sandboxes are created within a Sandbox Group. Each sandbox has its own source (disk image or snapshot), resource tier, lifecycle policy, network egress policy, environment variables, ports, volumes, and connections. Sandbox Lifecycle Sandboxes have a well-defined lifecycle with the following states: State Description Creating Provisioning the sandbox from a disk image or snapshot Running Actively executing - backed by a live microVM Idle System-suspended after inactivity; can auto-resume on the next request Suspended Full state (memory + disk) preserved as a snapshot; no compute costs Resuming Restoring from a suspended or idle state - sub-second for most workloads Stopped User-initiated stop; can be resumed Stopping Graceful shutdown in progress Deleting Teardown in progress The key insight here is the distinction between Idle and Suspended. When a sandbox goes idle (e.g., no traffic for a configured timeout), the system can automatically suspend it and capture a snapshot. When a new request arrives, the sandbox resumes transparently. This gives you scale-to-zero economics with stateful compute - something that wasn't possible before without significant custom engineering. Disk Images: Bring Your Own Container Sandboxes boot from Disk Images - Open Container Initiative (OCI) images converted into an optimized root filesystem format. You point to any OCI image (public or private registry), and the platform builds a bootable disk image from it. You can start with public, pre-built images maintained by the platform (for example, Ubuntu base images), or bring your own private images. For private registries, you can authenticate with username/token or use a user-assigned managed identity for Azure Container Registry (ACR) – integrated with Azure as you expect. Snapshots: Full-State Persistence Snapshots capture the complete state of a running sandbox - memory, disk, and all running processes. When you resume a sandbox from a snapshot, every process, open file handle, and in-memory data structure is restored exactly as it was. A snapshot captures the full state of a running sandbox: memory pages, disk, processes. Two ways to make one - automatically on suspend, or manually on demand. Three things they're great for: Checkpointing mid-task so a long-running agent can resume exactly where it left off Cloning an environment that's already warm - dependencies installed, caches populated, services running Shipping a "ready-to-go" state that resumes in sub-second instead of cold-booting Snapshots are free during the preview, after which they will be stored as Azure Blob Storage at standard rates. Each snapshot records the source sandbox, resource allocation (CPU, memory, disk), and container metadata - so what you get back is exactly what you snapshotted. Resource Tiers Every sandbox is assigned to a resource tier that determines its CPU, memory, and disk allocation: Tier CPU Memory Disk XS 0.25 vCPU 0.5 GB 5 GB S 0.5 vCPU 1 GB 10 GB M (default) 1vCPU 2 GB 20 GB L 2 vCPU 4 GB 40 GB XL 4 vCPU 8 GB 80 GB When creating a sandbox from a snapshot, the resource tier is inherited from the snapshot and cannot be changed - this ensures the restored environment has the exact resources it was running with when the snapshot was taken. Lifecycle Policies: Auto-Suspend and Auto-Delete Every sandbox can be configured with lifecycle policies that automate state transitions and cleanup: Auto-Suspend Idle timeout: How long a sandbox can sit idle before being suspended (configurable: 1m, 2m, 5m, 10m, 30m, 60m) Suspend mode: Disk + Memory (default): Full snapshot including memory state - resume picks up exactly where you left off, with all processes and in-memory data intact. Disk: Only the disk is preserved; the VM restarts fresh on resume. Useful when you only need file persistence, not process continuity. Auto-Delete Automatically delete sandboxes after a configurable number of days of inactivity Prevents accumulation of abandoned sandboxes that consume snapshot storage These lifecycle policies are what make Sandboxes economically viable at scale. A platform serving thousands of tenants can configure aggressive idle timeouts (say, 60 seconds) with Memory suspend mode, and each tenant's sandbox disappears from the billing meter almost immediately - but resumes in sub-second time the moment they return. Network Egress Policy For scenarios involving untrusted code - AI agents executing LLM-generated scripts, multi-tenant SaaS with user-submitted workloads - controlling outbound network access is critical. Sandboxes provide a per-sandbox Network Egress Policy: Default action: Allow or Deny all outbound traffic Host rules: Domain-pattern rules (e.g., *.github.com → Allow) to permit specific destinations Custom CIDR rules: Network-level rules for IP ranges (e.g., 10.0.0.0/8 → Deny) Skip egress proxy: Option to bypass the egress proxy entirely when custom VNet routing handles policy enforcement This means you can run a sandbox in a deny-by-default posture and allowlist only the specific endpoints it needs (your API server, a package registry, etc.) - without setting up NSGs or firewall appliances. Managed Volumes: Persistent and Shared Storage Sandboxes support two types of mountable volumes, both managed by Microsoft: Volume Type Backed By Best For Managed Azure Blob Azure Blob Storage Shared data across sandboxes, file uploads/downloads, persistent artifacts Managed Data Disk Azure Disk Storage High-performance storage for databases, build caches, large working sets - only available to one sandbox at a time Blob volumes come with a built-in file explorer in the portal - you can browse, upload, download, create folders, and drag-and-drop files directly. Data Disk volumes provide dedicated block storage with configurable sizes. Secrets and Identity Secrets Sandbox Groups support key-value secrets scoped to the group. Secrets can be created, edited, and referenced by sandboxes within the group. These secrets can be used in egress policies to modify requests with transform or header-injection rules, without exposing the secrets to code running inside the sandbox. Managed Identity Sandbox Groups support both system-assigned and user-assigned managed identities, with full RBAC role assignment management. This means your sandboxes can authenticate to Azure services (Key Vault, Storage, Cosmos DB, etc.) without managing credentials - the same identity model you use everywhere else in Azure. MCP Connectors and Triggers ACA Sandboxes now supports managed connectors through the Model Context Protocol (MCP), giving sandboxes access to external APIs - including Microsoft 365, Salesforce, ServiceNow, GitHub, and 1,400+ other systems - without managing credentials directly. Attach a Connector Gateway to your sandbox group, and every sandbox in the group can call external APIs through a standardized MCP interface at runtime. Pair connectors with triggers to build event-driven automation: route an Outlook email to a sandbox that triages it with an AI agent, or react to a SharePoint file upload by extracting and processing the document all without writing glue code. Triggers can fire a shell command inside a sandbox or invoke an HTTP endpoint the sandbox exposes, so your automation shapes fit naturally around your workload. The integration is built on the new Connector Namespace service (az connector-namespace), the same runtime behind Logic Apps and Power Platform connectors, now available as a programmable layer for sandboxes. See the end-to-end samples for runnable azd up-deployable examples covering email triage and document automation scenarios. The Portal Experience Azure Container Apps Sandboxes are only available in the new Azure Container Apps portal that provides a rich, IDE-like experience for working with sandboxes. Creating a Sandbox The portal offers multiple creation paths: Standard Sandbox - full configuration control over source, resources, lifecycle, networking, and volumes GitHub Copilot Sandbox - preset, Copilot CLI ready to go, GitHub credentials can be wired through the Access Token before the sandbox is created Claude Sandbox - Claude CLI pre-installed, ready for agentic coding inside the sandbox Using Coding Agents (Copilot CLI / Claude Code) If you live inside Copilot CLI or Claude Code, you don't need to learn a new CLI. Install the azure-sandbox skill once and your agent picks up the right skills: # GitHub Copilot CLI # Add as a plugin marketplace /plugin marketplace add microsoft/azure-container-apps # Install all skills /plugin install sandboxes@Azure-Container-Apps # Claude Code claude plugin add microsoft/azure-container-apps The skill runs prerequisite checks silently (az --version, az account show, node --version, aca --version), prompts only if something's missing, and maps natural-language asks to the right aca commands. Bundled runbooks cover Copilot CLI BYOK (bring your own Azure OpenAI key), the deploy-a-web-app walkthrough, and shell setup. Sandbox Detail Page Once your sandbox is running, the detail page gives you immediate access to the sandbox terminal and additional details, such as - Network Audit - real-time egress traffic log showing allowed and denied requests Monitor - live CPU, memory, disk, and network utilization charts Connectors - attached connections with an "Add" action Volumes - mounted volumes with an "Add" action Log Stream - streaming container logs Processes - running process list inside the sandbox Files - file explorer to browse the sandbox filesystem The toolbar actions let you manage the state of the sandbox - Resume or Stop. In the Ellipsis menu (⁝) you can find additional settings to manage network Egress Policy and ingress (Add port), take a Snapshot of the sandbox, Commit (save disk state as a new disk image), set Lifecycle Policy or permanently Delete the sandbox. Finally, you can see additional Details in a side panel. Getting Started with the CLI and Python SDK All sandbox and sandbox-group operations go through the aca CLI. There are no az containerapp sandbox commands, - az is only used for az login, az account show, and resource-group management. Install (CLI) # Mac, Linux curl -fsSL https://aka.ms/aca-cli-install | sh # Windows irm https://aka.ms/aca-cli-install-ps | iex Run aca --help to get started. Install (Python SDK) pip install azure-containerapps-sandbox For more details, quick start and examples on ACA CLI and Python SDK, please go to https://sandboxes.azure.com Evolution from Dynamic Sessions If you've used Azure Container Apps Dynamic Sessions, Sandboxes are the next evolution of that capability. Everything Sessions can do, Sandboxes can do - and significantly more: Capability Dynamic Sessions Sandboxes Sub-second startup ✓ ✓ Strong isolation ✓ ✓ Custom container images ✓ ✓ Custom VNet integration ✓ (Partial) ✓ Suspend/resume with Memory and Disk snapshots - ✓ Lifecycle policies (auto-suspend, auto-delete) - ✓ Network egress policy (per-sandbox) - ✓ Persistent managed volumes (Blob, Data Disk) - ✓ Managed identity (system + user-assigned) - ✓ Secrets management - ✓ Configurable resource tiers - ✓ Direct access to sandbox in Portal experience - ✓ We will continue to support Dynamic Sessions, but all new investment goes into Sandboxes. If you're building new workloads on isolated ephemeral compute, start with Sandboxes. How It All Fits Together ACA Sandboxes is a platform primitive. It's the foundation on which multiple Microsoft products are already built - including ACA Express, Cloud sandboxes in GitHub Copilot, and Foundry Hosted Agents. When you build on Sandboxes, you're building on the same infrastructure that powers Microsoft's own portfolio. This is the evolution of what we shared with Project Legion in 2024. Legion described the internal infrastructure; Sandboxes exposes it as a customer-facing primitive that you can use directly. What's Next • Deeper Azure integrations - first-class connectivity with Azure networking, identity, storage, and AI services • Enhanced SDK and CLI - richer programmatic experiences for managing sandboxes at scale • More Microsoft services built on Sandboxes - this is just the beginning Get Started Today • Portal: https://sandboxes.azure.com/ • Documentation: Azure Container Apps Sandboxes • Pricing: Azure Container Apps Pricing (per-second vCPU/memory billing, scale-to-zero, snapshots at Blob Storage rates) We'd love to hear your feedback. You can ask questions, or file issues on the Azure Container Apps GitHub (prefix with [Sandbox] for Sandboxes-specific issues).1.8KViews1like0CommentsWhat's new in Azure Container Apps at Build'26
Azure Container Apps (ACA) is a fully managed serverless container platform that enables developers to build and deploy microservices and modern applications without requiring container expertise or needing infrastructure management. ACA provides built-in autoscaling (including scale to zero), per-second billing, advanced networking, built-in observability, and simplified developer experiences across multiple programming languages and frameworks. The world of application development is shifting rapidly. Agentic AI is fundamentally changing the requirements of cloud platforms - more code is being written by AI, more apps are being deployed by agents, and more deployment stacks are being assembled autonomously. Platforms are aligning to two concurrent demands: hosting intelligent agents as first-class workloads, and giving those same agents access to empty, secure compute pools as tools they can invoke on demand. At the same time, the proliferation of AI-generated code means that platforms must offer strong isolation for untrusted workloads, instant provisioning for rapid iteration, and production-grade defaults that make the right thing the easy thing - for both humans and agents. Azure Container Apps is purpose-built for this new reality. Whether you're a developer shipping a web app in minutes or an agent spinning up ephemeral sandboxes for code execution, ACA provides the serverless foundation that meets both audiences where they are. Customers across industries are betting on ACA as the compute foundation for their AI and cloud-native workloads: Replit runs its agent-driven software creation platform on Azure, enabling enterprises like Hexaware to securely build and deploy AI-generated applications at scale with seamless procurement through Azure Marketplace. LayerX built its Ai Workforce document processing platform on Azure Container Apps, Azure OpenAI, Azure AI Search, and Cosmos DB - helping clients like Mitsui & Co. save 570 hours annually by automating manual document tasks. SJR built GX Manager with Microsoft Foundry to automate website personalization at scale - delivering production-grade, data-grounded content in seconds instead of hours of manual curation. August AI powers an AI health companion serving over 3.5 million customers on Azure infrastructure, scoring 100% on the U.S. Medical Licensing Examination and delivering potentially life-saving medical support. Photon Education created Classwise on Azure OpenAI and Foundry with Defender for Cloud security, enabling teachers to prepare lessons faster and engage students more effectively in inclusive learning environments. Microsoft Foundry Agent Service is built directly on Azure Container Apps, serving over 20,000 customers with a dedicated agent runtime that handles fast startup, tool execution, long-running operations, and enterprise-grade isolation at scale. Following the features announced at Ignite'25 and our continued momentum through early 2026, we're excited to share what's new at Build'26. This release deepens our commitment to the agentic era with new primitives for secure ephemeral compute, the fastest path from container to production, a reimagined portal experience, and continued investment in security, observability, and developer productivity. Azure Container Apps Sandboxes (Public Preview) Teams building agentic applications, multi-tenant platforms, development environments, and CI/CD systems have often had to stitch together custom infrastructure to run untrusted code safely, preserve state across sessions, and handle bursty demand without paying for idle capacity. Azure Container Apps Sandboxes addresses that challenge with a new first-class resource type that provides fast, secure, ephemeral compute environments with built-in suspend and resume capabilities. Each sandbox runs in its own hardware-isolated microVM boundary, supports standard OCI container images, and starts in sub-second time. Sandboxes can preserve memory, disk state, and preloaded libraries in a snapshot, so workloads resume quickly from the same point without incurring a cold-start reload penalty. Why Sandboxes are perfect for agents Agents can safely run AI-generated code in isolated environments with instant startup. Agents also accumulate context, intermediate results, and working state during long-running tasks. With sandbox snapshots, agents get persistent, isolated workspaces that survive across task boundaries - they can suspend and resume as needed, preserving full execution context including memory and disk. Key capabilities Sub-second startup - provision and execute immediately Hardware-isolated microVMs - strong security boundary for untrusted code Snapshot and resume - full state preservation (memory + disk) across sessions OCI container image support - bring any container Scale to zero, scale to thousands - consumption pricing with per-second billing This is the underlying infrastructure on which products like Cloud Sandbox in GitHub Copilot, Foundry Hosted Agents, and Azure Container Apps Express are built. ACA Sandboxes joins the Container Apps family alongside Apps, Jobs, Functions, and Dynamic Sessions as a foundational building block for the next generation of cloud and AI application workloads. Learn more about Azure Container Apps Sandboxes at https://aka.ms/aca/sandboxes Azure Container Apps Express (Public Preview) We recently launched Azure Container Apps Express in public preview - the simplest and fastest way to launch and scale powerful applications on Azure, from zero to hyperscale, without infrastructure decisions. It represents the first Azure compute platform purpose-built for agent and developer use alike. Express is based on years of experience running Azure Container Apps at scale. We've learned that most developers working on web apps, APIs, and agents want to deploy quickly, have automatic scaling, and avoid dealing with complex infrastructure. Express provides these capabilities - it sets up your environment in seconds, handles any amount of traffic, and removes complicated settings. This helps teams move from writing code to having a production-ready app in minutes, not hours. What makes Express different Instant provisioning - your app is running in seconds, not minutes Sub-second cold starts - fast enough for interactive UIs and on-demand agent endpoints Scale to and from zero - automatic, no configuration required Per-second billing - pay only for what you use, no environment provisioning fee Production-ready defaults - autoscaling, managed identity, secrets management, custom domains, container registry integration, revision management, and built-in observability Purpose-built for custom agents Agents need to spin up application endpoints on demand - fast, reliably, and without pre-provisioning infrastructure. Express is purpose-built for this pattern: it provisions in seconds, scales from zero instantly when an agent triggers a workload, and scales back down when the task is complete. Whether an agent is deploying a tool-use endpoint, standing up a temporary API for a multi-step workflow, or launching a web UI for human-in-the-loop review, Express gives it a production-grade, internet-reachable application with zero operational overhead. It's the fastest path from "an agent decided to deploy something" to "it's live and serving traffic." Learn more about Azure Container Apps Express at https://aka.ms/aca/express/launch-blog New Azure Container Apps Portal You open the Azure portal and want to deploy a Container App. Ten minutes later you're three blades deep, toggling settings you don't understand, wondering which workload profile is best before you even have an app. We built a different portal. One where deploying a container app takes less time than reading this paragraph. One where creating an Azure Container App is a single click. And one where experimental features ship weekly, not quarterly. Smart defaults, advanced when you need Developers care about outcomes - where their app is running and how to reach it - not starting with a configuration form. The new portal offers three creation modes to keep setup simple: Simple "one-click create" - auto-generates a unique name and provisions your app. Provide the container image and egress settings. That's it - no environment type selection, networking decisions, or container registry configuration. Advanced create - unlocks everything: custom VNets with subnet selection, managed identity for registry auth, lifecycle policies, egress controls, environment variables, custom scale rules, and more. It's a toggle at the top of the same form, not a separate workflow. Express App (Preview) - the new kind of ACA application that provisions and starts almost instantly. Observe quickly, act faster The app overview page surfaces critical information at a glance - including a unified Log Stream that brings app and system logs together in one place. Getting to the root cause now takes fewer clicks, and next steps are always one click away. Faster releases, direct feedback loop Azure Container Apps Express (Preview) and Azure Container Apps Sandboxes (Preview) are currently available only in this new portal. We ship weekly - often more. Upcoming Portal Features in settings give you an easy way to opt in to early access features and share feedback directly. Security: Defender for Cloud Serverless Containers Posture and Confidential Compute Security remains a top priority as enterprises run more sensitive and regulated workloads on Azure Container Apps. At Build'26, we're announcing two key security milestones. Public Preview: Defender for Cloud Serverless Containers Posture on Azure Container Apps Customers can now bring Azure Container Apps environments into Microsoft Defender for Cloud's Serverless Containers Posture experience, helping security teams extend posture management across more of their container estate from a single workflow. This makes it easier to gain visibility into Container Apps resources and assess risks across areas such as identity, networking, and container or image configuration. With this capability, teams can more consistently evaluate risk across container environments and use attack path analysis to identify potential exposure faster. The result is a more unified security posture, less manual effort, and stronger confidence when securing Container Apps deployments. Serverless Containers Posture is available as part of the Defender CSPM plan. Learn more at the Defender for Cloud documentation. General Availability: Confidential Compute for Azure Container Apps Confidential Compute in Azure Container Apps is now generally available, providing hardware-backed Trusted Execution Environments (TEEs) through workload profiles. This extends protection to data in use - in addition to data at rest and in transit - enabling teams to run higher-trust workloads with stronger isolation for sensitive data. With confidential computing now GA, Azure Container Apps becomes more viable for regulated, financial, healthcare, and other high-trust scenarios where organizations need hardware-enforced isolation that protects in-memory data, including from the underlying infrastructure. There is no extra charge for confidential compute workload profiles. Learn more at the Azure Confidential Computing documentation. Observability: HTTP Traffic Logs and OpenTelemetry Destinations Knowing what's happening inside your application is essential to running production workloads with confidence. At Build'26, we're announcing two enhancements that give teams deeper visibility and more flexibility in where they send telemetry. Monitor HTTP traffic in Azure Container Apps Azure Container Apps now adds a dedicated Azure Monitor diagnostic setting category - ContainerAppHTTPLogs - that exposes detailed HTTP access logs for incoming traffic. This capability is designed for high-volume request data, enabling teams to troubleshoot ingress and request-flow issues with much greater precision. With HTTP traffic logs, you can now investigate: Failed requests and error codes Latency patterns and outliers Retries and WebSocket disconnects Routing behavior and backend connectivity The result is faster issue resolution, less operational friction, and stronger confidence in running high-traffic, business-critical applications. Standard Azure Monitor log volume charges apply. Learn more at Azure Monitor pricing. Additional OpenTelemetry Destinations: New Relic, Dynatrace, Elastic Azure Container Apps enhances its managed OpenTelemetry (OTel) capabilities by expanding support for third-party observability platforms. This update introduces additional endpoint options for commonly used monitoring tools - New Relic, Dynatrace, and Elastic - extending the existing managed OpenTelemetry experience. Teams can now use a more consistent OpenTelemetry-based pipeline across Azure Monitor, Datadog, New Relic, Dynatrace, Elastic, and any OTLP-compatible endpoint, with less configuration overhead and more flexibility to route logs, metrics, and traces where they need them - without deploying or managing their own collectors. No extra charge applies. Learn more at the OpenTelemetry agents documentation. Additional Enhancements and Ecosystem Updates Beyond the headline announcements, Azure Container Apps continues to evolve with a steady cadence of improvements across the platform. Override Scale Rules in Azure Functions on Azure Container Apps Azure Functions on Container Apps has traditionally used platform-managed scaling, where triggers are automatically translated into KEDA scale rules. With the new allowScalingRuleOverride property, customers can now choose to override platform-managed scaling and define their own custom KEDA scaling rules. This enhancement is especially useful for scenarios where automatically generated KEDA rules lead to unintended scaling behavior, where workloads require custom thresholds or concurrency tuning, or where teams need standardized scaling policies across services. It works with any of the 60+ KEDA scalers - Service Bus, Kafka, PostgreSQL, HTTP concurrency, Cron, and more. Heroku Migration to Azure Container Apps With Heroku entering maintenance mode, Azure Container Apps is a natural landing zone for Heroku workloads. New guidance and tooling makes the migration path straightforward - from understanding why ACA is the right next step to a practical migration guide for hands-on implementation. Dapr v1.16 Platform Upgrade Azure Container Apps completed a staged platform upgrade to Dapr v1.16.4, bringing modernized actor scheduling, improved scalability for reminders, and updated TLS/security internals. The upgrade is fully platform-managed, with minimal customer action required for most workloads. Running AI Models on ACA Serverless GPUs The community continues to push the boundaries of what's possible with serverless GPUs on ACA. Recent highlights include running Gemma 4 with Ollama for fully private, self-hosted inference, and deploying ComfyUI for text-to-image and text-to-video workloads - all with scale-to-zero and per-second billing. Hosting Remote MCP Servers on ACA Azure Container Apps is emerging as the preferred platform for hosting Model Context Protocol (MCP) servers. With serverless scaling, idle billing, HTTP/1.1 and HTTP/2 support, and managed identity integration, ACA provides a production-ready environment for exposing tools and APIs to AI agents. Multiple tutorials and guides are now available for deploying MCP servers on ACA, including integration with Azure API Management. App Modernization with GitHub Copilot GitHub Copilot App Modernization can dramatically reduce the time required to modernize legacy applications and deploy them to ACA. A recent walkthrough demonstrated upgrading a classic ASP.NET MVC app on .NET Framework to .NET 10 and deploying it to Azure Container Apps in hours - with managed identity and Key Vault integration enabled by default. Azure Skills Repository for Container Apps The new Azure Skills repository includes comprehensive skills specifically for Azure Container Apps - covering troubleshooting, best practices, architecture patterns, security, deployment, and integration. These skills are designed to be used by AI agents and developer tools like GitHub Copilot CLI, providing rich context for building, deploying, and operating ACA workloads. It's another example of how the ACA ecosystem is evolving to be agent-native. Docker Compose for Agents Docker Compose for Agents on Container Apps (public preview) brings the familiar Compose workflow to agentic applications. Declare models, agents, and MCP tools in a single compose.yaml file and deploy unchanged from laptop to cloud - supporting LangGraph, Vercel AI SDK, Spring AI, CrewAI, and other frameworks. Learn more at the Compose for Agents documentation. What's Next Azure Container Apps is redefining how developers and agents build, deploy, and operate intelligent applications. With Sandboxes for secure ephemeral compute, Express for instant provisioning, a reimagined portal for streamlined management, and continued investment in security and observability - ACA provides the ideal foundation for the agentic era. The features announced at Build'26 deepen our commitment to making Azure Container Apps the platform where both humans and AI agents can ship production workloads with confidence, speed, and minimal operational overhead. Also, if you're at Build, come see us at the following sessions: Breakout 221: Idea to production-ready agent in seconds on AI-native runtime Demo 312: Multi-agents in action with 3 AI agents, 3 frameworks, tools & models Lab 580: Build and deploy reasoning agents with NVIDIA Nemotron and Foundry Lightning Talk 453: Building an End‑to‑End Enterprise AI Platform on Azure Or come visit us at the Azure Application Services booth #44. Visit our GitHub page for feedback, feature requests, or questions. Check out our roadmap to see what we're working on next. We look forward to hearing from you!408Views1like0CommentsIntroducing Azure Container Apps Express!
Three years ago, a 15-second cold start was industry-leading. Today, developers and AI agents expect sub-second. The speed bar has moved, and the tooling needs to move with it. After running Azure Container Apps for years, we've learned something important: for most developers, the ACA environment is an unnecessary construct. It adds provisioning time, configuration surface, and cognitive overhead — when all you really want is to run your app with scaling, networking, and operations handled for you. At the same time, a new class of workloads has emerged. Agent-first platforms — systems where AI agents deploy endpoints on demand, spin up tool-use APIs, and tear them down when work is done — demand an even more radical focus on speed and simplicity. Every second of provisioning delay is wasted agent productivity. Today, we're launching Azure Container Apps Express in Public Preview — the fastest, simplest way to go from a container image to an internet-reachable app on Azure, ready for many production-style workloads. What Is ACA Express? ACA Express removes the infrastructure decisions. There's no environment to provision, no networking to configure, no scaling rules to write. You bring a container image, Express handles everything else. Behind the scenes, Express runs your container on pre-provisioned capacity with sensible defaults baked in — so you skip environment setup without giving up ACA's serverless model. There's more coming in this space soon — keep watching. Here's what that means in practice: Instant provisioning — your app is running in seconds, not minutes Sub-second cold starts — fast enough for interactive UIs and on-demand agent endpoints Scale to and from zero — automatic, no configuration required (full scaling controls coming soon) Per-second billing — pay only for what you use Production-ready defaults — ingress, secrets, environment variables, and observability are built in Express is purpose-built for two audiences: developers who want to ship fast (SaaS apps, APIs, web dashboards, prototypes) and agents that deploy on demand (MCP servers, tool-use endpoints, multi-step workflow APIs, human-in-the-loop UIs). If you've ever waited for an ACA environment to provision, only to realize you didn't need half of the configuration options it asked you for — Express is your answer. What You Can Do Today Note: West Central US is currently the only available region. We will expand to new regions through the coming days. Express is in Public Preview starting today. It's a deliberate early ship — there's a meaningful feature gap compared to the existing Azure Container Apps offering, and we're filling it fast. New capabilities are landing on a rapid cadence throughout the preview, and by Microsoft Build in June, Express should be close to feature-complete. For the current list of supported features, known gaps, and what's on the way, see the Express documentation. We'd rather put valuable technology in your hands early and iterate with you than wait behind closed doors for perfection. Who Is Express For? Scenario Why Express SaaS apps and APIs Deploy and scale without infrastructure planning AI app frontends Chat UIs and copilot frontends that scale with usage spikes MCP servers Expose API endpoints for AI agents in seconds Agent workflows Spin up endpoints on demand, tear down when done Prototypes and startups Go from idea to production in minutes Web dashboards Internal tools with instant availability Get Started Express is available now in Public Preview. Try it: Azure Container Apps Express overview — concepts, capabilities, and the current feature support matrix Deploy your first app with the Azure CLI — step-by-step quickstart New Azure Container Apps Portal — create and manage Express apps alongside your existing Container Apps resources Have questions? Check the Azure Container Apps Express FAQ for answers to common questions about pricing, limits, regions, and the road to GA. We're building Express in the open and we want to hear from you. Tell us what features matter most, what works, and what doesn't — reach out on the Azure Container Apps GitHub or in the comments below.14KViews7likes6CommentsEven simpler to Safely Execute AI-generated Code with Azure Container Apps Dynamic Sessions
AI agents are writing code. The question is: where does that code run? If it runs in your process, a single hallucinated import os; os.remove('/') can ruin your day. Azure Container Apps dynamic sessions solve this with on-demand sandboxed environments - Hyper-V isolated, fully managed, and ready in milliseconds. Thanks to your feedback, Dynamic Sessions are now easier to use with AI via MCP. Agents can quickly start a session interpreter and safely run code - all using a built-in MCP endpoint. Additionally - new starter samples show how to invoke dynamic sessions from Microsoft Agent Framework with code interpreter and with a custom container for even more versatility. What Are Dynamic Sessions? A session pool maintains a reservoir of pre-warmed, isolated sandboxes. When your app needs one, it’s allocated instantly via REST API. When idle, it’s destroyed automatically after provided session cool down period. What you get: Strong isolation - Each session runs in its own Hyper-V sandbox - enterprise-grade security Millisecond startup -Pre-warmed pool eliminates cold starts Fully managed - No infra to maintain - automatic lifecycle, cleanup, scaling Simple access - Single HTTP endpoint, session identified by a unique ID Scalable - Hundreds to thousands of concurrent sessions Two Session Types 1. Code Interpreter — Run Untrusted Code Safely Code interpreter sessions accept inline code, run it in a Hyper-V sandbox, and return the output. Sessions support network egress and persistent file systems within the session lifetime. Three runtimes are available: Python - Ships with popular libraries pre-installed (NumPy, pandas, matplotlib, etc.). Ideal for AI-generated data analysis, math computation, and chart generation. Node.js - Comes with common npm packages. Great for server-side JavaScript execution, data transformation, and scripting. Shell - A full Linux shell environment where agents can run arbitrary commands, install packages, start processes, manage files, and chain multi-step workflows. Unlike Python/Node.js interpreters, shell sessions expose a complete OS - ideal for agent-driven DevOps, build/test environments, CLI tool execution, and multi-process pipelines. 2. Custom Containers — Bring Your Own Runtime Custom container sessions let you run your own container image in the same isolated, on-demand model. Define your image, and Container Apps handles the pooling, scaling, and lifecycle. Typical use cases are hosting proprietary runtimes, custom code interpreters, and specialized tool chains. This sample (Azure Samples) dives deeper into Customer Containers with Microsoft agent Framework orchestration. MCP Support for Dynamic Sessions Dynamic sessions also support Model Context Protocol (MCP) on both shell and Python session types. This turns a session pool into a remote MCP server that AI agents can connect to - enabling tool execution, file system access, and shell commands in a secure, ephemeral environment. With an MCP-enabled shell session, an Azure Foundry agent can spin up a Flask app, run system commands, or install packages - all in an isolated container that vanishes when done. The MCP server is enabled with a single property on the session pool (isMCPServerEnabled: true), and the resulting endpoint + API key can be plugged directly into Azure Foundry as a connected tool. For a step-by-step walkthrough, see How to add an MCP tool to your Azure Foundry agent using dynamic sessions. Deep Dive: Building an AI Travel Agent with Code Interpreter Sessions Let’s walk through a sample implementation - a travel planning agent that uses dynamic sessions for both static code execution (weather research) and LLM-generated code execution (charting). Full source: github.com/jkalis-MS/AIAgent-ACA-DynamicSession Architecture Travel Agent Architecture Component Purpose Microsoft Agent Framework Agent runtime with middleware, telemetry, and DevUI Azure OpenAI (GPT-4o) LLM for conversation and code generation ACA Session Pools Sandboxed Python code interpreter Azure Container Apps Hosts the agent in a container Application Insights Observability for agent spans The agent implements with two variants switchable in the Agent Framework DevUI - tools in ACA Dynamic Session (sandbox) and tools running locally (no isolation) - making the security value immediately visible. Scenario A: Static Code in a Sandbox - Weather Research The agent sends pre-written Python code to the session pool to fetch live weather data. The code runs with network egress enabled, calls the Open-Meteo API, and returns formatted results - all without touching the host process. import requests from azure.identity import DefaultAzureCredential credential = DefaultAzureCredential() token = credential.get_token("https://dynamicsessions.io/.default") response = requests.post( f"{pool_endpoint}/code/execute?api-version=2024-02-02-preview&identifier=weather-session-1", headers={"Authorization": f"Bearer {token.token}"}, json={"properties": { "codeInputType": "inline", "executionType": "synchronous", "code": weather_code, # Python that calls Open-Meteo API }}, ) result = response.json()["properties"]["stdout"] Scenario B: LLM-Generated Code in a Sandbox - Dynamic Charting This is where it gets interesting. The user asks “plot a chart comparing Miami and Tokyo weather.” The agent: Fetches weather data Asks Azure OpenAI to generate matplotlib code using a tightly-scoped system prompt Safety-checks the generated code for forbidden imports (subprocess, os.system, etc.) Wraps the code with data injection and sends it to the sandbox Downloads the resulting PNG from the sandbox’s /mnt/data/ directory from openai import AzureOpenAI # 1. LLM generates chart code client = AzureOpenAI(azure_endpoint=endpoint, api_key=key, api_version="2024-12-01-preview") generated_code = client.chat.completions.create( model="gpt-4o", messages=[{"role": "system", "content": CODE_GEN_PROMPT}, {"role": "user", "content": f"Weather data: {weather_json}"}], temperature=0.2, ).choices[0].message.content # 2. Execute in sandbox requests.post( f"{pool_endpoint}/code/execute?api-version=2024-02-02-preview&identifier=chart-session-1", headers={"Authorization": f"Bearer {token.token}"}, json={"properties": { "codeInputType": "inline", "executionType": "synchronous", "code": f"import json, matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\nweather_data = json.loads('{weather_json}')\n{generated_code}", }}, ) # 3. Download the chart img = requests.get( f"{pool_endpoint}/files/content/chart.png?api-version=2024-02-02-preview&identifier=chart-session-1", headers={"Authorization": f"Bearer {token.token}"}, ).content The result is a dark-themed dual-subplot chart comparing maximal and minimal temperature forecast chart example rendered by the Chart Weather tool in Dynamic Session: Authentication The agent uses DefaultAzureCredential locally and ManagedIdentityCredential when deployed. Tokens are cached and refreshed automatically: from azure.identity import DefaultAzureCredential token = DefaultAzureCredential().get_token("https://dynamicsessions.io/.default") auth_header = f"Bearer {token.token}" # Uses ManagedIdentityCredential automatically when deployed to Container Apps Observability The agent uses Application Insights for end-to-end tracing. The Microsoft Agent Framework exposes OpenTelemetry spans for invoke_agent, chat, and execute tool - wired to Azure Monitor with custom exporters: from azure.monitor.opentelemetry import configure_azure_monitor from agent_framework.observability import create_resource, enable_instrumentation # Configure Azure Monitor first configure_azure_monitor( connection_string="InstrumentationKey=...", resource=create_resource(), # Uses OTEL_SERVICE_NAME, etc. enable_live_metrics=True, ) # Then activate Agent Framework's telemetry code paths, optional if ENABLE_INSTRUMENTATION and/or ENABLE_SENSITIVE_DATA are set in env vars enable_instrumentation(enable_sensitive_data=False) This gives you traces for every agent invocation, tool execution (including sandbox timing), and LLM call - visible in the Application Insights transaction search and end-to-end transaction view in the new Agents blade in Application Insights. You can also open a detailed dashboard by clicking Explore in Grafana. Session pools emit their own metrics and logs for monitoring sandbox utilization and performance. Combined with the agent-level Application Insights traces, you can get full visibility from the user prompt → agent → LLM → sandbox execution → response across both your application and the infrastructure running untrusted code. Deploy with One Command The project includes full Bicep infrastructure-as-code. A single azd up provisions Azure OpenAI, Container Apps, Session Pool (with egress enabled), Container Registry, Application Insights, and all role assignments. azd auth login azd up Next Steps Dynamic sessions documentation – Microsoft Learn MCP + Shell sessions tutorial - How to add an MCP tool to your Foundry agent Custom container sessions sample - github.com/Azure-Samples/dynamic-sessions-custom-container AI Agent + Dynamic Sessions - github.com/jkalis-MS/AIAgent-ACA-DynamicSession1.2KViews0likes0Comments