azure container apps
226 TopicsIntroducing Azure Container Apps Express!
Three years ago, a 15-second cold start was industry-leading. Today, developers and AI agents expect sub-second. The speed bar has moved, and the tooling needs to move with it. After running Azure Container Apps for years, we've learned something important: for most developers, the ACA environment is an unnecessary construct. It adds provisioning time, configuration surface, and cognitive overhead — when all you really want is to run your app with scaling, networking, and operations handled for you. At the same time, a new class of workloads has emerged. Agent-first platforms — systems where AI agents deploy endpoints on demand, spin up tool-use APIs, and tear them down when work is done — demand an even more radical focus on speed and simplicity. Every second of provisioning delay is wasted agent productivity. Today, we're launching Azure Container Apps Express in Public Preview — the fastest, simplest way to go from a container image to an internet-reachable app on Azure, ready for many production-style workloads. What Is ACA Express? ACA Express removes the infrastructure decisions. There's no environment to provision, no networking to configure, no scaling rules to write. You bring a container image, Express handles everything else. Behind the scenes, Express runs your container on pre-provisioned capacity with sensible defaults baked in — so you skip environment setup without giving up ACA's serverless model. There's more coming in this space soon — keep watching. Here's what that means in practice: Instant provisioning — your app is running in seconds, not minutes Sub-second cold starts — fast enough for interactive UIs and on-demand agent endpoints Scale to and from zero — automatic, no configuration required (full scaling controls coming soon) Per-second billing — pay only for what you use Production-ready defaults — ingress, secrets, environment variables, and observability are built in Express is purpose-built for two audiences: developers who want to ship fast (SaaS apps, APIs, web dashboards, prototypes) and agents that deploy on demand (MCP servers, tool-use endpoints, multi-step workflow APIs, human-in-the-loop UIs). If you've ever waited for an ACA environment to provision, only to realize you didn't need half of the configuration options it asked you for — Express is your answer. What You Can Do Today Note: West Central US is currently the only available region. We will expand to new regions through the coming days. Express is in Public Preview starting today. It's a deliberate early ship — there's a meaningful feature gap compared to the existing Azure Container Apps offering, and we're filling it fast. New capabilities are landing on a rapid cadence throughout the preview, and by Microsoft Build in June, Express should be close to feature-complete. For the current list of supported features, known gaps, and what's on the way, see the Express documentation. We'd rather put valuable technology in your hands early and iterate with you than wait behind closed doors for perfection. Who Is Express For? Scenario Why Express SaaS apps and APIs Deploy and scale without infrastructure planning AI app frontends Chat UIs and copilot frontends that scale with usage spikes MCP servers Expose API endpoints for AI agents in seconds Agent workflows Spin up endpoints on demand, tear down when done Prototypes and startups Go from idea to production in minutes Web dashboards Internal tools with instant availability Get Started Express is available now in Public Preview. Try it: Azure Container Apps Express overview — concepts, capabilities, and the current feature support matrix Deploy your first app with the Azure CLI — step-by-step quickstart New Azure Container Apps Portal — create and manage Express apps alongside your existing Container Apps resources Have questions? Check the Azure Container Apps Express FAQ for answers to common questions about pricing, limits, regions, and the road to GA. We're building Express in the open and we want to hear from you. Tell us what features matter most, what works, and what doesn't — reach out on the Azure Container Apps GitHub or in the comments below.14KViews6likes6CommentsEven simpler to Safely Execute AI-generated Code with Azure Container Apps Dynamic Sessions
AI agents are writing code. The question is: where does that code run? If it runs in your process, a single hallucinated import os; os.remove('/') can ruin your day. Azure Container Apps dynamic sessions solve this with on-demand sandboxed environments - Hyper-V isolated, fully managed, and ready in milliseconds. Thanks to your feedback, Dynamic Sessions are now easier to use with AI via MCP. Agents can quickly start a session interpreter and safely run code - all using a built-in MCP endpoint. Additionally - new starter samples show how to invoke dynamic sessions from Microsoft Agent Framework with code interpreter and with a custom container for even more versatility. What Are Dynamic Sessions? A session pool maintains a reservoir of pre-warmed, isolated sandboxes. When your app needs one, it’s allocated instantly via REST API. When idle, it’s destroyed automatically after provided session cool down period. What you get: Strong isolation - Each session runs in its own Hyper-V sandbox - enterprise-grade security Millisecond startup -Pre-warmed pool eliminates cold starts Fully managed - No infra to maintain - automatic lifecycle, cleanup, scaling Simple access - Single HTTP endpoint, session identified by a unique ID Scalable - Hundreds to thousands of concurrent sessions Two Session Types 1. Code Interpreter — Run Untrusted Code Safely Code interpreter sessions accept inline code, run it in a Hyper-V sandbox, and return the output. Sessions support network egress and persistent file systems within the session lifetime. Three runtimes are available: Python - Ships with popular libraries pre-installed (NumPy, pandas, matplotlib, etc.). Ideal for AI-generated data analysis, math computation, and chart generation. Node.js - Comes with common npm packages. Great for server-side JavaScript execution, data transformation, and scripting. Shell - A full Linux shell environment where agents can run arbitrary commands, install packages, start processes, manage files, and chain multi-step workflows. Unlike Python/Node.js interpreters, shell sessions expose a complete OS - ideal for agent-driven DevOps, build/test environments, CLI tool execution, and multi-process pipelines. 2. Custom Containers — Bring Your Own Runtime Custom container sessions let you run your own container image in the same isolated, on-demand model. Define your image, and Container Apps handles the pooling, scaling, and lifecycle. Typical use cases are hosting proprietary runtimes, custom code interpreters, and specialized tool chains. This sample (Azure Samples) dives deeper into Customer Containers with Microsoft agent Framework orchestration. MCP Support for Dynamic Sessions Dynamic sessions also support Model Context Protocol (MCP) on both shell and Python session types. This turns a session pool into a remote MCP server that AI agents can connect to - enabling tool execution, file system access, and shell commands in a secure, ephemeral environment. With an MCP-enabled shell session, an Azure Foundry agent can spin up a Flask app, run system commands, or install packages - all in an isolated container that vanishes when done. The MCP server is enabled with a single property on the session pool (isMCPServerEnabled: true), and the resulting endpoint + API key can be plugged directly into Azure Foundry as a connected tool. For a step-by-step walkthrough, see How to add an MCP tool to your Azure Foundry agent using dynamic sessions. Deep Dive: Building an AI Travel Agent with Code Interpreter Sessions Let’s walk through a sample implementation - a travel planning agent that uses dynamic sessions for both static code execution (weather research) and LLM-generated code execution (charting). Full source: github.com/jkalis-MS/AIAgent-ACA-DynamicSession Architecture Travel Agent Architecture Component Purpose Microsoft Agent Framework Agent runtime with middleware, telemetry, and DevUI Azure OpenAI (GPT-4o) LLM for conversation and code generation ACA Session Pools Sandboxed Python code interpreter Azure Container Apps Hosts the agent in a container Application Insights Observability for agent spans The agent implements with two variants switchable in the Agent Framework DevUI - tools in ACA Dynamic Session (sandbox) and tools running locally (no isolation) - making the security value immediately visible. Scenario A: Static Code in a Sandbox - Weather Research The agent sends pre-written Python code to the session pool to fetch live weather data. The code runs with network egress enabled, calls the Open-Meteo API, and returns formatted results - all without touching the host process. import requests from azure.identity import DefaultAzureCredential credential = DefaultAzureCredential() token = credential.get_token("https://dynamicsessions.io/.default") response = requests.post( f"{pool_endpoint}/code/execute?api-version=2024-02-02-preview&identifier=weather-session-1", headers={"Authorization": f"Bearer {token.token}"}, json={"properties": { "codeInputType": "inline", "executionType": "synchronous", "code": weather_code, # Python that calls Open-Meteo API }}, ) result = response.json()["properties"]["stdout"] Scenario B: LLM-Generated Code in a Sandbox - Dynamic Charting This is where it gets interesting. The user asks “plot a chart comparing Miami and Tokyo weather.” The agent: Fetches weather data Asks Azure OpenAI to generate matplotlib code using a tightly-scoped system prompt Safety-checks the generated code for forbidden imports (subprocess, os.system, etc.) Wraps the code with data injection and sends it to the sandbox Downloads the resulting PNG from the sandbox’s /mnt/data/ directory from openai import AzureOpenAI # 1. LLM generates chart code client = AzureOpenAI(azure_endpoint=endpoint, api_key=key, api_version="2024-12-01-preview") generated_code = client.chat.completions.create( model="gpt-4o", messages=[{"role": "system", "content": CODE_GEN_PROMPT}, {"role": "user", "content": f"Weather data: {weather_json}"}], temperature=0.2, ).choices[0].message.content # 2. Execute in sandbox requests.post( f"{pool_endpoint}/code/execute?api-version=2024-02-02-preview&identifier=chart-session-1", headers={"Authorization": f"Bearer {token.token}"}, json={"properties": { "codeInputType": "inline", "executionType": "synchronous", "code": f"import json, matplotlib\nmatplotlib.use('Agg')\nimport matplotlib.pyplot as plt\nweather_data = json.loads('{weather_json}')\n{generated_code}", }}, ) # 3. Download the chart img = requests.get( f"{pool_endpoint}/files/content/chart.png?api-version=2024-02-02-preview&identifier=chart-session-1", headers={"Authorization": f"Bearer {token.token}"}, ).content The result is a dark-themed dual-subplot chart comparing maximal and minimal temperature forecast chart example rendered by the Chart Weather tool in Dynamic Session: Authentication The agent uses DefaultAzureCredential locally and ManagedIdentityCredential when deployed. Tokens are cached and refreshed automatically: from azure.identity import DefaultAzureCredential token = DefaultAzureCredential().get_token("https://dynamicsessions.io/.default") auth_header = f"Bearer {token.token}" # Uses ManagedIdentityCredential automatically when deployed to Container Apps Observability The agent uses Application Insights for end-to-end tracing. The Microsoft Agent Framework exposes OpenTelemetry spans for invoke_agent, chat, and execute tool - wired to Azure Monitor with custom exporters: from azure.monitor.opentelemetry import configure_azure_monitor from agent_framework.observability import create_resource, enable_instrumentation # Configure Azure Monitor first configure_azure_monitor( connection_string="InstrumentationKey=...", resource=create_resource(), # Uses OTEL_SERVICE_NAME, etc. enable_live_metrics=True, ) # Then activate Agent Framework's telemetry code paths, optional if ENABLE_INSTRUMENTATION and/or ENABLE_SENSITIVE_DATA are set in env vars enable_instrumentation(enable_sensitive_data=False) This gives you traces for every agent invocation, tool execution (including sandbox timing), and LLM call - visible in the Application Insights transaction search and end-to-end transaction view in the new Agents blade in Application Insights. You can also open a detailed dashboard by clicking Explore in Grafana. Session pools emit their own metrics and logs for monitoring sandbox utilization and performance. Combined with the agent-level Application Insights traces, you can get full visibility from the user prompt → agent → LLM → sandbox execution → response across both your application and the infrastructure running untrusted code. Deploy with One Command The project includes full Bicep infrastructure-as-code. A single azd up provisions Azure OpenAI, Container Apps, Session Pool (with egress enabled), Container Registry, Application Insights, and all role assignments. azd auth login azd up Next Steps Dynamic sessions documentation – Microsoft Learn MCP + Shell sessions tutorial - How to add an MCP tool to your Foundry agent Custom container sessions sample - github.com/Azure-Samples/dynamic-sessions-custom-container AI Agent + Dynamic Sessions - github.com/jkalis-MS/AIAgent-ACA-DynamicSession1.1KViews0likes0CommentsFrom "Maybe Next Quarter" to "Running Before Lunch" on Container Apps - Modernizing Legacy .NET App
In early 2025, we wanted to modernize Jon Galloway's MVC Music Store - a classic ASP.NET MVC 5 app running on .NET Framework 4.8 with Entity Framework 6. The goal was straightforward: address vulnerabilities, enable managed identity, and deploy to Azure Container Apps and Azure SQL. No more plaintext connection strings. No more passwords in config files. We hit a wall immediately. Entity Framework on .NET Framework did not support Azure.Identity or DefaultAzureCredential. We just could not add a NuGet package and call it done - we’d need EF Core, which means modern .NET - and rewriting the data layer, the identity system, the startup pipeline, the views. The engineering team estimated one week of dedicated developer work. As a product manager without extensive .NET modernization experience, I wasn't able to complete it quickly on my own, so the project was placed in the backlog. This was before the GitHub Copilot "Agent" mode, the GitHub Copilot app modernization (a specialized agent with skills for modernization) existed but only offered assessment - it could tell you what needed to change, but couldn't make the end to end changes for you. Fast-forward one year. The full modernization agent is available. I sat down with the same app and the same goal. A few hours later, it was running on .NET 10 on Azure Container Apps with managed identity, Key Vault integration, and zero plaintext credentials. Thank you GitHub Copilot app modernization! And while we were on it – GitHub Copilot helped to modernize the experience as well, built more tests and generated more synthetic data for testing. Why Azure Container Apps? Azure Container Apps is an ideal deployment target for this modernized MVC Music Store application because it provides a serverless, fully managed container hosting environment. It abstracts away infrastructure management while natively supporting the key security and operational features this project required. It pairs naturally with infrastructure-as-code deployments, and its per-second billing on a consumption plan keeps costs minimal for a lightweight web app like this, eliminating the overhead of managing Kubernetes clusters while still giving you the container portability that modern .NET apps benefit from. That is why I asked Copilot to modernize to Azure Container Apps - here's how it went - Phase 1: Assessment GitHub Copilot App Modernization started by analyzing the codebase and producing a detailed assessment: Framework gap analysis - .NET Framework 4.0 → .NET 10, identifying every breaking change Dependency inventory - Entity Framework 6 (not EF Core), MVC 5 references, System.Web dependencies Security findings - plaintext SQL connection strings in Web.config, no managed identity support API surface changes - Global.asax → Program.cs minimal hosting, System.Web.Mvc → Microsoft.AspNetCore.Mvc The assessment is not a generic checklist. It reads your code - your controllers, your DbContext, your views - and maps a concrete modernization path. For this app, the key finding was clear: EF 6 on .NET Framework cannot support DefaultAzureCredential. The entire data layer needs to move to EF Core on modern .NET to unlock passwordless authentication. Phase 2: Code & Dependency Modernization This is where last year's experience ended and this year's began. The agent performed the actual modernization: Project structure: .csproj converted from legacy XML format to SDK-style targeting net10.0 Global.asax replaced with Program.cs using minimal hosting packages.config → NuGet PackageReference entries Data layer (the hard part): Entity Framework 6 → EF Core with Microsoft.EntityFrameworkCore.SqlServer DbContext rewritten with OnModelCreating fluent configuration System.Data.Entity → Microsoft.EntityFrameworkCore namespace throughout EF Core modernization generated from scratch Database seeding moved to a proper DbSeeder pattern with MigrateAsync() Identity: ASP.NET Membership → ASP.NET Core Identity with ApplicationUser, ApplicationDbContext Cookie authentication configured through ConfigureApplicationCookie Security (the whole trigger for this modernization): Azure.Identity + DefaultAzureCredential integrated in Program.cs Azure Key Vault configuration provider added via Azure.Extensions.AspNetCore.Configuration.Secrets Connection strings use Authentication=Active Directory Default — no passwords anywhere Application Insights wired through OpenTelemetry Views: Razor views updated from MVC 5 helpers to ASP.NET Core Tag Helpers and conventions _Layout.cshtml and all partials migrated The code changes touched every layer of the application. This is not a find-and-replace - it's a structural rewrite that maintains functional equivalence. Phase 3: Local Testing After modernization, the app builds, runs locally, and connects to a local SQL Server (or SQL in a container). EF Core modernizations apply cleanly, the seed data loads, and you can browse albums, add to cart, and check out. The identity system works. The Key Vault integration gracefully skips when KeyVaultName isn't configured - meaning local dev and Azure use the same Program.cs with zero code branches. Phase 4: AZD UP and Deployment to Azure The agent also generates the deployment infrastructure: azure.yaml - AZD service definition pointing to the Dockerfile, targeting Azure Container Apps Dockerfile - Multi-stage build using mcr.microsoft.com/dotnet/sdk:10.0 and aspnet:10.0 infra/main.bicep - Full IaaC including: Azure Container Apps with system + user-assigned managed identity Azure SQL Server with Azure AD-only authentication (no SQL auth) Azure Key Vault with RBAC, Secrets Officer role for the managed identity Container Registry with ACR Pull role assignment Application Insights + Log Analytics All connection strings injected as Container App secrets — using Active Directory Default, not passwords One command: AZD UP Provisions everything, builds the container, pushes to ACR, deploys to Container Apps. The app starts, runs MigrateAsync() on first boot, seeds the database, and serves traffic. Managed identity handles all auth to SQL and Key Vault. No credentials stored anywhere. What Changed in a Year Early 2025 Now Assessment Available Available Automated code modernization Semi-manual ✅ Full modernization agent Infrastructure generation Semi-manual ✅ Bicep + AZD generated Time to complete Weeks ✅ Hours The technology didn't just improve incrementally. The gap between "assessment" and "done" collapsed. A year ago, knowing what to do and being able to do it were very different things. Now they're the same step. Who This Is For If you have a .NET Framework app sitting on a backlog because "the modernization is too expensive" - revisit that assumption. The process changed. GitHub Copilot app modernization helps you rewrite your data layer, generates your infrastructure, and gets you to azd up. It can help you generate tests to increase your code coverage. If you have some feature requests or if you want to further optimize the code for scale - bring your requirements or logs or profile traces, you can take care of all of that during the modernization process. MVC Music Store went from .NET Framework 4.0 with Entity Framework 6 and plaintext SQL credentials to .NET 10 on Azure Container Apps with managed identity, Key Vault, and zero secrets in code. In an afternoon. That backlog item might be a lunch break now 😊. Really. Find your legacy apps and try it yourself. Next steps Modernize your .Net or Java apps with GitHub Copilot app modernization – https://aka.ms/ghcp-appmod Open your legacy application in Visual Studio or Visual Studio Code to start the process Deploy to Azure Container Apps https://aka.ms/aca/start473Views0likes1CommentCustom KEDA Scale Rules for Azure Functions on Azure Container Apps
Announcing Custom KEDA Scale Rules for Azure Functions on Container Apps We're excited to announce a new capability for Azure Functions on Container Apps: custom KEDA scale rule overrides. You can now bypass the platform's auto-generated scaling rules and define your own KEDA configuration - giving you full control over how your function apps scale. Why We Built This Azure Functions on Container Apps automatically generates KEDA scale rules from your function triggers. This works great for most workloads, but we heard from customers that the platform defaults don't always match their scaling needs: Message-driven workloads that need a higher threshold before scaling out - to control costs and avoid over-provisioning. CPU-intensive functions where each replica should process fewer messages concurrently, independent of the scaling threshold. Advanced scenarios requiring KEDA scalers that the platform doesn't auto-generate from trigger metadata. A common pattern we saw: teams needed to decouple when to add replicas from how much work each replica handles. The default threshold and maxConcurrentCalls in host.json are tightly coupled - adjusting one affects the other. With custom scale rules, these concerns are fully independent. What's New: `allowScalingRuleOverride` A new property on the Container Apps scale configuration: { "properties":{ "template":{ "scale":{ "allowScalingRuleOverride":true, "rules":[ { "name":"my-servicebus-rule", "custom":{ "type":"azure-servicebus", "metadata":{ "queueName":"orders-queue", "messageCount":"50", "connectionFromEnv":"ServiceBusConnection" } } } ] } } } } When set to `true`: Your custom rules override the platform's auto-generated scaling configuration. Your custom rules become the sole source of scaling decisions. You control the threshold - set the exact values that match your workload. This works with any KEDA scaler - Service Bus, Azure Queue, Kafka, PostgreSQL, Cron, HTTP concurrency, or any of the 60+ scalers KEDA supports (https://keda.sh/docs/scalers/). Getting Started See the full documentation for step-by-step instructions: https://learn.microsoft.com/en-us/azure/container-apps/functions-scale-rule-override Enable the Override Use the Container Apps REST API to PATCH your function app: API endpoint: https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroup}/providers/Microsoft.App/containerApps/{appName}?api-version=2026-03-02-preview PATCH body: { "properties":{ "template":{ "scale":{ "allowScalingRuleOverride":true, "minReplicas":0, "maxReplicas":10, "rules":[ { "name":"controlled-sb-rule", "custom":{ "type":"azure-servicebus", "metadata":{ "queueName":"my-queue", "messageCount":"50", "connectionFromEnv":"ServiceBusConnection" } } } ] } } } } Important Details Functions-only: This property applies to Container Apps with kind=functionapp. Non-Functions apps will receive an AllowScalingRuleOverrideNotApplicable error. You own the rules: When override is enabled, Azure won't auto-generate trigger-based rules. You're responsible for providing a valid KEDA scaler configuration. Any KEDA scaler: Not limited to Service Bus or Queue Storage. HTTP concurrency, Kafka topic lag, PostgreSQL queries, Cron - anything KEDA supports. Combine multiple rules: Provide a queue rule AND an HTTP concurrency rule in a single PATCH for multi-signal scaling. Safe revert: To switch back to platform-managed scaling, set allowScalingRuleOverride: false with "rules": []. The API rejects the request if custom rules are still present - protecting you from accidental deletion. Learn More Documentation-Override auto-generated KEDA scale rules-https://learn.microsoft.com/en-us/azure/container-apps/functions-scale-rule-override Azure Functions KEDA scaling mappings- https://learn.microsoft.com/en-us/azure/container-apps/functions-keda-mappings Scale an app in Azure Container Apps- https://learn.microsoft.com/en-us/azure/container-apps/scale-app KEDA Scalers Reference- https://keda.sh/docs/scalers/ We're excited to see how you use custom scale rules to optimize your workloads. Try it out today and share your feedback - we're listening.194Views0likes0CommentsRunning Foundry Agent Service on Azure Container Apps
Microsoft’s Customer Zero blog series gives an insider view of how Microsoft builds and operates Microsoft using our trusted, enterprise-grade agentic platform. Learn best practices from our engineering teams with real-world lessons, architectural patterns, and operational strategies for pressure-tested solutions in building, operating, and scaling AI apps and agent fleets across the organization. Challenge: Scaling agents to production changes the requirements As teams move from experimenting with AI agents to running them in production, the questions they ask begin to change. Early prototypes often focus on whether an agent can reason to generate useful output. But once agents are placed into real systems where they continuously need to serve users and respond to events, new concerns quickly take center stage: reliability, scale, observability, security, and long‑running operations. A common misconception at this stage is to think of an agent as a simple chatbot wrapped around an API. In practice, an AI agent is something very different. It is a service that listens, thinks, and acts, ingesting unstructured inputs, reasoning over context, and producing outputs that may span multiple phases. Treating agents as services means teams often need more than they initially expect: dependable compute, strong security, and real-time visibility to run agents safely and effectively at scale. When we kick off an agent loop, we provide input that informs the context it recalls for the task, the data it connects to, the tools it calls, and the reasoning steps it outlines for itself to generate an output. Agent needs are different from traditional services in hosting, scaling, identity, security, and observability; it’s a product with a probabilistic nature that requires secure, auditable access to many resources at the same lightspeed performance that users expect from any software. This isn’t the first time that the software industry needed to evolve its thinking around infrastructure. When modern application architectures began shifting from monolithic apps toward microservices, existing infrastructure wasn’t built with that model in mind. As systems were reconstructed into independent services, teams quickly discovered they needed new runtime architecture that properly accommodated microservice needs. The modern app era brought new levels of performance, reliability, and scalability of apps, but it also warranted that we rebuild app infrastructure with container orchestration and new operational patterns in mind. AI agents represent a similar inflection. Infrastructure designed for request‑response applications or stateless workloads wasn’t built with long‑running, tool‑calling, AI‑driven workflows in mind. As the builders of Foundry Agent Service, we were very aware that traditional architectures wouldn’t hold up to the bursty agentic workflows that needed to aggregate data across sources, connect to several simultaneous tools, and reason through execution plans for the output that we needed. Rather than building new infrastructure from scratch, the choice for building on Azure Container Apps was clear. With over a million Apps hosted on Azure Container Apps, it was the tried-and-true solution we needed to keep our team focused on building agent intelligence and behavior instead of the plumbing underneath. Solution: Building Foundry Agent Service on a resilient agent runtime foundation Foundry Agent Service is Microsoft’s fully managed platform for building, deploying, and scaling AI agents as production services. Builders start by choosing their preferred framework or immediately building an agent inside Foundry, while Foundry Agent Service handles the operational complexity required to run agents at scale. Let’s use the example of a sales agent in Foundry Agent Service. You might have a salesperson who prompts a sales agent with “Help me prepare for my upcoming meeting with customer Contoso.” The agent is going to kick off several processes across data and tools to generate the best answer: Work IQ to understand Teams conversations with Contoso, Fabric IQ for current product usage and forecast trends, Foundry IQ to do an AI search over internal sales materials, and even GitHub Copilot SDK to generate and execute code that can draft PowerPoint and Word artifacts for the meeting. And this is just one agent; more than 20,000 customers rely on Foundry Agent Service. At the core of Foundry Agent Service is a dedicated agent runtime through Azure Container Apps that explicitly meets our demands for production agents. Agent runtime through flexible cloud infrastructure allows builders to focus on making powerful agent experiences without worrying about under-the-hood compute and configurations. This runtime is built around five foundational pillars: Fast startup and resume. Agents are event‑driven and often bursty. Responsiveness depends on the ability to start or resume execution quickly when events arrive. Built‑in agent tool execution. Agents must securely execute tool calls like APIs, workflows, and services as part of their reasoning process, without fragile glue code or ad‑hoc orchestration. State persistence and restore. Many agent workflows are long‑running and multi‑phase. The runtime must allow agents to reason, pause, and resume with safely preserved state. Strong isolation per agent task. As agents execute code and tools dynamically, isolation is critical to prevent data leakage and contain blast radius. Secure by default. Identity, access, and execution controls are enforced at the runtime layer rather than bolted on after the fact. Together, these pillars define what it means to run AI agents as first‑class production services. Impact: How Azure Container Apps powers agent runtime Building and operating agent infrastructure from scratch introduces unnecessary complexity and risk. Azure Container Apps has been pressure‑tested at Microsoft scale, proving to be a powerful, serverless foundation for running AI workloads and aligns naturally with the needs of agent runtime. It provides serverless, event‑driven scaling with fast startup and scale‑to‑zero, which is critical for agents with unpredictable execution patterns. Execution is secure by default, with built‑in identity, isolation, and security boundaries enforced at the platform layer. Azure Container Apps natively supports running MCP servers and executing full agent workflows, while Container Apps jobs enable on‑demand tool execution for discrete units of work without custom orchestration. For scenarios involving AI‑generated or untrusted code, dynamic sessions allow execution in isolated sandboxes, keeping blast radius contained. Azure Container Apps also supports running model inference directly within the container boundary, helping preserve data residency and reduce unnecessary data movement. Learnings for your agent runtime foundation Make infrastructure flexible with serverless architecture. AI systems move too fast to create infrastructure from scratch. With bursty, unpredictable agent workloads, sub‑second startup times and serverless scaling are critical. Simplify heavy lifting. Developers should focus on agent behavior, tool invocation, and workflow design instead of infrastructure plumbing. Using trusted cloud infrastructure, pain points like making sure agents run in isolated sandboxes, properly applying security policy to agent IDs, and ensuring secure connections to virtual networks are already solved. When you simplify the operational overhead, you make it easier for developers to focus on meaningful innovation. Invest in visibility and monitoring. Strong observability enables faster iteration, safer evolution, and continuous self‑correction for both humans and agents as systems adapt over time. Want to learn more? Learn about building and hosting agents with Foundry Agent Service Discover agent runtime through Azure Container Apps Read about best practices for managing agents277Views1like0CommentsWhy Does Azure App Service Return HTTP 404?
When an application deployed to Azure App Service suddenly starts returning HTTP 404 – Not Found, it can be confusing —especially when: The deployment completed successfully The App Service shows as Running No obvious errors appear in the portal This behaviour is more common than it appears and is often linked to routing, configuration, or platform : In this article, I’ll walk through real-world reasons why Azure App Service can return HTTP 404 errors, based on issues . The goal is to help you systematically isolate the root cause—whether it’s application-level, configuration-related, or platform-specific. What Does HTTP 404 Mean in Azure App Service? An HTTP 404 response from Azure App Service means: The incoming request successfully reached Azure App Service, but neither the platform nor the application could locate the requested resource. This distinction is important. Unlike connectivity or DNS issues, a 404 confirms that: DNS resolution worked The request hit the App Service front end The failure happened after request routing Incorrect Application URL or Route This is the most common cause of 404 errors. Typical scenarios Accessing the root URL (https://<app>.azurewebsites.net) for a Web API that exposes only API routes Missing route prefixes such as /api , /v1controller/action name segments Case sensitivity mismatches on Linux App Service Example https://myapp.azurewebsites.net Returns 404, but: https://myapp.azurewebsites.net/weatherforecast Works as expected. ✅ Tip: Always validate your routing locally and confirm the exact same path is being accessed in Azure. Application Appears Running, but Startup Failed Partially It is possible for an App Service to show Running even when the application failed to initialize fully. Common causes Missing or incorrect environment variables Invalid connection strings Exceptions thrown during Program.cs / Startup.cs Dependency initialization failures at startup In such scenarios, the app may start the host process but fail to register routes—resulting in 404 responses instead of 500 errors. ✅ Where to check Application logs Deployment logs Kudu → LogFiles Static Files Not Found or Not Being Served For applications hosting static content (HTML, JavaScript, images, JSON files), a 404 can occur even when files exist. Common reasons Files not deployed to the expected directory (wwor root, /home/site/wwwroot) Missing or unsupported MIME type configuration (commonly seen with .json) Static file middleware not enabled in ASP.NET Core applications ✅ Quick validation: Deploy a simple test.html to wwwroot and try accessing it directly. Windows vs Linux App Service Differences Behaviour can differ significantly between Windows App Service and Linux App Service. Common pitfalls on Linux Case-sensitive file paths (Index.html ≠ index.html) Missing or incorrect startup command Differences in request routing handled by Nginx ✅ Tip: If the app works on Windows App Service but fails on Linux, always recheck file casing and startup configuration first. Custom Domain and Networking Configuration Issues In some cases, requests reach the App Service but fail due to domain or network constraints. Possible causes Incorrect custom domain binding ✅ Isolation step: Always test using the default *.azurewebsites.net specific issues the issue is domain-specific. 6. Health Checks or Monitoring Probes Targeting Invalid Paths Seeing periodic 404 entries in logs—every few minutes—is often a sign of misconfigured probes. Typical scenarios App Service Health Check configured with a non-existent endpoint External monitoring tools probing /health or paths that do no exist ✅ Fix: Ensure the health check path maps to a valid endpoint implemented by the application. 7.Missing or Corrupted Deployment Artifacts Even when deployments report success, application files may not be where the runtime expects them. Commonly observed with Zip deployments WEBSITE_RUN_FROM_PACKAGE misconfigurations Partial or interrupted deployments ✅ Verify using Kudu: Browse /home/site/wwwroot and check files are present. Quick Troubleshooting Checklist If your Azure App Service is returning HTTP 404: Verify the exact URL and route Test hostingstart.html or a static file (for example, /hostingstart.html) Review startup and application logs Inspect deployed artifacts via Kudu Validate Windows vs Linux behaviour differences Review networking, authentication, and health check settings 8. Application Gateway infront of App Service If you have Application gateway infront of app service , please check the re-write rules so that the request is being sent to correct path. Final Thoughts HTTP 404 errors on Azure App Service are rarely random. In most cases, they point to: Routing mismatches Startup or configuration failures Platform-specific behavior differences By breaking the investigation into platform → configuration → application, you can systematically narrow down the root cause and resolve the issue. Happy debugging 🚀394Views1like0CommentsAnnouncing general availability for the Azure SRE Agent
Today, we’re excited to announce the General Availability (GA) of Azure SRE Agent— your AI‑powered operations teammate that helps organizations improve uptime, reduce incident impact, and cut operational toil by accelerating diagnosis and automating response workflows.14KViews1like2CommentsGemma 4 on Azure Container Apps Serverless GPU
Every prompt you send to a hosted AI service leaves your tenant. Your code, your architecture decisions, your proprietary logic — all of it crosses a network boundary you don't control. For teams building in regulated industries or handling sensitive IP, that's not a philosophical concern. It's a compliance blocker. What if you could spin up a fully private AI coding agent — running on your own GPU, in your own Azure subscription — with a single command? That's exactly what this template does. One azd up , 15 minutes, and you have Google's Gemma 4 running on Azure Container Apps serverless GPU with an OpenAI-compatible API, protected by auth, and ready to power OpenCode as your terminal-based coding agent. No data leaves your environment. No third-party model provider sees your code. Full control. Why Self-Hosted AI on ACA? Azure Container Apps serverless GPU gives you on-demand GPU compute without managing VMs, Kubernetes clusters, or GPU drivers. You get a container, a GPU, and an HTTPS endpoint — Azure handles the rest. Here's what makes this approach different from calling a hosted model API: Complete data privacy — your code and prompts never leave your Azure subscription. No PII exposure, no data leakage, no third-party processing. For teams navigating HIPAA, SOC 2, or internal IP policies, this is the simplest path to compliant AI-assisted development. Predictable costs — you pay for GPU compute time, not per-token. Run as many prompts as you want against your deployed model. No rate limits — the GPU is yours. No throttling, no queue, no waiting for capacity. Model flexibility — swap models in minutes. Start with the 4B parameter Gemma 4 for fast iteration, scale up to 26B for complex reasoning tasks. This isn't a tradeoff between convenience and privacy. ACA serverless GPU makes self-hosted AI as easy to deploy as any SaaS endpoint — but the data stays yours. What You're Building The template deploys two containers into an Azure Container Apps environment: Ollama + Gemma 4 — running on a serverless GPU (NVIDIA T4 or A100), serving an OpenAI-compatible API Nginx auth proxy — a lightweight reverse proxy that adds basic authentication and exposes the endpoint over HTTPS The Ollama container pulls the Gemma 4 model on first start, so there's nothing to pre-build or upload. The nginx proxy runs on the free Consumption profile — only the Ollama container needs GPU. After deployment, you get a single HTTPS endpoint that works with curl , any OpenAI-compatible SDK, or OpenCode — a terminal-based AI coding agent that turns the whole thing into a private GitHub Copilot alternative. Step 1: Deploy with azd up You need the Azure CLI and Azure Developer CLI (azd) installed. git clone https://github.com/simonjj/gemma4-on-aca.git cd gemma4-on-aca azd up The setup walks you through three choices: GPU selection — T4 (16 GB VRAM) for smaller models, or A100 (80 GB VRAM) for the full Gemma 4 lineup. Model selection — depends on your GPU choice. The defaults are tuned for the best quality-to-speed ratio on each GPU tier. Proxy password — protects your endpoint with basic auth. Region availability: Serverless GPUs are available in various regoins such as australiaeast , brazilsouth , canadacentral , eastus , italynorth , swedencentral , uksouth , westus , and westus3 . Pick one of these when prompted for location. That's it. Provisioning takes about 10 minutes — mostly waiting for the ACA environment to create and the model to download. Choose Your Model Gemma 4 ships in four sizes. The right choice depends on your GPU and workload: Model Params Architecture Context Modalities Disk Size gemma4:e2b ~2B Dense 128K Text, Image, Audio ~7 GB gemma4:e4b ~4B Dense 128K Text, Image, Audio ~10 GB gemma4:26b 26B MoE (4B active) 256K Text, Image ~18 GB gemma4:31b 31B Dense 256K Text, Image ~20 GB Real-World Performance on ACA We benchmarked every model on both GPU tiers using Ollama v0.20 with Q4_K_M quantization and 32K context in Sweden Central: Model GPU Tokens/sec TTFT Notes gemma4:e2b T4 ~81 ~15ms Fastest on T4 gemma4:e4b T4 ~51 ~17ms Default T4 choice — best quality/speed gemma4:e2b A100 ~184 ~9ms Ultra-fast gemma4:e4b A100 ~129 ~12ms Great for lighter workloads gemma4:26b A100 ~113 ~14ms Default A100 choice — strong reasoning gemma4:31b A100 ~40 ~30ms Highest quality, slower 51 tokens/second on a T4 with the 4B model is fast enough for interactive coding assistance. The 26B model on A100 delivers 113 tokens/second with noticeably better reasoning — ideal for complex refactoring, architecture questions, and multi-file changes. The 26B and 31B models require A100 — they don't fit in T4's 16 GB VRAM. Step 2: Verify Your Endpoint After azd up completes, the post-provision hook prints your endpoint URL. Test it: curl -u admin:<YOUR_PASSWORD> \ https://<YOUR_PROXY_ENDPOINT>/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gemma4:e4b", "messages": [{"role": "user", "content": "Hello!"}] }' You should get a JSON response with Gemma 4's reply. The endpoint is fully OpenAI-compatible — it works with any tool or SDK that speaks the OpenAI API format. Step 3: Connect OpenCode Here's where it gets powerful. OpenCode is a terminal-based AI coding agent — think GitHub Copilot, but running in your terminal and pointing at whatever model backend you choose. The azd up post-provision hook automatically generates an opencode.json in your project directory with the correct endpoint and credentials. If you need to create it manually: { "$schema": "https://opencode.ai/config.json", "provider": { "gemma4-aca": { "npm": "@ai-sdk/openai-compatible", "name": "Gemma 4 on ACA", "options": { "baseURL": "https://<YOUR_PROXY_ENDPOINT>/v1", "headers": { "Authorization": "Basic <BASE64_OF_admin:YOUR_PASSWORD>" } }, "models": { "gemma4:e4b": { "name": "Gemma 4 e4b (4B)" } } } } } Generate the Base64 value: echo -n "admin:YOUR_PASSWORD" | base64 Now run it: opencode run -m "gemma4-aca/gemma4:e4b" "Write a binary search in Rust" That command sends your prompt to Gemma 4 running on your ACA GPU, and streams the response back to your terminal. Every token is generated on your infrastructure. Nothing leaves your subscription. For interactive sessions, launch the TUI: opencode Select your model with /models , pick Gemma 4, and start coding. OpenCode supports file editing, code generation, refactoring, and multi-turn conversations — all powered by your private Gemma 4 instance. The Privacy Case This matters most for teams that can't send code to external APIs: HIPAA-regulated healthcare apps — patient data in code, schema definitions, and test fixtures stays in your Azure subscription Financial services — proprietary trading algorithms and risk models never leave your network boundary Defense and government — classified or CUI-adjacent codebases get AI assistance without external data processing agreements Startups with sensitive IP — your secret sauce stays secret, even while you use AI to build faster With ACA serverless GPU, you're not running a VM or managing a Kubernetes cluster to get this privacy. It's a managed container with a GPU attached. Azure handles the infrastructure, you own the data boundary. Clean Up When you're done: azd down This tears down all Azure resources. Since ACA serverless GPU bills only while your containers are running, you can also scale to zero replicas to pause costs without destroying the environment. Get Started 📖 gemma4-on-aca on GitHub — clone it, run azd up , and you're live 🤖 OpenCode — the terminal AI agent that connects to your Gemma 4 endpoint 📌 Gemma 4 docs — model architecture and capabilities 📌 ACA serverless GPU — GPU regions and workload profile details989Views0likes0CommentsAzure Functions Ignite 2025 Update
Azure Functions is redefining event-driven applications and high-scale APIs in 2025, accelerating innovation for developers building the next generation of intelligent, resilient, and scalable workloads. This year, our focus has been on empowering AI and agentic scenarios: remote MCP server hosting, bulletproofing agents with Durable Functions, and first-class support for critical technologies like OpenTelemetry, .NET 10 and Aspire. With major advances in serverless Flex Consumption, enhanced performance, security, and deployment fundamentals across Elastic Premium and Flex, Azure Functions is the platform of choice for building modern, enterprise-grade solutions. Remote MCP Model Context Protocol (MCP) has taken the world by storm, offering an agent a mechanism to discover and work deeply with the capabilities and context of tools. When you want to expose MCP/tools to your enterprise or the world securely, we recommend you think deeply about building remote MCP servers that are designed to run securely at scale. Azure Functions is uniquely optimized to run your MCP servers at scale, offering serverless and highly scalable features of Flex Consumption plan, plus two flexible programming model options discussed below. All come together using the hardened Functions service plus new authentication modes for Entra and OAuth using Built-in authentication. Remote MCP Triggers and Bindings Extension GA Back in April, we shared a new extension that allows you to author MCP servers using functions with the MCP tool trigger. That MCP extension is now generally available, with support for C#(.NET), Java, JavaScript (Node.js), Python, and Typescript (Node.js). The MCP tool trigger allows you to focus on what matters most: the logic of the tool you want to expose to agents. Functions will take care of all the protocol and server logistics, with the ability to scale out to support as many sessions as you want to throw at it. [Function(nameof(GetSnippet))] public object GetSnippet( [McpToolTrigger(GetSnippetToolName, GetSnippetToolDescription)] ToolInvocationContext context, [BlobInput(BlobPath)] string snippetContent ) { return snippetContent; } New: Self-hosted MCP Server (Preview) If you’ve built servers with official MCP SDKs and want to run them as remote cloud‑scale servers without re‑writing any code, this public preview is for you. You can now self‑host your MCP server on Azure Functions—keep your existing Python, TypeScript, .NET, or Java code and get rapid 0 to N scaling, built-in server authentication and authorization, consumption-based billing, and more from the underlying Azure Functions service. This feature complements the Azure Functions MCP extension for building MCP servers using the Functions programming model (triggers & bindings). Pick the path that fits your scenario—build with the extension or standard MCP SDKs. Either way you benefit from the same scalable, secure, and serverless platform. Use the official MCP SDKs: # MCP.tool() async def get_alerts(state: str) -> str: """Get weather alerts for a US state. Args: state: Two-letter US state code (e.g. CA, NY) """ url = f"{NWS_API_BASE}/alerts/active/area/{state}" data = await make_nws_request(url) if not data or "features" not in data: return "Unable to fetch alerts or no alerts found." if not data["features"]: return "No active alerts for this state." alerts = [format_alert(feature) for feature in data["features"]] return "\n---\n".join(alerts) Use Azure Functions Flex Consumption Plan's serverless compute using Custom Handlers in host.json: { "version": "2.0", "configurationProfile": "mcp-custom-handler", "customHandler": { "description": { "defaultExecutablePath": "python", "arguments": ["weather.py"] }, "http": { "DefaultAuthorizationLevel": "anonymous" }, "port": "8000" } } Learn more about MCPTrigger and self-hosted MCP servers at https://aka.ms/remote-mcp Built-in MCP server authorization (Preview) The built-in authentication and authorization feature can now be used for MCP server authorization, using a new preview option. You can quickly define identity-based access control for your MCP servers with Microsoft Entra ID or other OpenID Connect providers. Learn more at https://aka.ms/functions-mcp-server-authorization. Better together with Foundry agents Microsoft Foundry is the starting point for building intelligent agents, and Azure Functions is the natural next step for extending those agents with remote MCP tools. Running your tools on Functions gives you clean separation of concerns, reuse across multiple agents, and strong security isolation. And with built-in authorization, Functions enables enterprise-ready authentication patterns, from calling downstream services with the agent’s identity to operating on behalf of end users with their delegated permissions. Build your first remote MCP server and connect it to your Foundry agent at https://aka.ms/foundry-functions-mcp-tutorial. Agents Microsoft Agent Framework 2.0 (Public Preview Refresh) We’re excited about the preview refresh 2.0 release of Microsoft Agent Framework that builds on battle hardened work from Semantic Kernel and AutoGen. Agent Framework is an outstanding solution for building multi-agent orchestrations that are both simple and powerful. Azure Functions is a strong fit to host Agent Framework with the service’s extreme scale, serverless billing, and enterprise grade features like VNET networking and built-in auth. Durable Task Extension for Microsoft Agent Framework (Preview) The durable task extension for Microsoft Agent Framework transforms how you build production-ready, resilient and scalable AI agents by bringing the proven durable execution (survives crashes and restarts) and distributed execution (runs across multiple instances) capabilities of Azure Durable Functions directly into the Microsoft Agent Framework. Combined with Azure Functions for hosting and event-driven execution, you can now deploy stateful, resilient AI agents that automatically handle session management, failure recovery, and scaling, freeing you to focus entirely on your agent logic. Key features of the durable task extension include: Serverless Hosting: Deploy agents on Azure Functions with auto-scaling from thousands of instances to zero, while retaining full control in a serverless architecture. Automatic Session Management: Agents maintain persistent sessions with full conversation context that survives process crashes, restarts, and distributed execution across instances Deterministic Multi-Agent Orchestrations: Coordinate specialized durable agents with predictable, repeatable, code-driven execution patterns Human-in-the-Loop with Serverless Cost Savings: Pause for human input without consuming compute resources or incurring costs Built-in Observability with Durable Task Scheduler: Deep visibility into agent operations and orchestrations through the Durable Task Scheduler UI dashboard Create a durable agent: endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4o-mini") # Create an AI agent following the standard Microsoft Agent Framework pattern agent = AzureOpenAIChatClient( endpoint=endpoint, deployment_name=deployment_name, credential=AzureCliCredential() ).create_agent( instructions="""You are a professional content writer who creates engaging, well-structured documents for any given topic. When given a topic, you will: 1. Research the topic using the web search tool 2. Generate an outline for the document 3. Write a compelling document with proper formatting 4. Include relevant examples and citations""", name="DocumentPublisher", tools=[ AIFunctionFactory.Create(search_web), AIFunctionFactory.Create(generate_outline) ] ) # Configure the function app to host the agent with durable session management app = AgentFunctionApp(agents=[agent]) app.run() Durable Task Scheduler dashboard for agent and agent workflow observability and debugging For more information on the durable task extension for Agent Framework, see the announcement: https://aka.ms/durable-extension-for-af-blog. Flex Consumption Updates As you know, Flex Consumption means serverless without compromise. It combines elastic scale and pay‑for‑what‑you‑use pricing with the controls you expect: per‑instance concurrency, longer executions, VNet/private networking, and Always Ready instances to minimize cold starts. Since launching GA at Ignite 2024 last year, Flex Consumption has had tremendous growth with over 1.5 billion function executions per day and nearly 40 thousand apps. Here’s what’s new for Ignite 2025: 512 MB instance size (GA). Right‑size lighter workloads, scale farther within default quota. Availability Zones (GA). Distribute instances across zones. Rolling updates (Public Preview). Unlock zero-downtime deployments of code or config by setting a single configuration. See below for more information. Even more improvements including: new diagnostic settingsto route logs/metrics, use Key Vault App Config references, new regions, and Custom Handler support. To get started, review Flex Consumption samples, or dive into the documentation to see how Flex can support your workloads. Migrating to Azure Functions Flex Consumption Migrating to Flex Consumption is simple with our step-by-step guides and agentic tools. Move your Azure Functions apps or AWS Lambda workloads, update your code and configuration, and take advantage of new automation tools. With Linux Consumption retiring, now is the time to switch. For more information, see: Migrate Consumption plan apps to the Flex Consumption plan Migrate AWS Lambda workloads to Azure Functions Durable Functions Durable Functions introduces powerful new features to help you build resilient, production-ready workflows: Distributed Tracing: lets you track requests across components and systems, giving you deep visibility into orchestration and activities with support for App Insights and OpenTelemetry. Extended Sessions support in .NET isolated: improves performance by caching orchestrations in memory, ideal for fast sequential activities and large fan-out/fan-in patterns. Orchestration versioning (public preview): enables zero-downtime deployments and backward compatibility, so you can safely roll out changes without disrupting in-flight workflows Durable Task Scheduler Updates Durable Task Scheduler Dedicated SKU (GA): Now generally available, the Dedicated SKU offers advanced orchestration for complex workflows and intelligent apps. It provides predictable pricing for steady workloads, automatic checkpointing, state protection, and advanced monitoring for resilient, reliable execution. Durable Task Scheduler Consumption SKU (Public Preview): The new Consumption SKU brings serverless, pay-as-you-go orchestration to dynamic and variable workloads. It delivers the same orchestration capabilities with flexible billing, making it easy to scale intelligent applications as needed. For more information see: https://aka.ms/dts-ga-blog OpenTelemetry support in GA Azure Functions OpenTelemetry is now generally available, bringing unified, production-ready observability to serverless applications. Developers can now export logs, traces, and metrics using open standards—enabling consistent monitoring and troubleshooting across every workload. Key capabilities include: Unified observability: Standardize logs, traces, and metrics across all your serverless workloads for consistent monitoring and troubleshooting. Vendor-neutral telemetry: Integrate seamlessly with Azure Monitor or any OpenTelemetry-compliant backend, ensuring flexibility and choice. Broad language support: Works with .NET (isolated), Java, JavaScript, Python, PowerShell, and TypeScript. Start using OpenTelemetry in Azure Functions today to unlock standards-based observability for your apps. For step-by-step guidance on enabling OpenTelemetry and configuring exporters for your preferred backend, see the documentation. Deployment with Rolling Updates (Preview) Achieving zero-downtime deployments has never been easier. The Flex Consumption plan now offers rolling updates as a site update strategy. Set a single property, and all future code deployments and configuration changes will be released with zero-downtime. Instead of restarting all instances at once, the platform now drains existing instances in batches while scaling out the latest version to match real-time demand. This ensures uninterrupted in-flight executions and resilient throughput across your HTTP, non-HTTP, and Durable workloads – even during intensive scale-out scenarios. Rolling updates are now in public preview. Learn more at https://aka.ms/functions/rolling-updates. Secure Identity and Networking Everywhere By Design Security and trust are paramount. Azure Functions incorporates proven best practices by design, with full support for managed identity—eliminating secrets and simplifying secure authentication and authorization. Flex Consumption and other plans offer enterprise-grade networking features like VNETs, private endpoints, and NAT gateways for deep protection. The Azure Portal streamlines secure function creation, and updated scenarios and samples showcase these identity and networking capabilities in action. Built-in authentication (discussed above) enables inbound client traffic to use identity as well. Check out our updated Functions Scenarios page with quickstarts or our secure samples gallery to see these identity and networking best practices in action. .NET 10 Azure Functions now supports .NET 10, bringing in a great suite of new features and performance benefits for your code. .NET 10 is supported on the isolated worker model, and it’s available for all plan types except Linux Consumption. As a reminder, support ends for the legacy in-process model on November 10, 2026, and the in-process model is not being updated with .NET 10. To stay supported and take advantage of the latest features, migrate to the isolated worker model. Aspire Aspire is an opinionated stack that simplifies development of distributed applications in the cloud. The Azure Functions integration for Aspire enables you to develop, debug, and orchestrate an Azure Functions .NET project as part of an Aspire solution. Aspire publish directly deploys to your functions to Azure Functions on Azure Container Apps. Aspire 13 includes an updated preview version of the Functions integration that acts as a release candidate with go-live support. The package will be moved to GA quality with Aspire 13.1. Java 25, Node.js 24 Azure Functions now supports Java 25 and Node.js 24 in preview. You can now develop functions using these versions locally and deploy them to Azure Functions plans. Learn how to upgrade your apps to these versions here In Summary Ready to build what’s next? Update your Azure Functions Core Tools today and explore the latest samples and quickstarts to unlock new capabilities for your scenarios. The guided quickstarts run and deploy in under 5 minutes, and incorporate best practices—from architecture to security to deployment. We’ve made it easier than ever to scaffold, deploy, and scale real-world solutions with confidence. The future of intelligent, scalable, and secure applications starts now—jump in and see what you can create!3.7KViews1like2CommentsBring Your Own Model (BYOM) for Azure AI Applications using Azure Machine Learning
Modern AI-powered applications running on Azure increasingly require flexibility in model choice. While managed model catalogs accelerate time to value, real-world enterprise applications often need to: Host open‑source or fine‑tuned models Deploy domain‑specific or regulated models inside a tenant boundary Maintain tight control over runtime environments and versions Integrate AI inference into existing application architectures This is where Bring Your Own Model (BYOM) becomes a core architectural capability, not just an AI feature. In this post, we’ll walk through a production-ready BYOM pattern for Azure applications, using: Azure Machine Learning as the model lifecycle and inference platform Azure-hosted applications (and optionally Microsoft Foundry) as the orchestration layer The focus is on building scalable, governable AI-powered apps on Azure, not platform lock‑in. We use SmolLM‑135M as a reference model. The same pattern applies to any open‑source or proprietary model. Reference Architecture: Azure BYOM for AI Applications At a high level, the responsibilities are clearly separated: Azure Layer Responsibility Azure Application Layer API, app logic, orchestration, agent logic Azure Machine Learning Model registration, environments, scalable inference Azure Identity & Networking Authentication, RBAC, private endpoints Key principle: Applications orchestrate. Azure ML executes the model. This keeps AI workloads modular, auditable, and production-safe. BYOM Workflow Overview Provision Azure Machine Learning Create Azure ML compute Author code in an Azure ML notebook Download and package the model Register the model Define a reproducible inference environment Implement scoring logic Deploy a managed online endpoint Use the endpoint from Microsoft Foundry Step 1: Provision Azure Machine Learning An Azure ML workspace is the governance boundary for BYOM: Model versioning and lineage Environment definitions Secure endpoint hosting Auditability Choose region carefully for latency, data residency, and networking. Step 2: Create Azure ML Compute (Compute Instance) Create a Compute Instance in Azure ML Studio. Why this matters: Managed Jupyter environment Identity integrated (no secrets in notebooks) Ideal for model packaging and testing - Enable auto‑shutdown for cost control - CPU is sufficient for most development workflows Step 3: Create an Azure ML Notebook Open Azure ML Studio → Notebooks Create a new Python notebook Select the Python SDK v2 kernel This notebook will handle the entire BYOM lifecycle. Step 4: Connect to the Azure ML Workspace # Import Azure ML SDK client from azure.ai.ml import MLClient # Import identity library for secure authentication from azure.identity import DefaultAzureCredential # Define workspace details subscription_id = "<SUBSCRIPTION_ID>" resource_group = "<RESOURCE_GROUP>" workspace_name = "<WORKSPACE_NAME>" # Create MLClient using Microsoft Entra ID # No keys or secrets are embedded in code ml_client = MLClient( DefaultAzureCredential(), subscription_id, resource_group, workspace_name ) The code above uses enterprise identity and aligns with zero‑trust practices. Step 5: Download and Package Model Artifacts from transformers import AutoModelForCausalLM, AutoTokenizer import os # Hugging Face model identifier model_id = "HuggingFaceTB/SmolLM-135M" # Local directory where model artifacts will be stored model_dir = "smollm_135m" os.makedirs(model_dir, exist_ok=True) # Download model weights model = AutoModelForCausalLM.from_pretrained(model_id) # Download tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id) # Save artifacts locally model.save_pretrained(model_dir) tokenizer.save_pretrained(model_dir) 🔹 Open‑source or proprietary models follow the same packaging pattern 🔹 Azure ML treats all registered models identically Step 6: Register the Model in Azure ML Register the packaged artifacts as a custom model asset. Optionally, developers can: Enables version tracking Supports rolling upgrades Integrates with CI/CD pipelines This is the foundation for repeatable inference deployments. from azure.ai.ml.entities import Model # Create a model asset in Azure ML registered_model = Model( path=model_dir, name="SmolLM-135M", description="BYOM model for Microsoft Foundry extensibility", type="custom_model" ) # Register (or update) the model ml_client.models.create_or_update(registered_model) Step 7: Define a Reproducible Inference Environment name: dev-hf-base channels: - conda-forge dependencies: - python=3.12 - numpy=2.3.1 - pip=25.1.1 - scipy=1.16.1 - pip: - azureml-inference-server-http==1.4.1 - inference-schema[numpy-support] - accelerate==1.10.0 - einops==0.8.1 - torch==2.0.0 - transformers==4.55.2 ⚠️ Environment management is the hardest part of BYOM ✅ Treat environment changes like code changes BYOM Inference Patterns The same model can expose multiple behaviors. Pattern 1: Text Generation Endpoint This is the most common pattern for AI-powered applications: REST-based text generation Stateless inference Horizontal scaling through Azure ML managed endpoints Ideal for: Copilots Chat APIs Summarization or content generation services Scoring Script (score.py) import os import json import torch from transformers import AutoTokenizer, AutoModelForCausalLM def init(): """ Called once when the container starts. Loads the model and tokenizer into memory. """ global model, tokenizer # Azure ML injects model path at runtime model_dir = os.getenv("AZUREML_MODEL_DIR") tokenizer = AutoTokenizer.from_pretrained(model_dir) model = AutoModelForCausalLM.from_pretrained(model_dir) model.eval() def run(raw_data): """ Called for each inference request. Expects JSON input with a 'prompt' field. """ data = json.loads(raw_data) prompt = data.get("prompt", "") # Tokenize input text inputs = tokenizer(prompt, return_tensors="pt") # Generate text without tracking gradients with torch.no_grad(): outputs = model.generate(**inputs, max_new_tokens=100) # Decode output tokens into text response_text = tokenizer.decode(outputs[0], skip_special_tokens=True) return {"response": response_text} Example Request { "prompt": "Summarize the BYOM pattern in one sentence." } Example Response { "response": "Bring Your Own Model (BYOM) allows organizations to extend Microsoft Foundry with custom models hosted on Azure Machine Learning while maintaining enterprise governance and scalability." } Pattern 2: Predictive / Token Rank Analysis The same model can expose non-generative behaviors, such as: Token likelihood analysis Ranking or scoring Model introspection services This enables AI-backed analytics capabilities, not just chat. import torch from transformers import AutoModelForCausalLM, AutoTokenizer class PredictiveAnalysisModel: """ Computes the rank of each token based on the model's next-token probability distribution. """ def init(self, model, tokenizer): self.model = model self.tokenizer = tokenizer self.model.eval() def analyze(self, text): tokens = self.tokenizer.tokenize(text) token_ids = self.tokenizer.convert_tokens_to_ids(tokens) # Start with BOS token input_sequence = [self.tokenizer.bos_token_id, *token_ids] results = [] for i in range(len(token_ids)): context = input_sequence[: i + 1] model_input = torch.tensor([context]) with torch.no_grad(): outputs = self.model(model_input) logits = outputs.logits[0, -1] sorted_indices = torch.argsort(logits, descending=True) actual_token = token_ids[i] rank = (sorted_indices == actual_token).nonzero(as_tuple=True)[0].item() results.append({ "token": tokens[i], "rank": rank }) return results @classmethod def from_disk(cls, model_path): model = AutoModelForCausalLM.from_pretrained(model_path) tokenizer = AutoTokenizer.from_pretrained(model_path) return cls(model, tokenizer) Scoring Script (score.py) import os from predictive_analysis import PredictiveAnalysisModel def init(): """ Loads predictive analysis model from disk. """ global model model_dir = os.getenv("AZUREML_MODEL_DIR") model = PredictiveAnalysisModel.from_disk(model_dir) def run(text: str): """ Accepts raw text input and returns token ranks. """ return { "token_ranks": model.analyze(text) } Example Request { "text": "This is a test." } Example Response { "token_ranks": [ { "token": "This", "rank": 518 }, { "token": " is", "rank": 2 }, { "token": " a", "rank": 0 }, { "token": " test", "rank": 33 }, { "token": ".", "rank": 77 } ] } Consuming the BYOM Endpoint from Azure Applications Azure ML endpoints are external inference services consumed by apps. Option A: Application-Controlled Invocation App calls Azure ML endpoint directly IAM, networking, and retries controlled by the app Recommended for most production systems import requests import os AML_ENDPOINT = os.environ["AML_ENDPOINT"] AML_KEY = os.environ["AML_KEY"] headers = { "Authorization": f"Bearer {AML_KEY}", "Content-Type": "application/json" } payload = { "prompt": "Summarize BYOM in one sentence." } response = requests.post(AML_ENDPOINT, json=payload, headers=headers) print(response.json()) Option B: Tool-Based Invocation Expose the ML endpoint as an OpenAPI interface Allow higher-level orchestration layers (such as agents) to invoke it dynamically Both patterns integrate cleanly with Azure App Services, Container Apps, Functions, and Kubernetes-based apps. Operational Considerations Dependency management is ongoing work Model upgrades require redeployment Private networking must be planned early Use managed Foundry models where possible Use BYOM when business or regulatory needs require it Security and Governance by Default BYOM on Azure ML integrates natively with Azure platform controls: Entra ID & managed identity RBAC-based permissions Private networking and VNET isolation Centralized logging and diagnostics This makes BYOM suitable for regulated industries and production‑critical AI workloads. When Should You Use BYOM? BYOM is the right choice when: You need model choice independence You want to deploy open‑source or proprietary LLMs You require enterprise‑grade controls You are building AI APIs, agents, or copilots at scale For experimentation, higher‑level tooling may be faster. For production, BYOM provides the control and durability enterprises require. Conclusion Azure applications increasingly depend on AI, but models should not dictate architecture. With Azure Machine Learning as the execution layer and Azure Apps as the orchestration layer, organizations can: combine managed and custom models Enforce security and compliance Scale AI workloads reliably Avoid platform and vendor lock-in Bring Your Own Model (BYOM) is no longer a niche requirement. It is a foundational pattern for enterprise AI platforms. Azure Machine Learning enables BYOM across open‑source models, fine‑tuned variants, and proprietary LLMs, allowing organizations to innovate without being locked into a single model provider. You build the application. Azure delivers the platform. You own the model. That is the essence of BYOM on Azure.746Views1like0Comments