Unlocking new AI workloads in Azure Container Apps

Microsoft

May 19, 2025

The rapid rise of AI has unlocked powerful new scenarios—from AI-powered chatbots and image generation to advanced agents. However, deploying AI models at scale presents real challenges including managing compute-intensive workloads, complexities with model deployment, and executing untrusted AI-generated code safely.

Azure Container Apps addresses these challenges by offering a fully managed, flexible, serverless container platform designed for modern, cloud-native applications – now with GA of Dedicated GPUs, improved integrations for deploying Foundry models to Azure Container Apps, and the private preview of GPU powered dynamic sessions.

Foundry Model Integration

Azure AI Foundry Models support a wide collection of ready-to-deploy models. Traditionally, these models can be deployed within AI Foundry either as serverless APIs with pay-as-you-go billing for some of the most popular models or as managed compute which has pay-per-GPU pricing but allows deployment of all the Foundry models. Azure Container Apps now provides a third option in public preview by allowing you to deploy Foundry models directly to Azure Container Apps during container app create. You can deploy models using the CLI command:

az containerapp up \
  --name <CONTAINER_APP_NAME> \
  --location <LOCATION> \
  --resource-group <RESOURCE_GROUP_NAME> \
  --model-registry <MODEL_REGISTRY_NAME> \
  --model-name <MODEL_NAME> \
  --model-version <MODEL_VERSION> \

At preview, the CLI command will provide support for a limited subset of Foundry models. The list of supported models can be found here.

General Availability of Dedicated GPUs

Azure Container Apps dedicated GPUs simplify the management overhead required when developing and deploying your AI applications. With built-in support for key platform components like the latest CUDA driver and turnkey networking and security feature enablement, you can focus on your AI application code instead of managing infrastructure.

Dedicated GPUs complement the generally available serverless GPU feature which provides on-demand, flexible GPU compute with automatic scaling, optimized cold starts, per-second billing, reduced operational overhead, and full data governance. Serverless GPUs are ideal for bursty workloads which run for only part of the day while Dedicated GPUs are better suited for GPU workloads which require continuous availability without much idle time or the need to scale to zero.

This feature supports NC A100 v4 series GPUs that are optimized to support your AI workloads. GPU workload profiles are offered as part of the Dedicated Plan and only supported in the Workload Profiles type environment.

Private Preview GPU dynamic sessions

Dynamic Sessions are a great feature in Azure Container Apps for running untrusted code at scale within compute sandboxes that are protected by industry-standard Hyper-V isolation. Today, there are three ways of running dynamic sessions: python code interpreter, node.js code interpreter, and custom container sessions. For private preview, we are also providing a GPU-powered Python code interpreter for your sessions to better handle your AI workloads and run AI-generated, untrusted code. Microsoft Dev Box offers an integration with Serverless GPU in Dynamic Sessions through the Dev Box GPU Shell feature.

To sign-up for the private preview, visit the sign-up link.

Catch us at Microsoft Build 2025

For a full list of Azure Container Apps Updates, see What's new in Azure Container Apps at Build'25 | Microsoft Community Hub. We'll be showcasing these features and how Azure Container Apps can be used to unlock your AI scenarios in BRK190 on Wednesday, May 21. In this session, we'll be joined by our partners at NVIDIA who will showcase how they are leveraging serverless GPUs.

Wrapping up

As always, we invite you to visit our GitHub page for feedback, feature requests, or questions about Azure Container Apps, where you can open a new issue or up-vote existing ones. If you’re curious about what we’re working on next, check out our roadmap. We look forward to hearing from you!

Updated May 19, 2025

Version 1.0

Microsoft

Joined July 06, 2022

View Profile