Blog Post

Apps on Azure Blog

5 MIN READ

Open AI's gpt-oss models on Azure Container Apps serverless GPUs

Cary_Chai

Microsoft

Aug 07, 2025

Deploy OpenAI gpt-oss-120b and gpt-oss-20b models to Azure Container Apps serverless GPUs with Ollama

Just yesterday, OpenAI announced the release of gpt-oss-120b and gpt-oss-20b, two new state-of-the-art open-weight language models. These models are designed to run on lighter weight GPU resources ma...

Updated Aug 27, 2025

Version 17.0

Microsoft

Joined July 06, 2022

View Profile

Apps on Azure Blog

Follow this blog board to get notified when there's new activity

powerofzero

Copper Contributor

Aug 21, 2025

This is great but since we can deploy the 120B serverlessly already, what would be really cool is access to the model via responses API so we can get the Assistants vector store and search, code intepreter, MCP support etc. When is this likely to happen?

Cary_Chai
Microsoft
Aug 21, 2025
Hi powerofzero, Azure Container Apps provides GPUs to run your containerized applications. This post shows the simplest path to run gpt-oss via an Ollama container on Azure Container Apps. However, if you prefer to not use Ollama or need additional functionality that the model deployed through Ollama doesn't provide, you can deploy your own containerized version of gpt-oss today where you can create your own Responses API server. If you want to use Ollama still but also have some of the Responses API functionality, solutions like Hugging Face's responses.js should be able to act as a proxy by mapping Responses API calls to chat completions for an Ollama-hosted gpt-oss. You can deploy all of these solutions on container apps today, just switch out the container/ingress port details in this post.