Blog Post
Open AI's gpt-oss models on Azure Container Apps serverless GPUs
This is great but since we can deploy the 120B serverlessly already, what would be really cool is access to the model via responses API so we can get the Assistants vector store and search, code intepreter, MCP support etc. When is this likely to happen?
- Cary_ChaiAug 21, 2025
Microsoft
Hi powerofzero, Azure Container Apps provides GPUs to run your containerized applications. This post shows the simplest path to run gpt-oss via an Ollama container on Azure Container Apps. However, if you prefer to not use Ollama or need additional functionality that the model deployed through Ollama doesn't provide, you can deploy your own containerized version of gpt-oss today where you can create your own Responses API server. If you want to use Ollama still but also have some of the Responses API functionality, solutions like Hugging Face's responses.js should be able to act as a proxy by mapping Responses API calls to chat completions for an Ollama-hosted gpt-oss. You can deploy all of these solutions on container apps today, just switch out the container/ingress port details in this post.