OpenAI recently introduced gpt-oss as an open-weight language model that delivers strong real-world performance at low cost. Available under the flexible Apache 2.0 license, these models outperform similarly sized open models on reasoning tasks, demonstrate strong tool use capabilities, and are optimized for efficient deployment on consumer hardware; see the announcement: https://openai.com/index/introducing-gpt-oss/.
It’s an excellent choice for scenarios where you want the security and efficiency of a smaller model running on your application instance — while still getting impressive reasoning capabilities.
By hosting it on Azure App Service, you can take advantage of enterprise-grade features without worrying about managing infrastructure:
- Built-in autoscaling
- Integration with VNet
- Enterprise-grade security and compliance
- Easy CI/CD integration
- Choice of deployment methods
In this post, we’ll walk through a complete sample that uses gpt-oss-20b as a sidecar container running alongside a Python Flask app on Azure App Service.
All the source code and Bicep templates are available here:
📂 Azure-Samples/appservice-ai-samples/gpt-oss-20b-sample
Architecture of our sample at a glance
- Web app (Flask) runs as a code-based App Service.
- Model runs in a sidecar container (Ollama) in the same App Service.
- The Flask app calls the model over localhost:11434.
- Bicep provisions the Web App and an Azure Container Registry (ACR). You push your model image to ACR and attach it as a sidecar in the Portal.
1. Wrapping gpt-oss-20b in a Container
Code location:
/gpt-oss-20b-sample/ollama-image in the sample repo: https://github.com/Azure-Samples/appservice-ai-samples/tree/main/gpt-oss-20b-sample/ollama-image.
What this image does (at a glance)
- Starts the Ollama server
- Pulls the gpt-oss:20b model on first run
- Exposes port 11434 for the Flask app to call locally
Dockerfile:
FROM ollama/ollama
EXPOSE 11434
COPY startup.sh /
RUN chmod +x /startup.sh
ENTRYPOINT ["./startup.sh"]
startup.sh
# Start Ollama in the background
ollama serve &
sleep 5
# Pull and run gpt-oss:20b
ollama pull gpt-oss:20b
# Restart ollama and run it in the foreground
pkill -f "ollama"
ollama serve
Build the image
Choose one of the two common paths:
A. Build locally with Docker
From the ollama-image folder:
# 1) (optional) pick a registry/image name up-front
ACR_NAME=<your-acr-name> # e.g., myacr123
IMAGE=ollama-gpt-oss:20b
# 2) build locally
docker build -t $IMAGE .
If you’re new to building images, see Docker’s build docs for options and examples.
B. Build in Azure (no local Docker required) with ACR Tasks
Run a cloud build directly from the repo or your working directory:
ACR_NAME=<your-acr-name>
az acr build \
--registry $ACR_NAME \
--image ollama-gpt-oss:20b \
./gpt-oss-20b-sample/ollama-image
ACR Tasks build the image in Azure and push it straight into your registry.
Push the image to Azure Container Registry (ACR)
If you built locally, tag and push to your ACR:
# login (CLI recommended)
az acr login --name $ACR_NAME
# tag and push (note: all-lowercase FQDN)
docker tag ollama-gpt-oss:20b $ACR_NAME.azurecr.io/ollama-gpt-oss:20b
docker push $ACR_NAME.azurecr.io/ollama-gpt-oss:20b
Full “push/pull with Docker CLI” quickstart is here if you need it.
2. The Flask Application
Our main app is a simple Python Flask service that connects to the model running in the sidecar.
Since the sidecar shares the same network namespace as the main app, we can call it at http://localhost:11434.
OLLAMA_HOST = "http://localhost:11434"
MODEL_NAME = "gpt-oss:20b"
@app.route("/chat", methods=["POST"])
def chat():
data = request.get_json()
prompt = data.get("prompt", "")
payload = {
"model": MODEL_NAME,
"messages": [{"role": "user", "content": prompt}],
"stream": True
}
def generate():
with requests.post(f"{OLLAMA_HOST}/api/chat", json=payload, stream=True) as r:
for line in r.iter_lines(decode_unicode=True):
if line:
event = json.loads(line)
if "message" in event:
yield event["message"]["content"]
return Response(generate(), mimetype="text/plain")
This allows your app to stream responses back to the browser in real-time — giving a chat-like experience.
3. Deploying to Azure App Service
Code location:
/gpt-oss-20b-sample/flask-app in the sample repo: https://github.com/Azure-Samples/appservice-ai-samples/tree/main/gpt-oss-20b-sample/flask-app
You can deploy the Flask app using your preferred method — VS Code, GitHub Actions, az webapp up, or via Bicep.
We’ve included a Bicep template that sets up:
- An Azure Container Registry for your sidecar image
- An Azure Web App running on Premium V4 for best performance and cost efficiency
🔗 Azure App Service Premium V4 now in Public Preview
If you want to use the azd template, pull down the repo and run these commands from the folder.
azd init
azd up
Open the Web App in Azure Portal and add a sidecar:
- How-to: https://learn.microsoft.com/azure/app-service/configure-sidecar
- Choose your ACR image (the one you created in Step 1), set port to 11434
First startup note: the sidecar downloads the gpt-oss-20b model on first run; cold start will take time. Subsequent restarts will be faster because the model layers will not need to be pulled down.
Try it, then open your site—it’s a chat UI backed by gpt-oss-20b running locally as a sidecar on Azure App Service.
Conclusion
With GPT-OSS-20B running as a sidecar on Azure App Service, you get the best of both worlds — the flexibility of open-source models and the reliability, scalability, and security of a fully managed platform. This setup makes it easy to integrate AI capabilities into your applications without having to provision or manage custom infrastructure.
Whether you’re building a lightweight chat experience, prototyping a new AI-powered feature, or experimenting with domain-specific fine-tuning, this approach provides a robust foundation. You can scale your application based on demand, swap out models as needed, and take advantage of the full Azure ecosystem for networking, observability, and deployment automation.
Next Steps & Resources
Here are some useful resources to help you go further:
- 📂 Sample Code & Templates – gpt-oss-20b Sample Repository
- 📖 About GPT-OSS – Introducing gpt-oss (OpenAI blog)
- 🛠 Deploying Sidecars – Configure Sidecars in Azure App Service
- 🚀 Premium V4 Plan – Azure App Service Premium V4 announcement
- 📦 Pushing Images to ACR – Push and pull container images in Azure Container Registry
- 💡 Advanced AI Patterns – Build RAG solutions with Azure AI Search