As AI workloads grow in complexity and scale, deploying large language models like GPT-OSS efficiently becomes critical. In this post, we’ll walk through how to deploy GPT-OSS using Azure Machine Learning (Azure ML) Online Endpoint on a managed compute (NV-A10 & NC-H100) —leveraging a streamlined, script-driven approach.
❓Why Azure ML Online Endpoints?
Azure ML online endpoints provide a fully managed, scalable, and secure way to serve models like GPT-OSS. They support production-grade features like blue-green deployments for safe rollouts and traffic mirroring to test new versions without impacting live traffic. With built-in autoscaling, authentication, monitoring, and seamless REST API integration, they’re ideal for deploying large models on managed compute with minimal operational overhead.
🧰 What You’ll Need
Before diving in, make sure you have the following:
- Azure CLI installed and authenticated
- An Azure ML workspace set up
- Contributor or Owner permissions on your Azure subscription
- GPU Quota in Azure ML studio
# Install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash
# Login to Azure
az login
📁 Clone the code
git clone https://github.com/maljazaery/AzureML_LLM_Endpoint_Deployment_Script.git
The repo includes one env example and 2 deployment configs:
- AML_env/gpt_oss/: Dockerfile and environment setup for GPT-OSS
- configs/gpt_oss/config_a10.conf: Sample config for NV-A10 GPU
- configs/gpt_oss/config_h100.conf: Sample config for NC-H100 GPU
⚙️ Configuration
Create a config.conf file tailored to your Azure environment ( check "configs" folder for examples):
# Azure subscription & workspace settings
AZ_SUBSCRIPTION_ID="your-subscription-id"
AZ_RESOURCE_GROUP="your-resource-group"
AZ_ML_WORKSPACE="your-workspace-name"
# Endpoint and deployment settings
AZ_ENDPOINT_NAME="gptoss-endpoint-h100"
AZ_INSTANCE_TYPE="Standard_NC40ads_H100_v5"
# ... other settings
🚦 Deployment Options
Option 1: Full Automated Deployment
chmod +x deploy-main.sh
./deploy-main.sh config.conf
Option 2: Step-by-Step Deployment
# Create environment only
./deploy-main.sh --env-only config.conf
# Create endpoint and deployment
./deploy-main.sh --endpoint-only config.conf
🧪 Testing the Endpoint
Using curl
az ml online-endpoint get-credentials --name your-endpoint-name
curl -X POST "https://your-endpoint.region.inference.ml.azure.com" \
-H "Authorization: Bearer <your-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-oss-20b",
"messages": [{"role": "user", "content": "Hello!"}],
"max_tokens": 100
}'
Using OpenAI SDK
from openai import OpenAI
client = OpenAI(
baseurl="https://your-endpoint.region.inference.ml.azure.com",
apikey="your-key"
)
result = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
)
print(result.choices[0].message)
📊 Monitoring & Management
# List endpoints
az ml online-endpoint list
# Show endpoint details
az ml online-endpoint show --name your-endpoint-name
# View deployment logs
az ml online-deployment get-logs --name current --endpoint-name your-endpoint-name --lines 100 --follow
Updated Sep 11, 2025
Version 2.0maljazaery
Microsoft
Joined May 07, 2024
Azure AI Foundry Blog
Follow this blog board to get notified when there's new activity