Blog Post

Azure AI Foundry Blog
2 MIN READ

Deploying GPT-OSS as Azure ML Online Endpoint

maljazaery's avatar
maljazaery
Icon for Microsoft rankMicrosoft
Sep 10, 2025
As AI workloads grow in complexity and scale, deploying large language models like GPT-OSS efficiently becomes critical. In this post, we’ll walk through how to deploy GPT-OSS using Azure Machine Learning (Azure ML) Online Endpoint on a managed compute (NV-A10 & NC-H100) —leveraging a streamlined, script-driven approach.

❓Why Azure ML Online Endpoints?


Azure ML online endpoints provide a fully managed, scalable, and secure way to serve models like GPT-OSS. They support production-grade features like blue-green deployments for safe rollouts and traffic mirroring to test new versions without impacting live traffic. With built-in autoscaling, authentication, monitoring, and seamless REST API integration, they’re ideal for deploying large models on managed compute with minimal operational overhead.

🧰 What You’ll Need

Before diving in, make sure you have the following:

  • Azure CLI installed and authenticated
  • An Azure ML workspace set up
  • Contributor or Owner permissions on your Azure subscription
  • GPU Quota in Azure ML studio
# Install Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

# Login to Azure
az login

📁 Clone the code

git clone https://github.com/maljazaery/AzureML_LLM_Endpoint_Deployment_Script.git

The repo includes one env example and 2 deployment configs:

  • AML_env/gpt_oss/: Dockerfile and environment setup for GPT-OSS
  • configs/gpt_oss/config_a10.conf: Sample config for NV-A10 GPU
  • configs/gpt_oss/config_h100.conf: Sample config for NC-H100 GPU

⚙️ Configuration

Create a config.conf file tailored to your Azure environment ( check "configs" folder for examples):

# Azure subscription & workspace settings
AZ_SUBSCRIPTION_ID="your-subscription-id"
AZ_RESOURCE_GROUP="your-resource-group"
AZ_ML_WORKSPACE="your-workspace-name"

# Endpoint and deployment settings
AZ_ENDPOINT_NAME="gptoss-endpoint-h100"
AZ_INSTANCE_TYPE="Standard_NC40ads_H100_v5"
# ... other settings

 

🚦 Deployment Options

Option 1: Full Automated Deployment

chmod +x deploy-main.sh

./deploy-main.sh config.conf

 

Option 2: Step-by-Step Deployment

# Create environment only
./deploy-main.sh --env-only config.conf

# Create endpoint and deployment
./deploy-main.sh --endpoint-only config.conf

 

🧪 Testing the Endpoint

Using curl

az ml online-endpoint get-credentials --name your-endpoint-name

curl -X POST "https://your-endpoint.region.inference.ml.azure.com" \
  -H "Authorization: Bearer <your-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-oss-20b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 100
  }'

 

Using OpenAI SDK

from openai import OpenAI

client = OpenAI(
    baseurl="https://your-endpoint.region.inference.ml.azure.com",
    apikey="your-key"
)

result = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"}
    ]
)

print(result.choices[0].message)

 

📊 Monitoring & Management

# List endpoints
az ml online-endpoint list

# Show endpoint details
az ml online-endpoint show --name your-endpoint-name

# View deployment logs
az ml online-deployment get-logs --name current --endpoint-name your-endpoint-name --lines 100 --follow

 

Updated Sep 11, 2025
Version 2.0
No CommentsBe the first to comment