Deploying OpenAI’s First Open-Source Model on Azure AKS with KAITO

maljazaery

Microsoft

Aug 15, 2025

In this tutorial, we’ll walk through deploying the openai/gpt-oss-20B model on Azure’s Standard_NV36ads_A10_v5 GPU instances using vLLM for fast inference — all running on AKS via KAITO. By the end, you’ll have a public endpoint where you can send API requests in OpenAI’s format.

Special Thanks Thanks to Andrew Thomas, Kurt Niebuhr, and Sachi Desai for their invaluable support, insightful discussions, and for providing the compute resources that made testing this deployment ...

Updated Aug 15, 2025

Version 5.0

maljazaery

Microsoft

Joined May 07, 2024

View Profile

Microsoft Foundry Blog

Follow this blog board to get notified when there's new activity

Blog Post

Deploying OpenAI’s First Open-Source Model on Azure AKS with KAITO

In this tutorial, we’ll walk through deploying the openai/gpt-oss-20B model on Azure’s Standard_NV36ads_A10_v5 GPU instances using vLLM for fast inference — all running on AKS via KAITO. By the end, you’ll have a public endpoint where you can send API requests in OpenAI’s format.