In this tutorial, we’ll walk through deploying the openai/gpt-oss-20B model on Azure’s Standard_NV36ads_A10_v5 GPU instances using vLLM for fast inference — all running on AKS via KAITO. By the end, you’ll have a public endpoint where you can send API requests in OpenAI’s format.
Updated Aug 15, 2025
Version 5.0