Introducing Enhanced Azure OpenAI Distillation and Fine-Tuning Capabilities

Former Employee

Jan 30, 2025

As we continue to push the boundaries of AI capabilities, we are excited to announce significant updates to our Azure OpenAI Service, specifically focused on enhancing our distillation and fine-tuning features. Following our recent announcement on the public preview of distillation in Azure OpenAI Service, we're releasing a compare experience for evaluation and expanding our model and region coverage for stored completions -- making distillation easier than ever! We are also providing more deployment types in our evaluation offering. These updates aim to provide more robust, flexible, and efficient AI solutions to meet diverse business needs.

Overview of Distillation in Azure OpenAI Service

Azure OpenAI Service distillation involves three main components:

Stored Completions: Easily generate datasets for distillation by capturing and storing input-output pairs from models like GPT-4o through our API. This allows you to build datasets with your production data for evaluating and fine-tuning models.
Evaluation: Create and run custom evaluations to measure model performance on specific tasks. Azure OpenAI Evaluation provides an integrated way to measure performance, using data from Stored Completions or existing datasets. Azure OpenAI Evaluation can also be utilized on its own to assess model performance for your specific use cases.
Fine-tuning: Stored Completions and Azure OpenAI Evaluation are fully integrated with Azure OpenAI fine-tuning. Use datasets created with Stored Completions in your fine-tuning jobs and run evaluations on fine-tuned models using Azure OpenAI Evaluations.

Together, these steps create a comprehensive distillation process: collecting live traffic from Azure OpenAI endpoints, filtering and subsetting that traffic in the Stored Completions UI, exporting it to the Evaluation UI for quality scoring, and finally fine-tuning from the collected data or a subset based on evaluation scoring.

New regions and new models for Stored Completions

We are expanding our Stored Completions feature to more regions. In addition to Sweden Central, we have now enabled North Central US and East US2 regions as well. We have also added more models to choose from for the data capture part. In addition to GPT-40-0806, we now support o1-preview, o1-mini, GPT-4o-mini, and GPT-4o-0513 models in the Sweden Central region. Our documentation will provide the latest updates on supported models and regions.

Supported Regions and Models

	Models
Regions	GPT-40-0806	GPT-4o-0513	GPT-4o-mini	o1-preview	o1-mini
Sweden Central	X	X	X	X	X
North Central US	X
East US2	X

Enhanced Evaluation UI

Businesses often need to assess the performance of distilled models and compare them with base teacher models in production to determine the best model for their use case. The comprehensive evaluation feature simplifies this process, accelerating experimentation and decision-making.

Imagine you're a financial institution using AI models to summarize large volumes of financial reports in real-time. You rely on GPT-4o-0806 in production but are exploring whether GPT-4o-mini can maintain high accuracy while reducing processing time and compute costs. With the run and compare feature in Evaluation, you can set up multiple test criteria—such as Rouge, BlEU and semantic similarity—and compare all models side by side in a unified interface.

Once the group evaluation run is completed, a comprehensive report view provides side-by-side comparisons of all model runs. You can add new evaluations over time, track improvements, and use the data tab for a granular view of how each model performs across different transaction types. If you find that GPT-4o-mini delivers nearly identical accuracy while cutting inference costs, you can confidently transition to the smaller model, optimizing both security and operational efficiency.

By making model evaluations faster, more structured, and data-driven, this feature empowers businesses to continuously refine their AI strategy and maximize value.

Finally, we are thrilled to announce that we have now integrated more supported deployment types into our evaluation experience. The supported deployment types include: Standard, Global standard, Data zone standard, Provisioned, Global provisioned and Data zone provisioned.

These new features in Azure OpenAI Service fine-tuning demonstrate our commitment to providing robust, flexible, and efficient AI solutions. We invite you to explore these new features and take your AI projects to the next level.