o4-mini Reinforcement Fine-tuning (RFT) Now Generally Available on Azure AI Foundry

Microsoft

Sep 10, 2025

At Build 2025, we announced Reinforcement Fine-tuning for o4-mini. Today, we are happy to promote RFT with o4-mini to General Availability on Azure AI Foundry. Customers can meaningfully fine-tune o4-mini starting with just 100 samples and a wide variety of graders! We have also introduced Custom Python Code as a grader which can be included as part of the finetuning job on both UI and SDK for predictable tasks. Learn more on how to use custom code grader here.

Based on our learnings over the last few months, we are sharing some key suggestions on choosing the right fine-tuning technique as part of this blog.

SFT vs. RFT- Choosing the Right Fine-Tuning Technique

When fine-tuning a large language model, it's crucial to select the method that best aligns with your project's goals. The two primary approaches are Supervised Fine-Tuning (SFT) and Reinforcement Fine-Tuning (RFT). While both methods aim to improve a model's performance, they do so through fundamentally different mechanisms.

Supervised Fine-Tuning (SFT)

SFT involves training a model on a dataset of pre-defined prompt-and-answer pairs. The model learns to replicate the specific patterns and outputs found in this data.

How it Works: You provide the model with a fixed set of examples, and it learns to produce the desired output for a given input. It's a "learn by example" approach.
Best Use Cases: SFT is ideal for tasks requiring a specific output format or simple classification. This includes making the model adhere to a custom coding style or generating responses that fit a particular structure like concise summary creation.
Data Requirements: This method is data-intensive, as it requires manually labeled outputs to build the training dataset – Typically 1000s of examples. Data format is typically chat completion pairs.

Reinforcement Fine-Tuning (RFT)

RFT takes a different approach, moving away from fixed answers and towards a system of continuous feedback. Instead of providing the correct answer, you provide a "grader" which is a rubric or set of rules that scores the model's responses based on a metric like accuracy or compliance.

How it Works: The model explores various possible solutions, and a grader evaluates each response, providing a score. The model then learns and improves based on these scores, effectively learning to reason and achieve better outcomes. We support a variety of graders, choose the right one for your task.
Best Use Cases: RFT excels in tasks that require reasoning and adherence to complex rules, such as legal reasoning, medical workflows, or ensuring policy compliance.
Customer Story: DraftWise, a legal tech startup, used reinforcement fine-tuning (RFT) in Azure AI Foundry Models to enhance the performance of reasoning models tailored for contract generation and review.
Data Requirements: RFT is more data-efficient than SFT. You only need a small number of examples (tens to hundreds) to begin with. The focus is on creating a robust grading system rather than manually labeling every possible output. The data format is called out here.

The Core Difference

The fundamental distinction lies in the feedback mechanism. SFT learns from fixed, static examples, while RFT learns from a dynamic, continuous feedback signal provided by a grader. This allows RFT to pull more signal from a single example, making it a powerful tool for complex, reasoning-based tasks.

Analogy 😊:
Another way of thinking about it is in SFT you give questions and answers to the child together and child makes it connections and will be tested during evaluation. In RFT you just give child the questions and score each response by the child as 1 or 0 or something in between depending on relative proximity to the answer and over time child makes its own connections (reinforces the learning based on scores) and learns eventually.

Getting Started with RFT with o4-mini

Use Reinforcement Fine-tuning with o4-mini to build reasoning engines that learn from experience and evolve over time. Now in Azure AI Foundry, with regional availability for East US2 and Sweden Central.

In this demo video, Liam Cavanaugh, Data & AI Specialist, dives into the applications for RFT with o4-mini.

Azure AI Foundry is your foundation for enterprise-grade AI tuning. This fine-tuning progress unlocks new capabilities in model customization, helping you build intelligent systems that think and respond in ways that reflect your business DNA.

Happy Fine-tuning!

Learn More

👩‍💻 Get Started with Azure OpenAI o4-mini RFT with Microsoft Learn Docs

🧠 Try out this Custom Code grader working example

▶️ RSVP for the next Model Monday LIVE on YouTube or On-Demand

👋 Continue the conversation on Discord

Updated Sep 10, 2025

Version 1.0

NandiniMuralidharan

Microsoft

Joined November 15, 2024

View Profile