Blog Post

AI - Azure AI services Blog
4 MIN READ

Customize the Phi-3.5 family of models with LoRA fine-tuning in Azure

martincai's avatar
martincai
Microsoft
Oct 31, 2024

The Phi model collection represents the latest advancement in Microsoft's series of Small Language Models (SLMs). Back in August 2024, we welcomed the latest additions, Phi-3.5-mini and Phi-3.5-MoE, a Mixture-of-Experts (MoE) model:

  • Phi-3.5-mini: This 3.8B parameter model enhances multi-lingual support, reasoning capability, and offers an extended context length of 128K tokens.
  • Phi-3.5-MoE: Featuring 16 experts and 6.6B active parameters, this model delivers high performance, reduced latency, multi-lingual support, and robust safety measures, surpassing the capabilities of larger models while maintaining the efficacy of the Phi models.

 

From Generalist to Custom SLMs

The results from our benchmarks underscore the remarkable efficiency and capability of Phi-3.5-mini and Phi-3.5-MOE. But even so, the models can be further customized for your unique needs to match the performance of larger models for a given task.

There are three powerful techniques that can be used to customize language models for your organization's specific needs:

  1. Prompt Engineering
  2. Retrieval Augmented Generation (RAG)
  3. Fine-tuning

Let's delve into each of these techniques.

 

Prompt Engineering is about providing clear instructions directly within the prompt, often in the system prompt, to guide the model's responses. This method falls under the category of "giving more information in the prompt" and can be particularly useful for shaping the model's behavior and output format.

 

Next, we have Retrieval Augmented Generation, or RAG. This technique is employed when you want to incorporate organizational data and knowledge into the model's responses. RAG allows you to provide the model with reliable sources for answers through additional documents. It retrieves the relevant information and augments the prompt, enhancing the model's ability to generate informed and contextually accurate responses. RAG also belongs to the category of "giving more information in the prompt".

 

Fine-tuning is the process of customizing a model using labelled training data, which often leads to better performance and reduced computational resources. In fact, fine-tuning a smaller model with the appropriate training data can lead to its performance exceeding a larger model for a specific task. Specifically, Low-Rank Adaptation (LoRA) fine-tuning is an excellent approach for adapting language models to specific use cases due to several key advantages. First, LoRA significantly reduces the number of trainable parameters, making the fine-tuning process more efficient, saving time and cost. This reduced demand on resource allows for quicker iterations, making LoRA easier to experiment with fine-tuning tasks. LoRA keeps the original model weights mostly unchanged, which helps in maintaining the general capabilities of the pre-trained model while adapting it to specific tasks.

 

LoRA fine-tuning for Phi-3.5 models

Today, we are proud to announce the availability for LoRA fine-tuning the Phi-3.5-mini and Phi-3.5-MoE models in Azure AI starting November 1, 2024.

 

Serverless fine-tuning for Phi-3.5-mini and Phi-3.5-MOE models enables developers to quickly and easily customize the models for cloud scenarios without having to manage compute. The fine-tuning experience is available in Azure AI Studio, and it adopts a pay-as-you-go approach, ensuring you only pay for the actual training time your fine-tuning requires.

 

 

Once finetuned, the models can be deployed in Azure for inference, with the option to enable Content Safety. Deploying your fine-tuned model is a streamlined process with our pay-as-you-go service. The billing for fine-tuned model deployments is based on the number of input and output tokens used, along with a nominal hosting charge for maintaining the fine-tuned model. Once deployed, you can integrate your fine-tuned model with leading LLM tools like prompt flow, LangChain, and Semantic Kernel, enhancing your AI capabilities effortlessly.

 

Fine-tuning with the Managed Compute option is also available. You may use the Azure Machine Learning Studio user interface or follow our notebook example to create your custom model. Using the notebook has the advantage of greater flexibility for the configurations used by the fine-tuning job. You have the option to download the fine-tuned model, and then deploy it using Azure managed compute resources, your own premises, or your edge devices. The fine-tuned model is also licensed under MIT license.

 

Closing remark

The Phi-3.5 family of models represents a significant advancement in the realm of SLMs. With the introduction of Phi-3.5-mini and Phi-3.5-MoE, we have pushed the boundaries of performance, efficiency, and customization. The availability of LoRA fine-tuning in Azure AI further empowers developers to tailor these models to their specific needs, ensuring optimal performance for a wide range of applications. As we continue to innovate and refine our models, we remain committed to providing cutting-edge solutions that drive progress and enhance user experiences. Thank you for joining us on this journey, and we look forward to seeing the incredible things you will achieve with the Phi-3.5 models.

Updated Oct 25, 2024
Version 1.0