Blog Post

Educator Developer Blog
15 MIN READ

Fine-Tuning and Deploying Phi-3.5 Model with Azure and AI Toolkit

Sharda_Kaur's avatar
Sharda_Kaur
Brass Contributor
Jan 17, 2025

What is Phi-3.5?

Phi-3.5 as a state-of-the-art language model with strong multilingual capabilities. Emphasize that it is designed to handle multiple languages with high proficiency, making it a versatile tool for Natural Language Processing (NLP) tasks across different linguistic backgrounds.

Key Features of Phi-3.5

Highlight the core features of the Phi-3.5 model:

  1. Multilingual Capabilities:  Explain that the model supports a wide variety of languages, including major world languages such as English, Spanish, Chinese, French, and others. You can provide an example of its ability to handle a sentence or document translation from one language to another without losing context or meaning.
  2. Fine-Tuning Ability: Discuss how the model can be fine-tuned for specific use cases. For instance, in a customer support setting, the Phi-3.5 model can be fine-tuned to understand the nuances of different languages used by customers across the globe, improving response accuracy.
  3. High Performance in NLP Tasks: Phi-3.5 is optimized for tasks like text classification, machine translation, summarization, and more. It has superior performance in handling large-scale datasets and producing coherent, contextually correct language outputs.

Applications in Real-World Scenarios

To make this section more engaging, provide a few real-world applications where the Phi-3.5 model can be utilized:

  1. Customer Support Chatbots: For companies with global customer bases, the model’s multilingual support can enhance chatbot capabilities, allowing for real-time responses in a customer’s native language, no matter where they are located.
  2. Content Creation for Global Markets: Discuss how businesses can use Phi-3.5 to automatically generate or translate content for different regions. For example, marketing copy can be adapted to fit cultural and linguistic nuances in multiple languages.
  3. Document Summarization Across Languages: Highlight how the model can be used to summarize long documents or articles written in one language and then translate the summary into another language, improving access to information for non-native speakers.

 

Why Choose Phi-3.5 for Your Project?

End this section by emphasizing why someone should use Phi-3.5:

  • Versatility: It’s not limited to just one or two languages but performs well across many.
  • Customization: The ability to fine-tune it for particular use cases or industries makes it highly adaptable.
  • Ease of Deployment: With tools like Azure ML and Ollama, deploying Phi-3.5 in the cloud or locally is accessible even for smaller teams.

Objective Of Blog

Specialized Language Models (SLMs) are at the forefront of advancements in Natural Language Processing, offering fine-tuned, high-performance solutions for specific tasks and languages. Among these, the Phi-3.5 model has emerged as a powerful tool, excelling in its multilingual capabilities. Whether you're working with English, Spanish, Mandarin, or any other major world language, Phi-3.5 offers robust, reliable language processing that adapts to various real-world applications. This makes it an ideal choice for businesses looking to deploy multilingual chatbots, automate content generation, or translate customer interactions in real time. Moreover, its fine-tuning ability allows for customization, making Phi-3.5 versatile across industries and tasks. 

Customization and Fine-Tuning for Different Applications

The Phi-3.5 model is not just limited to general language understanding tasks. It can be fine-tuned for specific applications, industries, and language models, allowing users to tailor its performance to meet their needs.

  1. Customizable for Industry-Specific Use Cases: With fine-tuning, the model can be trained further on domain-specific data to handle particular use cases like legal document translation, medical records analysis, or technical support. Example: A healthcare company can fine-tune Phi-3.5 to understand medical terminology in multiple languages, enabling it to assist in processing patient records or generating multilingual health reports.
  2. Adapting for Specialized Tasks: You can train Phi-3.5 to perform specialized tasks like sentiment analysis, text summarization, or named entity recognition in specific languages. Fine-tuning helps enhance the model's ability to handle unique text formats or requirements. Example: A marketing team can fine-tune the model to analyse customer feedback in different languages to identify trends or sentiment across various regions. The model can quickly classify feedback as positive, negative, or neutral, even in less widely spoken languages like Arabic or Korean.

Applications in Real-World Scenarios

To illustrate the versatility of Phi-3.5, here are some real-world applications where this model excels, demonstrating its multilingual capabilities and customization potential:

Case Study 1: Multilingual Customer Support Chatbots

Many global companies rely on chatbots to handle customer queries in real-time. With Phi-3.5’s multilingual abilities, businesses can deploy a single model that understands and responds in multiple languages, cutting down on the need to create language-specific chatbots.

  • Example: A global airline can use Phi-3.5 to power its customer service bot. Passengers from different countries can inquire about their flight status or baggage policies in their native languages—whether it's Japanese, Hindi, or Portuguese—and the model responds accurately in the appropriate language.
Case Study 2: Multilingual Content Generation

Phi-3.5 is also useful for businesses that need to generate content in different languages. For example, marketing campaigns often require creating region-specific ads or blog posts in multiple languages. Phi-3.5 can help automate this process by generating localized content that is not just translated but adapted to fit the cultural context of the target audience. Example: An international cosmetics brand can use Phi-3.5 to automatically generate product descriptions for different regions. Instead of merely translating a product description from English to Spanish, the model can tailor the description to fit cultural expectations, using language that resonates with Spanish-speaking audiences.

Case Study 3: Document Translation and Summarization

Phi-3.5 can be used to translate or summarize complex documents across languages. Its ability to preserve meaning and context across languages makes it ideal for industries where accuracy is crucial, such as legal or academic fields. Example: A legal firm working on cross-border cases can use Phi-3.5 to translate contracts or legal briefs from German to English, ensuring the context and legal terminology are accurately preserved. It can also summarize lengthy documents in multiple languages, saving time for legal teams.

 

 

Fine-Tuning Phi-3.5 Model

Fine-tuning a language model like Phi-3.5 is a crucial step in adapting it to perform specific tasks or cater to specific domains. This section will walk through what fine-tuning is, its importance in NLP, and how to fine-tune the Phi-3.5 model using Azure Model Catalog for different languages and tasks. We'll also explore a code example and best practices for evaluating and validating the fine-tuned model.

What is Fine-Tuning?

Fine-tuning refers to the process of taking a pre-trained model and adapting it to a specific task or dataset by training it further on domain-specific data. In the context of NLP, fine-tuning is often required to ensure that the language model understands the nuances of a particular language, industry-specific terminology, or a specific use case.

Why Fine-Tuning is Necessary

Pre-trained Large Language Models (LLMs) are trained on diverse datasets and can handle various tasks like text summarization, generation, and question answering. However, they may not perform optimally in specialized domains without fine-tuning. The goal of fine-tuning is to enhance the model's performance on specific tasks by leveraging its prior knowledge while adapting it to new contexts.

Challenges of Fine-Tuning

  • Resource Intensiveness: Fine-tuning large models can be computationally expensive, requiring significant hardware resources.
  • Storage Costs: Each fine-tuned model can be large, leading to increased storage needs when deploying multiple models for different tasks.

LoRA and QLoRA

To address these challenges, techniques like LoRA (Low-rank Adaptation) and QLoRA (Quantized Low-rank Adaptation) have emerged. Both methods aim to make the fine-tuning process more efficient:

 

  1. LoRA: This technique reduces the number of trainable parameters by introducing low-rank matrices into the model while keeping the original model weights frozen. This approach minimizes memory usage and speeds up the fine-tuning process.

 

  1. QLoRA: An enhancement of LoRA, QLoRA incorporates quantization techniques to further reduce memory requirements and increase the efficiency of the fine-tuning process. It allows for the deployment of large models on consumer hardware without the extensive resource demands typically associated with full fine-tuning.
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments
from peft import get_peft_model, LoraConfig

# Load a pre-trained model
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

# Configure LoRA
lora_config = LoraConfig(
    r=16,  # Rank
    lora_alpha=32,
    lora_dropout=0.1,
)

# Wrap the model with LoRA
model = get_peft_model(model, lora_config)

# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
)

# Create a Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

# Start fine-tuning
trainer.train()

 

This code outlines how to set up a model for fine-tuning using LoRA, which can significantly reduce the resource requirements while still adapting the model effectively to specific tasks.

In summary, fine-tuning with methods like LoRA and QLoRA is essential for optimizing pre-trained models for specific applications in NLP, making it feasible to deploy these powerful models in various domains efficiently.

Why is Fine-Tuning Important in NLP?

  • Task-Specific Performance: Fine-tuning helps improve performance for tasks like text classification, machine translation, or sentiment analysis in specific domains (e.g., legal, healthcare).
  • Language-Specific Adaptation: Since models like Phi-3.5 are trained on general datasets, fine-tuning helps them handle industry-specific jargon or linguistic quirks.
  • Efficient Resource Utilization: Instead of training a model from scratch, fine-tuning leverages pre-trained knowledge, saving computational resources and time.

 

Steps to Fine-Tune Phi-3.5 in Azure AI Foundry

Fine-tuning the Phi-3.5 model in Azure AI Foundry involves several key steps. Azure provides a user-friendly interface to streamline model customization, allowing you to quickly configure, train, and deploy models.

Step 1: Setting Up the Environment in Azure AI Foundry

  1. Access Azure AI Foundry: Log in to Azure AI Foundry. If you don’t have an account, you can create one and set up a workspace.
  2. Create a New Experiment: Once in the Azure AI Foundry, create a new training experiment. Choose the Phi-3.5 model from the pre-trained models provided in the Azure Model Zoo.
  3. Set Up the Data for Fine-Tuning: Upload your custom dataset for fine-tuning. Ensure the dataset is in a compatible format (e.g., CSV, JSON). 

    For instance, if you are fine-tuning the model for a customer service chatbot, you could upload customer queries in different languages.

Step 2: Configure Fine-Tuning Settings

  1.         Select the Training Dataset:

    • Select the dataset you uploaded and link it to the Phi-3.5 model.

    2)    Configure the Hyperparameters:

    • Set up training hyperparameters like the number of epochs, learning rate, and batch size. You may need to experiment with these settings to achieve optimal performance.

    3)    Choose the Task Type:

    • Specify the task you are fine-tuning for, such as text classification, translation, or summarization. This helps Azure AI Foundry understand how to optimize the model during fine-tuning.

    4)    Fine-Tuning for Specific Languages:

    • If you are fine-tuning for a specific language or multilingual tasks, ensure that the dataset is labeled appropriately and contains enough examples in the target language(s). This will allow Phi-3.5 to learn language-specific features effectively.

Step 3: Train the Model

  1. Launch the Training Process: Once the configuration is complete, launch the training process in Azure AI Foundry. Depending on the size of your dataset and the complexity of the model, this could take some time.
  2. Monitor Training Progress: Use Azure AI Foundry’s built-in monitoring tools to track performance metrics such as loss, accuracy, and F1 score. You can view the model’s progress during training to ensure that it is learning effectively.

 

Code Example: Fine-Tuning Phi-3.5 for a Specific Use Case

Here's a code snippet for fine-tuning the Phi-3.5 model using Python and Azure AI Foundry SDK. In this example, we are fine-tuning the model for a customer support chatbot in multiple languages.

 

from azure.ai import Foundry
from azure.ai.model import Model

# Initialize Azure AI Foundry
foundry = Foundry()

# Load the Phi-3.5 model
model = Model.load("phi-3.5")

# Set up the training dataset
training_data = foundry.load_dataset("customer_queries_dataset")

# Fine-tune the model
model.fine_tune(training_data, epochs=5, learning_rate=0.001)

# Save the fine-tuned model
model.save("fine_tuned_phi_3.5")

 

Best Practices for Evaluating and Validating Fine-Tuned Models

Once the model is fine-tuned, it's essential to evaluate and validate its performance before deploying it in production.

  1. Split Data for Validation: Always split your dataset into training and validation sets. This ensures that the model is evaluated on unseen data to prevent overfitting.
  2. Evaluate Key Metrics: Measure performance using key metrics such as:
    • Accuracy: The proportion of correct predictions.
    • F1 Score: A measure of precision and recall.
    • Confusion Matrix: Helps visualize true vs. false predictions for classification tasks.
  1. Cross-Language Validation: If the model is fine-tuned for multiple languages, test its performance across all supported languages to ensure consistency and accuracy.
  2. Test in Production-Like Environments: Before full deployment, test the fine-tuned model in a production-like environment to catch any potential issues.
  3. Continuous Monitoring and Re-Fine-Tuning: Once deployed, continuously monitor the model’s performance and re-fine-tune it periodically as new data becomes available.
Deploying Phi-3.5 Model

After fine-tuning the Phi-3.5 model, the next crucial step is deploying it to make it accessible for real-world applications. This section will cover two key deployment strategies: deploying in Azure for cloud-based scaling and reliability, and deploying locally with AI Toolkit for simpler offline usage. Each deployment strategy offers its own advantages depending on the use case.

Deploying in Azure

Azure provides a powerful environment for deploying machine learning models at scale, enabling organizations to deploy models like Phi-3.5 with high availability, scalability, and robust security features. Azure AI Foundry simplifies the entire deployment pipeline.

  1. Set Up Azure AI Foundry Workspace:
  • Log in to Azure AI Foundry and navigate to the workspace where the Phi-3.5 model was fine-tuned.
  • Go to the Deployments section and create a new deployment environment for the model.
  1. Choose Compute Resources:
  • Compute Target: Select a compute target suitable for your deployment. For large-scale usage, it’s advisable to choose a GPU-based compute instance. Example: Choose an Azure Kubernetes Service (AKS) cluster for handling large-scale requests efficiently.
  • Configure Scaling Options: Azure allows you to set up auto-scaling based on traffic. This ensures that the model can handle surges in demand without affecting performance.
  1. Model Deployment Configuration:
  • Create an Inference Pipeline: In Azure AI Foundry, set up an inference pipeline for your model.
  • Specify the Model: Link the fine-tuned Phi-3.5 model to the deployment pipeline.
  • Deploy the Model: Select the option to deploy the model to the chosen compute resource.
  1. Test the Deployment:
  • Once the model is deployed, test the endpoint by sending sample requests to verify the predictions.
Configuration Steps (Compute, Resources, Scaling)

During deployment, Azure AI Foundry allows you to configure essential aspects like compute type, resource allocation, and scaling options.

  1. Compute Type: Choose between CPU or GPU clusters depending on the computational intensity of the model.
  2. Resource Allocation: Define the minimum and maximum resources to be allocated for the deployment.
  • For real-time applications, use Azure Kubernetes Service (AKS) for high availability.
  • For batch inference, Azure Container Instances (ACI) is suitable.
  1. Auto-Scaling: Set up automatic scaling of the compute instances based on the number of requests. For example, configure the deployment to start with 1 node and scale to 10 nodes during peak usage.

Cost Comparison: Phi-3.5 vs. Larger Language Models

When comparing the costs of using Phi-3.5 with larger language models (LLMs), several factors come into play, including computational resources, pricing structures, and performance efficiency. Here’s a breakdown:

Cost Efficiency Phi-3.5:

  1. Designed as a Small Language Model (SLM), Phi-3.5 is optimized for lower computational costs.
  2. It offers competitive performance at a fraction of the cost of larger models, making it suitable for budget-conscious projects.
  3. The smaller size (3.8 billion parameters) allows for reduced resource consumption during both training and inference.

Larger Language Models (e.g., GPT-3.5):

  1. Typically require more computational resources, leading to higher operational costs.
  2. Larger models may incur additional costs for storage and processing power, especially in cloud environments.
  3. Performance vs. Cost

Performance Parity:

  1. Phi-3.5 has been shown to achieve performance parity with larger models on various benchmarks, including language comprehension and reasoning tasks.
  2. This means that for many applications, Phi-3.5 can deliver similar results to larger models without the associated costs.

Use Case Suitability:

  1. For simpler tasks or applications that do not require extensive factual knowledge, Phi-3.5 is often the more cost-effective choice.
  2. Larger models may still be preferred for complex tasks requiring deep contextual understanding or extensive factual recall.
  3. Pricing Structure

Azure Pricing:

  1. Phi-3.5 is available through Azure with a pay-as-you-go billing model, allowing users to scale costs based on usage.
  2. Pricing details for Phi-3.5 can be found on the Azure pricing page, where users can customize options based on their needs.
Code Example: API Setup and Endpoints for Live Interaction

Below is a Python code snippet demonstrating how to interact with a deployed Phi-3.5 model via an API in Azure:

import requests

# Define the API endpoint and your API key
api_url = "https://<your-azure-endpoint>/predict"
api_key = "YOUR_API_KEY"

# Prepare the input data
input_data = {
    "text": "What are the benefits of renewable energy?"
}

# Make the API request
response = requests.post(api_url, json=input_data, headers={"Authorization": f"Bearer {api_key}"})

# Print the model's response
if response.status_code == 200:
    print("Model Response:", response.json())
else:
    print("Error:", response.status_code, response.text)

Deploying Locally with AI Toolkit

For developers who prefer to run models on their local machines, the AI Toolkit provides a convenient solution. The AI Toolkit is a lightweight platform that simplifies local deployment of AI models, allowing for offline usage, experimentation, and rapid prototyping. Deploying the Phi-3.5 model locally using the AI Toolkit is straightforward and can be used for personal projects, testing, or scenarios where cloud access is limited.

Introduction to AI Toolkit

The AI Toolkit is an easy-to-use platform for deploying language models locally without relying on cloud infrastructure. It supports a range of AI models and enables developers to work in a low-latency environment.

Advantages of deploying locally with AI Toolkit:
  1. Offline Capability: No need for continuous internet access.
  2. Quick Experimentation: Rapid prototyping and testing without the delays of cloud deployments.
Setup Guide: Installing and Running Phi-3.5 Locally Using AI Toolkit
  1. Install AI Toolkit: Go to the AI Toolkit website and download the platform for your operating system (Linux, macOS, or Windows).  Install AI Toolkit by running the appropriate installation command in your terminal.
  2. Download the Phi-3.5 Model: Once AI Toolkit is installed, you can download the Phi-3.5 model locally by running:  

3. Run the Model Locally: After downloading the model, start a local session by running:

This will launch a local server on your machine where the model will be available for interaction.

Code Example: Using Phi-3.5 Locally in a Project

Below is a Python code example demonstrating how to send a query to the locally deployed Phi-3.5 model running on the AI Toolkit.

import requests

# Define the local endpoint
local_url = "http://localhost:8000/predict"

# Prepare the input data
input_data = {
    "text": "What are the benefits of renewable energy?"
}

# Make the API request
response = requests.post(local_url, json=input_data)

# Print the model's response
if response.status_code == 200:
    print("Model Response:", response.json())
else:
    print("Error:", response.status_code, response.text)

 

Comparing Language Capabilities

Test Results: How Phi-3.5 Handles Different Languages

The Phi-3.5 model demonstrates robust multilingual capabilities, effectively processing and generating text in various languages. Below are comparative examples showcasing its performance in English, Spanish, and Mandarin:

English Example:

Input: "What are the benefits of renewable energy?"

Output: "Renewable energy sources, such as solar and wind, reduce greenhouse gas emissions and promote sustainability."

Spanish Example:

Input: "¿Cuáles son los beneficios de la energía renovable?"

Output: "Las fuentes de energía renovable, como la solar y la eólica, reducen las emisiones de gases de efecto invernadero y promueven la sostenibilidad."

Mandarin Example:

Input: "可再生能源的好处是什么?"

Output: "可再生能源,如太阳能和风能,减少温室气体排放,促进可持续发展。"

Performance Benchmarking and Evaluation Across Different Languages

Benchmarking Phi-3.5 across different languages involves evaluating its accuracy, fluency, and contextual understanding. For instance, using BLEU scores and human evaluations, the model can be assessed on its translation quality and coherence in various languages.

Real-World Use Case: Multilingual Customer Service Chatbot

A practical application of Phi-3.5's multilingual capabilities is in developing a customer service chatbot that can interact with users in their preferred language. For instance, the chatbot could provide support in English, Spanish, and Mandarin, ensuring a wider reach and better user experience.

 Optimizing and Validating Phi-3.5 Model

Model Performance Metrics

To validate the model's performance in different scenarios, consider the following metrics:

  1. Accuracy: Measure how often the model's outputs are correct or align with expected results.
  2. Fluency: Assess the naturalness and readability of the generated text.
  3. Contextual Understanding: Evaluate how well the model understands and responds to context-specific queries.

Tools to Use in Azure and Ollama for Evaluation

  1. Azure Cognitive Services: Utilize tools like Text Analytics and Translator to evaluate performance.
  2. Ollama: Use local testing environments to quickly iterate and validate model outputs.

Conclusion

In summary, Phi-3.5 exhibits impressive multilingual capabilities, effective deployment options, and robust performance metrics. Its ability to handle various languages makes it a versatile tool for natural language processing applications. Phi-3.5 stands out for its adaptability and performance in multilingual contexts, making it an excellent choice for future NLP projects, especially those requiring diverse language support.

We encourage readers to experiment with the Phi-3.5 model using Azure AI Foundry or the AI Toolkit, explore fine-tuning techniques for their specific use cases, and share their findings with the community. For more information on optimized fine-tuning techniques, check out the Ignite Fine-Tuning Workshop.

References

  1. Customize the Phi-3.5 family of models with LoRA fine-tuning in Azure Fine-tune Phi-3.5 models in Azure
  2. Fine Tuning with Azure AI Foundry and Microsoft Olive Hands on Labs and Workshop
  3. Customize a model with fine-tuning https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/fine-tuning?tabs=azure-openai%2Cturbo%2Cpython-new&pivots=programming-language-studio
  4. Microsoft AI Toolkit - AI Toolkit for VSCode
Updated Jan 16, 2025
Version 1.0
  • ventura's avatar
    ventura
    Copper Contributor

    Thanks, is this solution the same for other language models?

    • Sharda_Kaur's avatar
      Sharda_Kaur
      Brass Contributor

      Yes, the solution for fine-tuning and deploying the Phi 3.5 model can be applied to other language models, but there may be some differences in steps and requirements. Always check the specific documentation for the model you are using to ensure compatibility.