Exploring Microsoft's Phi-3 Family of Small Language Models (SLMs) with Azure AI
Published May 09 2024 12:00 PM 13.2K Views

Microsoft's Phi-3 family of small language models (SLMs) has been gaining a lot of attention recently, and for good reason. These SLMs are powerful yet lightweight and efficient, making them perfect for applications with limited computational resources.

In this blog post, we will explore how to interact with Microsoft's Phi-3 models using Azure AI services and the Model catalog. We'll dive into the process of deploying and integrating these models into real-world applications, as well as practical exercises to solidify your understanding of the technology.

We'll also create our own chatbot interface powered by Phi-3 using Gradio. This will allow you to interact with the model in a user-friendly way, helping you gain confidence in deploying and integrating AI into applications.

If you're interested in learning more about Microsoft's Phi-3 models and how to use them with Azure AI services, keep reading!

Step 1: Set Up Your Azure Account

Before you dive into using Phi-3, you’ll need to set up an Azure account if you don’t already have one. Visit the Azure website and follow the sign-up instructions. All students get Azure for Student with $100 of credit simply register at http://aka.ms/azure4student 

Step 2: Access the Azure AI Model Catalog

Once your account is set up, navigate to the Azure AI Model Catalog where you’ll find the Phi-3 model(s) listed. You can also browse more than 1500+ frontier and open models from LLM providers including: HuggingFace, Meta, Mistral, Cohere and many more.



Step 3: How to deploy large language models with Azure AI Studio and Deploy to an online managed endpoint


Deploying a large language model (LLM) makes it available for use in a website, an application, or other production environments. This typically involves hosting the model on a server or in the cloud, and creating an API or other interface for users to interact with the model. You can invoke the deployment for real-time inference for chat, copilot, or another generative AI application.

Deploy Open Models to Azure AI Studio


Follow the steps below to deploy an open model such as distilbert-base-cased to a real-time endpoint in Azure AI Studio.

  1. Choose a model you want to deploy from the Azure AI Studio model catalog. Alternatively, you can initiate deployment by selecting + Create from your project>deployments

  2. Select Deploy to project on the model card details page.

  3. Choose the project you want to deploy the model to.

  4. Select Deploy.

  5. You land on the deployment details page. Select Consume to obtain code samples that can be used to consume the deployed model in your application.

You can use the Azure AI Generative SDK to deploy an open model. In this example, you deploy a distilbert-base-cased model.


#Import the libraries
from azure.ai.resources.client import AIClient
from azure.ai.resources.entities.deployment import Deployment
from azure.ai.resources.entities.models import PromptflowModel
from azure.identity import DefaultAzureCredential


Credential info can be found under your project settings on Azure AI Studio. You can go to Settings by selecting the gear icon on the bottom of the left navigation UI.


credential = DefaultAzureCredential()
client = AIClient(


Define the model and the deployment. The model_id can be found on the model card on Azure AI Studio model catalog.


#Select the model you want simply #comment out the model not required
model_id = "azureml://registries/azureml/models/Phi-3-mini-4k-instruct/versions/5"
#Azure Phi-3-mini-128k
#model_id = "azureml://registries/azureml/models/Phi-3-mini-128k-instruct/versions/5"
deployment_name = "Enter Your Deployment Name"

deployment = Deployment(


Deploy the model. You can deploy to a real-time endpoint from here directly! Optionally, you can use the Azure AI Generative AI SDK to deploy any model from the model catalog.




Delete the deployment endpoint

Deleting deployments and its associated endpoint isn't supported via the Azure AI SDK. To delete deployments in Azure AI Studio, select the Delete button on the top panel of the deployment details page.

Quota considerations

Deploying and inferencing with real-time endpoints can be done by consuming Virtual Machine (VM) core quota that is assigned to your subscription a per-region basis. When you sign up for Azure AI Studio, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can request for quota increase.

Recommended virtual machine skus: within Azure AI Studio to run Phi-3 

Smallest VM $
Standard_NC6s_v3  *This is the max size for Azure for Students*

Largest $$$

NOTE: If your a student you can run a single VM with a maximum of 6GPUs so please select the Standard_NC6s_v3 phi-3-mini-instruct is small enough to run on a local device, so smaller VMs will work for this demo so save your credit or costs.


3. Setting up your Python Environment Make sure you have the following prerequisites:

  • An Azure Machine Learning workspace
  • The requirements.txt with the following Python Libraries which are required to be installed.
  • Note: You can install the libraries into your existing environment by using pip install 


pip install -r requirements.txt


  • An instance of the Phi-3 model deployed to an Online Endpoint
    Note: You can immediately start using the Phi-3 model through an Azure Managed Online Endpoint. Azure Managed Online Endpoints allow you to deploy your models as a web service easily.

4. Getting Started on your code 


#Importing Required Libraries
#We are importing the necessary libraries. MLClient is the main class that we use to #interact with Azure AI. DefaultAzureCredential and InteractiveBrowserCredential are used #for authentication purposes. The os library is used to access environment variables.

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
import os​


Next, we will set up the credentials to authenticate with Azure. We first try to use the DefaultAzureCredential. If that fails (for example, if we are running the code on a machine that is not logged into Azure), we fall back to using InteractiveBrowserCredential, which will prompt the user to log in.


#Setting Up Credentials
#Here, we are setting up the credentials to authenticate with Azure. 
#We first try to use the DefaultAzureCredential. If that fails, we fall back to using
#InteractiveBrowserCredential, which will prompt us to log in.

    credential = DefaultAzureCredential()
except Exception as ex:
    credential = InteractiveBrowserCredential()


Finally, we create an MLClient for our Azure AI workspace. We use environment variables to get the subscription ID, resource group name, and workspace name.


#Creating an MLClient for Workspace
#In this cell, we create an MLClient for our Azure AI workspace. 
#We use environment variables to get the subscription ID, 
#resource group name, 
#and workspace name.

workspace_ml_client = MLClient(


Loading and using other data sets with Phi-3


#Loading & using Dataset
#Experimenting with a dataset
#Step 1. Import the necessary python library we recommend pandas

import pandas as pd
from datasets import load_dataset

#Step 2. Load in your dataset 
#example: we are using the ultrachat_200k dataset from Hugging Face 
#select the test_sft split. 
#Note:You can use other dataset simply replace the location and test_sft below

dataset = load_dataset("HuggingFaceH4/ultrachat_200k")["test_sft"]

#Step 3.Convert the dataset into a pandas DataFrame 
#Cleaning Data view: we want to make the data cleaner so we can additional drop columns. #Example: We want drop 'prompt_id' and the 'messages' columns. These columns are not needed for our current task.

df = pd.DataFrame(dataset).drop(columns=["prompt_id", "messages"])

#Step 4. Displaying a random sample of x rows from the DataFrame. T
#Note: This gives us a quick look at the data we'll be working with.
#Example: In this case we are choosing 5 rows simply replace 5 with the number of rows required.



Next, we want to test our model with random sample, we want to ensures that we have a diverse range of topics to test our model with, and that the testing process is as unbiased as possible.


#Creating a Random Sample
#Import Python Libraries
import random 
import json
#Random sample from our dataset for our test case for Phi-3. 
#Step 1. First, we sample 5 random examples from the DataFrame and convert them to a list. 

examples = df.sample(5).values.tolist()

#Step2. We convert the examples to a JSON string. 

examples_json = json.dumps(examples, indent=2)

#Step 3. Selecting a random index from the examples. 
#We use the random.randint function
#This returns a random integer within the specified range.

i = random.randint(0, len(examples) - 1)

#We use this random index to select an example from our list.
sample = examples[i]


Getting a response to the inputted prompt (user question)


#Getting the Phi-3 model to generate a response to a user's question.
#import the Python Libraries
import json
import tempfile

#Step 1. Define the input data. 
# This includes the user's message their prompt/question 
# We also need some additional parameters for the Phi-3 model. 
# The parameters control the randomness of the model's output Temperature, top_p do_Sample, and max_new_tokens.

messages = {
    "input_data": { "input_string": [ { "role": "user", "content": "This is the users input question or prompt?" } ],
        "parameters": { "temperature": 0.7, "top_p": 0.9, "do_sample": True, "max_new_tokens": 500 } }

#Step 2. We write the input data to a temporary file. 
#The invoke method of the workspace_ml_client.online_endpoints object 
#requires a file as input.

with tempfile.NamedTemporaryFile(suffix=".json", delete=False, mode='w') as temp:
    json.dump(messages, temp)
    temp_file_name = temp.name

#Step 3. Invoking the Phi-3 model and providing the response. 
#The invoke method sends the input data to the model and returns the model's output.

response = workspace_ml_client.online_endpoints.invoke(
#You will find the endpoint_name and deployment_name details available under 
#Build > Components > Deployments > then, select the deployment you created.
    endpoint_name="Replace this with your model Name",
    deployment_name="Replace this with your deployment Name",

#Step 4. We get the response from the model, parse it and add it to the input data. 
#This allows us to build up the conversation history. 
#Display the message back to the user we print the updated input data. 
#This includes the user's message and the model's response.

response_json = json.loads(response)["output"]
response_dict = {'content': response_json, 'role': 'assistant'}

#Step 5. Display the updated data


Now we want to test the model so we need to use the sample data we created.


#Step1. Importing the random data and files we created previously
import json
import tempfile
import random

#Step 2. Using a random sample from the examples
i = random.randint(0, len(examples) - 1)
sample = examples[i]

#Step 3. Define the input data
messages = { "input_data": { 
    "input_string": [{"role": "user", "content": sample[0]}], 
    "parameters": { "temperature": 0.7, "top_p": 0.9, "do_sample": True, "max_new_tokens": 500, }, }

#Step 4. Write the input data to a temporary file
with tempfile.NamedTemporaryFile(suffix=".json", delete=False, mode="w") as temp:
    json.dump(messages, temp)
    temp_file_name = temp.name

#Step 5. Invoking the Phi-3 model and get the response
response = workspace_ml_client.online_endpoints.invoke(
    endpoint_name="Replace with your endpoint",
    deployment_name="Replace with your deployment",

#Step 6. Parse the response and add it to the input data
response_json = json.loads(response)["output"]
response_dict = {'content': response_json, 'role': 'assistant'}

#Step 7. Display the updated input data


Now we want to create UI to experiement with our Model and Chat inputs and responses. This process creates a user-friendly chat interface for the Phi-3 model.


#Building the Chat Interface
#Import the Python Libraries
import gradio as gr
import json
import tempfile

#Step 1.Using Gradio to create a UI.
#Define a function predict this will takes a message/input 
#Plus the history of previous messages as input. 
#This function prepares the input data for the Phi-3 model
#Invokes the model, and processes the model's response.

def predict(message, history):
    messages = {
        "input_data": {
            "input_string": [ ],
            "parameters": { "temperature": 0.6, "top_p": 0.9, "do_sample": True, "max_new_tokens": 500, }, } }
    for user, assistant in history:
        messages["input_data"]["input_string"].append({"content": user, "role": "user"})
        messages["input_data"]["input_string"].append({"content": assistant, "role": "assistant"})
    messages["input_data"]["input_string"].append({"content": message, "role": "user"})

#Step 3.Write the data to a temp file
with tempfile.NamedTemporaryFile(suffix=".json", delete=False, mode="w") as temp:
        json.dump(messages, temp)
        temp_file_name = temp.name

#Step 5. Invoking a response from the Phi-3 model 
response = workspace_ml_client.online_endpoints.invoke(

#Step 6. Parsing the response
response_json = json.loads(response)["output"]
    response_dict = {"content": response_json, "role": "assistant"}
    return response_json

#Step 7. Create a Gradio interface for it. 
#This interface includes a textbox for the user to enter their message 

        value="Ask any questiom?",
        placeholder="Ask me anything...",
    title="Phi-3: This is a response example!",


As technical students, exploring Microsoft's Phi-3 family of small language models (SLMs) and their integration into applications via the Azure AI Model catalog has revealed that powerful AI can be achieved with lighter and more efficient models. This blog post aimed to illustrate this concept by showcasing the benefits of using Phi-3 models, including step-by-step guidance for deploying and integrating AI into applications, as well as practical exercises like our Gradio-powered chatbot.

Don't stop here – continue your exploration of AI with Azure AI and keep learning and building. We would love for you to share what your building and if you'd like to see more content, and consider sharing this tutorial with colleagues or through your professional network to help grow the field of AI for all. I look forward to seeing what you create with Azure AI and you can check out our new Phi-3 CookBook for getting started with Phi-3Learn more about what you can do in Azure AI Studio 

Version history
Last update:
‎May 09 2024 11:49 PM
Updated by: