Blog Post

Educator Developer Blog
3 MIN READ

Welcome to the new Phi-4 models - Microsoft Phi-4-mini & Phi-4-multimodal

kinfey's avatar
kinfey
Icon for Microsoft rankMicrosoft
Feb 27, 2025

Microsoft has officially released the Phi-4 series models. Building on the previously launched Phi-4 (14B) model with advanced reasoning capabilities, Microsoft has now introduced Phi-4-mini-instruct (3.8B) and Phi-4-multimodal (5.6B).

These new Phi-4 mini and multimodal models are now available on Hugging Face, Azure AI Foundry Model Catalog, GitHub Models, and Ollama.

Phi-4-mini brings significant enhancements in multilingual support, reasoning, and mathematics, and now, the long-awaited function calling feature is finally supported. As for Phi-4-multimodal, it is a fully multimodal model capable of vision, audio, text, multilingual understanding, strong reasoning, encoding, and more.

These models can also be deployed on edge devices, enabling IoT applications to integrate generative AI even in environments with limited computing power and network access.

Now, let’s dive into the new Phi-4-mini and Phi-4-multimodal together!

Function calling

This is a highly anticipated feature within the community. With function calling, Phi-4-mini and Phi-4-multimodal can extend their text-processing capabilities by integrating search engines, connecting various tools, and more.

As illustrated in the image, the model is being used to retrieve Premier League match information via Phi-4-mini, showcasing its ability to interact with external data sources seamlessly.

🔗Sample Code for function calling https://github.com/microsoft/PhiCookBook/blob/main/md/02.Application/07.FunctionCalling/Phi4/FunctionCallingBasic/README.md

Quantized model deployment

We can deploy the quantized model on edge devices. By combining Microsoft Olive and the ONNX GenAI Runtime, we can deploy Phi-4-mini on Windows, iPhone, Android and other devices.

This is an example running on an iPhone 12 Pro.

Multimodal SLM - Phi-4-multimodal

Phi-4-multimodal is a fully multimodal model that supports text, visual, and voice inputs. If you're already familiar with visual contexts, the model can even generate code directly from images, streamlining the development process.

🔗Sample code sample for Phi-4 multimodal

https://github.com/microsoft/PhiCookBook/tree/main/md/02.Application/04.Vision/Phi4/CreateFrontend

The integration of audio functions has given Phi-4 stronger functional support. The following are some examples

Advanced Reasoning

When Phi-4 (14B) was released, its strong reasoning capabilities were a major highlight. Now, even with a reduced parameter size, both Phi-4-mini and Phi-4-multimodal retain this capability. We can test their reasoning abilities by combining them with image inputs. For example, by uploading an image, Phi-4-multimodal can generate structured project code based on both the image content and provided prompts.

🔗 Advanced Reasoning Code Sample: 
https://github.com/microsoft/PhiCookBook/blob/main/md/02.Application/02.Code/Phi4/GenProjectCode/README.md

Despite their compact size, Phi-4-mini and Phi-4-multimodal achieve performance levels comparable to some LLMs. They can be deployed on edge devices, enabling PCs, mobile devices, and IoT systems to leverage enhanced generative AI capabilities.

We will continue to expand the available examples in the Phi Cookbook, aiming to make it your go-to guide for Phi-4. Stay tuned for more updates!

Resources

Updated Feb 26, 2025
Version 1.0