Microsoft has officially released the Phi-4 series models. Building on the previously launched Phi-4 (14B) model with advanced reasoning capabilities, Microsoft has now introduced Phi-4-mini-instruct (3.8B) and Phi-4-multimodal (5.6B).
These new Phi-4 mini and multimodal models are now available on Hugging Face, Azure AI Foundry Model Catalog, GitHub Models, and Ollama.
Phi-4-mini brings significant enhancements in multilingual support, reasoning, and mathematics, and now, the long-awaited function calling feature is finally supported. As for Phi-4-multimodal, it is a fully multimodal model capable of vision, audio, text, multilingual understanding, strong reasoning, encoding, and more.
These models can also be deployed on edge devices, enabling IoT applications to integrate generative AI even in environments with limited computing power and network access.
Now, let’s dive into the new Phi-4-mini and Phi-4-multimodal together!
Function calling
This is a highly anticipated feature within the community. With function calling, Phi-4-mini and Phi-4-multimodal can extend their text-processing capabilities by integrating search engines, connecting various tools, and more.
As illustrated in the image, the model is being used to retrieve Premier League match information via Phi-4-mini, showcasing its ability to interact with external data sources seamlessly.
🔗Sample Code for function calling https://github.com/microsoft/PhiCookBook/blob/main/md/02.Application/07.FunctionCalling/Phi4/FunctionCallingBasic/README.md
Quantized model deployment
We can deploy the quantized model on edge devices. By combining Microsoft Olive and the ONNX GenAI Runtime, we can deploy Phi-4-mini on Windows, iPhone, Android and other devices.
This is an example running on an iPhone 12 Pro.
Multimodal SLM - Phi-4-multimodal
Phi-4-multimodal is a fully multimodal model that supports text, visual, and voice inputs. If you're already familiar with visual contexts, the model can even generate code directly from images, streamlining the development process.
🔗Sample code sample for Phi-4 multimodal
https://github.com/microsoft/PhiCookBook/tree/main/md/02.Application/04.Vision/Phi4/CreateFrontend
The integration of audio functions has given Phi-4 stronger functional support. The following are some examples
- Audio Samples Extraction https://github.com/microsoft/PhiCookBook/blob/main/md/02.Application/08.Multimodel/Phi4/TechJournalist/phi_4_mm_audio_text_publish_news.ipynb
- Voice Interaction Sample https://github.com/microsoft/PhiCookBook/blob/main/md/02.Application/05.Audio/Phi4/Siri/demo.ipynb
- Audio translation Sample https://github.com/microsoft/PhiCookBook/blob/main/md/02.Application/05.Audio/Phi4/Translate/demo.ipynb
Advanced Reasoning
When Phi-4 (14B) was released, its strong reasoning capabilities were a major highlight. Now, even with a reduced parameter size, both Phi-4-mini and Phi-4-multimodal retain this capability. We can test their reasoning abilities by combining them with image inputs. For example, by uploading an image, Phi-4-multimodal can generate structured project code based on both the image content and provided prompts.
🔗 Advanced Reasoning Code Sample:
https://github.com/microsoft/PhiCookBook/blob/main/md/02.Application/02.Code/Phi4/GenProjectCode/README.md
Despite their compact size, Phi-4-mini and Phi-4-multimodal achieve performance levels comparable to some LLMs. They can be deployed on edge devices, enabling PCs, mobile devices, and IoT systems to leverage enhanced generative AI capabilities.
We will continue to expand the available examples in the Phi Cookbook, aiming to make it your go-to guide for Phi-4. Stay tuned for more updates!
Resources
- Microsoft Phi Cookbook https://aka.ms/Phicookbook
- Microsoft Phi-4-multimodal Tech Report https://aka.ms/phi-4-multimodal/techreport
- Microsoft Phi-4 Paper https://arxiv.org/abs/2412.08905
Updated Feb 26, 2025
Version 1.0kinfey
Microsoft
Joined September 17, 2021
Educator Developer Blog
Follow this blog board to get notified when there's new activity