phi-3
7 TopicsIntroducing Phi-4-Reasoning-Vision to Microsoft Foundry
Vision reasoning models unlock a critical capability for developers: the ability to move beyond passive perception toward systems that can understand, reason over, and act on visual information. Instead of treating images, diagrams, documents, or UI screens as unstructured inputs, vision reasoning models enable developers to build applications that can interpret visual structure, connect it with textual context, and perform multi-step reasoning to reach actionable conclusions. Today, we are excited to announce Phi-4-Reasoning-Vision-15B is available in Microsoft Foundry and Hugging Face. This model brings high‑fidelity vision to the reasoning‑focused Phi‑4 family, extending small language models (SLMs) beyond perception into structured, multi‑step visual reasoning for agents, analytical tools, and scientific workflows. What’s new? The Phi model family has advanced toward combining efficient visual understanding with strong reasoning in small language models. Earlier Phi‑4 models demonstrated reliable perception and grounding across images and text, while later iterations introduced structured reasoning to improve performance on complex tasks. Phi‑4‑reasoning-vision-15B brings these threads together, pairing high‑resolution visual perception with selective, task‑aware reasoning. As a result, the model can reason deeply when needed while remaining fast and efficient for perception‑focused scenarios—making it well suited for interactive, real‑world applications. Key capabilities Reasoning behavior is explicitly enabled via prompting: Developers can explicitly enable or disable reasoning to balance latency and accuracy at runtime. Optimized for vision reasoning and can be used for: diagram-based math, document, chart, and table understanding, GUI interpretations and grounding for agent scenarios to interpret screens and actions, Computer-use agent scenarios, and General image chat and answering questions Benchmarks The following results summarize Phi-4-reasoning-vision-15B performance across a set of established multimodal reasoning, mathematics, and computer use benchmarks. The following benchmarks are the result of internal evaluations. Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B – force no think Phi-4-mm-instruct Kimi-VL-A3B-Instruct gemma-3-12b-it Qwen3-VL-8B-Instruct-4K Qwen3-VL-8B-Instruct-32K Qwen3-VL-32B-Instruct-4K Qwen3-VL-32B-Instruct-32K AI2D _TEST 84.8 84.7 68.6 84.6 80.4 82.7 83 84.8 85 ChartQA _TEST 83.3 76.5 23.5 87 39 83.1 83.2 84.3 84 HallusionBench 64.4 63.1 56 65.2 65.3 73.5 74.1 74.4 74.9 MathVerse _MINI 44.9 43.8 32.4 41.7 29.8 54.5 57.4 64.2 64.2 MathVision _MINI 36.2 34.2 20 28.3 31.9 45.7 50 54.3 60.5 MathVista _MINI 75.2 68.7 50.5 67.1 57.4 77.1 76.4 82.5 81.8 MMMU _VAL 54.3 52 42.3 52 50 60.7 64.6 68.6 70.6 MMStar 64.5 63.3 45.9 60 59.4 68.9 69.9 73.7 74.3 OCRBench 76 75.6 62.6 86.5 75.3 89.2 90 88.5 88.5 ScreenSpot _v2 88.2 88.3 28.5 89.8 3.5 91.5 91.5 93.7 93.9 Table 1: Accuracy comparisons relative to popular open-weight, non-thinking models Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B - force thinking Kimi-VL-A3B-Thinking gemma-3-12b-it Qwen3-VL-8B-Thinking-4K Qwen3-VL-8B-Thinking-40K Qwen3-VL-32B-Thiking-4K Qwen3-VL-32B-Thinking-40K AI2D_TEST 84.8 79.7 81.2 80.4 83.5 83.9 86.9 87.2 ChartQA _TEST 83.3 82.9 73.3 39 78 78.6 78.5 79.1 HallusionBench 64.4 63.9 70.6 65.3 71.6 73 76.4 76.6 MathVerse _MINI 44.9 53.1 61 29.8 67.3 73.3 78.3 78.2 MathVision _MINI 36.2 36.2 50.3 31.9 43.1 50.7 60.9 58.6 MathVista _MINI 75.2 74.1 78.6 57.4 77.7 79.5 83.9 83.8 MMMU _VAL 54.3 55 60.2 50 59.3 65.3 72 72.2 MMStar 64.5 63.9 69.6 59.4 69.3 72.3 75.5 75.7 OCRBench 76 73.7 79.9 75.3 81.2 82 83.7 85 ScreenSpot _v2 88.2 88.1 81.8 3.5 93.3 92.7 83.1 83.1 Table 2: Accuracy comparisons relative to popular open-weight, thinking models All results were obtained using a consistent evaluation setup and prompts across models; numbers are provided for comparison and analysis rather than as leaderboard claims. For more information regarding benchmarks and evaluations, please read the technical paper on the Microsoft Research hub. Suggested use cases and applications Phi‑4‑Reasoning-Vision-15B supports applications that require both high‑fidelity visual perception and structured inference. Two representative scenarios include scientific and mathematical reasoning over visual inputs, and computer‑using agents (CUAs) that operate directly on graphical user interfaces. In both cases, the model provides grounded visual understanding paired with controllable, low‑latency reasoning suitable for interactive systems. Computer use agents in retail scenarios For computer use agents, Phi‑4‑Reasoning-Vision-15B provides the perception and grounding layer required to understand and act within live ecommerce interfaces. For example, in an online shopping experience, the model interprets screen content—products, prices, filters, promotions, buttons, and cart state—and produces grounded observations that agentic models like Fara-7B can use to select actions. Its compact size and low latency inference make it well suited for CUA workflows and agentic applications. Visual reasoning for education Another practical use of visual reasoning models is education. A developer could build a K‑12 tutoring app with Phi‑4‑Reasoning‑Vision‑15B where students upload photos of worksheets, charts, or diagrams to get guided help—not answers. The model can understand the visual content, identify where the student went wrong, and explain the correct steps clearly. Over time, the app can adapt by serving new examples matched to the student’s learning level, turning visual problem‑solving into a personalized learning experience. Microsoft Responsible AI principles At Microsoft, our mission to empower people and organizations remains constant—especially in the age of AI, where the potential for human achievement is greater than ever. We recognize that trust is foundational to AI adoption, and earning that trust requires a commitment to transparency, safety, and accountability. As with other Phi models, Phi-4-Reasoning-Vision-15B was developed with safety as a core consideration throughout training and evaluation. The model was trained on a mixture of public safety datasets and internally generated examples designed to elicit behaviors the model should appropriately refuse, in alignment with Microsoft’s Responsible AI Principles. These safety focused training signals help the model recognize and decline requests that fall outside intended or acceptable use. Additional details on the model’s safety considerations, evaluation approach, and known limitations are provided in the accompanying technical blog and model card. Getting started Start using Phi‑4‑Reasoning-Vision-15B in Microsoft Foundry today. Microsoft Foundry provides a unified environment for model discovery, evaluation, and deployment, making it straightforward to move from initial experimentation to production use while applying appropriate safety and governance practices. Deploy the new model on Microsoft Foundry. Learn more about the Phi family on Foundry Labs and in the Phi Cookbook Connect to the Microsoft Developer Community on Discord Read the technical paper on Microsoft Research Read more use cases on the Educators Developer blog1.3KViews0likes0CommentsIntroducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning
Today we are introducing Phi-4, our 14B parameter state-of-the-art small language model (SLM) that excels at complex reasoning in areas such as math, in addition to conventional language processing. Phi-4 is the latest member of our Phi family of small language models and demonstrates what’s possible as we continue to probe the boundaries of SLMs. Phi-4 is available on Azure AI Foundry and on Hugging Face. Phi-4 Benchmarks Phi-4 outperforms comparable and larger models on math related reasoning due to advancements throughout the processes, including the use of high-quality synthetic datasets, curation of high-quality organic data, and post-training innovations. Phi-4 continues to push the frontier of size vs quality. Phi-4 is particularly good at math problems, for example here are the benchmarks for Phi-4 on math competition problems: Phi-4 performance on math competition problems To see more benchmarks read the newest technical paper released on arxiv. Enabling AI innovation safely and responsibly Building AI solutions responsibly is at the core of AI development at Microsoft. We have made our robust responsible AI capabilities available to customers building with Phi models, including Phi-3.5-mini optimized for Windows Copilot+ PCs. Azure AI Foundry provides users with a robust set of capabilities to help organizations measure, mitigate, and manage AI risks across the AI development lifecycle for traditional machine learning and generative AI applications. Azure AI evaluations in AI Foundry enable developers to iteratively assess the quality and safety of models and applications using built-in and custom metrics to inform mitigations. Additionally, Phi users can use Azure AI Content Safety features such as prompt shields, protected material detection, and groundedness detection. These capabilities can be leveraged as content filters with any language model included in our model catalog and developers can integrate these capabilities into their application easily through a single API. Once in production, developers can monitor their application for quality and safety, adversarial prompt attacks, and data integrity, making timely interventions with the help of real-time alerts. Phi-4 in action One example of the mathematical reasoning Phi-4 is capable of is demonstrated in this problem. Start Exploring Phi-4 is currently available on Azure AI Foundry and Hugging Face, take a look today.249KViews20likes22CommentsCapacity's AI Answer Engine® leveraged Phi to deliver better results for their customers, faster
Capacity an all-in-one Support Automation Platform, provides organizations with the ultimate Answer Engine®. They needed a way to help unify diverse datasets across tens of millions of search results and billions of interactions and make information more easily accessible and understandable for their customers. By leveraging Phi—Microsoft’s family of powerful small language models offering groundbreaking performance at low cost and low latency—Capacity provides the enterprise with an effective AI knowledge management solution that democratizes knowledge on large teams securely and in a way that maximizes value to the customer. With Phi, Capacity’s Answer Engine® improved results quality and scale, so customers save both time and money by more quickly finding the rich information they invested in to do their best work. What was the challenge? Enterprise employees struggle to find the data they need searching through isolated, untagged content, leading to frustration and wasted time. To address this, Capacity’s Answer Engine® retrieves information across diverse enterprise systems, repositories and sources, instantly delivering the exact answers needed to inform work and make faster decisions. At the same time, AI can only go so far to unify and enrich this data. Capacity addressed the challenge by leveraging Phi using Azure Serverless API to experiment on the effectiveness of Language Model-based tagging infrastructure. They applied prompt engineering, adherence workflows, and at-scale testing to better prepare Answers for search and create a more universal Answer Engine®. Why did Capacity choose Phi? Capacity chose Phi-3.5-mini for its speed, cost-effectiveness, and deployment flexibility. With Azure Models as a Service (MaaS), Capacity was able to use the Phi family models without having to provision GPUs or manage back-end operations, saving their team time, effort, and cost. They used prompt engineering and metadata tagging to optimize search results, ultimately improving development speed and query processing efficiency. Additionally, the favorable MIT Open Source licensing of the Phi family provided a strong long-term strategy for their private cloud deployment, vectorization, and query routing activities. "From our initial experiments, what truly impressed us about the Phi was its remarkable accuracy and the ease of deployment, even before customization. Since then, we've been able to enhance both accuracy and reliability, all while maintaining the cost-effectiveness and scalability we valued from the start." ~ Steve Frederickson, Head of Product, Answer Engine How did they solve for it? To achieve their goal, Capacity implemented Phi-3-mini and Phi-3.5-mini Model-as-a-Service, using both 4k and 128K variants with some prompt engineering. This allowed them to accelerate development on their AI-powered Answer Engine and help their enterprise customers deliver the right information to their end users quickly and accurately. When presenting an Answer to their customer’s end user, Capacity wanted their AI Answer engine to instantly present the full Answer along with all the content metadata around it, so the end user could feel confident in their search results. To accomplish this, Capacity engineers split the tasks for Phi into preprocessing and real-time flows. In preprocessing, they generated metadata such as title summaries for answers, keyword tags for search, and other information to the index. This pre-work was done offline and ahead of time. Depending on the tagging task required for each Answer, they calculated the needed token size then rerouted the query to the appropriate Phi model. At query time, Phi models pre-process the query to retrieve the most relevant content. The split tasks for Phi enabled repeatable performance, keeping the responsive query times users expect while enhancing results with new functionality and increased retrieval relevance. At the same time, the cost-efficiency of Phi was able to produce the same or better qualitative results for preprocessing with a 4.2x cost savings as compared to the competing workflow. The considerable cost savings on the preprocessing allows Capacity to scale to ever-growing datasets. While the increased retrieval relevance fosters sustained growth and enhances user satisfaction. After integrating Phi, Capacity observed significant improvements in both performance and customer satisfaction. The AI-powered solutions provided faster and more accurate information retrieval, which reduced time users spent searching for information. Additionally, the seamless integration of datasets with the Phi-3.5-mini model as a service significantly empowered Capacity to address a wide range of use cases with enhanced efficacy, ultimately elevating the user experience. Steve Frederickson, Capacity's Head of Product, Answer Engine, noted, “Integrating our datasets with the Phi-3.5-mini model was effortless. We have found new opportunity in its speed, and the enriched customer experience of GenAI enables us to resolve customer issues more effectively, delivering a superior user experience." Capacity also shared some valuable tips for other organizations looking to implement similar AI solutions. They recommended designing the system to optimize for query performance and retrieval accuracy, including adding metadata and keyword tags to optimize search efficiency. They also emphasize the importance of choosing the right AI model based on the capability and scalability, to balance speed and cost-effectiveness. The next step Implementing Phi has revolutionized Capacity’s approach to knowledge management, providing their enterprise customers with efficient and accurate information retrieval solutions. Their success highlights the potential of the Phi model family to transform enterprise operations and improve user experiences. Looking ahead, Capacity plans to explore additional state-of-the-art models such as Phi-4-multimodal and Phi-4-mini for more complex reasoning tasks like multilingual support and image understanding scenarios. They also aim to fine-tune their solutions to enhance their knowledge graph and improve interoperability among different institutional knowledge bases. By continuously innovating and leveraging advanced AI technology, the Capacity Answer Engine® is well-positioned to remain at the forefront of knowledge management solutions, helping organizations do their best work with the complexities of information retrieval and discovery. Learn more about the Phi family of models here: About Phi Learn about the latest updates Download the models673Views1like0CommentsNew Hugging Face Models on Azure AI: Phi-3 Variants from the Community
The Azure AI Model Catalog offers over 1.78K models, including foundation models from core partners and nearly 1.6K open-source models from the Hugging Face community. Read this post to learn about the latest models added to the Hugging Face Collection this month!14KViews0likes0Comments