phi
3 TopicsIntroducing Phi-4-Reasoning-Vision to Microsoft Foundry
Vision reasoning models unlock a critical capability for developers: the ability to move beyond passive perception toward systems that can understand, reason over, and act on visual information. Instead of treating images, diagrams, documents, or UI screens as unstructured inputs, vision reasoning models enable developers to build applications that can interpret visual structure, connect it with textual context, and perform multi-step reasoning to reach actionable conclusions. Today, we are excited to announce Phi-4-Reasoning-Vision-15B is available in Microsoft Foundry and Hugging Face. This model brings high‑fidelity vision to the reasoning‑focused Phi‑4 family, extending small language models (SLMs) beyond perception into structured, multi‑step visual reasoning for agents, analytical tools, and scientific workflows. What’s new? The Phi model family has advanced toward combining efficient visual understanding with strong reasoning in small language models. Earlier Phi‑4 models demonstrated reliable perception and grounding across images and text, while later iterations introduced structured reasoning to improve performance on complex tasks. Phi‑4‑reasoning-vision-15B brings these threads together, pairing high‑resolution visual perception with selective, task‑aware reasoning. As a result, the model can reason deeply when needed while remaining fast and efficient for perception‑focused scenarios—making it well suited for interactive, real‑world applications. Key capabilities Reasoning behavior is explicitly enabled via prompting: Developers can explicitly enable or disable reasoning to balance latency and accuracy at runtime. Optimized for vision reasoning and can be used for: diagram-based math, document, chart, and table understanding, GUI interpretations and grounding for agent scenarios to interpret screens and actions, Computer-use agent scenarios, and General image chat and answering questions Benchmarks The following results summarize Phi-4-reasoning-vision-15B performance across a set of established multimodal reasoning, mathematics, and computer use benchmarks. The following benchmarks are the result of internal evaluations. Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B – force no think Phi-4-mm-instruct Kimi-VL-A3B-Instruct gemma-3-12b-it Qwen3-VL-8B-Instruct-4K Qwen3-VL-8B-Instruct-32K Qwen3-VL-32B-Instruct-4K Qwen3-VL-32B-Instruct-32K AI2D _TEST 84.8 84.7 68.6 84.6 80.4 82.7 83 84.8 85 ChartQA _TEST 83.3 76.5 23.5 87 39 83.1 83.2 84.3 84 HallusionBench 64.4 63.1 56 65.2 65.3 73.5 74.1 74.4 74.9 MathVerse _MINI 44.9 43.8 32.4 41.7 29.8 54.5 57.4 64.2 64.2 MathVision _MINI 36.2 34.2 20 28.3 31.9 45.7 50 54.3 60.5 MathVista _MINI 75.2 68.7 50.5 67.1 57.4 77.1 76.4 82.5 81.8 MMMU _VAL 54.3 52 42.3 52 50 60.7 64.6 68.6 70.6 MMStar 64.5 63.3 45.9 60 59.4 68.9 69.9 73.7 74.3 OCRBench 76 75.6 62.6 86.5 75.3 89.2 90 88.5 88.5 ScreenSpot _v2 88.2 88.3 28.5 89.8 3.5 91.5 91.5 93.7 93.9 Table 1: Accuracy comparisons relative to popular open-weight, non-thinking models Benchmark Phi-4-reasoning-vision-15B Phi-4-reasoning-vision-15B - force thinking Kimi-VL-A3B-Thinking gemma-3-12b-it Qwen3-VL-8B-Thinking-4K Qwen3-VL-8B-Thinking-40K Qwen3-VL-32B-Thiking-4K Qwen3-VL-32B-Thinking-40K AI2D_TEST 84.8 79.7 81.2 80.4 83.5 83.9 86.9 87.2 ChartQA _TEST 83.3 82.9 73.3 39 78 78.6 78.5 79.1 HallusionBench 64.4 63.9 70.6 65.3 71.6 73 76.4 76.6 MathVerse _MINI 44.9 53.1 61 29.8 67.3 73.3 78.3 78.2 MathVision _MINI 36.2 36.2 50.3 31.9 43.1 50.7 60.9 58.6 MathVista _MINI 75.2 74.1 78.6 57.4 77.7 79.5 83.9 83.8 MMMU _VAL 54.3 55 60.2 50 59.3 65.3 72 72.2 MMStar 64.5 63.9 69.6 59.4 69.3 72.3 75.5 75.7 OCRBench 76 73.7 79.9 75.3 81.2 82 83.7 85 ScreenSpot _v2 88.2 88.1 81.8 3.5 93.3 92.7 83.1 83.1 Table 2: Accuracy comparisons relative to popular open-weight, thinking models All results were obtained using a consistent evaluation setup and prompts across models; numbers are provided for comparison and analysis rather than as leaderboard claims. For more information regarding benchmarks and evaluations, please read the technical paper on the Microsoft Research hub. Suggested use cases and applications Phi‑4‑Reasoning-Vision-15B supports applications that require both high‑fidelity visual perception and structured inference. Two representative scenarios include scientific and mathematical reasoning over visual inputs, and computer‑using agents (CUAs) that operate directly on graphical user interfaces. In both cases, the model provides grounded visual understanding paired with controllable, low‑latency reasoning suitable for interactive systems. Computer use agents in retail scenarios For computer use agents, Phi‑4‑Reasoning-Vision-15B provides the perception and grounding layer required to understand and act within live ecommerce interfaces. For example, in an online shopping experience, the model interprets screen content—products, prices, filters, promotions, buttons, and cart state—and produces grounded observations that agentic models like Fara-7B can use to select actions. Its compact size and low latency inference make it well suited for CUA workflows and agentic applications. Visual reasoning for education Another practical use of visual reasoning models is education. A developer could build a K‑12 tutoring app with Phi‑4‑Reasoning‑Vision‑15B where students upload photos of worksheets, charts, or diagrams to get guided help—not answers. The model can understand the visual content, identify where the student went wrong, and explain the correct steps clearly. Over time, the app can adapt by serving new examples matched to the student’s learning level, turning visual problem‑solving into a personalized learning experience. Microsoft Responsible AI principles At Microsoft, our mission to empower people and organizations remains constant—especially in the age of AI, where the potential for human achievement is greater than ever. We recognize that trust is foundational to AI adoption, and earning that trust requires a commitment to transparency, safety, and accountability. As with other Phi models, Phi-4-Reasoning-Vision-15B was developed with safety as a core consideration throughout training and evaluation. The model was trained on a mixture of public safety datasets and internally generated examples designed to elicit behaviors the model should appropriately refuse, in alignment with Microsoft’s Responsible AI Principles. These safety focused training signals help the model recognize and decline requests that fall outside intended or acceptable use. Additional details on the model’s safety considerations, evaluation approach, and known limitations are provided in the accompanying technical blog and model card. Getting started Start using Phi‑4‑Reasoning-Vision-15B in Microsoft Foundry today. Microsoft Foundry provides a unified environment for model discovery, evaluation, and deployment, making it straightforward to move from initial experimentation to production use while applying appropriate safety and governance practices. Deploy the new model on Microsoft Foundry. Learn more about the Phi family on Foundry Labs and in the Phi Cookbook Connect to the Microsoft Developer Community on Discord Read the technical paper on Microsoft Research Read more use cases on the Educators Developer blog1.2KViews0likes0CommentsTransforming Android Development: Unveiling MediaTek’s latest chipset with Microsoft's Phi models
Imagine running advanced AI applications—like intelligent copilots and Retrieval-Augmented Generation (RAG)—directly on Android devices, completely offline. With the rapid evolution of Neural Processing Units (NPUs), this is no longer a future vision—it’s happening now. Optimized AI at the Edge: Phi-4-mini on MediaTek Thanks to MediaTek’s conversion and quantization tools, Microsoft’s Phi-4-mini and Phi-4-mini-reasoning models are now optimized for MediaTek NPUs. This collaboration empowers developers to build fast, responsive, and privacy-preserving AI experiences on Android—without needing cloud connectivity. MediaTek’s flagship Dimensity 9400 and 9400+ platform with Dimensity GenAI Toolkit 2.0 delivers excellent performance with the Phi-4 mini (3.8B) model where prefill speed is >800 tokens/sec and decode speed is >21 tokens/sec. Unlock Enhanced Performance: Introducing MediaTek's NeuroPilot SDK The MediaTek NeuroPilot SDK is a robust software development toolkit designed to accelerate AI application development and deployment across MediaTek’s hardware ecosystem. It provides developers with advanced optimization tools and cross-platform compatibility, enabling efficient implementation of neural networks while balancing performance, power efficiency, and resource utilization. Comprehensive toolchain and documentation support The NeuroPilot platform offers a complete toolchain, including SDKs, APIs, and documentation, for model quantization/conversion, compilation, and integration. Developers can leverage these tools to optimize neural networks, significantly improving on-device performance while reducing power consumption and memory usage. MediaTek’s Dimensity GenAI Toolkit 2.0 now supports the Phi-4 series and provides best practices. Users can convert and quantize Phi-4 mini models in just a few steps, enabling seamless deployment on Dimensity series platforms. A key advantage is that developers do not require specialized hardware expertise to rapidly prototype and deploy customized AI solutions. One-time coding, cross-platform deployment The MediaTek NeuroPilot SDK supports all AI-capable MediaTek hardware, empowering developers to adopt a "code once, deploy everywhere" strategy across smartphones, tablets, automotive, smart home devices, IoT products, and future platforms. This aligns with MediaTek’s corporate philosophy of bringing AI to everyone. This unified approach streamlines development, reduces costs, and accelerates time-to-market. The SDK integrates with Android and Linux ecosystems, providing complete compiler suites, analyzers, and application libraries to ensure compatibility and optimize performance. Demo 1: Deploying Phi-4-mini-reasoning with NeuroPilot SDK In this demo, developers are shown how to use the NeuroPilot SDK to deploy the Phi-4-mini-reasoning model on edge devices. The SDK enables efficient conversion and optimization, making it possible to bring advanced reasoning capabilities to smartphones and other local hardware. The Phi-4-mini-reasoning model brings logical and problem-solving capabilities to the edge. With MediaTek’s advanced conversion tools, this new model can be transformed for MediaTek’s DLA, enabling a new class of intelligent applications on mobile devices. Bringing reasoning capabilities to the edge allows developers to build faster, more responsive AI experiences—without relying on cloud access. Demo 2: Deploying Phi-4-mini with NeuroPilot SDK This video demonstrates how to convert and run the Phi-4-mini model using the NeuroPilot SDK. With a focus on instruction-following tasks, this deployment empowers developers to build responsive, embedded AI assistants that understand and execute user commands locally. Whether it’s productivity tools or context-sensitive automation, Phi-4-mini brings natural interaction and reliability directly to the device. Imagine the possibilities: Real world scenarios Intelligent information access with on-device RAG Picture this: your application intelligently accesses and reasons over on-device documents, like PDFs or internal knowledge bases, using an advanced embedding model paired with the MediaTek optimized Phi-4-mini. This enables developers to create: Personalized Assistants: Apps that understand user context from their own documents. Offline Knowledge Hubs: Providing instant access to relevant information without needing cloud connectivity. Enhanced Productivity Tools: Smart summarization, Q&A, and content generation based on local data. Demo 3: Private RAG chatbot on device People are on their mobile devices every day—saving new documents, sending messages, taking notes, and more. With how much we’re able to store on our phones and laptops, it can get hard to find specific files or pieces of information when we need them most. What if you could implement a personal assistant that understands your question and fetches exactly what you’re looking for, without you needing to dig through your device? This demo showcases a Retrieval-Augmented Generation (RAG) implementation of the Phi model embedded directly on a smartphone. The chatbot allows users to ask natural language questions and instantly retrieve relevant information from local files. Because the model runs on-device, there's no need for a cloud connection—ensuring your data stays private while still offering intelligent, context-aware result RAG based Phi-4-mini solution, so that when you searched your device, it parsed through every document to help you find the exact document you are looking for. Stay ahead of the curve: If you're eager to explore the Phi-4 family of models on edge devices and master building next-gen apps with MediaTek's powerful NPU, don't miss the key sessions at Microsoft Build and Computex Taipei happening this week. This is your chance to get direct insights from the experts. Microsoft Build 2025: Uncover the latest on Azure AI Foundry on May 20 during “Unveiling Latest Innovations in Azure AI Foundry Model Catalog” If you are in person on May 20 th , catch the second lab “Fine-Tune End-to-End Distillation Models with Azure AI Foundry Models” Learn about Phi on Windows devices in on May 20 th for “Enable seamless deployment across Intel Copilot+ AI PCs and Azure” Computex 2025 : MediaTek Booth (M0806) on May 20-23. See MediaTek 's AI vision and hardware innovations firsthand. Resources Explore the Phi-4 Model Family on Azure AI Foundry and HuggingFace Get access to the Phi Cookbook: Your practical guide and code repository for building with Phi models. Learn more about Mediatek NeuroPilot Connect with the MediaTek Developer Application1.2KViews0likes0CommentsCapacity's AI Answer Engine® leveraged Phi to deliver better results for their customers, faster
Capacity an all-in-one Support Automation Platform, provides organizations with the ultimate Answer Engine®. They needed a way to help unify diverse datasets across tens of millions of search results and billions of interactions and make information more easily accessible and understandable for their customers. By leveraging Phi—Microsoft’s family of powerful small language models offering groundbreaking performance at low cost and low latency—Capacity provides the enterprise with an effective AI knowledge management solution that democratizes knowledge on large teams securely and in a way that maximizes value to the customer. With Phi, Capacity’s Answer Engine® improved results quality and scale, so customers save both time and money by more quickly finding the rich information they invested in to do their best work. What was the challenge? Enterprise employees struggle to find the data they need searching through isolated, untagged content, leading to frustration and wasted time. To address this, Capacity’s Answer Engine® retrieves information across diverse enterprise systems, repositories and sources, instantly delivering the exact answers needed to inform work and make faster decisions. At the same time, AI can only go so far to unify and enrich this data. Capacity addressed the challenge by leveraging Phi using Azure Serverless API to experiment on the effectiveness of Language Model-based tagging infrastructure. They applied prompt engineering, adherence workflows, and at-scale testing to better prepare Answers for search and create a more universal Answer Engine®. Why did Capacity choose Phi? Capacity chose Phi-3.5-mini for its speed, cost-effectiveness, and deployment flexibility. With Azure Models as a Service (MaaS), Capacity was able to use the Phi family models without having to provision GPUs or manage back-end operations, saving their team time, effort, and cost. They used prompt engineering and metadata tagging to optimize search results, ultimately improving development speed and query processing efficiency. Additionally, the favorable MIT Open Source licensing of the Phi family provided a strong long-term strategy for their private cloud deployment, vectorization, and query routing activities. "From our initial experiments, what truly impressed us about the Phi was its remarkable accuracy and the ease of deployment, even before customization. Since then, we've been able to enhance both accuracy and reliability, all while maintaining the cost-effectiveness and scalability we valued from the start." ~ Steve Frederickson, Head of Product, Answer Engine How did they solve for it? To achieve their goal, Capacity implemented Phi-3-mini and Phi-3.5-mini Model-as-a-Service, using both 4k and 128K variants with some prompt engineering. This allowed them to accelerate development on their AI-powered Answer Engine and help their enterprise customers deliver the right information to their end users quickly and accurately. When presenting an Answer to their customer’s end user, Capacity wanted their AI Answer engine to instantly present the full Answer along with all the content metadata around it, so the end user could feel confident in their search results. To accomplish this, Capacity engineers split the tasks for Phi into preprocessing and real-time flows. In preprocessing, they generated metadata such as title summaries for answers, keyword tags for search, and other information to the index. This pre-work was done offline and ahead of time. Depending on the tagging task required for each Answer, they calculated the needed token size then rerouted the query to the appropriate Phi model. At query time, Phi models pre-process the query to retrieve the most relevant content. The split tasks for Phi enabled repeatable performance, keeping the responsive query times users expect while enhancing results with new functionality and increased retrieval relevance. At the same time, the cost-efficiency of Phi was able to produce the same or better qualitative results for preprocessing with a 4.2x cost savings as compared to the competing workflow. The considerable cost savings on the preprocessing allows Capacity to scale to ever-growing datasets. While the increased retrieval relevance fosters sustained growth and enhances user satisfaction. After integrating Phi, Capacity observed significant improvements in both performance and customer satisfaction. The AI-powered solutions provided faster and more accurate information retrieval, which reduced time users spent searching for information. Additionally, the seamless integration of datasets with the Phi-3.5-mini model as a service significantly empowered Capacity to address a wide range of use cases with enhanced efficacy, ultimately elevating the user experience. Steve Frederickson, Capacity's Head of Product, Answer Engine, noted, “Integrating our datasets with the Phi-3.5-mini model was effortless. We have found new opportunity in its speed, and the enriched customer experience of GenAI enables us to resolve customer issues more effectively, delivering a superior user experience." Capacity also shared some valuable tips for other organizations looking to implement similar AI solutions. They recommended designing the system to optimize for query performance and retrieval accuracy, including adding metadata and keyword tags to optimize search efficiency. They also emphasize the importance of choosing the right AI model based on the capability and scalability, to balance speed and cost-effectiveness. The next step Implementing Phi has revolutionized Capacity’s approach to knowledge management, providing their enterprise customers with efficient and accurate information retrieval solutions. Their success highlights the potential of the Phi model family to transform enterprise operations and improve user experiences. Looking ahead, Capacity plans to explore additional state-of-the-art models such as Phi-4-multimodal and Phi-4-mini for more complex reasoning tasks like multilingual support and image understanding scenarios. They also aim to fine-tune their solutions to enhance their knowledge graph and improve interoperability among different institutional knowledge bases. By continuously innovating and leveraging advanced AI technology, the Capacity Answer Engine® is well-positioned to remain at the forefront of knowledge management solutions, helping organizations do their best work with the complexities of information retrieval and discovery. Learn more about the Phi family of models here: About Phi Learn about the latest updates Download the models668Views1like0Comments