Fine Tuning
2 TopicsPhi-4: Small Language Models That Pack a Punch
What Are Small Language Models, and Why Should You Care? If you've been following AI development, you can probably recall "bigger is better" being the mantra for years. GPT-3.5 was 175 billion parameters, GPT-4 is even larger, and everyone seemed to be in an arms race to build the biggest model possible. But here's the thing: bigger models are expensive to run, slow to respond, and often overkill for what you actually need. Small Language Models (SLMs) flip this script. These are models with fewer parameters (typically 1-15 billion) that are trained really thoughtfully on high-quality data. The outcome of this is models that can run on your laptop, respond instantly, and still handle complex reasoning tasks. You can extrapolate from this, increased speed, privacy, and cost-effectiveness. Microsoft's been exploring this space for a while. It started with Phi-1, which showed that small models trained on carefully curated "textbook-like" data could punch way above their weight class. Then came Phi-2 and Phi-3, each iteration getting better at reasoning and problem-solving. Now we have Phi-4, and it's honestly impressive. At 14 billion parameters, it outperforms models that are 5 times its size on math and reasoning tasks. Microsoft trained it on 9.8 trillion tokens over three weeks, using a mix of synthetic data (generated by larger models like GPT-4o) and high-quality web content. The key innovation isn't just throwing more data at it but they were incredibly selective about what to include, focusing on teaching reasoning patterns rather than memorizing facts. The Phi family has also expanded recently. There's Phi-4-mini at 3.8 billion parameters for even lighter deployments, and Phi-4-multimodal at 5.6 billion parameters that can handle text, images, and audio all at once. Pretty cool if you're building something that needs to understand screenshots or transcribe audio. How Well Does It Actually Perform? Let's talk numbers, because that's where Phi-4 really shines. On MMLU (a broad test of knowledge across 57 subjects), Phi-4 scores 84.8%. That's better than Phi-3's 77.9% and competitive with models like GPT-4o-mini. On MATH (competition-level math problems), it hits 56.1%, which is significantly higher than Phi-3's 42.5%. For code generation on HumanEval, it achieves 82.6%. Model Parameters MMLU MATH HumanEval Phi-3-medium 14B 77.9% 42.5% 62.5% Phi-4 14B 84.8% 56.1% 82.6% Llama 3.3 70B 86.0% ~51% ~73% GPT-4o-mini Unknown ~82% 52.2% 87.2% Microsoft tested Phi-4 on the November 2024 AMC-10 and AMC-12 math competitions. These are tests that over 150,000 high school students take each year, and the questions appeared after all of Phi-4's training data was collected. Phi-4 beat not just similar-sized models, but also much larger ones. That suggests it's actually learned to reason, not just memorize benchmark answers. The model also does well on GPQA (graduate-level science questions) and even outperforms its teacher model GPT-4o on certain reasoning tasks. That's pretty remarkable for a 14 billion parameter model. If you're wondering about practical performance, Phi-4 runs about 2-4x faster than comparable larger models and uses significantly less memory. You can run it on a single GPU or even on newer AI-capable laptops with NPUs. That makes it practical for real-time applications where latency matters. Try Phi-4 Yourself You can start experimenting with Phi-4 right now without any complicated setup. Azure AI Foundry Microsoft's Azure AI Foundry is probably the quickest way to get started. Once you're logged in: Go to the Model Catalog and search for "Phi-4" Click "Use this Model" Select an active subscription in the subsequent pop-up and confirm Deploy and start chatting or testing prompts The playground lets you adjust parameters like temperature and see how the model responds. You can test it on math problems, coding questions, or reasoning tasks without writing any code. There's also a code view that shows you how to integrate it into your own applications. Hugging Face (for open-source enthusiasts) If you prefer to work with open-source tools, the model weights are available on Hugging Face. You can run it locally or use their hosted inference API: # Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/phi-4") messages = [ {"role": "user", "content": "What's the derivative of x²?"}, ] pipe(messages) Other Options The Phi Cookbook on GitHub has tons of examples for different use cases like RAG (retrieval-augmented generation), function calling, and multimodal inputs. If you want to run it locally with minimal setup, you can use Ollama (ollama pull phi-4) or LM Studio, which provides a nice GUI. The Azure AI Foundry Labs also has experimental features where you can test Phi-4-multimodal with audio and image inputs. What's Next? Phi-4 is surprisingly capable for its size, and it's practical enough to run almost anywhere. Whether you're building a chatbot, working on educational software, or just experimenting with AI, it's worth checking out. We might explore local deployment in more detail later, including how to build multi-agent systems where several SLMs work together, and maybe even look at fine-tuning Phi-4 for specific tasks. But for now, give it a try and see what you can build with it. The model weights are MIT licensed, so you're free to use them commercially. Microsoft's made it pretty easy to get started, so there's really no reason not to experiment. Resources: Azure AI Foundry Phi-4 on Hugging Face Phi Cookbook Phi-4 Technical Report204Views0likes0CommentsModel Mondays S2:E5 – Fine Tuning & Distillation with Dave Voutila
This post was generated with AI help and human revision & review. To learn more about our motivation and workflows, please refer to this document in our Model Mondays website. About Model Mondays Model Mondays is a weekly series designed to help you build your Azure AI Foundry Model IQ, one week at a time. Here’s what to expect: 5-Minute Highlights – Quick updates on Azure AI models and tools (Mondays) 15-Minute Spotlight – A deeper look at a key model, protocol, or feature (Mondays) 30-Minute AMA – Friday Q&A with experts from Monday’s episode Whether you’re just starting out or already working with models, this series is your chance to grow with the community. Quick links to explore: Register for Model Mondays Watch Past Episodes Join the AMA on July 18 Visit the Discussion Forum Spotlight Topic: Fine Tuning & Distillation What is this topic and why is it important? Fine-tuning allows you to adapt a general-purpose, pre-trained model to your specific data or task—boosting accuracy and relevance. Distillation helps you take a large, high-performing model and extract its knowledge into a smaller model. This means you can run AI on smaller devices or scale at lower cost without losing performance. Together, these techniques are key for customizing and deploying real-world AI solutions effectively. Key Takeaway You don’t need to start from scratch! Dave Voutila showed how Azure AI Foundry makes it easy to fine-tune existing models and use distillation techniques without needing deep ML expertise. These tools let you iterate faster, test ideas, and deploy solutions at scale—all with efficiency in mind. How Can I Get Started? Here are a few practical links: Fine-tune models in Azure OpenAI Foundry Distillation Tooling Join the community AMA What’s New in Azure AI Foundry? Here are some of the latest updates: Streamlined fine-tuning workflows: Making it easier for developers to adapt models without complex setup Improved distillation pipelines: To help create compact, high-performing versions of larger models More robust documentation and examples: Great for newcomers exploring use cases Optimized deployment options: Especially useful for edge and resource-constrained environments My A-Ha Moment Before this episode, the terms “fine-tuning” and “distillation” sounded intimidating. But Dave explained them in such a clear, practical way that I realized—it’s all about enhancing what already exists. I learned that I don’t have to build AI from scratch. Using Azure AI Foundry, I can tune a model to my own needs and even shrink it for performance. That gave me the confidence to try building on top of existing models without fear. My a-ha moment? Realizing that responsible innovation is totally doable—even for students like me! Coming Up Next Week Next episode, we go deeper into research & innovation with SeokJin Han and Saumil Shrivastava. They'll talk about the MCP Server and the Magentic-UI project, which is shaping the future of human-in-the-loop AI. Don’t miss it! Join the Community You’re not alone on this journey. Connect with other developers and learn together: Join our Discord Check out AMA Recaps About Me I'm Sharda Kaur, a Gold Microsoft Learn Student Ambassador passionate about AI and cloud. I enjoy sharing what I learn to help others grow. LinkedIn GitHub Dev.to Tech Community Thanks for reading! I’ll be back next week with another episode recap from Model Mondays!209Views0likes1Comment