Open Source
105 TopicsRunning Phi-4 Locally with Microsoft Foundry Local: A Step-by-Step Guide
In our previous post, we explored how Phi-4 represents a new frontier in AI efficiency that delivers performance comparable to models 5x its size while being small enough to run on your laptop. Today, we're taking the next step: getting Phi-4 up and running locally on your machine using Microsoft Foundry Local. Whether you're a developer building AI-powered applications, an educator exploring AI capabilities, or simply curious about running state-of-the-art models without relying on cloud APIs, this guide will walk you through the entire process. Microsoft Foundry Local brings the power of Azure AI Foundry to your local device without requiring an Azure subscription, making local AI development more accessible than ever. So why do you want to run Phi-4 Locally? Before we dive into the setup, let's quickly recap why running models locally matters: Privacy and Control: Your data never leaves your machine. This is crucial for sensitive applications in healthcare, finance, or education where data privacy is paramount. Cost Efficiency: No API costs, no rate limits. Once you have the model downloaded, inference is completely free. Speed and Reliability: No network latency or dependency on external services. Your AI applications work even when you're offline. Learning and Experimentation: Full control over model parameters, prompts, and fine-tuning opportunities without restrictions. With Phi-4's compact size, these benefits are now accessible to anyone with a modern laptop—no expensive GPU required. What You'll Need Before we begin, make sure you have: Operating System: Windows 10/11, macOS (Intel or Apple Silicon), or Linux RAM: Minimum 16GB (32GB recommended for optimal performance) Storage: At least 5 - 10GB of free disk space Processor: Any modern CPU (GPU optional but provides faster inference) Note: Phi-4 works remarkably well even on consumer hardware 😀. Step 1: Installing Microsoft Foundry Local Microsoft Foundry Local is designed to make running AI models locally as simple as possible. It handles model downloads, manages memory efficiently, provides OpenAI-compatible APIs, and automatically optimizes for your hardware. For Windows Users: Open PowerShell or Command Prompt and run: winget install Microsoft.FoundryLocal For macOS Users (Apple Silicon): Open Terminal and run: brew install microsoft/foundrylocal/foundrylocal Verify Installation: Open your terminal and type. This should return the Microsoft Foundry Local version, confirming installation: foundry --version Step 2: Downloading Phi-4-Mini For this tutorial, we'll use Phi-4-mini, the lightweight 3.8 billion parameter version that's perfect for learning and experimentation. Open your terminal and run: foundry model run phi-4-mini You should see your download begin and something similar to the image below Available Phi Models on Foundry Local While we're using phi-4-mini for this guide, Foundry Local offers several Phi model variants and other open-source models optimized for different hardware and use cases: Model Hardware Type Size Best For phi-4-mini GPU chat-completion 3.72 GB Learning, fast responses, resource-constrained environments with GPU phi-4-mini CPU chat-completion 4.80 GB Learning, fast responses, CPU-only systems phi-4-mini-reasoning GPU chat-completion 3.15 GB Reasoning tasks with GPU acceleration phi-4-mini-reasoning CPU chat-completion 4.52 GB Mathematical proofs, logic puzzles with lower resource requirements phi-4 GPU chat-completion 8.37 GB Maximum reasoning performance, complex tasks with GPU phi-4 CPU chat-completion 10.16 GB Maximum reasoning performance, CPU-only systems phi-3.5-mini GPU chat-completion 2.16 GB Most lightweight option with GPU support phi-3.5-mini CPU chat-completion 2.53 GB Most lightweight option, CPU-optimized phi-3-mini-128k GPU chat-completion 2.13 GB Extended context (128k tokens), GPU-optimized phi-3-mini-128k CPU chat-completion 2.54 GB Extended context (128k tokens), CPU-optimized phi-3-mini-4k GPU chat-completion 2.13 GB Standard context (4k tokens), GPU-optimized phi-3-mini-4k CPU chat-completion 2.53 GB Standard context (4k tokens), CPU-optimized Note: Foundry Local automatically selects the best variant for your hardware. If you have an NVIDIA GPU, it will use the GPU-optimized version. Otherwise, it will use the CPU-optimized version. run the command below to see full list of models foundry model list Step 3: Test It Out Once the download completes, an interactive session will begin. Let's test Phi-4-mini's capabilities with a few different prompts: Example 1: Explanation Phi-4-mini provides a thorough, well-structured explanation! It starts with the basic definition, explains the process in biological systems, gives real-world examples (plant cells, human blood cells). The response is detailed yet accessible. Example 2: Mathematical Problem Solving Excellent step-by-step solution! Phi-4-mini breaks down the problem methodically: 1. Distributes on the left side 2. Isolates the variable terms 3. Simplifies progressively 4. Arrives at the final answer: x = 11 The model shows its work clearly, making it easy to follow the logic and ideal for educational purposes Example 3: Code Generation The model provides a concise Python function using string slicing ([::-1]) - the most Pythonic approach to reversing a string. It includes clear documentation with a docstring explaining the function's purpose, provides example usage demonstrating the output, and even explains how the slicing notation works under the hood. The response shows that the model understands not just how to write the code, but why this approach is preferred - noting that the [::-1] slice notation means "start at the end of the string and end at position 0, move with the step -1, negative one, which means one step backwards." This showcases the model's ability to generate production-ready code with proper documentation while being educational about Python idioms. To exit the interactive session, type `/bye` Step 4: Extending Phi-4 with Real-Time Tools Understanding Phi-4's Knowledge Cutoff Like all language models, Phi-4 has a knowledge cutoff date from its training data (typically several months old). This means it won't know about very recent events, current prices, or breaking news. For example, if you ask "Who won the 2024 NBA championship?" it might not have the answer. The good thing is, there's a powerful work-around. While Phi-4 is incredibly capable, connecting it to external tools like web search, databases, or APIs transforms it from a static knowledge base into a dynamic reasoning engine. This is where Microsoft Foundry's REST API comes in. Microsoft Foundry provides a simple API that lets you integrate Phi-4 into Python applications and connect it to real-time data sources. Here's a practical example: building a web-enhanced AI assistant. Web-Enhanced AI Assistant This simple application combines Phi-4's reasoning with real-time web search, allowing it to answer current questions accurately. Prerequisites: pip install foundry-local-sdk requests ddgs Create phi4_web_assistant.py: import requests from foundry_local import FoundryLocalManager from ddgs import DDGS import json def search_web(query): """Search the web and return top results""" try: results = list(DDGS().text(query, max_results=3)) if not results: return "No search results found." search_summary = "\n\n".join([ f"[Source {i+1}] {r['title']}\n{r['body'][:500]}" for i, r in enumerate(results) ]) return search_summary except Exception as e: return f"Search failed: {e}" def ask_phi4(endpoint, model_id, prompt): """Send a prompt to Phi-4 and stream response""" response = requests.post( f"{endpoint}/chat/completions", json={ "model": model_id, "messages": [{"role": "user", "content": prompt}], "stream": True }, stream=True, timeout=180 ) full_response = "" for line in response.iter_lines(): if line: line_text = line.decode('utf-8') if line_text.startswith('data: '): line_text = line_text[6:] # Remove 'data: ' prefix if line_text.strip() == '[DONE]': break try: data = json.loads(line_text) if 'choices' in data and len(data['choices']) > 0: delta = data['choices'][0].get('delta', {}) if 'content' in delta: chunk = delta['content'] print(chunk, end="", flush=True) full_response += chunk except json.JSONDecodeError: continue print() return full_response def web_enhanced_query(question): """Combine web search with Phi-4 reasoning""" # By using an alias, the most suitable model will be downloaded # to your device automatically alias = "phi-4-mini" # Create a FoundryLocalManager instance. This will start the Foundry # Local service if it is not already running and load the specified model. manager = FoundryLocalManager(alias) model_info = manager.get_model_info(alias) print("🔍 Searching the web...\n") search_results = search_web(question) prompt = f"""Here are recent search results: {search_results} Question: {question} Using only the information above, give a clear answer with specific details.""" print("🤖 Phi-4 Answer:\n") return ask_phi4(manager.endpoint, model_info.id, prompt) if __name__ == "__main__": # Try different questions question = "Who won the 2024 NBA championship?" # question = "What is the latest iPhone model released in 2024?" # question = "What is the current price of Bitcoin?" print(f"Question: {question}\n") print("=" * 60 + "\n") web_enhanced_query(question) print("\n" + "=" * 60) Run It: python phi4_web_assistant.py What Makes This Powerful By connecting Phi-4 to external tools, you create an intelligent system that: Accesses Real-Time Information: Get news, weather, sports scores, and breaking developments Verifies Facts: Cross-reference information with multiple sources Extends Capabilities: Connect to databases, APIs, file systems, or any other tool Enables Complex Applications: Build research assistants, customer support bots, educational tutors, and personal assistants This same pattern can be applied to connect Phi-4 to: Databases: Query your company's internal data APIs: Weather services, stock prices, translation services File Systems: Analyze documents and spreadsheets IoT Devices: Control smart home systems The possibilities are endless when you combine local AI reasoning with real-world data access. Troubleshooting Common Issues Service not running: Make sure Foundry Local is properly installed and the service is running. Try restarting with foundry --version to verify installation. Model downloads slowly: Check your internet connection and ensure you have enough disk space (5-10GB per model). Out of memory: Close other applications or try using a smaller model variant like phi-3.5-mini instead of the full phi-4. Connection issues: Verify that no other services are using the same ports. Foundry Local typically runs on http://localhost:5272. Model not found: Run foundry model list to see available models, then use foundry model run <model-name> to download and run a specific model. Your Next Steps with Foundry Local Congratulations! You now have Phi-4 running locally through Microsoft Foundry Local and understand how to extend it with external tools like web search. This combination of local AI reasoning with real-time data access opens up countless possibilities for building intelligent applications. Coming in Future Posts In the coming weeks, we'll explore advanced topics using Hugging Face: Fine-tuning Phi models on your own data for domain-specific applications Phi-4-multimodal: Analyze images, process audio, and combine multiple data types Advanced deployment patterns: RAG systems and multi-agent orchestration Resources to Explore EdgeAI for Beginners Course: Comprehensive 36-45 hour course covering Edge AI fundamentals, optimization, and production deployment Phi-4 Technical Report: Deep dive into architecture and benchmarks Phi Cookbook on GitHub: Practical examples and recipes Foundry Local Documentation: Complete technical documentation and API reference Module 08: Foundry Local Toolkit: 10 comprehensive samples including RAG applications and multi-agent systems Keep experimenting with Foundry Local, and stay tuned as we unlock the full potential of Edge AI! What will you build with Phi-4? Share your ideas and projects in the comments below!September 2025 Recap: What’s New with Azure Database for PostgreSQL
September 2025 Recap for Azure Database for PostgreSQL September was a big month for Azure Postgres! From the public preview of PostgreSQL 18 (launched same day as the community!) to the GA of Azure Confidential Computing and Near Zero Downtime scaling for HA, this update is packed with new capabilities that make PostgreSQL on Azure more secure, performant, and developer-friendly. 💡 Here’s a quick peek at what’s inside: PostgreSQL 18 (Preview) – early access to the latest community release on Azure Near Zero Downtime Scaling (GA) – compute scaling in under 30 seconds for HA servers Azure Confidential Computing (GA) – hardware-backed data-in-use protection PostgreSQL Discovery & Assessment in Azure Migrate (Preview) – plan your migration smarter LlamaIndex Integration – build AI apps and vector search using Azure Postgres VS Code Extension Enhancements – new Server Dashboard + Copilot Chat integration Catch all the highlights and hands-on guides in the full recap 👉 #PostgreSQL #AzureDatabase #AzurePostgres #CloudDatabases #AI #OpenSource #Microsoft49Views0likes0CommentsSurface Joins the Open Device Partnership (ODP)
Surface joins the Open Device Partnership (ODP) to drive innovation in secure, reliable, and future-ready device firmware—leveraging Rust, modern EC implementations, and unified OS-EC interfaces to benefit customers and the broader tech ecosystem.1.4KViews3likes0CommentsPostgreSQL and the Power of Community
PGConf NYC 2025 is the premier event for the global PostgreSQL community, and Microsoft is proud to be a Platinum sponsor this year. The conference will also feature a keynote from Claire Giordano, Principal PM for PostgreSQL at Microsoft, who will share our vision for Postgres along with lessons from ten PostgreSQL hacker journeys.Architecting Secure PostgreSQL on Azure: Insights from Mercedes-Benz
Authors: Johannes Schuetzner, Software Engineer at Mercedes-Benz & Nacho Alonso Portillo, Principal Program Manager at Microsoft When you think of Mercedes-Benz, you think of innovation, precision, and trust. But behind every iconic vehicle and digital experience is a relentless drive for security and operational excellence. At Mercedes-Benz R&D in Sindelfingen, Germany, Johannes Schuetzner and the team faced a challenge familiar to many PostgreSQL users: how to build a secure, scalable, and flexible database architecture in the cloud—without sacrificing agility or developer productivity. This article shares insights from Mercedes-Benz about how Azure Database for PostgreSQL can be leveraged to enhance your security posture, streamline access management, and empower teams to innovate with confidence. The Challenge: Security Without Compromise “OK, let’s stop intrusions in their tracks,” Schuetzner began his POSETTE talk, setting the tone for a deep dive into network security and access management. Many organizations need to protect sensitive data, ensure compliance, and enable secure collaboration across distributed teams. The typical priorities are clear: Encrypt data in transit and at rest Implement row-level security for granular access Integrate with Microsoft Defender for Cloud for threat protection Focus on network security and access management—where configuration can make the biggest impact Building a Secure Network: Private vs. Public Access Mercedes-Benz explored two fundamental ways to set up their network for Azure Database for PostgreSQL: private access and public access. “With private access, your PostgreSQL server is integrated in a virtual network. With public access, it is accessible by everybody on the public internet,” explained Schuetzner. Public Access: Public endpoint, resolvable via DNS Firewall rules control allowed IP ranges Vulnerable to external attacks; traffic travels over public internet Private Access: Server injected into an Azure VNET Traffic travels securely over the Azure backbone Requires delegated subnet and private DNS VNET peering enables cross-region connectivity “One big benefit of private access is that the network traffic travels over the Azure backbone, so not the public internet,” said Schuetzner. This ensures that sensitive data remain protected, even as applications scaled across regions. An Azure VNET is restricted to an Azure region though and peering them may be complex. Embracing Flexibility: The Power of Private Endpoints Last year, Azure introduced private endpoints for PostgreSQL, a significant milestone in Mercedes-Benz’s database connectivity strategy. It adds a network interface to the resource that can also be reached from other Azure regions. This provides the resources in the VNET associated with the private endpoint to connect to the Postgres server. The network traffic travels securely over the Azure backbone. Private endpoints allow Mercedes-Benz to: Dynamically enable and disable public access during migrations Flexibly provision multiple endpoints for different VNETs and regions Have explicit control over the allowed network accesses Have in-built protection from data exfiltration Automate setup with Terraform and infrastructure-as-code This flexibility can be crucial for supporting large architectures and migration scenarios, all while maintaining robust security. Passwordless Authentication: Simplicity Meets Security Managing database passwords is a pain point for every developer. Mercedes-Benz embraced Azure Entra Authentication (formerly Azure Active Directory) to enable passwordless connections. Passwordless connections do not rely on traditional passwords but are based on more secure authentication methods of Azure Entra. They require less administrational efforts and prevent security breaches. Benefits include: Uniform user management across Azure resources Group-based access control Passwordless authentication for applications and CI/CD pipelines For developers, this means less manual overhead and fewer risks of password leaks. “Once you have set it up, then Azure takes good care of all the details, you don’t have to manage your passwords anymore, also they cannot be leaked anymore accidentally because you don’t have a password,” Schuetzner emphasized. Principle of Least Privilege: Granular Authorization Mercedes-Benz appreciates the principle of least privilege, ensuring applications have only the permissions they need—nothing more. By correlating managed identities with specific roles in PostgreSQL, teams can grant only necessary Data Manipulation Language (DML) permissions (select, insert, update), while restricting Data Definition Language (DDL) operations. This approach minimizes risk and simplifies compliance. Operational Excellence: Automation and Troubleshooting Automation is key to Mercedes-Benz’s success. Using Terraform and integrated in CI/CD pipelines, the team can provision identities, configure endpoints, and manage permissions—all as code. For troubleshooting, tools like Azure Bastion enable secure, temporary access to the database for diagnostics, without exposing sensitive endpoints. The Impact: Security, Agility, and Developer Empowerment By leveraging Azure Database for PostgreSQL, Mercedes-Benz can achieve: Stronger security through private networking and passwordless authentication Flexible, scalable architecture for global operations Streamlined access management and compliance Empowered developers to focus on innovation, not infrastructure Schuetzner concluded, “Private endpoints provide a new network opportunity for Postgres on Azure. There are additional costs, but it’s more flexible and more dynamic. Azure takes good care of all the details, so you don’t have to manage your passwords anymore. It’s basically the ultimate solution for password management.” Mercedes-Benz’s story shows that with the right tools and mindset, you can build secure and scalable solutions on Azure Database for PostgreSQL. For more details, refer to the full POSETTE session.Model Mondays S2E13: Open Source Models (Hugging Face)
1. Weekly Highlights 1. Weekly Highlights Here are the key updates we covered in the Season 2 finale: O1 Mini Reinforcement Fine-Tuning (GA): Fine-tune models with as few as ~100 samples using built-in Python code graders. Azure Live Interpreter API (Preview): Real-time speech-to-speech translation supporting 76 input languages and 143 locales with near human-level latency. Agent Factory – Part 5: Connecting agents using open standards like MCP (Model Context Protocol) and A2A (Agent-to-Agent protocol). Ask Ralph by Ralph Lauren: A retail example of agentic AI for conversational styling assistance, built on Azure OpenAI and Foundry’s agentic toolset. VS Code August Release: Brings auto-model selection, stronger safety guards for sensitive edits, and improved agent workflows through new agents.md support. 2. Spotlight – Open Source Models in Azure AI Foundry Guest: Jeff Boudier, VP of Product at Hugging Face Jeff showcased the deep integration between the Hugging Face community and Azure AI Foundry, where developers can access over 10 000 open-source models across multiple modalities—LLMs, speech recognition, computer vision, and even specialized domains like protein modeling and robotics. Demo Highlights Discover models through Azure AI Foundry’s task-based catalog filters. Deploy directly from Hugging Face Hub to Azure with one-click deployment. Explore Use Cases such as multilingual speech recognition and vision-language-action models for robotics. Jeff also highlighted notable models, including: SmoLM3 – a 3 B-parameter model with hybrid reasoning capabilities Qwen 3 Coder – a mixture-of-experts model optimized for coding tasks Parakeet ASR – multilingual speech recognition Microsoft Research protein-modeling collection MAGMA – a vision-language-action model for robotics Integration extends beyond deployment to programmatic access through the Azure CLI and Python SDKs, plus local development via new VS Code extensions. 3. Customer Story – DraftWise (BUILD 2025 Segment) The finale featured a customer spotlight on DraftWise, where CEO James Ding shared how the company accelerates contract drafting with Azure AI Foundry. Problem Legal contract drafting is time-consuming and error-prone. Solution DraftWise uses Azure AI Foundry to fine-tune Hugging Face language models on legal data, generating contract drafts and redline suggestions. Impact Faster drafting cycles and higher consistency Easy model management and deployment with Foundry’s secure workflows Transparent evaluation for legal compliance 4. Community Story – Hugging Face & Microsoft The episode also celebrated the ongoing collaboration between Hugging Face and Microsoft and the impact of open-source AI on the global developer ecosystem. Community Benefits Access to State-of-the-Art Models without licensing barriers Transparent Performance through public leaderboards and benchmarks Rapid Innovation as improvements and bug fixes spread quickly Education & Empowerment via tutorials, docs, and active forums Responsible AI Practices encouraged through community oversight 5. Key Takeaways Open Source AI Is Here to Stay Azure AI Foundry and Hugging Face make deploying, fine-tuning, and benchmarking open models easier than ever. Community Drives Innovation: Collaboration accelerates progress, improves transparency, and makes AI accessible to everyone. Responsible AI and Transparency: Open-source models come with clear documentation, licensing, and community-driven best practices. Easy Deployment & Customization: Azure AI Foundry lets you deploy, automate, and customize open models from a single, unified platform. Learn, Build, Share: The open-model ecosystem is a great place for students, developers, and researchers to learn, build, and share their work. Sharda's Tips: How I Wrote This Blog For this final recap, I focused on capturing the energy of the open source AI movement and the practical impact of Hugging Face and Azure AI Foundry collaboration. I watched the livestream, took notes on the demos and interviews, and linked directly to official resources for models, docs, and community sites. Here’s my Copilot prompt for this episode: "Generate a technical blog post for Model Mondays S2E13 based on the transcript and episode details. Focus on open source models, Hugging Face, Azure AI Foundry, and community workflows. Include practical links and actionable insights for developers and students! Learn & Connect Explore Open Models in Azure AI Foundry Hugging Face Leaderboard Responsible AI in Azure Machine Learning Llama-3 by Meta Hugging Face Community Azure AI Documentation About Model Mondays Model Mondays is your weekly Azure AI learning series: 5-Minute Highlights: Latest AI news and product updates 15-Minute Spotlight: Demos and deep dives with product teams 30-Minute AMA Fridays: Ask anything in Discord or the forum Start building: Watch Past Replays Register For AMA Recap Past AMAs Join The Community Don’t build alone! The Azure AI Developer Community is here for real-time chats, events, and support: Join the Discord Explore the Forum About Me I'm Sharda, a Gold Microsoft Learn Student Ambassador focused on cloud and AI. Find me on GitHub, Dev.to, Tech Community, and LinkedIn. In this blog series, I share takeaways from each week’s Model Mondays livestream.193Views0likes0CommentsAugust 2025 Recap: Azure Database for PostgreSQL
Here’s what’s new this month to help you build smarter and scale securely: Advisor performance tuning (GA): New insights on index scans, logging, stats, and connections Entra ID group login (Preview): Let users sign in with their own credentials (no need for login using group-ID). New region – Austria East: Lower latency + data residency options for Central Europe LangChain & LangGraph support: Use Azure PostgreSQL as a vector store for AI agents Active-active replication guide: Step-by-step walkthrough using pglogical Full details in monthly recap: https://techcommunity.microsoft.com/blog/adforpostgresql/august-2025-recap-azure-database-for-postgresql/4450527