phi-3

35 Topics

Edge AI for Beginners : Getting Started with Foundry Local
In Module 08 of the EdgeAI for Beginners course, Microsoft introduces Foundry Local a toolkit that helps you deploy and test Small Language Models (SLMs) completely offline. In this blog, I’ll share how I installed Foundry Local, ran the Phi-3.5-mini model on my windows laptop, and what I learned through the process. What Is Foundry Local? Foundry Local allows developers to run AI models locally on their own hardware. It supports text generation, summarization, and code completion — all without sending data to the cloud. Unlike cloud-based systems, everything happens on your computer, so your data never leaves your device. Prerequisites Before starting, make sure you have: Windows 10 or 11 Python 3.10 or newer Git Internet connection (for the first-time model download) Foundry Local installed Step 1 — Verify Installation After installing Foundry Local, open Command Prompt and type: foundry --version If you see a version number, Foundry Local is installed correctly. Step 2 — Start the Service Start the Foundry Local service using: foundry service start You should see a confirmation message that the service is running. Step 3 — List Available Models To view the models supported by your system, run: foundry model list You’ll get a list of locally available SLMs. Here’s what I saw on my machine: Note: Model availability depends on your device’s hardware. For most laptops, phi-3.5-mini works smoothly on CPU. Step 4 — Run the Phi-3.5 Model Now let’s start chatting with the model: foundry model run phi-3.5-mini-instruct-generic-cpu:1 Once it loads, you’ll enter an interactive chat mode. Try a simple prompt: Hello! What can you do? The model replies instantly — right from your laptop, no cloud needed. To exit, type: /exit How It Works Foundry Local loads the model weights from your device and performs inference locally.This means text generation happens using your CPU (or GPU, if available). The result: complete privacy, no internet dependency, and instant responses. Benefits for Students For students beginning their journey in AI, Foundry Local offers several key advantages: No need for high-end GPUs or expensive cloud subscriptions. Easy setup for experimenting with multiple models. Perfect for class assignments, AI workshops, and offline learning sessions. Promotes a deeper understanding of model behavior by allowing step-by-step local interaction. These factors make Foundry Local a practical choice for learning environments, especially in universities and research institutions where accessibility and affordability are important. Why Use Foundry Local Running models locally offers several practical benefits compared to using AI Foundry in the cloud. With Foundry Local, you do not need an internet connection, and all computations happen on your personal machine. This makes it faster for small models and more private since your data never leaves your device. In contrast, AI Foundry runs entirely on the cloud, requiring internet access and charging based on usage. For students and developers, Foundry Local is ideal for quick experiments, offline testing, and understanding how models behave in real-time. On the other hand, AI Foundry is better suited for large-scale or production-level scenarios where models need to be deployed at scale. In summary, Foundry Local provides a flexible and affordable environment for hands-on learning, especially when working with smaller models such as Phi-3, Qwen2.5, or TinyLlama. It allows you to experiment freely, learn efficiently, and better understand the fundamentals of Edge AI development. Optional: Restart Later Next time you open your laptop, you don’t have to reinstall anything. Just run these two commands again: foundry service start foundry model run phi-3.5-mini-instruct-generic-cpu:1 What I Learned Following the EdgeAI for Beginners Study Guide helped me understand: How edge AI applications work How small models like Phi 3.5 can run on a local machine How to test prompts and build chat apps with zero cloud usage Conclusion Running the Phi-3.5-mini model locally with Foundry Localgave me hands-on insight into edge AI. It’s an easy, private, and cost-free way to explore generative AI development. If you’re new to Edge AI, start with the EdgeAI for Beginners course and follow its Study Guide to get comfortable with local inference and small language models. Resources: EdgeAI for Beginners GitHub Repo Foundry Local Official Site Phi Model Link
Sharda_Kaur
Oct 30, 2025 Place Educator Developer Blog
296Views
1like
0Comments
Transform Your AI Applications with Local LLM Deployment
Introduction Are you tired of watching your AI application costs spiral out of control every time your user base grows? As AI Engineers and Developers, we've all felt the pain of cloud-dependent LLM deployments. Every API call adds up, latency becomes a bottleneck in real-time applications, and sensitive data must leave your infrastructure to get processed. Meanwhile, your users demand faster responses, better privacy, and more reliable service. What if there was a way to run powerful language models directly on your users' devices or your local infrastructure? Enter the world of Edge AI deployment with Microsoft's Foundry Local a game-changing approach that brings enterprise-grade LLM capabilities to local hardware while maintaining full OpenAI API compatibility. The Edge AI for Beginners https://aka.ms/edgeai-for-beginners curriculum provides AI Engineers and Developers with comprehensive, hands-on training to master local LLM deployment. This isn't just another theoretical course, it's a practical guide that will transform how you think about AI infrastructure, combining cutting-edge local deployment techniques with production-ready implementation patterns. In this post, we'll explore why Edge AI deployment represents the future of AI applications, dive deep into Foundry Local's capabilities across multiple frameworks, and show you exactly how to implement local LLM solutions that deliver both technical excellence and significant business value. Why Edge AI Deployment Changes Everything for Developers The shift from cloud-dependent to edge-deployed AI represents more than just a technical evolution, it's a fundamental reimagining of how we build intelligent applications. As AI Engineers, we're witnessing a transformation that addresses the most pressing challenges in modern AI deployment while opening up entirely new possibilities for innovation. Consider the current state of cloud-based LLM deployment. Every user interaction requires a round-trip to external servers, introducing latency that can kill user experience in real-time applications. Costs scale linearly (or worse) with usage, making successful applications expensive to operate. Sensitive data must traverse networks and live temporarily in external systems, creating compliance nightmares for enterprise applications. Edge AI deployment fundamentally changes this equation. By running models locally, we achieve several critical advantages: Data Sovereignty and Privacy Protection: Your sensitive data never leaves your infrastructure. For healthcare applications processing patient records, financial services handling transactions, or enterprise tools managing proprietary information, this represents a quantum leap in security posture. You maintain complete control over data flow, meeting even the strictest compliance requirements without architectural compromises. Real-Time Performance at Scale: Local inference eliminates network latency entirely. Instead of 200-500ms round-trips to cloud APIs, you get sub-10ms response times. This enables entirely new categories of applications—real-time code completion, interactive AI tutoring systems, voice assistants that respond instantly, and IoT devices that make intelligent decisions without connectivity. Predictable Cost Structure: Transform variable API costs into fixed infrastructure investments. Instead of paying per-token for potentially unlimited usage, you invest in local hardware that serves unlimited requests. This makes ROI calculations straightforward and removes the fear of viral success destroying your margins. Offline Capabilities and Resilience: Local deployment means your AI features work even when connectivity fails. Mobile applications can provide intelligent features in areas with poor network coverage. Critical systems maintain AI capabilities during network outages. Edge devices in remote locations operate autonomously. The technical implications extend beyond these obvious benefits. Local deployment enables new architectural patterns: AI-powered applications that work entirely client-side, edge computing nodes that make intelligent routing decisions, and distributed systems where intelligence lives close to data sources. Foundry Local: Multi-Framework Edge AI Deployment Made Simple Microsoft's Foundry Local https://www.foundrylocal.ai represents a breakthrough in local AI deployment, designed specifically for developers who need production-ready edge AI solutions. Unlike single-framework tools, Foundry Local provides a unified platform that works seamlessly across multiple programming languages and deployment scenarios while maintaining full compatibility with existing OpenAI-based workflows. The platform's approach to multi-framework support means you're not locked into a single technology stack. Whether you're building TypeScript applications, Python ML pipelines, Rust systems programming projects, or .NET enterprise applications, Foundry Local provides native SDKs and consistent APIs that integrate naturally with your existing codebase. Enterprise-Grade Model Catalog: Foundry Local comes with a curated selection of production-ready models optimized for edge deployment. The `phi-3.5-mini` model delivers impressive performance in a compact footprint, perfect for resource-constrained environments. For applications requiring more sophisticated reasoning, `qwen2.5-0.5b` provides enhanced capabilities while maintaining efficiency. When you need maximum capability and have sufficient hardware resources, `gpt-oss-20b` offers state-of-the-art performance with full local control. Intelligent Hardware Optimization: One of Foundry Local's most powerful features is its automatic hardware detection and optimization. The platform automatically identifies your available compute resources, NVIDIA CUDA GPUs, AMD GPUs, Intel NPUs, Qualcomm Snapdragon NPUs, or CPU-only environments and downloads the most appropriate model variant. This means the same application code delivers optimal performance across diverse hardware configurations without manual intervention. ONNX Runtime Acceleration: Under the hood, Foundry Local leverages Microsoft's ONNX Runtime for maximum performance. This provides significant advantages over generic inference engines, delivering optimized execution paths for different hardware architectures while maintaining model accuracy and compatibility. OpenAI SDK Compatibility: Perhaps most importantly for developers, Foundry Local maintains complete API compatibility with the OpenAI SDK. This means existing applications can migrate to local inference by changing only the endpoint configuration—no rewriting of application logic, no learning new APIs, no disruption to existing workflows. The platform handles the complex aspects of local AI deployment automatically: model downloading, hardware-specific optimization, memory management, and inference scheduling. This allows developers to focus on building intelligent applications rather than managing AI infrastructure. Framework-Agnostic Benefits: Foundry Local's multi-framework approach delivers consistent benefits regardless of your technology choices. Whether you're working in a Node.js microservices architecture, a Python data science environment, a Rust embedded system, or a C# enterprise application, you get the same advantages: reduced latency, eliminated API costs, enhanced privacy, and offline capabilities. This universal compatibility means teams can adopt edge AI deployment incrementally, starting with pilot projects in their preferred language and expanding across their technology stack as they see results. The learning curve is minimal because the API patterns remain familiar while the underlying infrastructure transforms to local deployment. Implementing Edge AI: From Code to Production Moving from cloud APIs to local AI deployment requires understanding the implementation patterns that make edge AI both powerful and practical. Let's explore how Foundry Local's SDKs enable seamless integration across different development environments, with real-world code examples that you can adapt for your production systems. Python Implementation for Data Science and ML Pipelines Python developers will find Foundry Local's integration particularly natural, especially in data science and machine learning contexts where local processing is often preferred for security and performance reasons. import openai from foundry_local import FoundryLocalManager # Initialize with automatic hardware optimization alias = "phi-3.5-mini" manager = FoundryLocalManager(alias) This simple initialization handles a remarkable amount of complexity automatically. The `FoundryLocalManager` detects your hardware configuration, downloads the most appropriate model variant for your system, and starts the local inference service. Behind the scenes, it's making intelligent decisions about memory allocation, selecting optimal execution providers, and preparing the model for efficient inference. # Configure OpenAI client for local deployment client = openai.OpenAI( base_url=manager.endpoint, api_key=manager.api_key # Not required for local, but maintains API compatibility ) # Production-ready inference with streaming def analyze_document(content: str): stream = client.chat.completions.create( model=manager.get_model_info(alias).id, messages=[{ "role": "system", "content": "You are an expert document analyzer. Provide structured analysis." }, { "role": "user", "content": f"Analyze this document: {content}" }], stream=True, temperature=0.7 ) result = "" for chunk in stream: if chunk.choices[0].delta.content: content_piece = chunk.choices[0].delta.content result += content_piece yield content_piece # Enable real-time UI updates return result Key implementation benefits here: • Automatic model management: The `FoundryLocalManager` handles model lifecycle, memory optimization, and hardware-specific acceleration without manual configuration. • Streaming interface compatibility: Maintains the familiar OpenAI streaming API while processing locally, enabling real-time user interfaces with zero latency overhead. • Production error handling: The manager includes built-in retry logic, graceful degradation, and resource management for reliable production deployment. JavaScript/TypeScript Implementation for Web Applications JavaScript and TypeScript developers can integrate local AI capabilities directly into web applications, enabling entirely new categories of client-side intelligent features. import { OpenAI } from "openai"; import { FoundryLocalManager } from "foundry-local-sdk"; class LocalAIService { constructor() { this.foundryManager = null; this.openaiClient = null; this.isInitialized = false; } async initialize(modelAlias = "phi-3.5-mini") { this.foundryManager = new FoundryLocalManager(); const modelInfo = await this.foundryManager.init(modelAlias); this.openaiClient = new OpenAI({ baseURL: this.foundryManager.endpoint, apiKey: this.foundryManager.apiKey, }); this.isInitialized = true; return modelInfo; } The initialization pattern establishes local AI capabilities with full error handling and resource management. This enables web applications to provide AI features without external API dependencies. async generateCodeCompletion(codeContext, userPrompt) { if (!this.isInitialized) { throw new Error("LocalAI service not initialized"); } try { const completion = await this.openaiClient.chat.completions.create({ model: this.foundryManager.getModelInfo().id, messages: [ { role: "system", content: "You are a code completion assistant. Provide accurate, efficient code suggestions." }, { role: "user", content: `Context: ${codeContext}\n\nComplete: ${userPrompt}` } ], max_tokens: 150, temperature: 0.2 }); return completion.choices[0].message.content; } catch (error) { console.error("Local AI completion failed:", error); throw new Error("Code completion unavailable"); } } } Implementation advantages for web applications • Zero-dependency AI features: Applications work entirely offline once models are downloaded, enabling AI capabilities in disconnected environments. • Instant response times: Eliminate network latency for real-time features like code completion, content generation, or intelligent search. • Client-side privacy: Sensitive code or content never leaves the user's device, meeting strict security requirements for enterprise development tools. Cross-Platform Production Deployment Patterns Both Python and JavaScript implementations share common production deployment patterns that make Foundry Local particularly suitable for enterprise applications: Automatic Hardware Optimization: The platform automatically detects and utilizes available acceleration hardware. On systems with NVIDIA GPUs, it leverages CUDA acceleration. On newer Intel systems, it uses NPU acceleration. On ARM-based systems like Apple Silicon or Qualcomm Snapdragon, it optimizes for those architectures. This means the same application code delivers optimal performance across diverse deployment environments. Graceful Resource Management: Foundry Local includes sophisticated memory management and resource allocation. Models are loaded efficiently, memory is recycled properly, and concurrent requests are handled intelligently to maintain system stability under load. Production Monitoring Integration: The platform provides comprehensive metrics and logging that integrate naturally with existing monitoring systems, enabling production observability for AI workloads running at the edge. These implementation patterns demonstrate how Foundry Local transforms edge AI from an experimental concept into a practical, production-ready deployment strategy that works consistently across different technology stacks and hardware environments. Measuring Success: Technical Performance and Business Impact The transition to edge AI deployment delivers measurable improvements across both technical and business metrics. Understanding these impacts helps justify the architectural shift and demonstrates the concrete value of local LLM deployment in production environments. Technical Performance Gains Latency Elimination: The most immediately visible benefit is the dramatic reduction in response times. Cloud API calls typically require 200-800ms round-trips, depending on geographic location and network conditions. Local inference with Foundry Local reduces this to sub-10ms response times—a 95-99% improvement that fundamentally changes user experience possibilities. Consider a code completion feature: cloud-based completion feels sluggish and interrupts developer flow, while local completion provides instant suggestions that enhance productivity. The same applies to real-time chat applications, interactive AI tutoring systems, and any application where response latency directly impacts usability. Automatic Hardware Utilization: Foundry Local's intelligent hardware detection and optimization delivers significant performance improvements without manual configuration. On systems with NVIDIA RTX 4000 series GPUs, inference speeds can be 10-50x faster than CPU-only processing. On newer Intel systems with NPUs, the platform automatically leverages neural processing units for efficient AI workloads. Apple Silicon systems benefit from Metal Performance Shaders optimization, delivering excellent performance per watt. ONNX Runtime Optimization: Microsoft's ONNX Runtime provides substantial performance advantages over generic inference engines. In benchmark testing, ONNX Runtime consistently delivers 2-5x performance improvements compared to standard PyTorch or TensorFlow inference, while maintaining full model accuracy and compatibility. Scalability Characteristics: Local deployment transforms scaling economics entirely. Instead of linear cost scaling with usage, you get horizontal scaling through hardware deployment. A single modern GPU can handle hundreds of concurrent inference requests, making per-request costs approach zero for high-volume applications. Business Impact Analysis Cost Structure Transformation: The financial implications of local deployment are profound. Consider an application processing 1 million tokens daily through OpenAI's API—this represents $20-60 in daily costs depending on the model. Over a year, this becomes $7,300-21,900 in recurring expenses. A comparable local deployment might require a $2,000-5,000 hardware investment with no ongoing API costs. For high-volume applications, the savings become dramatic. Applications processing 100 million tokens monthly face $60,000-180,000 annual API costs. Local deployment with appropriate hardware infrastructure could reduce this to electricity and maintenance costs—typically under $10,000 annually for equivalent processing capacity. Enhanced Privacy and Compliance: Local deployment eliminates data sovereignty concerns entirely. Healthcare applications processing patient records, financial services handling transaction data, and enterprise tools managing proprietary information can deploy AI capabilities without data leaving their infrastructure. This simplifies compliance with GDPR, HIPAA, SOX, and other regulatory frameworks while reducing legal and security risks. Operational Resilience: Local deployment provides significant business continuity advantages. Applications continue functioning during network outages, API service disruptions, or third-party provider issues. For mission-critical systems, this resilience can prevent costly downtime and maintain user productivity during external service failures. Development Velocity: Local deployment accelerates development cycles by eliminating API rate limits, usage quotas, and external dependencies during development and testing. Developers can iterate freely, run comprehensive test suites, and experiment with AI features without cost concerns or rate limiting delays. Enterprise Adoption Metrics Real-world enterprise deployments demonstrate measurable business value: Local Usage: Foundry Local for internal AI-powered tools, reporting 60-80% reduction in AI-related operational costs while improving developer productivity through instant AI responses in development environments. Manufacturing Applications: Industrial IoT deployments using edge AI for predictive maintenance show 40-60% reduction in unplanned downtime while eliminating cloud connectivity requirements in remote facilities. Financial Services: Trading firms deploying local LLMs for market analysis report sub-millisecond decision latencies while maintaining complete data isolation for competitive advantage and regulatory compliance. ROI Calculation Framework For AI Engineers evaluating edge deployment, consider these quantifiable factors: Direct Cost Savings: Compare monthly API costs against hardware amortization over 24-36 months. Most applications with >$1,000 monthly API costs achieve positive ROI within 12-18 months. Performance Value: Quantify the business impact of reduced latency. For customer-facing applications, each 100ms of latency reduction typically correlates with 1-3% conversion improvement. Risk Mitigation: Calculate the cost of downtime or compliance violations prevented by local deployment. For many enterprise applications, avoiding a single significant outage justifies the infrastructure investment. Development Efficiency: Measure developer productivity improvements from unlimited local AI access during development. Teams report 20-40% faster iteration cycles when AI features can be tested without external dependencies. These metrics demonstrate that edge AI deployment with Foundry Local delivers both immediate technical improvements and substantial long-term business value, making it a strategic investment in AI infrastructure that pays dividends across multiple dimensions. Your Edge AI Journey Starts Here The shift to edge AI represents more than just a technical evolution, it's an opportunity to fundamentally improve your applications while building valuable expertise in an emerging field. Whether you're looking to reduce costs, improve performance, or enhance privacy, the path forward involves both learning new concepts and connecting with a community of practitioners solving similar challenges. Master Edge AI with Comprehensive Training The Edge AI for Beginners https://aka.ms/edgeai-for-beginners curriculum provides the complete foundation you need to become proficient in local AI deployment. This isn't a superficial overview, it's a comprehensive, hands-on program designed specifically for developers who want to build production-ready edge AI applications. The curriculum takes you through hours of structured learning, progressing from fundamental concepts to advanced deployment scenarios. You'll start by understanding the principles of edge AI and local inference, then dive deep into practical implementation with Foundry Local across multiple programming languages. The program includes working examples and comprehensive sample applications that demonstrate real-world use cases. What sets this curriculum apart is its practical focus. Instead of theoretical discussions, you'll build actual applications: document analysis systems that work offline, real-time code completion tools, intelligent chatbots that protect user privacy, and IoT applications that make decisions locally. Each project teaches both the technical implementation and the architectural thinking needed for successful edge AI deployment. The curriculum covers multi-framework deployment patterns extensively, ensuring you can apply edge AI principles regardless of your preferred development stack. Whether you're working in Python data science environments, JavaScript web applications, C# enterprise systems, or Rust embedded projects, you'll learn the patterns and practices that make edge AI successful. Join a Community of AI Engineers Learning edge AI doesn't happen in isolation, it requires connection with other developers who are solving similar challenges and discovering new possibilities. The Foundry Local Discord community https://aka.ms/foundry-local-discord provides exactly this environment, connecting AI Engineers and Developers from around the world who are implementing local AI solutions. This community serves multiple crucial functions for your development as an edge AI practitioner. You'll find experienced developers sharing implementation patterns they've discovered, debugging complex deployment issues collaboratively, and discussing the architectural decisions that make edge AI successful in production environments. The Discord community includes dedicated channels for different programming languages, specific deployment scenarios, and technical discussions about optimization and performance. Whether you're implementing your first local AI feature or optimizing a complex multi-model deployment, you'll find peers and experts ready to help problem-solve and share insights. Beyond technical support, the community provides valuable career and business insights. Members share their experiences with edge AI adoption in different industries, discuss the business cases that have proven most successful, and collaborate on open-source projects that advance the entire ecosystem. Share Your Experience and Build Expertise One of the most effective ways to solidify your edge AI expertise is by sharing your implementation experiences with the community. As you build applications with Foundry Local and deploy edge AI solutions, documenting your process and sharing your learnings provides value both to others and to your own professional development. Consider sharing your deployment stories, whether they're successes or challenges you've overcome. The community benefits from real-world case studies that show how edge AI performs in different environments and use cases. Your experience implementing local AI in a healthcare application, financial services system, or manufacturing environment provides valuable insights that others can build upon. Technical contributions are equally valuable, whether it's sharing configuration patterns you've discovered, performance optimizations you've implemented, or integration approaches you've developed for specific frameworks or libraries. The edge AI field is evolving rapidly, and practical contributions from working developers drive much of the innovation. Sharing your work also builds your professional reputation as an edge AI expert. As organizations increasingly adopt local AI deployment strategies, developers with proven experience in this area become valuable resources for their teams and the broader industry. The combination of structured learning through the Edge AI curriculum, active participation in the community, and sharing your practical experiences creates a comprehensive path to edge AI expertise that serves both your immediate project needs and your long-term career development as AI deployment patterns continue evolving. Key Takeaways Local LLM deployment transforms application economics: Replace variable API costs with fixed infrastructure investments that scale to unlimited usage, typically achieving ROI within 12-18 months for applications with significant AI workloads. Foundry Local enables multi-framework edge AI: Consistent deployment patterns across Python, JavaScript, C#, and Rust environments with automatic hardware optimization and OpenAI API compatibility. Performance improvements are dramatic and measurable: Sub-10ms response times replace 200-800ms cloud API latency, while automatic hardware acceleration delivers 2-50x performance improvements depending on available compute resources. Privacy and compliance become architectural advantages: Local deployment eliminates data sovereignty concerns, simplifies regulatory compliance, and provides complete control over sensitive information processing. Edge AI expertise is a strategic career investment: As organizations increasingly adopt local AI deployment, developers with hands-on edge AI experience become valuable technical resources with unique skills in an emerging field. Conclusion Edge AI deployment represents the next evolution in intelligent application development, transforming both the technical possibilities and economic models of AI-powered systems. With Foundry Local and the comprehensive Edge AI for Beginners curriculum, you have access to production-ready tools and expert guidance to make this transition successfully. The path forward is clear: start with the Edge AI for Beginners curriculum to build solid foundations, connect with the Foundry Local Discord community to learn from practicing developers, and begin implementing local AI solutions in your projects. Each step builds valuable expertise while delivering immediate improvements to your applications. As cloud costs continue rising and privacy requirements become more stringent, organizations will increasingly rely on developers who can implement local AI solutions effectively. Your early adoption of edge AI deployment patterns positions you at the forefront of this technological shift, with skills that will become increasingly valuable as the industry evolves. The future of AI deployment is local, private, and performance-optimized. Start building that future today. Resources Edge AI for Beginners Curriculum: Comprehensive training with 36-45 hours of hands-on content examples, and production-ready deployment patterns https://aka.ms/edgeai-for-beginners Foundry Local GitHub Repository: Official documentation, samples, and community contributions for local AI deployment https://github.com/microsoft/foundry_local Foundry Local Discord Community: Connect with AI Engineers and Developers implementing edge AI solutions worldwide https://aka.ms/foundry/discord Foundry Local Documentation: Complete technical documentation and API references Foundry Local documentation | Microsoft Learn Foundry Local Model Catalog: Browse available models and deployment options for different hardware configurations Foundry Local Models - Browse AI Models
Lee_Stott
Oct 23, 2025 Place Microsoft Developer Community Blog
635Views
0likes
1Comment
The Future of AI: Exploring Multi-Agent AI Systems
Image generated by Microsoft Designer
Marco_Casalaina
Oct 03, 2025 Place Microsoft Foundry Blog
25KViews
4likes
2Comments
The Future of AI: Unlocking the Power of Azure AI with the Book of AI
Image generated by Microsoft Designer
robchambers
Oct 03, 2025 Place Microsoft Foundry Blog
16KViews
10likes
1Comment
Essential Microsoft Resources for MVPs & the Tech Community from the AI Tour
Unlock the power of Microsoft AI with redeliverable technical presentations, hands-on workshops, and open-source curriculum from the Microsoft AI Tour! Whether you’re a Microsoft MVP, Developer, or IT Professional, these expertly crafted resources empower you to teach, train, and lead AI adoption in your community. Explore top breakout sessions covering GitHub Copilot, Azure AI, Generative AI, and security best practices—designed to simplify AI integration and accelerate digital transformation. Dive into interactive workshops that provide real-world applications of AI technologies. Take it a step further with Microsoft’s Open-Source AI Curriculum, offering beginner-friendly courses on AI, Machine Learning, Data Science, Cybersecurity, and GitHub Copilot—perfect for upskilling teams and fostering innovation. Don’t just learn—lead. Access these resources, host impactful training sessions, and drive AI adoption in your organization. Start sharing today! Explore now: Microsoft AI Tour Resources.
AnthonyBartolo
Oct 02, 2025 Place Microsoft Developer Community Blog
1.3KViews
5likes
2Comments
Join Us for a Technical Deep Dive and Q&A on Foundry Local - LLMs on device
Join us for an Ask Me Anything with the Foundry Local team on October 14th, 2025! Discover how Foundry Local is redefining edge AI with powerful features like on-device inference, enabling you to run models directly on your hardware, cutting costs and keeping your data secure. Whether you're customizing models to fit unique use cases or integrating seamlessly via SDKs, APIs, or CLI, Foundry Local offers scalable pathways to Azure AI Foundry as your needs evolve. It's the perfect solution for environments with limited connectivity, sensitive data requirements, low-latency demands, or early-stage experimentation before cloud deployment. If you're building smarter, leaner, and more private AI workflows, this AMA is your chance to dive deep with the team behind it all. What is Foundry Local? Foundry Local is a set of development tools designed to help you build and evaluate LLM applications on your local machine. It provides a curated collection of production-quality tools, including evaluation and prompt engineering capabilities, that are fully compatible with Azure AI. This allows for a seamless transition of your work from your local environment to the cloud. Don't miss this opportunity to connect with our experts and enhance your understanding of local LLM development. Foundry Local is an on-device AI inference solution offering performance, privacy, customization, and cost advantages. It integrates seamlessly into your existing workflows and applications through an intuitive CLI, SDK, and REST API. Key features On-Device Inference: Run models locally on your own hardware, reducing your costs while keeping all your data on your device. Model Customization: Select from preset models or use your own to meet specific requirements and use cases. Cost Efficiency: Eliminate recurring cloud service costs by using your existing hardware, making AI more accessible. Seamless Integration: Connect with your applications through an SDK, API endpoints, or the CLI, with easy scaling to Azure AI Foundry as your needs grow. How to Join: Register to Join the Azure AI Foundry Discord Community Event 14th Oct 2025 9am Pacific Time UTC−08:00 Unlock Accelerated Local LLM Development Discover how Foundry Local can enhance your development process and explore the possibilities for building robust LLM applications. Whether you're a seasoned AI developer or just getting started, this session is your chance to get hands-on insights into the innovative world of Azure AI Foundry. Event Highlights: An in-depth overview of the Foundry Local CLI and SDK. Interactive demo with step-by-step examples. Best practices for local AI Inference and models Transitioning your local development to cloud solutions or vice-versa Why Attend? Gain expert insights into Foundry Local, and ask questions about using Foundry Local Network with fellow AI professionals and developers in the Azure AI Foundry community. Enhance your AI development skills with practical examples. Stay at the forefront of LLM application development. Speakers Product Manager Foundry Local Maanav Dalal Product Manager |Foundry Local Microsoft Maanav Dalal is a PM on the AI Frameworks team. He's super inquisitive about the ways you use AI in daily life, so be encouraged to strike up a conversation with him about that. LinkedIn Profile
Lee_Stott
Sep 25, 2025 Place Microsoft Developer Community Blog
247Views
0likes
0Comments
Introducing Azure AI Travel Agents: A Flagship MCP-Powered Sample for AI Travel Solutions
We are excited to introduce AI Travel Agents, a sample application with enterprise functionality that demonstrates how developers can coordinate multiple AI agents (written in multiple languages) to explore travel planning scenarios. It's built with LlamaIndex.TS for agent orchestration, Model Context Protocol (MCP) for structured tool interactions, and Azure Container Apps for scalable deployment. TL;DR: Experience the power of MCP and Azure Container Apps with The AI Travel Agents! Try out live demo locally on your computer for free to see real-time agent collaboration in action. Share your feedback on our community forum. We’re already planning enhancements, like new MCP-integrated agents, enabling secure communication between the AI agents and MCP servers and more. NOTE: This example uses mock data and is intended for demonstration purposes rather than production use. The Challenge: Scaling Personalized Travel Planning Travel agencies grapple with complex tasks: analyzing diverse customer needs, recommending destinations, and crafting itineraries, all while integrating real-time data like trending spots or logistics. Traditional systems falter with latency, scalability, and coordination, leading to delays and frustrated clients. The AI Travel Agents tackles these issues with a technical trifecta: LlamaIndex.TS orchestrates six AI agents for efficient task handling. MCP equips agents with travel-specific data and tools. Azure Container Apps ensures scalable, serverless deployment. This architecture delivers operational efficiency and personalized service at scale, transforming chaos into opportunity. LlamaIndex.TS: Orchestrating AI Agents The heart of The AI Travel Agents is LlamaIndex.TS, a powerful agentic framework that orchestrates multiple AI agents to handle travel planning tasks. Built on a Node.js backend, LlamaIndex.TS manages agent interactions in a seamless and intelligent manner: Task Delegation: The Triage Agent analyzes queries and routes them to specialized agents, like the Itinerary Planning Agent, ensuring efficient workflows. Agent Coordination: LlamaIndex.TS maintains context across interactions, enabling coherent responses for complex queries, such as multi-city trip plans. LLM Integration: Connects to Azure OpenAI, GitHub Models or any local LLM using Foundy Local for advanced AI capabilities. LlamaIndex.TS’s modular design supports extensibility, allowing new agents to be added with ease. LlamaIndex.TS is the conductor, ensuring agents work in sync to deliver accurate, timely results. Its lightweight orchestration minimizes latency, making it ideal for real-time applications. MCP: Fueling Agents with Data and Tools The Model Context Protocol (MCP) empowers AI agents by providing travel-specific data and tools, enhancing their functionality. MCP acts as a data and tool hub: Real-Time Data: Supplies up-to-date travel information, such as trending destinations or seasonal events, via the Web Search Agent using Bing Search. Tool Access: Connects agents to external tools, like the .NET-based customer queries analyzer for sentiment analysis, the Python-based itinerary planning for trip schedules or destination recommendation tools written in Java. For example, when the Destination Recommendation Agent needs current travel trends, MCP delivers via the Web Search Agent. This modularity allows new tools to be integrated seamlessly, future-proofing the platform. MCP’s role is to enrich agent capabilities, leaving orchestration to LlamaIndex.TS. Azure Container Apps: Scalability and Resilience Azure Container Apps powers The AI Travel Agents sample application with a serverless, scalable platform for deploying microservices. It ensures the application handles varying workloads with ease: Dynamic Scaling: Automatically adjusts container instances based on demand, managing booking surges without downtime. Polyglot Microservices: Supports .NET (Customer Query), Python (Itinerary Planning), Java (Destination Recommandation) and Node.js services in isolated containers. Observability: Integrates tracing, metrics, and logging enabling real-time monitoring. Serverless Efficiency: Abstracts infrastructure, reducing costs and accelerating deployment. Azure Container Apps' global infrastructure delivers low-latency performance, critical for travel agencies serving clients worldwide. The AI Agents: A Quick Look While MCP and Azure Container Apps are the stars, they support a team of multiple AI agents that drive the application’s functionality. Built and orchestrated with Llamaindex.TS via MCP, these agents collaborate to handle travel planning tasks: Triage Agent: Directs queries to the right agent, leveraging MCP for task delegation. Customer Query Agent: Analyzes customer needs (emotions, intents), using .NET tools. Destination Recommendation Agent: Suggests tailored destinations, using Java. Itinerary Planning Agent: Crafts efficient itineraries, powered by Python. Web Search Agent: Fetches real-time data via Bing Search. These agents rely on MCP’s real-time communication and Azure Container Apps’ scalability to deliver responsive, accurate results. It's worth noting though this sample application uses mock data for demonstration purpose. In real worl scenario, the application would communicate with an MCP server that is plugged in a real production travel API. Key Features and Benefits The AI Travel Agents offers features that showcase the power of MCP and Azure Container Apps: Real-Time Chat: A responsive Angular UI streams agent responses via MCP’s SSE, ensuring fluid interactions. Modular Tools: MCP enables tools like analyze_customer_query to integrate seamlessly, supporting future additions. Scalable Performance: Azure Container Apps ensures the UI, backend and the MCP servers handle high traffic effortlessly. Transparent Debugging: An accordion UI displays agent reasoning providing backend insights. Benefits: Efficiency: LlamaIndex.TS streamlines operations. Personalization: MCP’s data drives tailored recommendations. Scalability: Azure ensures reliability at scale. Thank You to Our Contributors! The AI Travel Agents wouldn’t exist without the incredible work of our contributors. Their expertise in MCP development, Azure deployment, and AI orchestration brought this project to life. A special shoutout to: Pamela Fox – Leading the developement of the Python MCP server. Aaron Powell and Justin Yoo – Leading the developement of the .NET MCP server. Rory Preddy – Leading the developement of the Java MCP server. Lee Stott and Kinfey Lo – Leading the developement of the Local AI Foundry Anthony Chu and Vyom Nagrani – Leading Azure Container Apps roadmap Matt Soucoup and Julien Dubois – Leading the ACA DevRel strategy Wassim Chegham – Architected MCP and backend orchestration. And many more! See the GitHub repository for all contributors. Thank you for your dedication to pushing the boundaries of AI and cloud technology! Try It Out Experience the power of MCP and Azure Container Apps with The AI Travel Agents! Try out live demo locally on your computer for free to see real-time agent collaboration in action. Conclusion Developers can explore today the open-source project on GitHub, with setup and deployment instructions. Share your feedback on our community forum. We’re already planning enhancements, like new MCP-integrated agents, enabling secure communication between the AI agents and MCP servers and more. This is still a work in progress and we also welcome all kind of contributions. Please fork and star the repo to stay tuned for updates! ◾️We would love your feedback and continue the discussion in the Azure AI Foundry Discord aka.ms/foundry/discord On behalf of Microsoft DevRel Team.
wassimchegham
Jun 27, 2025 Place Microsoft Developer Community Blog
4.3KViews
0likes
0Comments
MVP’s Favorite Content from Microsoft Azure MVPs
MVP’s Favorite Content is back! In this special Global Azure edition, we’re excited to share recommended articles from four Azure MVPs from around the world.
RieMoriguchi
May 11, 2025 Place Microsoft MVP Program Blog
363Views
1like
2Comments
Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning
Today we are introducing Phi-4, our 14B parameter state-of-the-art small language model (SLM) that excels at complex reasoning in areas such as math, in addition to conventional language processing. Phi-4 is the latest member of our Phi family of small language models and demonstrates what’s possible as we continue to probe the boundaries of SLMs. Phi-4 is available on Azure AI Foundry and on Hugging Face. Phi-4 Benchmarks Phi-4 outperforms comparable and larger models on math related reasoning due to advancements throughout the processes, including the use of high-quality synthetic datasets, curation of high-quality organic data, and post-training innovations. Phi-4 continues to push the frontier of size vs quality. Phi-4 is particularly good at math problems, for example here are the benchmarks for Phi-4 on math competition problems: Phi-4 performance on math competition problems To see more benchmarks read the newest technical paper released on arxiv. Enabling AI innovation safely and responsibly Building AI solutions responsibly is at the core of AI development at Microsoft. We have made our robust responsible AI capabilities available to customers building with Phi models, including Phi-3.5-mini optimized for Windows Copilot+ PCs. Azure AI Foundry provides users with a robust set of capabilities to help organizations measure, mitigate, and manage AI risks across the AI development lifecycle for traditional machine learning and generative AI applications. Azure AI evaluations in AI Foundry enable developers to iteratively assess the quality and safety of models and applications using built-in and custom metrics to inform mitigations. Additionally, Phi users can use Azure AI Content Safety features such as prompt shields, protected material detection, and groundedness detection. These capabilities can be leveraged as content filters with any language model included in our model catalog and developers can integrate these capabilities into their application easily through a single API. Once in production, developers can monitor their application for quality and safety, adversarial prompt attacks, and data integrity, making timely interventions with the help of real-time alerts. Phi-4 in action One example of the mathematical reasoning Phi-4 is capable of is demonstrated in this problem. Start Exploring Phi-4 is currently available on Azure AI Foundry and Hugging Face, take a look today.
ecekamar
May 01, 2025 Place Microsoft Foundry Blog
235KViews
20likes
22Comments
Week 2 . Microsoft Agents Hack Online Events and Readiness Resources
https://aka.ms/agentshack 2025 is the year of AI agents! But what exactly is an agent, and how can you build one? Whether you're a seasoned developer or just starting out, this FREE three-week virtual hackathon is your chance to dive deep into AI agent development. Register Now: https://aka.ms/agentshack 🔥 Learn from expert-led sessions streamed live on YouTube, covering top frameworks like Semantic Kernel, Autogen, the new Azure AI Agents SDK and the Microsoft 365 Agents SDK. Week 2 Events: April 14th-18th Day/Time Topic Track 4/14 08:00 AM PT Building custom engine agents with Azure AI Foundry and Visual Studio Code Copilots 4/15 07:00 AM PT Your first AI Agent in JS with Azure AI Agent Service JS 4/15 09:00 AM PT Building Agentic Applications with AutoGen v0.4 Python 4/15 12:00 PM PT AI Agents + .NET Aspire C# 4/15 03:00 PM PT Prototyping AI Agents with GitHub Models Python 4/16 04:00 AM PT Multi-agent AI apps with Semantic Kernel and Azure Cosmos DB C# 4/16 06:00 AM PT Building declarative agents with Microsoft Copilot Studio & Teams Toolkit Copilots 4/16 07:00 AM PT Prompting is the New Scripting: Meet GenAIScript JS 4/16 09:00 AM PT Building agents with an army of models from the Azure AI model catalog Python 4/16 12:00 PM PT Multi-Agent API with LangGraph and Azure Cosmos DB Python 4/16 03:00 PM PT Mastering Agentic RAG Python 4/17 06:00 AM PT Build your own agent with OpenAI, .NET, and Copilot Studio C# 4/17 09:00 AM PT Building smarter Python AI agents with code interpreters Python 4/17 12:00 PM PT Building Java AI Agents using LangChain4j and Dynamic Sessions Java 4/17 03:00 PM PT Agentic Voice Mode Unplugged Python
Lee_Stott
Apr 14, 2025 Place Microsoft Developer Community Blog
1.3KViews
0likes
0Comments