phi-3
12 Topics- Transform Your AI Applications with Local LLM DeploymentIntroduction Are you tired of watching your AI application costs spiral out of control every time your user base grows? As AI Engineers and Developers, we've all felt the pain of cloud-dependent LLM deployments. Every API call adds up, latency becomes a bottleneck in real-time applications, and sensitive data must leave your infrastructure to get processed. Meanwhile, your users demand faster responses, better privacy, and more reliable service. What if there was a way to run powerful language models directly on your users' devices or your local infrastructure? Enter the world of Edge AI deployment with Microsoft's Foundry Local a game-changing approach that brings enterprise-grade LLM capabilities to local hardware while maintaining full OpenAI API compatibility. The Edge AI for Beginners https://aka.ms/edgeai-for-beginners curriculum provides AI Engineers and Developers with comprehensive, hands-on training to master local LLM deployment. This isn't just another theoretical course, it's a practical guide that will transform how you think about AI infrastructure, combining cutting-edge local deployment techniques with production-ready implementation patterns. In this post, we'll explore why Edge AI deployment represents the future of AI applications, dive deep into Foundry Local's capabilities across multiple frameworks, and show you exactly how to implement local LLM solutions that deliver both technical excellence and significant business value. Why Edge AI Deployment Changes Everything for Developers The shift from cloud-dependent to edge-deployed AI represents more than just a technical evolution, it's a fundamental reimagining of how we build intelligent applications. As AI Engineers, we're witnessing a transformation that addresses the most pressing challenges in modern AI deployment while opening up entirely new possibilities for innovation. Consider the current state of cloud-based LLM deployment. Every user interaction requires a round-trip to external servers, introducing latency that can kill user experience in real-time applications. Costs scale linearly (or worse) with usage, making successful applications expensive to operate. Sensitive data must traverse networks and live temporarily in external systems, creating compliance nightmares for enterprise applications. Edge AI deployment fundamentally changes this equation. By running models locally, we achieve several critical advantages: Data Sovereignty and Privacy Protection: Your sensitive data never leaves your infrastructure. For healthcare applications processing patient records, financial services handling transactions, or enterprise tools managing proprietary information, this represents a quantum leap in security posture. You maintain complete control over data flow, meeting even the strictest compliance requirements without architectural compromises. Real-Time Performance at Scale: Local inference eliminates network latency entirely. Instead of 200-500ms round-trips to cloud APIs, you get sub-10ms response times. This enables entirely new categories of applications—real-time code completion, interactive AI tutoring systems, voice assistants that respond instantly, and IoT devices that make intelligent decisions without connectivity. Predictable Cost Structure: Transform variable API costs into fixed infrastructure investments. Instead of paying per-token for potentially unlimited usage, you invest in local hardware that serves unlimited requests. This makes ROI calculations straightforward and removes the fear of viral success destroying your margins. Offline Capabilities and Resilience: Local deployment means your AI features work even when connectivity fails. Mobile applications can provide intelligent features in areas with poor network coverage. Critical systems maintain AI capabilities during network outages. Edge devices in remote locations operate autonomously. The technical implications extend beyond these obvious benefits. Local deployment enables new architectural patterns: AI-powered applications that work entirely client-side, edge computing nodes that make intelligent routing decisions, and distributed systems where intelligence lives close to data sources. Foundry Local: Multi-Framework Edge AI Deployment Made Simple Microsoft's Foundry Local https://www.foundrylocal.ai represents a breakthrough in local AI deployment, designed specifically for developers who need production-ready edge AI solutions. Unlike single-framework tools, Foundry Local provides a unified platform that works seamlessly across multiple programming languages and deployment scenarios while maintaining full compatibility with existing OpenAI-based workflows. The platform's approach to multi-framework support means you're not locked into a single technology stack. Whether you're building TypeScript applications, Python ML pipelines, Rust systems programming projects, or .NET enterprise applications, Foundry Local provides native SDKs and consistent APIs that integrate naturally with your existing codebase. Enterprise-Grade Model Catalog: Foundry Local comes with a curated selection of production-ready models optimized for edge deployment. The `phi-3.5-mini` model delivers impressive performance in a compact footprint, perfect for resource-constrained environments. For applications requiring more sophisticated reasoning, `qwen2.5-0.5b` provides enhanced capabilities while maintaining efficiency. When you need maximum capability and have sufficient hardware resources, `gpt-oss-20b` offers state-of-the-art performance with full local control. Intelligent Hardware Optimization: One of Foundry Local's most powerful features is its automatic hardware detection and optimization. The platform automatically identifies your available compute resources, NVIDIA CUDA GPUs, AMD GPUs, Intel NPUs, Qualcomm Snapdragon NPUs, or CPU-only environments and downloads the most appropriate model variant. This means the same application code delivers optimal performance across diverse hardware configurations without manual intervention. ONNX Runtime Acceleration: Under the hood, Foundry Local leverages Microsoft's ONNX Runtime for maximum performance. This provides significant advantages over generic inference engines, delivering optimized execution paths for different hardware architectures while maintaining model accuracy and compatibility. OpenAI SDK Compatibility: Perhaps most importantly for developers, Foundry Local maintains complete API compatibility with the OpenAI SDK. This means existing applications can migrate to local inference by changing only the endpoint configuration—no rewriting of application logic, no learning new APIs, no disruption to existing workflows. The platform handles the complex aspects of local AI deployment automatically: model downloading, hardware-specific optimization, memory management, and inference scheduling. This allows developers to focus on building intelligent applications rather than managing AI infrastructure. Framework-Agnostic Benefits: Foundry Local's multi-framework approach delivers consistent benefits regardless of your technology choices. Whether you're working in a Node.js microservices architecture, a Python data science environment, a Rust embedded system, or a C# enterprise application, you get the same advantages: reduced latency, eliminated API costs, enhanced privacy, and offline capabilities. This universal compatibility means teams can adopt edge AI deployment incrementally, starting with pilot projects in their preferred language and expanding across their technology stack as they see results. The learning curve is minimal because the API patterns remain familiar while the underlying infrastructure transforms to local deployment. Implementing Edge AI: From Code to Production Moving from cloud APIs to local AI deployment requires understanding the implementation patterns that make edge AI both powerful and practical. Let's explore how Foundry Local's SDKs enable seamless integration across different development environments, with real-world code examples that you can adapt for your production systems. Python Implementation for Data Science and ML Pipelines Python developers will find Foundry Local's integration particularly natural, especially in data science and machine learning contexts where local processing is often preferred for security and performance reasons. import openai from foundry_local import FoundryLocalManager # Initialize with automatic hardware optimization alias = "phi-3.5-mini" manager = FoundryLocalManager(alias) This simple initialization handles a remarkable amount of complexity automatically. The `FoundryLocalManager` detects your hardware configuration, downloads the most appropriate model variant for your system, and starts the local inference service. Behind the scenes, it's making intelligent decisions about memory allocation, selecting optimal execution providers, and preparing the model for efficient inference. # Configure OpenAI client for local deployment client = openai.OpenAI( base_url=manager.endpoint, api_key=manager.api_key # Not required for local, but maintains API compatibility ) # Production-ready inference with streaming def analyze_document(content: str): stream = client.chat.completions.create( model=manager.get_model_info(alias).id, messages=[{ "role": "system", "content": "You are an expert document analyzer. Provide structured analysis." }, { "role": "user", "content": f"Analyze this document: {content}" }], stream=True, temperature=0.7 ) result = "" for chunk in stream: if chunk.choices[0].delta.content: content_piece = chunk.choices[0].delta.content result += content_piece yield content_piece # Enable real-time UI updates return result Key implementation benefits here: • Automatic model management: The `FoundryLocalManager` handles model lifecycle, memory optimization, and hardware-specific acceleration without manual configuration. • Streaming interface compatibility: Maintains the familiar OpenAI streaming API while processing locally, enabling real-time user interfaces with zero latency overhead. • Production error handling: The manager includes built-in retry logic, graceful degradation, and resource management for reliable production deployment. JavaScript/TypeScript Implementation for Web Applications JavaScript and TypeScript developers can integrate local AI capabilities directly into web applications, enabling entirely new categories of client-side intelligent features. import { OpenAI } from "openai"; import { FoundryLocalManager } from "foundry-local-sdk"; class LocalAIService { constructor() { this.foundryManager = null; this.openaiClient = null; this.isInitialized = false; } async initialize(modelAlias = "phi-3.5-mini") { this.foundryManager = new FoundryLocalManager(); const modelInfo = await this.foundryManager.init(modelAlias); this.openaiClient = new OpenAI({ baseURL: this.foundryManager.endpoint, apiKey: this.foundryManager.apiKey, }); this.isInitialized = true; return modelInfo; } The initialization pattern establishes local AI capabilities with full error handling and resource management. This enables web applications to provide AI features without external API dependencies. async generateCodeCompletion(codeContext, userPrompt) { if (!this.isInitialized) { throw new Error("LocalAI service not initialized"); } try { const completion = await this.openaiClient.chat.completions.create({ model: this.foundryManager.getModelInfo().id, messages: [ { role: "system", content: "You are a code completion assistant. Provide accurate, efficient code suggestions." }, { role: "user", content: `Context: ${codeContext}\n\nComplete: ${userPrompt}` } ], max_tokens: 150, temperature: 0.2 }); return completion.choices[0].message.content; } catch (error) { console.error("Local AI completion failed:", error); throw new Error("Code completion unavailable"); } } } Implementation advantages for web applications • Zero-dependency AI features: Applications work entirely offline once models are downloaded, enabling AI capabilities in disconnected environments. • Instant response times: Eliminate network latency for real-time features like code completion, content generation, or intelligent search. • Client-side privacy: Sensitive code or content never leaves the user's device, meeting strict security requirements for enterprise development tools. Cross-Platform Production Deployment Patterns Both Python and JavaScript implementations share common production deployment patterns that make Foundry Local particularly suitable for enterprise applications: Automatic Hardware Optimization: The platform automatically detects and utilizes available acceleration hardware. On systems with NVIDIA GPUs, it leverages CUDA acceleration. On newer Intel systems, it uses NPU acceleration. On ARM-based systems like Apple Silicon or Qualcomm Snapdragon, it optimizes for those architectures. This means the same application code delivers optimal performance across diverse deployment environments. Graceful Resource Management: Foundry Local includes sophisticated memory management and resource allocation. Models are loaded efficiently, memory is recycled properly, and concurrent requests are handled intelligently to maintain system stability under load. Production Monitoring Integration: The platform provides comprehensive metrics and logging that integrate naturally with existing monitoring systems, enabling production observability for AI workloads running at the edge. These implementation patterns demonstrate how Foundry Local transforms edge AI from an experimental concept into a practical, production-ready deployment strategy that works consistently across different technology stacks and hardware environments. Measuring Success: Technical Performance and Business Impact The transition to edge AI deployment delivers measurable improvements across both technical and business metrics. Understanding these impacts helps justify the architectural shift and demonstrates the concrete value of local LLM deployment in production environments. Technical Performance Gains Latency Elimination: The most immediately visible benefit is the dramatic reduction in response times. Cloud API calls typically require 200-800ms round-trips, depending on geographic location and network conditions. Local inference with Foundry Local reduces this to sub-10ms response times—a 95-99% improvement that fundamentally changes user experience possibilities. Consider a code completion feature: cloud-based completion feels sluggish and interrupts developer flow, while local completion provides instant suggestions that enhance productivity. The same applies to real-time chat applications, interactive AI tutoring systems, and any application where response latency directly impacts usability. Automatic Hardware Utilization: Foundry Local's intelligent hardware detection and optimization delivers significant performance improvements without manual configuration. On systems with NVIDIA RTX 4000 series GPUs, inference speeds can be 10-50x faster than CPU-only processing. On newer Intel systems with NPUs, the platform automatically leverages neural processing units for efficient AI workloads. Apple Silicon systems benefit from Metal Performance Shaders optimization, delivering excellent performance per watt. ONNX Runtime Optimization: Microsoft's ONNX Runtime provides substantial performance advantages over generic inference engines. In benchmark testing, ONNX Runtime consistently delivers 2-5x performance improvements compared to standard PyTorch or TensorFlow inference, while maintaining full model accuracy and compatibility. Scalability Characteristics: Local deployment transforms scaling economics entirely. Instead of linear cost scaling with usage, you get horizontal scaling through hardware deployment. A single modern GPU can handle hundreds of concurrent inference requests, making per-request costs approach zero for high-volume applications. Business Impact Analysis Cost Structure Transformation: The financial implications of local deployment are profound. Consider an application processing 1 million tokens daily through OpenAI's API—this represents $20-60 in daily costs depending on the model. Over a year, this becomes $7,300-21,900 in recurring expenses. A comparable local deployment might require a $2,000-5,000 hardware investment with no ongoing API costs. For high-volume applications, the savings become dramatic. Applications processing 100 million tokens monthly face $60,000-180,000 annual API costs. Local deployment with appropriate hardware infrastructure could reduce this to electricity and maintenance costs—typically under $10,000 annually for equivalent processing capacity. Enhanced Privacy and Compliance: Local deployment eliminates data sovereignty concerns entirely. Healthcare applications processing patient records, financial services handling transaction data, and enterprise tools managing proprietary information can deploy AI capabilities without data leaving their infrastructure. This simplifies compliance with GDPR, HIPAA, SOX, and other regulatory frameworks while reducing legal and security risks. Operational Resilience: Local deployment provides significant business continuity advantages. Applications continue functioning during network outages, API service disruptions, or third-party provider issues. For mission-critical systems, this resilience can prevent costly downtime and maintain user productivity during external service failures. Development Velocity: Local deployment accelerates development cycles by eliminating API rate limits, usage quotas, and external dependencies during development and testing. Developers can iterate freely, run comprehensive test suites, and experiment with AI features without cost concerns or rate limiting delays. Enterprise Adoption Metrics Real-world enterprise deployments demonstrate measurable business value: Local Usage: Foundry Local for internal AI-powered tools, reporting 60-80% reduction in AI-related operational costs while improving developer productivity through instant AI responses in development environments. Manufacturing Applications: Industrial IoT deployments using edge AI for predictive maintenance show 40-60% reduction in unplanned downtime while eliminating cloud connectivity requirements in remote facilities. Financial Services: Trading firms deploying local LLMs for market analysis report sub-millisecond decision latencies while maintaining complete data isolation for competitive advantage and regulatory compliance. ROI Calculation Framework For AI Engineers evaluating edge deployment, consider these quantifiable factors: Direct Cost Savings: Compare monthly API costs against hardware amortization over 24-36 months. Most applications with >$1,000 monthly API costs achieve positive ROI within 12-18 months. Performance Value: Quantify the business impact of reduced latency. For customer-facing applications, each 100ms of latency reduction typically correlates with 1-3% conversion improvement. Risk Mitigation: Calculate the cost of downtime or compliance violations prevented by local deployment. For many enterprise applications, avoiding a single significant outage justifies the infrastructure investment. Development Efficiency: Measure developer productivity improvements from unlimited local AI access during development. Teams report 20-40% faster iteration cycles when AI features can be tested without external dependencies. These metrics demonstrate that edge AI deployment with Foundry Local delivers both immediate technical improvements and substantial long-term business value, making it a strategic investment in AI infrastructure that pays dividends across multiple dimensions. Your Edge AI Journey Starts Here The shift to edge AI represents more than just a technical evolution, it's an opportunity to fundamentally improve your applications while building valuable expertise in an emerging field. Whether you're looking to reduce costs, improve performance, or enhance privacy, the path forward involves both learning new concepts and connecting with a community of practitioners solving similar challenges. Master Edge AI with Comprehensive Training The Edge AI for Beginners https://aka.ms/edgeai-for-beginners curriculum provides the complete foundation you need to become proficient in local AI deployment. This isn't a superficial overview, it's a comprehensive, hands-on program designed specifically for developers who want to build production-ready edge AI applications. The curriculum takes you through hours of structured learning, progressing from fundamental concepts to advanced deployment scenarios. You'll start by understanding the principles of edge AI and local inference, then dive deep into practical implementation with Foundry Local across multiple programming languages. The program includes working examples and comprehensive sample applications that demonstrate real-world use cases. What sets this curriculum apart is its practical focus. Instead of theoretical discussions, you'll build actual applications: document analysis systems that work offline, real-time code completion tools, intelligent chatbots that protect user privacy, and IoT applications that make decisions locally. Each project teaches both the technical implementation and the architectural thinking needed for successful edge AI deployment. The curriculum covers multi-framework deployment patterns extensively, ensuring you can apply edge AI principles regardless of your preferred development stack. Whether you're working in Python data science environments, JavaScript web applications, C# enterprise systems, or Rust embedded projects, you'll learn the patterns and practices that make edge AI successful. Join a Community of AI Engineers Learning edge AI doesn't happen in isolation, it requires connection with other developers who are solving similar challenges and discovering new possibilities. The Foundry Local Discord community https://aka.ms/foundry-local-discord provides exactly this environment, connecting AI Engineers and Developers from around the world who are implementing local AI solutions. This community serves multiple crucial functions for your development as an edge AI practitioner. You'll find experienced developers sharing implementation patterns they've discovered, debugging complex deployment issues collaboratively, and discussing the architectural decisions that make edge AI successful in production environments. The Discord community includes dedicated channels for different programming languages, specific deployment scenarios, and technical discussions about optimization and performance. Whether you're implementing your first local AI feature or optimizing a complex multi-model deployment, you'll find peers and experts ready to help problem-solve and share insights. Beyond technical support, the community provides valuable career and business insights. Members share their experiences with edge AI adoption in different industries, discuss the business cases that have proven most successful, and collaborate on open-source projects that advance the entire ecosystem. Share Your Experience and Build Expertise One of the most effective ways to solidify your edge AI expertise is by sharing your implementation experiences with the community. As you build applications with Foundry Local and deploy edge AI solutions, documenting your process and sharing your learnings provides value both to others and to your own professional development. Consider sharing your deployment stories, whether they're successes or challenges you've overcome. The community benefits from real-world case studies that show how edge AI performs in different environments and use cases. Your experience implementing local AI in a healthcare application, financial services system, or manufacturing environment provides valuable insights that others can build upon. Technical contributions are equally valuable, whether it's sharing configuration patterns you've discovered, performance optimizations you've implemented, or integration approaches you've developed for specific frameworks or libraries. The edge AI field is evolving rapidly, and practical contributions from working developers drive much of the innovation. Sharing your work also builds your professional reputation as an edge AI expert. As organizations increasingly adopt local AI deployment strategies, developers with proven experience in this area become valuable resources for their teams and the broader industry. The combination of structured learning through the Edge AI curriculum, active participation in the community, and sharing your practical experiences creates a comprehensive path to edge AI expertise that serves both your immediate project needs and your long-term career development as AI deployment patterns continue evolving. Key Takeaways Local LLM deployment transforms application economics: Replace variable API costs with fixed infrastructure investments that scale to unlimited usage, typically achieving ROI within 12-18 months for applications with significant AI workloads. Foundry Local enables multi-framework edge AI: Consistent deployment patterns across Python, JavaScript, C#, and Rust environments with automatic hardware optimization and OpenAI API compatibility. Performance improvements are dramatic and measurable: Sub-10ms response times replace 200-800ms cloud API latency, while automatic hardware acceleration delivers 2-50x performance improvements depending on available compute resources. Privacy and compliance become architectural advantages: Local deployment eliminates data sovereignty concerns, simplifies regulatory compliance, and provides complete control over sensitive information processing. Edge AI expertise is a strategic career investment: As organizations increasingly adopt local AI deployment, developers with hands-on edge AI experience become valuable technical resources with unique skills in an emerging field. Conclusion Edge AI deployment represents the next evolution in intelligent application development, transforming both the technical possibilities and economic models of AI-powered systems. With Foundry Local and the comprehensive Edge AI for Beginners curriculum, you have access to production-ready tools and expert guidance to make this transition successfully. The path forward is clear: start with the Edge AI for Beginners curriculum to build solid foundations, connect with the Foundry Local Discord community to learn from practicing developers, and begin implementing local AI solutions in your projects. Each step builds valuable expertise while delivering immediate improvements to your applications. As cloud costs continue rising and privacy requirements become more stringent, organizations will increasingly rely on developers who can implement local AI solutions effectively. Your early adoption of edge AI deployment patterns positions you at the forefront of this technological shift, with skills that will become increasingly valuable as the industry evolves. The future of AI deployment is local, private, and performance-optimized. Start building that future today. Resources Edge AI for Beginners Curriculum: Comprehensive training with 36-45 hours of hands-on content examples, and production-ready deployment patterns https://aka.ms/edgeai-for-beginners Foundry Local GitHub Repository: Official documentation, samples, and community contributions for local AI deployment https://github.com/microsoft/foundry_local Foundry Local Discord Community: Connect with AI Engineers and Developers implementing edge AI solutions worldwide https://aka.ms/foundry/discord Foundry Local Documentation: Complete technical documentation and API references Foundry Local documentation | Microsoft Learn Foundry Local Model Catalog: Browse available models and deployment options for different hardware configurations Foundry Local Models - Browse AI Models
- Essential Microsoft Resources for MVPs & the Tech Community from the AI TourUnlock the power of Microsoft AI with redeliverable technical presentations, hands-on workshops, and open-source curriculum from the Microsoft AI Tour! Whether you’re a Microsoft MVP, Developer, or IT Professional, these expertly crafted resources empower you to teach, train, and lead AI adoption in your community. Explore top breakout sessions covering GitHub Copilot, Azure AI, Generative AI, and security best practices—designed to simplify AI integration and accelerate digital transformation. Dive into interactive workshops that provide real-world applications of AI technologies. Take it a step further with Microsoft’s Open-Source AI Curriculum, offering beginner-friendly courses on AI, Machine Learning, Data Science, Cybersecurity, and GitHub Copilot—perfect for upskilling teams and fostering innovation. Don’t just learn—lead. Access these resources, host impactful training sessions, and drive AI adoption in your organization. Start sharing today! Explore now: Microsoft AI Tour Resources.
- Join Us for a Technical Deep Dive and Q&A on Foundry Local - LLMs on deviceJoin us for an Ask Me Anything with the Foundry Local team on October 14th, 2025! Discover how Foundry Local is redefining edge AI with powerful features like on-device inference, enabling you to run models directly on your hardware, cutting costs and keeping your data secure. Whether you're customizing models to fit unique use cases or integrating seamlessly via SDKs, APIs, or CLI, Foundry Local offers scalable pathways to Azure AI Foundry as your needs evolve. It's the perfect solution for environments with limited connectivity, sensitive data requirements, low-latency demands, or early-stage experimentation before cloud deployment. If you're building smarter, leaner, and more private AI workflows, this AMA is your chance to dive deep with the team behind it all. What is Foundry Local? Foundry Local is a set of development tools designed to help you build and evaluate LLM applications on your local machine. It provides a curated collection of production-quality tools, including evaluation and prompt engineering capabilities, that are fully compatible with Azure AI. This allows for a seamless transition of your work from your local environment to the cloud. Don't miss this opportunity to connect with our experts and enhance your understanding of local LLM development. Foundry Local is an on-device AI inference solution offering performance, privacy, customization, and cost advantages. It integrates seamlessly into your existing workflows and applications through an intuitive CLI, SDK, and REST API. Key features On-Device Inference: Run models locally on your own hardware, reducing your costs while keeping all your data on your device. Model Customization: Select from preset models or use your own to meet specific requirements and use cases. Cost Efficiency: Eliminate recurring cloud service costs by using your existing hardware, making AI more accessible. Seamless Integration: Connect with your applications through an SDK, API endpoints, or the CLI, with easy scaling to Azure AI Foundry as your needs grow. How to Join: Register to Join the Azure AI Foundry Discord Community Event 14th Oct 2025 9am Pacific Time UTC−08:00 Unlock Accelerated Local LLM Development Discover how Foundry Local can enhance your development process and explore the possibilities for building robust LLM applications. Whether you're a seasoned AI developer or just getting started, this session is your chance to get hands-on insights into the innovative world of Azure AI Foundry. Event Highlights: An in-depth overview of the Foundry Local CLI and SDK. Interactive demo with step-by-step examples. Best practices for local AI Inference and models Transitioning your local development to cloud solutions or vice-versa Why Attend? Gain expert insights into Foundry Local, and ask questions about using Foundry Local Network with fellow AI professionals and developers in the Azure AI Foundry community. Enhance your AI development skills with practical examples. Stay at the forefront of LLM application development. Speakers Product Manager Foundry Local Maanav Dalal Product Manager |Foundry Local Microsoft Maanav Dalal is a PM on the AI Frameworks team. He's super inquisitive about the ways you use AI in daily life, so be encouraged to strike up a conversation with him about that. LinkedIn Profile
- Introducing Azure AI Travel Agents: A Flagship MCP-Powered Sample for AI Travel SolutionsWe are excited to introduce AI Travel Agents, a sample application with enterprise functionality that demonstrates how developers can coordinate multiple AI agents (written in multiple languages) to explore travel planning scenarios. It's built with LlamaIndex.TS for agent orchestration, Model Context Protocol (MCP) for structured tool interactions, and Azure Container Apps for scalable deployment. TL;DR: Experience the power of MCP and Azure Container Apps with The AI Travel Agents! Try out live demo locally on your computer for free to see real-time agent collaboration in action. Share your feedback on our community forum. We’re already planning enhancements, like new MCP-integrated agents, enabling secure communication between the AI agents and MCP servers and more. NOTE: This example uses mock data and is intended for demonstration purposes rather than production use. The Challenge: Scaling Personalized Travel Planning Travel agencies grapple with complex tasks: analyzing diverse customer needs, recommending destinations, and crafting itineraries, all while integrating real-time data like trending spots or logistics. Traditional systems falter with latency, scalability, and coordination, leading to delays and frustrated clients. The AI Travel Agents tackles these issues with a technical trifecta: LlamaIndex.TS orchestrates six AI agents for efficient task handling. MCP equips agents with travel-specific data and tools. Azure Container Apps ensures scalable, serverless deployment. This architecture delivers operational efficiency and personalized service at scale, transforming chaos into opportunity. LlamaIndex.TS: Orchestrating AI Agents The heart of The AI Travel Agents is LlamaIndex.TS, a powerful agentic framework that orchestrates multiple AI agents to handle travel planning tasks. Built on a Node.js backend, LlamaIndex.TS manages agent interactions in a seamless and intelligent manner: Task Delegation: The Triage Agent analyzes queries and routes them to specialized agents, like the Itinerary Planning Agent, ensuring efficient workflows. Agent Coordination: LlamaIndex.TS maintains context across interactions, enabling coherent responses for complex queries, such as multi-city trip plans. LLM Integration: Connects to Azure OpenAI, GitHub Models or any local LLM using Foundy Local for advanced AI capabilities. LlamaIndex.TS’s modular design supports extensibility, allowing new agents to be added with ease. LlamaIndex.TS is the conductor, ensuring agents work in sync to deliver accurate, timely results. Its lightweight orchestration minimizes latency, making it ideal for real-time applications. MCP: Fueling Agents with Data and Tools The Model Context Protocol (MCP) empowers AI agents by providing travel-specific data and tools, enhancing their functionality. MCP acts as a data and tool hub: Real-Time Data: Supplies up-to-date travel information, such as trending destinations or seasonal events, via the Web Search Agent using Bing Search. Tool Access: Connects agents to external tools, like the .NET-based customer queries analyzer for sentiment analysis, the Python-based itinerary planning for trip schedules or destination recommendation tools written in Java. For example, when the Destination Recommendation Agent needs current travel trends, MCP delivers via the Web Search Agent. This modularity allows new tools to be integrated seamlessly, future-proofing the platform. MCP’s role is to enrich agent capabilities, leaving orchestration to LlamaIndex.TS. Azure Container Apps: Scalability and Resilience Azure Container Apps powers The AI Travel Agents sample application with a serverless, scalable platform for deploying microservices. It ensures the application handles varying workloads with ease: Dynamic Scaling: Automatically adjusts container instances based on demand, managing booking surges without downtime. Polyglot Microservices: Supports .NET (Customer Query), Python (Itinerary Planning), Java (Destination Recommandation) and Node.js services in isolated containers. Observability: Integrates tracing, metrics, and logging enabling real-time monitoring. Serverless Efficiency: Abstracts infrastructure, reducing costs and accelerating deployment. Azure Container Apps' global infrastructure delivers low-latency performance, critical for travel agencies serving clients worldwide. The AI Agents: A Quick Look While MCP and Azure Container Apps are the stars, they support a team of multiple AI agents that drive the application’s functionality. Built and orchestrated with Llamaindex.TS via MCP, these agents collaborate to handle travel planning tasks: Triage Agent: Directs queries to the right agent, leveraging MCP for task delegation. Customer Query Agent: Analyzes customer needs (emotions, intents), using .NET tools. Destination Recommendation Agent: Suggests tailored destinations, using Java. Itinerary Planning Agent: Crafts efficient itineraries, powered by Python. Web Search Agent: Fetches real-time data via Bing Search. These agents rely on MCP’s real-time communication and Azure Container Apps’ scalability to deliver responsive, accurate results. It's worth noting though this sample application uses mock data for demonstration purpose. In real worl scenario, the application would communicate with an MCP server that is plugged in a real production travel API. Key Features and Benefits The AI Travel Agents offers features that showcase the power of MCP and Azure Container Apps: Real-Time Chat: A responsive Angular UI streams agent responses via MCP’s SSE, ensuring fluid interactions. Modular Tools: MCP enables tools like analyze_customer_query to integrate seamlessly, supporting future additions. Scalable Performance: Azure Container Apps ensures the UI, backend and the MCP servers handle high traffic effortlessly. Transparent Debugging: An accordion UI displays agent reasoning providing backend insights. Benefits: Efficiency: LlamaIndex.TS streamlines operations. Personalization: MCP’s data drives tailored recommendations. Scalability: Azure ensures reliability at scale. Thank You to Our Contributors! The AI Travel Agents wouldn’t exist without the incredible work of our contributors. Their expertise in MCP development, Azure deployment, and AI orchestration brought this project to life. A special shoutout to: Pamela Fox – Leading the developement of the Python MCP server. Aaron Powell and Justin Yoo – Leading the developement of the .NET MCP server. Rory Preddy – Leading the developement of the Java MCP server. Lee Stott and Kinfey Lo – Leading the developement of the Local AI Foundry Anthony Chu and Vyom Nagrani – Leading Azure Container Apps roadmap Matt Soucoup and Julien Dubois – Leading the ACA DevRel strategy Wassim Chegham – Architected MCP and backend orchestration. And many more! See the GitHub repository for all contributors. Thank you for your dedication to pushing the boundaries of AI and cloud technology! Try It Out Experience the power of MCP and Azure Container Apps with The AI Travel Agents! Try out live demo locally on your computer for free to see real-time agent collaboration in action. Conclusion Developers can explore today the open-source project on GitHub, with setup and deployment instructions. Share your feedback on our community forum. We’re already planning enhancements, like new MCP-integrated agents, enabling secure communication between the AI agents and MCP servers and more. This is still a work in progress and we also welcome all kind of contributions. Please fork and star the repo to stay tuned for updates! ◾️We would love your feedback and continue the discussion in the Azure AI Foundry Discord aka.ms/foundry/discord On behalf of Microsoft DevRel Team.
- Week 2 . Microsoft Agents Hack Online Events and Readiness Resourceshttps://aka.ms/agentshack 2025 is the year of AI agents! But what exactly is an agent, and how can you build one? Whether you're a seasoned developer or just starting out, this FREE three-week virtual hackathon is your chance to dive deep into AI agent development. Register Now: https://aka.ms/agentshack 🔥 Learn from expert-led sessions streamed live on YouTube, covering top frameworks like Semantic Kernel, Autogen, the new Azure AI Agents SDK and the Microsoft 365 Agents SDK. Week 2 Events: April 14th-18th Day/Time Topic Track 4/14 08:00 AM PT Building custom engine agents with Azure AI Foundry and Visual Studio Code Copilots 4/15 07:00 AM PT Your first AI Agent in JS with Azure AI Agent Service JS 4/15 09:00 AM PT Building Agentic Applications with AutoGen v0.4 Python 4/15 12:00 PM PT AI Agents + .NET Aspire C# 4/15 03:00 PM PT Prototyping AI Agents with GitHub Models Python 4/16 04:00 AM PT Multi-agent AI apps with Semantic Kernel and Azure Cosmos DB C# 4/16 06:00 AM PT Building declarative agents with Microsoft Copilot Studio & Teams Toolkit Copilots 4/16 07:00 AM PT Prompting is the New Scripting: Meet GenAIScript JS 4/16 09:00 AM PT Building agents with an army of models from the Azure AI model catalog Python 4/16 12:00 PM PT Multi-Agent API with LangGraph and Azure Cosmos DB Python 4/16 03:00 PM PT Mastering Agentic RAG Python 4/17 06:00 AM PT Build your own agent with OpenAI, .NET, and Copilot Studio C# 4/17 09:00 AM PT Building smarter Python AI agents with code interpreters Python 4/17 12:00 PM PT Building Java AI Agents using LangChain4j and Dynamic Sessions Java 4/17 03:00 PM PT Agentic Voice Mode Unplugged Python1.3KViews0likes0Comments
- Getting Started with the AI Dev GalleryMarch Update: The Gallery is now available on the Microsoft Store! The AI Dev Gallery is a new open-source project designed to inspire and support developers in integrating on-device AI functionality into their Windows apps. It offers an intuitive UX for exploring and testing interactive AI samples powered by local models. Key features include: Quickly explore and download models from well-known sources on GitHub and HuggingFace. Test different models with interactive samples over 25 different scenarios, including text, image, audio, and video use cases. See all relevant code and library references for every sample. Switch between models that run on CPU and GPU depending on your device capabilities. Quickly get started with your own projects by exporting any sample to a fresh Visual Studio project that references the same model cache, preventing duplicate downloads. Part of the motivation behind the Gallery was exposing developers to the host of benefits that come with on-device AI. Some of these benefits include improved data security and privacy, increased control and parameterization, and no dependence on an internet connection or third-party cloud provider. Requirements Device Requirements Minimum OS Version: Windows 10, version 1809 (10.0; Build 17763) Architecture: x64, ARM64 Memory: At least 16 GB is recommended Disk Space: At least 20GB free space is recommended GPU: 8GB of VRAM is recommended for running samples on the GPU Using the Gallery The AI Dev Gallery has can be navigated in two ways: The Samples View The Models View Navigating Samples In this view, samples are broken up into categories (Text, Code, Image, etc.) and then into more specific samples, like in the Translate Text pictured below: On clicking a sample, you will be prompted to choose a model to download if you haven’t run this sample before: Next to the model you can see the size of the model, whether it will run on CPU or GPU, and the associated license. Pick the model that makes the most sense for your machine. You can also download new models and change the model for a sample later from the sample view. Just click the model drop down at the top of the sample: The last thing you can do from the Sample pane is view the sample code and export the project to Visual Studio. Both buttons are found in the top right corner of the sample, and the code view will look like this: Navigating Models If you would rather navigate by models instead of samples, the Gallery also provides the model view: The model view contains a similar navigation menu on the right to navigate between models based on category. Clicking on a model will allow you to see a description of the model, the versions of it that are available to download, and the samples that use the model. Clicking on a sample will take back over to the samples view where you can see the model in action. Deleting and Managing Models If you need to clear up space or see download details for the models you are using, you can head over the Settings page to manage your downloads: From here, you can easily see every model you have downloaded and how much space on your drive they are taking up. You can clear your entire cache for a fresh start or delete individual models that you are no longer using. Any deleted model can be redownload through either the models or samples view. Next Steps for the Gallery The AI Dev Gallery is still a work in progress, and we plan on adding more samples, models, APIs, and features, and we are evaluating adding support for NPUs to take the experience even further If you have feedback, noticed a bug, or any ideas for features or samples, head over to the issue board and submit an issue. We also have a discussion board for any other topics relevant to the Gallery. The Gallery is an open-source project, and we would love contribution, feedback, and ideation! Happy modeling!6.3KViews5likes3Comments
- Getting Started - Generative AI with Phi-3-mini: Running Phi-3-mini in Intel AI PCIn 2024, with the empowerment of AI, we will enter the era of AI PC. On May 20, Microsoft also released the concept of Copilot + PC, which means that PC can run SLM/LLM more efficiently with the support of NPU. We can use models from different Phi-3 family combined with the new AI PC to build a simple personalized Copilot application for individuals. This content will combine Intel's AI PC, use Intel's OpenVINO, NPU Acceleration Library, and Microsoft's DirectML to create a local Copilot.32KViews2likes2Comments
- AI Toolkit for Visual Studio Code: October 2024 Update HighlightsThe AI Toolkit’s October 2024 update revolutionizes Visual Studio Code with game-changing features for developers, researchers, and enthusiasts. Explore multi-model integration, including GitHub Models, ONNX, and Google Gemini, alongside custom model support. Dive into multi-modal capabilities for richer AI testing and seamless multi-platform compatibility across Windows, macOS, and Linux. Tailored for productivity, the enhanced Model Catalog simplifies choosing the best tools for your projects. Try it now and share feedback to shape the future of AI in VS Code!2.9KViews4likes0Comments
- Getting started with Microsoft Phi-3-mini - Try running the Phi-3-mini on iPhone with ONNX RuntimeIn this article, we explore how to deploy generative AI applications to mobile devices, specifically on iPhone, using ONNX Runtime. We cover the steps to compile ONNX Runtime for iOS and then create an App application in Xcode. We also show you how to copy the ONNX quantized INT4 model to the project and add the C++ API to generate text. This is a preliminary exploration of deploying generative AI on mobile devices, but it provides a good starting point for further development.19KViews2likes2Comments
- Accelerate the development of Generative AI application with GitHub ModelsIntroducing GitHub Models, a new feature for over 100 million developers to become AI engineers using top AI models. Access models like Llama 3.1, GPT-4o, GPT-4o mini, Phi 3, and Mistral Large 2 through a built-in playground on GitHub. Test prompts and model settings for free. When ready, seamlessly integrate models into Codespaces and VS Code. For production, Azure AI offers responsible AI, enterprise-grade security, data privacy, and global availability. Models are accessible in over 25 Azure regions with provisioned throughput. Now, building and running your AI application is easier than ever.4.3KViews0likes0Comments