ai foundry

31 Topics

AI Toolkit for VS Code October Update
We're thrilled to bring you the October update for the AI Toolkit for Visual Studio Code! This month marks another major milestone with version 0.24.0, introducing groundbreaking GitHub Copilot Tools Integration and additional user experience enhancements that make AI-powered development more seamless than ever. Let's dive into what's new! 👇 🚀 GitHub Copilot Tools Integration We are excited to announce the integration of GitHub Copilot Tools into AI Toolkit for VS Code. This integration empowers developers to build AI-powered applications more efficiently by leveraging Copilot's capabilities enhanced by AI Toolkit. 🤖 AI Agent Code Generation Tool This powerful tool provides best practices, guidance, steps, and code samples on Microsoft Agent Framework for GitHub Copilot to better scaffold AI agent applications. Whether you're building your first agent or scaling complex multi-agent systems, this tool ensures you follow the latest best practices and patterns. 📊 AI Agent Evaluation Planner Tool Building great AI agents requires thorough evaluation. This tool guides users through the complete process of evaluating AI agents, including: Defining evaluation metrics - Establish clear success criteria for your agents Creating evaluation datasets - Generate comprehensive test datasets Analyzing results - Understand your agent's performance and areas for improvement The Evaluation Planner works seamlessly with two specialized sub-tools: 🏃‍♂️ Evaluation Agent Runner Tool This tool runs agents on provided datasets and collects results, making it easy to test your agents at scale across multiple scenarios and use cases. 💻 Evaluation Code Generation Tool Get best practices, guidance, steps, and code samples on Azure AI Foundry Evaluation Framework for GitHub Copilot to better scaffold code for evaluating AI agents. 🎯 Easy Access and Usage You can access these powerful tools in two convenient ways: Direct GitHub Copilot Integration: Simply enter prompts like: Create an AI agent using Microsoft Agent Framework to help users plan a trip to Paris. Evaluate the performance of my AI agent using Azure AI Foundry Evaluation Framework. AI Toolkit Tree View: For quick access, find these tools in the AI Toolkit Tree View UI under the section `Build Agent with GitHub Copilot`. ✨ Additional Enhancements 🎨 Model Playground Improvements The user experience in Model Playground has been significantly enhanced: Resizable Divider: The divider between chat output and model settings is now resizable, allowing you to customize your workspace layout for better usability and productivity. 📚 Model Catalog Updates We've unified and streamlined the model discovery experience: Unified Local Models: The ONNX models section in the Model Catalog has been merged with Foundry Local Models on macOS and Windows platforms, providing a unified experience for discovering and selecting local models. Simplified Navigation: Find all your local model options in one place, making it easier to compare and select the right model for your use case. ## 🌟 Why This Release Matters Version 0.24.0 represents a significant step forward in making AI development more accessible and efficient: Seamless Integration: The deep integration with GitHub Copilot means AI best practices are now available right where you're already working. End-to-End Workflow: From agent creation to evaluation, you now have comprehensive tooling that guides you through the entire AI development lifecycle. Enhanced Productivity: Improved UI elements and unified experiences reduce friction and help you focus on building great AI applications. 🚀 Get Started and Share Your Feedback Ready to experience the future of AI development? Here's how to get started: 📥 Download: Install the AI Toolkit from the Visual Studio Code Marketplace 📖 Learn: Explore our comprehensive AI Toolkit Documentation 🔍 Discover: Check out the complete changelog for v0.24.0 We'd love to hear from you! Whether it's a feature request, bug report, or feedback on your experience, join the conversation and contribute directly on our GitHub repository. 🎯 What's Next? This release sets the foundation for even more exciting developments ahead. The GitHub Copilot Tools Integration opens up new possibilities for AI-assisted development, and we're just getting started. Stay tuned for more updates, and let's continue building the future of AI agent development together! 💡💬 Happy coding, and see you next month! 🚀
junjieli
Oct 24, 2025 Place Microsoft Developer Community Blog
211Views
1like
0Comments
LangChain v1 is now generally available!
Today LangChain v1 officially launches and marks a new era for the popular AI agent library. The new version ushers in a more streamlined, and extensible foundation for building agentic LLM applications. In this post we'll breakdown what’s new, what changed, and what “general availability” means in practice. Join Microsoft Developer Advocates, Marlene Mhangami and Yohan Lasorsa, to see live demos of the new API and find out more about what JavaScript and Python developers need to know about v1. Register for this event here. Why v1? The Motivation Behind the Redesign The number of abstractions in LangChain had grown over the years to include chains, agents, tools, wrappers, prompt helpers and more, which, while powerful, introduced complexity and fragmentation. As model APIs evolve (multimodal inputs, richer structured output, tool-calling semantics), LangChain needed a cleaner, more consistent core to ensure production ready stability. In v1: All existing chains and agent abstractions in the old LangChain are deprecated; they are replaced by a single high-level agent abstraction built on LangGraph internals. LangGraph becomes the foundational runtime for durable, stateful, orchestrated execution. LangChain now emphasizes being the “fast path to agents” that doesn’t hide but builds upon LangGraph. The internal message format has been upgraded to support standard content blocks (e.g. text, reasoning, citations, tool calls) across model providers, decoupling “content” from raw strings. Namespace cleanup: the langchain package now focuses tightly on core abstractions (agents, models, messages, tools), while legacy patterns are moved into langchain-classic (or equivalents). What’s New & Noteworthy for Developers Here are key changes developers should pay attention to: 1. create_agent becomes the default API The create_agent function is now the idiomatic way to spin up agents in v1. It replaces older constructs (e.g. create_react_agent) with a clearer, more modular API. You can also now compose middleware around model calls, tool calls, before/after hooks, error handling, etc. 2. Standard content blocks & normalized message model One of LangChain's greatest stregnth's is it's model agnosticism. Content blocks move to standardize all outputs, so developers know exactly what to expect regardless of the model they are using. Responses from models are no longer opaque strings. Instead, they carry structured `content_blocks` which classify parts of the output (e.g. “text”, “reasoning”, “citation”, “tool_call”). 3. Multimodal and richer model inputs / outputs LangChain continues to support more than just text-based interactions, but in a more comprehensive way in v1. Models can accept and return files, images, video, etc., and the message format reflects this flexibility. This upgrade prepares us well for the next generation of models with mixed modalities (vision, audio, etc.). 4. Middleware hooks Because create_agent is designed as a pluggable pipeline, developers can now inject logic before/after model calls, before tool calls and more. New middleware such as 'human in the loop' and 'summarization' middleware have been added. This is a feature of the new package that I am most excited about it! Even with the simplified agents API, this option provides more room to customize workflows! Developers can try pre-built middleware or make their own. 5. Simplified, leaner namespace Many formerly top-level modules or helper classes have been removed or relocated to langchain-classic (or similarly stamped “legacy”) to declutter the main API surface. A migration guide is available to help projects transition from v0 to v1. While v1 is now the main line, older v0 is still documented and maintained for compatibility. What “General Availability” Means (and Doesn’t) v1 is production-ready, after testing the alpha version. The stable v0 release line remains supported for those unwilling or unable to migrate immediately. Breaking changes in public APIs will be accompanied by version bumps (i.e. minor version increments) and deprecation notices. The roadmap anticipates minor versions every 2–3 months (with patch releases more frequently). Because the field of LLM applications is evolving rapidly, the team expects continued iterations in v1—even in GA mode—with users encouraged to surface feedback, file issues, and adopt the migration path. (This is in line with the philosophy stated in docs.) Developer Callouts & Suggested Steps Some things we recommend for developers to do to get started with v1: Try the new API Now! LangChain Azure AI and Azure OpenAI have migrated to LangChain v1 and are ready to test! Learn more about using LangChain and Azure AI: Python: https://docs.langchain.com/oss/python/integrations/providers/azure_ai JavaScript: https://docs.langchain.com/oss/javascript/integrations/providers/microsoft Join us for a Live Stream on Wednesday 22 October 2025 Join Microsoft Developer Advocates Marlene Mhangami and Yohan Lasorsa for a livestream this Wednesday to see live demos and find out more about what JavaScript and Python developers need to know about v1. Register for this event here.
mmhangami
Oct 23, 2025 Place Microsoft Developer Community Blog
692Views
0likes
0Comments
Transform Your AI Applications with Local LLM Deployment
Introduction Are you tired of watching your AI application costs spiral out of control every time your user base grows? As AI Engineers and Developers, we've all felt the pain of cloud-dependent LLM deployments. Every API call adds up, latency becomes a bottleneck in real-time applications, and sensitive data must leave your infrastructure to get processed. Meanwhile, your users demand faster responses, better privacy, and more reliable service. What if there was a way to run powerful language models directly on your users' devices or your local infrastructure? Enter the world of Edge AI deployment with Microsoft's Foundry Local a game-changing approach that brings enterprise-grade LLM capabilities to local hardware while maintaining full OpenAI API compatibility. The Edge AI for Beginners https://aka.ms/edgeai-for-beginners curriculum provides AI Engineers and Developers with comprehensive, hands-on training to master local LLM deployment. This isn't just another theoretical course, it's a practical guide that will transform how you think about AI infrastructure, combining cutting-edge local deployment techniques with production-ready implementation patterns. In this post, we'll explore why Edge AI deployment represents the future of AI applications, dive deep into Foundry Local's capabilities across multiple frameworks, and show you exactly how to implement local LLM solutions that deliver both technical excellence and significant business value. Why Edge AI Deployment Changes Everything for Developers The shift from cloud-dependent to edge-deployed AI represents more than just a technical evolution, it's a fundamental reimagining of how we build intelligent applications. As AI Engineers, we're witnessing a transformation that addresses the most pressing challenges in modern AI deployment while opening up entirely new possibilities for innovation. Consider the current state of cloud-based LLM deployment. Every user interaction requires a round-trip to external servers, introducing latency that can kill user experience in real-time applications. Costs scale linearly (or worse) with usage, making successful applications expensive to operate. Sensitive data must traverse networks and live temporarily in external systems, creating compliance nightmares for enterprise applications. Edge AI deployment fundamentally changes this equation. By running models locally, we achieve several critical advantages: Data Sovereignty and Privacy Protection: Your sensitive data never leaves your infrastructure. For healthcare applications processing patient records, financial services handling transactions, or enterprise tools managing proprietary information, this represents a quantum leap in security posture. You maintain complete control over data flow, meeting even the strictest compliance requirements without architectural compromises. Real-Time Performance at Scale: Local inference eliminates network latency entirely. Instead of 200-500ms round-trips to cloud APIs, you get sub-10ms response times. This enables entirely new categories of applications—real-time code completion, interactive AI tutoring systems, voice assistants that respond instantly, and IoT devices that make intelligent decisions without connectivity. Predictable Cost Structure: Transform variable API costs into fixed infrastructure investments. Instead of paying per-token for potentially unlimited usage, you invest in local hardware that serves unlimited requests. This makes ROI calculations straightforward and removes the fear of viral success destroying your margins. Offline Capabilities and Resilience: Local deployment means your AI features work even when connectivity fails. Mobile applications can provide intelligent features in areas with poor network coverage. Critical systems maintain AI capabilities during network outages. Edge devices in remote locations operate autonomously. The technical implications extend beyond these obvious benefits. Local deployment enables new architectural patterns: AI-powered applications that work entirely client-side, edge computing nodes that make intelligent routing decisions, and distributed systems where intelligence lives close to data sources. Foundry Local: Multi-Framework Edge AI Deployment Made Simple Microsoft's Foundry Local https://www.foundrylocal.ai represents a breakthrough in local AI deployment, designed specifically for developers who need production-ready edge AI solutions. Unlike single-framework tools, Foundry Local provides a unified platform that works seamlessly across multiple programming languages and deployment scenarios while maintaining full compatibility with existing OpenAI-based workflows. The platform's approach to multi-framework support means you're not locked into a single technology stack. Whether you're building TypeScript applications, Python ML pipelines, Rust systems programming projects, or .NET enterprise applications, Foundry Local provides native SDKs and consistent APIs that integrate naturally with your existing codebase. Enterprise-Grade Model Catalog: Foundry Local comes with a curated selection of production-ready models optimized for edge deployment. The `phi-3.5-mini` model delivers impressive performance in a compact footprint, perfect for resource-constrained environments. For applications requiring more sophisticated reasoning, `qwen2.5-0.5b` provides enhanced capabilities while maintaining efficiency. When you need maximum capability and have sufficient hardware resources, `gpt-oss-20b` offers state-of-the-art performance with full local control. Intelligent Hardware Optimization: One of Foundry Local's most powerful features is its automatic hardware detection and optimization. The platform automatically identifies your available compute resources, NVIDIA CUDA GPUs, AMD GPUs, Intel NPUs, Qualcomm Snapdragon NPUs, or CPU-only environments and downloads the most appropriate model variant. This means the same application code delivers optimal performance across diverse hardware configurations without manual intervention. ONNX Runtime Acceleration: Under the hood, Foundry Local leverages Microsoft's ONNX Runtime for maximum performance. This provides significant advantages over generic inference engines, delivering optimized execution paths for different hardware architectures while maintaining model accuracy and compatibility. OpenAI SDK Compatibility: Perhaps most importantly for developers, Foundry Local maintains complete API compatibility with the OpenAI SDK. This means existing applications can migrate to local inference by changing only the endpoint configuration—no rewriting of application logic, no learning new APIs, no disruption to existing workflows. The platform handles the complex aspects of local AI deployment automatically: model downloading, hardware-specific optimization, memory management, and inference scheduling. This allows developers to focus on building intelligent applications rather than managing AI infrastructure. Framework-Agnostic Benefits: Foundry Local's multi-framework approach delivers consistent benefits regardless of your technology choices. Whether you're working in a Node.js microservices architecture, a Python data science environment, a Rust embedded system, or a C# enterprise application, you get the same advantages: reduced latency, eliminated API costs, enhanced privacy, and offline capabilities. This universal compatibility means teams can adopt edge AI deployment incrementally, starting with pilot projects in their preferred language and expanding across their technology stack as they see results. The learning curve is minimal because the API patterns remain familiar while the underlying infrastructure transforms to local deployment. Implementing Edge AI: From Code to Production Moving from cloud APIs to local AI deployment requires understanding the implementation patterns that make edge AI both powerful and practical. Let's explore how Foundry Local's SDKs enable seamless integration across different development environments, with real-world code examples that you can adapt for your production systems. Python Implementation for Data Science and ML Pipelines Python developers will find Foundry Local's integration particularly natural, especially in data science and machine learning contexts where local processing is often preferred for security and performance reasons. import openai from foundry_local import FoundryLocalManager # Initialize with automatic hardware optimization alias = "phi-3.5-mini" manager = FoundryLocalManager(alias) This simple initialization handles a remarkable amount of complexity automatically. The `FoundryLocalManager` detects your hardware configuration, downloads the most appropriate model variant for your system, and starts the local inference service. Behind the scenes, it's making intelligent decisions about memory allocation, selecting optimal execution providers, and preparing the model for efficient inference. # Configure OpenAI client for local deployment client = openai.OpenAI( base_url=manager.endpoint, api_key=manager.api_key # Not required for local, but maintains API compatibility ) # Production-ready inference with streaming def analyze_document(content: str): stream = client.chat.completions.create( model=manager.get_model_info(alias).id, messages=[{ "role": "system", "content": "You are an expert document analyzer. Provide structured analysis." }, { "role": "user", "content": f"Analyze this document: {content}" }], stream=True, temperature=0.7 ) result = "" for chunk in stream: if chunk.choices[0].delta.content: content_piece = chunk.choices[0].delta.content result += content_piece yield content_piece # Enable real-time UI updates return result Key implementation benefits here: • Automatic model management: The `FoundryLocalManager` handles model lifecycle, memory optimization, and hardware-specific acceleration without manual configuration. • Streaming interface compatibility: Maintains the familiar OpenAI streaming API while processing locally, enabling real-time user interfaces with zero latency overhead. • Production error handling: The manager includes built-in retry logic, graceful degradation, and resource management for reliable production deployment. JavaScript/TypeScript Implementation for Web Applications JavaScript and TypeScript developers can integrate local AI capabilities directly into web applications, enabling entirely new categories of client-side intelligent features. import { OpenAI } from "openai"; import { FoundryLocalManager } from "foundry-local-sdk"; class LocalAIService { constructor() { this.foundryManager = null; this.openaiClient = null; this.isInitialized = false; } async initialize(modelAlias = "phi-3.5-mini") { this.foundryManager = new FoundryLocalManager(); const modelInfo = await this.foundryManager.init(modelAlias); this.openaiClient = new OpenAI({ baseURL: this.foundryManager.endpoint, apiKey: this.foundryManager.apiKey, }); this.isInitialized = true; return modelInfo; } The initialization pattern establishes local AI capabilities with full error handling and resource management. This enables web applications to provide AI features without external API dependencies. async generateCodeCompletion(codeContext, userPrompt) { if (!this.isInitialized) { throw new Error("LocalAI service not initialized"); } try { const completion = await this.openaiClient.chat.completions.create({ model: this.foundryManager.getModelInfo().id, messages: [ { role: "system", content: "You are a code completion assistant. Provide accurate, efficient code suggestions." }, { role: "user", content: `Context: ${codeContext}\n\nComplete: ${userPrompt}` } ], max_tokens: 150, temperature: 0.2 }); return completion.choices[0].message.content; } catch (error) { console.error("Local AI completion failed:", error); throw new Error("Code completion unavailable"); } } } Implementation advantages for web applications • Zero-dependency AI features: Applications work entirely offline once models are downloaded, enabling AI capabilities in disconnected environments. • Instant response times: Eliminate network latency for real-time features like code completion, content generation, or intelligent search. • Client-side privacy: Sensitive code or content never leaves the user's device, meeting strict security requirements for enterprise development tools. Cross-Platform Production Deployment Patterns Both Python and JavaScript implementations share common production deployment patterns that make Foundry Local particularly suitable for enterprise applications: Automatic Hardware Optimization: The platform automatically detects and utilizes available acceleration hardware. On systems with NVIDIA GPUs, it leverages CUDA acceleration. On newer Intel systems, it uses NPU acceleration. On ARM-based systems like Apple Silicon or Qualcomm Snapdragon, it optimizes for those architectures. This means the same application code delivers optimal performance across diverse deployment environments. Graceful Resource Management: Foundry Local includes sophisticated memory management and resource allocation. Models are loaded efficiently, memory is recycled properly, and concurrent requests are handled intelligently to maintain system stability under load. Production Monitoring Integration: The platform provides comprehensive metrics and logging that integrate naturally with existing monitoring systems, enabling production observability for AI workloads running at the edge. These implementation patterns demonstrate how Foundry Local transforms edge AI from an experimental concept into a practical, production-ready deployment strategy that works consistently across different technology stacks and hardware environments. Measuring Success: Technical Performance and Business Impact The transition to edge AI deployment delivers measurable improvements across both technical and business metrics. Understanding these impacts helps justify the architectural shift and demonstrates the concrete value of local LLM deployment in production environments. Technical Performance Gains Latency Elimination: The most immediately visible benefit is the dramatic reduction in response times. Cloud API calls typically require 200-800ms round-trips, depending on geographic location and network conditions. Local inference with Foundry Local reduces this to sub-10ms response times—a 95-99% improvement that fundamentally changes user experience possibilities. Consider a code completion feature: cloud-based completion feels sluggish and interrupts developer flow, while local completion provides instant suggestions that enhance productivity. The same applies to real-time chat applications, interactive AI tutoring systems, and any application where response latency directly impacts usability. Automatic Hardware Utilization: Foundry Local's intelligent hardware detection and optimization delivers significant performance improvements without manual configuration. On systems with NVIDIA RTX 4000 series GPUs, inference speeds can be 10-50x faster than CPU-only processing. On newer Intel systems with NPUs, the platform automatically leverages neural processing units for efficient AI workloads. Apple Silicon systems benefit from Metal Performance Shaders optimization, delivering excellent performance per watt. ONNX Runtime Optimization: Microsoft's ONNX Runtime provides substantial performance advantages over generic inference engines. In benchmark testing, ONNX Runtime consistently delivers 2-5x performance improvements compared to standard PyTorch or TensorFlow inference, while maintaining full model accuracy and compatibility. Scalability Characteristics: Local deployment transforms scaling economics entirely. Instead of linear cost scaling with usage, you get horizontal scaling through hardware deployment. A single modern GPU can handle hundreds of concurrent inference requests, making per-request costs approach zero for high-volume applications. Business Impact Analysis Cost Structure Transformation: The financial implications of local deployment are profound. Consider an application processing 1 million tokens daily through OpenAI's API—this represents $20-60 in daily costs depending on the model. Over a year, this becomes $7,300-21,900 in recurring expenses. A comparable local deployment might require a $2,000-5,000 hardware investment with no ongoing API costs. For high-volume applications, the savings become dramatic. Applications processing 100 million tokens monthly face $60,000-180,000 annual API costs. Local deployment with appropriate hardware infrastructure could reduce this to electricity and maintenance costs—typically under $10,000 annually for equivalent processing capacity. Enhanced Privacy and Compliance: Local deployment eliminates data sovereignty concerns entirely. Healthcare applications processing patient records, financial services handling transaction data, and enterprise tools managing proprietary information can deploy AI capabilities without data leaving their infrastructure. This simplifies compliance with GDPR, HIPAA, SOX, and other regulatory frameworks while reducing legal and security risks. Operational Resilience: Local deployment provides significant business continuity advantages. Applications continue functioning during network outages, API service disruptions, or third-party provider issues. For mission-critical systems, this resilience can prevent costly downtime and maintain user productivity during external service failures. Development Velocity: Local deployment accelerates development cycles by eliminating API rate limits, usage quotas, and external dependencies during development and testing. Developers can iterate freely, run comprehensive test suites, and experiment with AI features without cost concerns or rate limiting delays. Enterprise Adoption Metrics Real-world enterprise deployments demonstrate measurable business value: Local Usage: Foundry Local for internal AI-powered tools, reporting 60-80% reduction in AI-related operational costs while improving developer productivity through instant AI responses in development environments. Manufacturing Applications: Industrial IoT deployments using edge AI for predictive maintenance show 40-60% reduction in unplanned downtime while eliminating cloud connectivity requirements in remote facilities. Financial Services: Trading firms deploying local LLMs for market analysis report sub-millisecond decision latencies while maintaining complete data isolation for competitive advantage and regulatory compliance. ROI Calculation Framework For AI Engineers evaluating edge deployment, consider these quantifiable factors: Direct Cost Savings: Compare monthly API costs against hardware amortization over 24-36 months. Most applications with >$1,000 monthly API costs achieve positive ROI within 12-18 months. Performance Value: Quantify the business impact of reduced latency. For customer-facing applications, each 100ms of latency reduction typically correlates with 1-3% conversion improvement. Risk Mitigation: Calculate the cost of downtime or compliance violations prevented by local deployment. For many enterprise applications, avoiding a single significant outage justifies the infrastructure investment. Development Efficiency: Measure developer productivity improvements from unlimited local AI access during development. Teams report 20-40% faster iteration cycles when AI features can be tested without external dependencies. These metrics demonstrate that edge AI deployment with Foundry Local delivers both immediate technical improvements and substantial long-term business value, making it a strategic investment in AI infrastructure that pays dividends across multiple dimensions. Your Edge AI Journey Starts Here The shift to edge AI represents more than just a technical evolution, it's an opportunity to fundamentally improve your applications while building valuable expertise in an emerging field. Whether you're looking to reduce costs, improve performance, or enhance privacy, the path forward involves both learning new concepts and connecting with a community of practitioners solving similar challenges. Master Edge AI with Comprehensive Training The Edge AI for Beginners https://aka.ms/edgeai-for-beginners curriculum provides the complete foundation you need to become proficient in local AI deployment. This isn't a superficial overview, it's a comprehensive, hands-on program designed specifically for developers who want to build production-ready edge AI applications. The curriculum takes you through hours of structured learning, progressing from fundamental concepts to advanced deployment scenarios. You'll start by understanding the principles of edge AI and local inference, then dive deep into practical implementation with Foundry Local across multiple programming languages. The program includes working examples and comprehensive sample applications that demonstrate real-world use cases. What sets this curriculum apart is its practical focus. Instead of theoretical discussions, you'll build actual applications: document analysis systems that work offline, real-time code completion tools, intelligent chatbots that protect user privacy, and IoT applications that make decisions locally. Each project teaches both the technical implementation and the architectural thinking needed for successful edge AI deployment. The curriculum covers multi-framework deployment patterns extensively, ensuring you can apply edge AI principles regardless of your preferred development stack. Whether you're working in Python data science environments, JavaScript web applications, C# enterprise systems, or Rust embedded projects, you'll learn the patterns and practices that make edge AI successful. Join a Community of AI Engineers Learning edge AI doesn't happen in isolation, it requires connection with other developers who are solving similar challenges and discovering new possibilities. The Foundry Local Discord community https://aka.ms/foundry-local-discord provides exactly this environment, connecting AI Engineers and Developers from around the world who are implementing local AI solutions. This community serves multiple crucial functions for your development as an edge AI practitioner. You'll find experienced developers sharing implementation patterns they've discovered, debugging complex deployment issues collaboratively, and discussing the architectural decisions that make edge AI successful in production environments. The Discord community includes dedicated channels for different programming languages, specific deployment scenarios, and technical discussions about optimization and performance. Whether you're implementing your first local AI feature or optimizing a complex multi-model deployment, you'll find peers and experts ready to help problem-solve and share insights. Beyond technical support, the community provides valuable career and business insights. Members share their experiences with edge AI adoption in different industries, discuss the business cases that have proven most successful, and collaborate on open-source projects that advance the entire ecosystem. Share Your Experience and Build Expertise One of the most effective ways to solidify your edge AI expertise is by sharing your implementation experiences with the community. As you build applications with Foundry Local and deploy edge AI solutions, documenting your process and sharing your learnings provides value both to others and to your own professional development. Consider sharing your deployment stories, whether they're successes or challenges you've overcome. The community benefits from real-world case studies that show how edge AI performs in different environments and use cases. Your experience implementing local AI in a healthcare application, financial services system, or manufacturing environment provides valuable insights that others can build upon. Technical contributions are equally valuable, whether it's sharing configuration patterns you've discovered, performance optimizations you've implemented, or integration approaches you've developed for specific frameworks or libraries. The edge AI field is evolving rapidly, and practical contributions from working developers drive much of the innovation. Sharing your work also builds your professional reputation as an edge AI expert. As organizations increasingly adopt local AI deployment strategies, developers with proven experience in this area become valuable resources for their teams and the broader industry. The combination of structured learning through the Edge AI curriculum, active participation in the community, and sharing your practical experiences creates a comprehensive path to edge AI expertise that serves both your immediate project needs and your long-term career development as AI deployment patterns continue evolving. Key Takeaways Local LLM deployment transforms application economics: Replace variable API costs with fixed infrastructure investments that scale to unlimited usage, typically achieving ROI within 12-18 months for applications with significant AI workloads. Foundry Local enables multi-framework edge AI: Consistent deployment patterns across Python, JavaScript, C#, and Rust environments with automatic hardware optimization and OpenAI API compatibility. Performance improvements are dramatic and measurable: Sub-10ms response times replace 200-800ms cloud API latency, while automatic hardware acceleration delivers 2-50x performance improvements depending on available compute resources. Privacy and compliance become architectural advantages: Local deployment eliminates data sovereignty concerns, simplifies regulatory compliance, and provides complete control over sensitive information processing. Edge AI expertise is a strategic career investment: As organizations increasingly adopt local AI deployment, developers with hands-on edge AI experience become valuable technical resources with unique skills in an emerging field. Conclusion Edge AI deployment represents the next evolution in intelligent application development, transforming both the technical possibilities and economic models of AI-powered systems. With Foundry Local and the comprehensive Edge AI for Beginners curriculum, you have access to production-ready tools and expert guidance to make this transition successfully. The path forward is clear: start with the Edge AI for Beginners curriculum to build solid foundations, connect with the Foundry Local Discord community to learn from practicing developers, and begin implementing local AI solutions in your projects. Each step builds valuable expertise while delivering immediate improvements to your applications. As cloud costs continue rising and privacy requirements become more stringent, organizations will increasingly rely on developers who can implement local AI solutions effectively. Your early adoption of edge AI deployment patterns positions you at the forefront of this technological shift, with skills that will become increasingly valuable as the industry evolves. The future of AI deployment is local, private, and performance-optimized. Start building that future today. Resources Edge AI for Beginners Curriculum: Comprehensive training with 36-45 hours of hands-on content examples, and production-ready deployment patterns https://aka.ms/edgeai-for-beginners Foundry Local GitHub Repository: Official documentation, samples, and community contributions for local AI deployment https://github.com/microsoft/foundry_local Foundry Local Discord Community: Connect with AI Engineers and Developers implementing edge AI solutions worldwide https://aka.ms/foundry/discord Foundry Local Documentation: Complete technical documentation and API references Foundry Local documentation | Microsoft Learn Foundry Local Model Catalog: Browse available models and deployment options for different hardware configurations Foundry Local Models - Browse AI Models
Lee_Stott
Oct 23, 2025 Place Microsoft Developer Community Blog
224Views
0likes
1Comment
Serverless MCP Agent with LangChain.js v1 — Burgers, Tools, and Traces 🍔
AI agents that can actually do stuff (not just chat) are the fun part nowadays, but wiring them cleanly into real APIs, keeping things observable, and shipping them to the cloud can get... messy. So we built a fresh end‑to‑end sample to show how to do it right with the brand new LangChain.js v1 and Model Context Protocol (MCP). In case you missed it, MCP is a recent open standard that makes it easy for LLM agents to consume tools and APIs, and LangChain.js, a great framework for building GenAI apps and agents, has first-class support for it. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. This new sample gives you: A LangChain.js v1 agent that streams its result, along reasoning + tool steps An MCP server exposing real tools (burger menu + ordering) from a business API A web interface with authentication, sessions history, and a debug panel (for developers) A production-ready multi-service architecture Serverless deployment on Azure in one command ( azd up ) Yes, it’s a burger ordering system. Who doesn't like burgers? Grab your favorite beverage ☕, and let’s dive in for a quick tour! TL;DR key takeaways New sample: full-stack Node.js AI agent using LangChain.js v1 + MCP tools Architecture: web app → agent API → MCP server → burger API Runs locally with a single npm start , deploys with azd up Uses streaming (NDJSON) with intermediate tool + LLM steps surfaced to the UI Ready to fork, extend, and plug into your own domain / tools What will you learn here? What this sample is about and its high-level architecture What LangChain.js v1 brings to the table for agents How to deploy and run the sample How MCP tools can expose real-world APIs Reference links for everything we use GitHub repo LangChain.js docs Model Context Protocol Azure Developer CLI MCP Inspector Use case You want an AI assistant that can take a natural language request like “Order two spicy burgers and show me my pending orders” and: Understand intent (query menu, then place order) Call the right MCP tools in sequence, calling in turn the necessary APIs Stream progress (LLM tokens + tool steps) Return a clean final answer Swap “burgers” for “inventory”, “bookings”, “support tickets”, or “IoT devices” and you’ve got a reusable pattern! Sample overview Before we play a bit with the sample, let's have a look at the main services implemented here: Service Role Tech Agent Web App ( agent-webapp ) Chat UI + streaming + session history Azure Static Web Apps, Lit web components Agent API ( agent-api ) LangChain.js v1 agent orchestration + auth + history Azure Functions, Node.js Burger MCP Server ( burger-mcp ) Exposes burger API as tools over MCP (Streamable HTTP + SSE) Azure Functions, Express, MCP SDK Burger API ( burger-api ) Business logic: burgers, toppings, orders lifecycle Azure Functions, Cosmos DB Here's a simplified view of how they interact: There are also other supporting components like databases and storage not shown here for clarity. For this quickstart we'll only interact with the Agent Web App and the Burger MCP Server, as they are the main stars of the show here. LangChain.js v1 agent features The recent release of LangChain.js v1 is a huge milestone for the JavaScript AI community! It marks a significant shift from experimental tools to a production-ready framework. The new version doubles down on what’s needed to build robust AI applications, with a strong focus on agents. This includes first-class support for streaming not just the final output, but also intermediate steps like tool calls and agent reasoning. This makes building transparent and interactive agent experiences (like the one in this sample) much more straightforward. Quickstart Requirements GitHub account Azure account (free signup, or if you're a student, get free credits here) Azure Developer CLI Deploy and run the sample We'll use GitHub Codespaces for a quick zero-install setup here, but if you prefer to run it locally, check the README. Click on the following link or open it in a new tab to launch a Codespace: Create Codespace This will open a VS Code environment in your browser with the repo already cloned and all the tools installed and ready to go. Provision and deploy to Azure Open a terminal and run these commands: # Install dependencies npm install # Login to Azure azd auth login # Provision and deploy all resources azd up Follow the prompts to select your Azure subscription and region. If you're unsure of which one to pick, choose East US 2 . The deployment will take about 15 minutes the first time, to create all the necessary resources (Functions, Static Web Apps, Cosmos DB, AI Models). If you're curious about what happens under the hood, you can take a look at the main.bicep file in the infra folder, which defines the infrastructure as code for this sample. Test the MCP server While the deployment is running, you can run the MCP server and API locally (even in Codespaces) to see how it works. Open another terminal and run: npm start This will start all services locally, including the Burger API and the MCP server, which will be available at http://localhost:3000/mcp . This may take a few seconds, wait until you see this message in the terminal: 🚀 All services ready 🚀 When these services are running without Azure resources provisioned, they will use in-memory data instead of Cosmos DB so you can experiment freely with the API and MCP server, though the agent won't be functional as it requires a LLM resource. MCP tools The MCP server exposes the following tools, which the agent can use to interact with the burger ordering system: Tool Name Description get_burgers Get a list of all burgers in the menu get_burger_by_id Get a specific burger by its ID get_toppings Get a list of all toppings in the menu get_topping_by_id Get a specific topping by its ID get_topping_categories Get a list of all topping categories get_orders Get a list of all orders in the system get_order_by_id Get a specific order by its ID place_order Place a new order with burgers (requires userId , optional nickname ) delete_order_by_id Cancel an order if it has not yet been started (status must be pending , requires userId ) You can test these tools using the MCP Inspector. Open another terminal and run: npx -y @modelcontextprotocol/inspector Then open the URL printed in the terminal in your browser and connect using these settings: Transport: Streamable HTTP URL: http://localhost:3000/mcp Connection Type: Via Proxy (should be default) Click on Connect, then try listing the tools first, and run get_burgers tool to get the menu info. Test the Agent Web App After the deployment is completed, you can run the command npm run env to print the URLs of the deployed services. Open the Agent Web App URL in your browser (it should look like https://<your-web-app>.azurestaticapps.net ). You'll first be greeted by an authentication page, you can sign in either with your GitHub or Microsoft account and then you should be able to access the chat interface. From there, you can start asking any question or use one of the suggested prompts, for example try asking: Recommend me an extra spicy burger . As the agent processes your request, you'll see the response streaming in real-time, along with the intermediate steps and tool calls. Once the response is complete, you can also unfold the debug panel to see the full reasoning chain and the tools that were invoked: Tip: Our agent service also sends detailed tracing data using OpenTelemetry. You can explore these either in Azure Monitor for the deployed service, or locally using an OpenTelemetry collector. We'll cover this in more detail in a future post. Wrap it up Congratulations, you just finished spinning up a full-stack serverless AI agent using LangChain.js v1, MCP tools, and Azure’s serverless platform. Now it's your turn to dive in the code and extend it for your use cases! 😎 And don't forget to azd down once you're done to avoid any unwanted costs. Going further This was just a quick introduction to this sample, and you can expect more in-depth posts and tutorials soon. Since we're in the era of AI agents, we've also made sure that this sample can be explored and extended easily with code agents like GitHub Copilot. We even built a custom chat mode to help you discover and understand the codebase faster! Check out the Copilot setup guide in the repo to get started. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. If you like this sample, don't forget to star the repo ⭐️! You can also join us in the Azure AI community Discord to chat and ask any questions. Happy coding and burger ordering! 🍔
sinedied
Oct 22, 2025 Place Microsoft Developer Community Blog
944Views
0likes
0Comments
Building a Multi-Agent System with Azure AI Agent Service: Campus Event Management
Personal Background My name is Peace Silly. I studied French and Spanish at the University of Oxford, where I developed a strong interest in how language is structured and interpreted. That curiosity about syntax and meaning eventually led me to computer science, which I came to see as another language built on logic and structure. In the academic year 2024–2025, I completed the MSc Computer Science at University College London, where I developed this project as part of my Master’s thesis. Project Introduction Can large-scale event management be handled through a simple chat interface? This was the question that guided my Master’s thesis project at UCL. As part of the Industry Exchange Network (IXN) and in collaboration with Microsoft, I set out to explore how conversational interfaces and autonomous AI agents could simplify one of the most underestimated coordination challenges in campus life: managing events across multiple departments, societies, and facilities. At large universities, event management is rarely straightforward. Rooms are shared between academic timetables, student societies, and one-off events. A single lecture theatre might host a departmental seminar in the morning, a society meeting in the afternoon, and a careers talk in the evening, each relying on different systems, staff, and communication chains. Double bookings, last-minute cancellations, and maintenance issues are common, and coordinating changes often means long email threads, manual spreadsheets, and frustrated users. These inefficiencies do more than waste time; they directly affect how a campus functions day to day. When venues are unavailable or notifications fail to reach the right people, even small scheduling errors can ripple across entire departments. A smarter, more adaptive approach was needed, one that could manage complex workflows autonomously while remaining intuitive and human for end users. The result was the Event Management Multi-Agent System, a cloud-based platform where staff and students can query events, book rooms, and reschedule activities simply by chatting. Behind the scenes, a network of Azure-powered AI agents collaborates to handle scheduling, communication, and maintenance in real time, working together to keep the campus running smoothly. The user scenario shown in the figure below exemplifies the vision that guided the development of this multi-agent system. Starting with Microsoft Learning Resources I began my journey with Microsoft’s tutorial Build Your First Agent with Azure AI Foundry which introduced the fundamentals of the Azure AI Agent Service and provided an ideal foundation for experimentation. Within a few weeks, using the Azure Foundry environment, I extended those foundations into a fully functional multi-agent system. Azure Foundry’s visual interface was an invaluable learning space. It allowed me to deploy, test, and adjust model parameters such as temperature, system prompts, and function calling while observing how each change influenced the agents’ reasoning and collaboration. Through these experiments, I developed a strong conceptual understanding of orchestration and coordination before moving to the command line for more complex development later. When development issues inevitably arose, I relied on the Discord support community and the GitHub forum for troubleshooting. These communities were instrumental in addressing configuration issues and providing practical examples, ensuring that each agent performed reliably within the shared-thread framework. This early engagement with Microsoft’s learning materials not only accelerated my technical progress but also shaped how I approached experimentation, debugging, and iteration. It transformed a steep learning curve into a structured, hands-on process that mirrored professional software development practice. A Decentralised Team of AI Agents The system’s intelligence is distributed across three specialised agents, powered by OpenAI’s GPT-4.1 models through Azure OpenAI Service. They each perform a distinct role within the event management workflow: Scheduling Agent – interprets natural language requests, checks room availability, and allocates suitable venues. Communications Agent – notifies stakeholders when events are booked, modified, or cancelled. Maintenance Agent – monitors room readiness, posts fault reports when venues become unavailable, and triggers rescheduling when needed. Each agent operates independently but communicates through a shared thread, a transparent message log that serves as the coordination backbone. This thread acts as a persistent state space where agents post updates, react to changes, and maintain a record of every decision. For example, when a maintenance fault is detected, the Maintenance Agent logs the issue, the Scheduling Agent identifies an alternative venue, and the Communications Agent automatically notifies attendees. These interactions happen autonomously, with each agent responding to the evolving context recorded in the shared thread. Interfaces and Backend The system was designed with both developer-focused and user-facing interfaces, supporting rapid iteration and intuitive interaction. The Terminal Interface Initially, the agents were deployed and tested through a terminal interface, which provided a controlled environment for debugging and verifying logic step by step. This setup allowed quick testing of individual agents and observation of their interactions within the shared thread. The Chat Interface As the project evolved, I introduced a lightweight chat interface to make the system accessible to staff and students. This interface allows users to book rooms, query events, and reschedule activities using plain language. Recognising that some users might still want to see what happens behind the scenes, I added an optional toggle that reveals the intermediate steps of agent reasoning. This transparency feature proved valuable for debugging and for more technical users who wanted to understand how the agents collaborated. When a user interacts with the chat interface, they are effectively communicating with the Scheduling Agent, which acts as the primary entry point. The Scheduling Agent interprets natural-language commands such as “Book the Engineering Auditorium for Friday at 2 PM” or “Reschedule the robotics demo to another room.” It then coordinates with the Maintenance and Communications Agents to complete the process. Behind the scenes, the chat interface connects to a FastAPI backend responsible for core logic and data access. A Flask + HTMX layer handles lightweight rendering and interactivity, while the Azure AI Agent Service manages orchestration and shared-thread coordination. This combination enables seamless agent communication and reliable task execution without exposing any of the underlying complexity to the end user. Automated Notifications and Fault Detection Once an event is scheduled, the Scheduling Agent posts the confirmation to the shared thread. The Communications Agent, which subscribes to thread updates, automatically sends notifications to all relevant stakeholders by email. This ensures that every participant stays informed without any manual follow-up. The Maintenance Agent runs routine availability checks. If a fault is detected, it logs the issue to the shared thread, prompting the Scheduling Agent to find an alternative room. The Communications Agent then notifies attendees of the change, ensuring minimal disruption to ongoing events. Testing and Evaluation The system underwent several layers of testing to validate both functional and non-functional requirements. Unit and Integration Tests Backend reliability was evaluated through unit and integration tests to ensure that room allocation, conflict detection, and database operations behaved as intended. Automated test scripts verified end-to-end workflows for event creation, modification, and cancellation across all agents. Integration results confirmed that the shared-thread orchestration functioned correctly, with all test cases passing consistently. However, coverage analysis revealed that approximately 60% of the codebase was tested, leaving some areas such as Azure service integration and error-handling paths outside automated validation. These trade-offs were deliberate, balancing test depth with project scope and the constraints of mocking live dependencies. Azure AI Evaluation While functional testing confirmed correctness, it did not capture the agents’ reasoning or language quality. To assess this, I used Azure AI Evaluation, which measures conversational performance across metrics such as relevance, coherence, fluency, and groundedness. The results showed high scores in relevance (4.33) and groundedness (4.67), confirming the agents’ ability to generate accurate and context-aware responses. However, slightly lower fluency scores and weaker performance in multi-turn tasks revealed a retrieval–execution gap typical in task-oriented dialogue systems. Limitations and Insights The evaluation also surfaced several key limitations: Synthetic data: All tests were conducted with simulated datasets rather than live campus systems, limiting generalisability. Scalability: A non-functional requirement in the form of horizontal scalability was not tested. The architecture supports scaling conceptually but requires validation under heavier load. Despite these constraints, the testing process confirmed that the system was both technically reliable and linguistically robust, capable of autonomous coordination under normal conditions. The results provided a realistic picture of what worked well and what future iterations should focus on improving. Impact and Future Work This project demonstrates how conversational AI and multi-agent orchestration can streamline real operational processes. By combining Azure AI Agent Services with modular design principles, the system automates scheduling, communication, and maintenance while keeping the user experience simple and intuitive. The architecture also establishes a foundation for future extensions: Predictive maintenance to anticipate venue faults before they occur. Microsoft Teams integration for seamless in-chat scheduling. Scalability testing and real-user trials to validate performance at institutional scale. Beyond its technical results, the project underscores the potential of multi-agent systems in real-world coordination tasks. It illustrates how modularity, transparency, and intelligent orchestration can make everyday workflows more efficient and human-centred. Acknowledgements What began with a simple Microsoft tutorial evolved into a working prototype that reimagines how campuses could manage their daily operations through conversation and collaboration. This was both a challenging and rewarding journey, and I am deeply grateful to Professor Graham Roberts (UCL) and Professor Lee Stott (Microsoft) for their guidance, feedback, and support throughout the project.
PeaceSilly
Oct 20, 2025 Place Educator Developer Blog
122Views
1like
0Comments
AMA: Azure AI Foundry Voice Live API: Build Smarter, Faster Voice Agents
Join us LIVE in the Azure AI Foundry Discord on the 14th October, 2025, 10am PT to learn more about Voice Live API Voice is no longer a novelty, it's the next-gen interface between humans and machines. From automotive assistants to educational tutors, voice-driven agents are reshaping how we interact with technology. But building seamless, real-time voice experiences has often meant stitching together a patchwork of services: STT, GenAI, TTS, avatars, and more. Until now. Introducing Azure AI Foundry Voice Live API Launched into general availability on October 1, 2025, the Azure AI Foundry Voice Live API is a game-changer for developers building voice-enabled agents. It unifies the entire voice stack—speech-to-text, generative AI, text-to-speech, avatars, and conversational enhancements, into a single, streamlined interface. That means: ⚡ Lower latency 🧠 Smarter interactions 🛠️ Simplified development 📈 Scalable deployment Whether you're prototyping a voice bot for customer support or deploying a full-stack assistant in production, Voice Live API accelerates your journey from idea to impact. Ask Me Anything: Deep Dive with the CoreAI Speech Team Join us for a live AMA session where you can engage directly with the engineers behind the API: 🗓️ Date: 14th Oct 2025 🕒 Time: 10am PT 📍 Location: https://aka.ms/foundry/discord See the EVENTS 🎤 Speakers: Qinying Liao, Principal Program Manager, CoreAI Speech Jan Gorgen, Senior Program Manager, CoreAI Speech They’ll walk through real-world use cases, demo the API in action, and answer your toughest questions, from latency optimization to avatar integration. Who Should Attend? This AMA is designed for: AI engineers building multimodal agents Developers integrating voice into enterprise workflows Researchers exploring conversational UX Foundry users looking to scale voice prototypes Why It Matters Voice Live API isn’t just another endpoint, it’s a foundation for building natural, responsive, and production-ready voice agents. With Azure AI Foundry’s orchestration and deployment tools, you can: Skip the glue code Focus on experience design Deploy with confidence across platforms Bring Your Questions Curious about latency benchmarks? Want to know how avatars sync with TTS? Wondering how to integrate with your existing Foundry workflows? This is your chance to ask the team directly.
Lee_Stott
Oct 08, 2025 Place Microsoft Developer Community Blog
166Views
0likes
0Comments
Introducing the Microsoft Agent Framework
Introducing the Microsoft Agent Framework: A Unified Foundation for AI Agents and Workflows The landscape of AI development is evolving rapidly, and Microsoft is at the forefront with the release of the Microsoft Agent Framework an open-source SDK designed to empower developers to build intelligent, multi-agent systems with ease and precision. Whether you're working in .NET or Python, this framework offers a unified, extensible foundation that merges the best of Semantic Kernel and AutoGen, while introducing powerful new capabilities for agent orchestration and workflow design. Introducing Microsoft Agent Framework: The Open-Source Engine for Agentic AI Apps | Azure AI Foundry Blog Introducing Microsoft Agent Framework | Microsoft Azure Blog Why Another Agent Framework? Both Semantic Kernel and AutoGen have pioneered agentic development, Semantic Kernel with its enterprise-grade features and AutoGen with its research-driven abstractions. The Microsoft Agent Framework is the next generation of both, built by the same teams to unify their strengths: AutoGen’s simplicity in multi-agent orchestration. Semantic Kernel’s robustness in thread-based state management, telemetry, and type safety. New capabilities like graph-based workflows, checkpointing, and human-in-the-loop support This convergence means developers no longer have to choose between experimentation and production. The Agent Framework is designed to scale from single-agent prototypes to complex, enterprise-ready systems Core Capabilities AI Agents AI agents are autonomous entities powered by LLMs that can process user inputs, make decisions, call tools and MCP servers, and generate responses. They support providers like Azure OpenAI, OpenAI, and Azure AI, and can be enhanced with: Agent threads for state management. Context providers for memory. Middleware for action interception. MCP clients for tool integration Use cases include customer support, education, code generation, research assistance, and more—especially where tasks are dynamic and underspecified. Workflows Workflows are graph-based orchestrations that connect multiple agents and functions to perform complex, multi-step tasks. They support: Type-based routing Conditional logic Checkpointing Human-in-the-loop interactions Multi-agent orchestration patterns (sequential, concurrent, hand-off, Magentic) Workflows are ideal for structured, long-running processes that require reliability and modularity. Developer Experience The Agent Framework is designed to be intuitive and powerful: Installation: Python: pip install agent-framework .NET: dotnet add package Microsoft.Agents.AI Integration: Works with Foundry SDK, MCP SDK, A2A SDK, and M365 Copilot Agents Samples and Manifests: Explore declarative agent manifests and code samples Learning Resources: Microsoft Learn modules AI Agents for Beginners AI Show demos Azure AI Foundry Discord community Migration and Compatibility If you're currently using Semantic Kernel or AutoGen, migration guides are available to help you transition smoothly. The framework is designed to be backward-compatible where possible, and future updates will continue to support community contributions via the GitHub repository. Important Considerations The Agent Framework is in public preview. Feedback and issues are welcome on the GitHub repository. When integrating with third-party servers or agents, review data sharing practices and compliance boundaries carefully. The Microsoft Agent Framework marks a pivotal moment in AI development, bringing together research innovation and enterprise readiness into a single, open-source foundation. Whether you're building your first agent or orchestrating a fleet of them, this framework gives you the tools to do it safely, scalably, and intelligently. Ready to get started? Download the SDK, explore the documentation, and join the community shaping the future of AI agents.
Lee_Stott
Oct 08, 2025 Place Microsoft Developer Community Blog
1.4KViews
0likes
1Comment
From Cloud to Chip: Building Smarter AI at the Edge with Windows AI PCs
As AI engineers, we’ve spent years optimizing models for the cloud, scaling inference, wrangling latency, and chasing compute across clusters. But the frontier is shifting. With the rise of Windows AI PCs and powerful local accelerators, the edge is no longer a constraint it’s now a canvas. Whether you're deploying vision models to industrial cameras, optimizing speech interfaces for offline assistants, or building privacy-preserving apps for healthcare, Edge AI is where real-world intelligence meets real-time performance. Why Edge AI, Why Now? Edge AI isn’t just about running models locally, it’s about rethinking the entire lifecycle: - Latency: Decisions in milliseconds, not round-trips to the cloud. - Privacy: Sensitive data stays on-device, enabling HIPAA/GDPR compliance. - Resilience: Offline-first apps that don’t break when the network does. - Cost: Reduced cloud compute and bandwidth overhead. With Windows AI PCs powered by Intel and Qualcomm NPUs and tools like ONNX Runtime, DirectML, and Olive, developers can now optimize and deploy models with unprecedented efficiency. What You’ll Learn in Edge AI for Beginners The Edge AI for Beginners curriculum is a hands-on, open-source guide designed for engineers ready to move from theory to deployment. Multi-Language Support This content is available in over 48 languages, so you can read and study in your native language. What You'll Master This course takes you from fundamental concepts to production-ready implementations, covering: Small Language Models (SLMs) optimized for edge deployment Hardware-aware optimization across diverse platforms Real-time inference with privacy-preserving capabilities Production deployment strategies for enterprise applications Why EdgeAI Matters Edge AI represents a paradigm shift that addresses critical modern challenges: Privacy & Security: Process sensitive data locally without cloud exposure Real-time Performance: Eliminate network latency for time-critical applications Cost Efficiency: Reduce bandwidth and cloud computing expenses Resilient Operations: Maintain functionality during network outages Regulatory Compliance: Meet data sovereignty requirements Edge AI Edge AI refers to running AI algorithms and language models locally on hardware, close to where data is generated without relying on cloud resources for inference. It reduces latency, enhances privacy, and enables real-time decision-making. Core Principles: On-device inference: AI models run on edge devices (phones, routers, microcontrollers, industrial PCs) Offline capability: Functions without persistent internet connectivity Low latency: Immediate responses suited for real-time systems Data sovereignty: Keeps sensitive data local, improving security and compliance Small Language Models (SLMs) SLMs like Phi-4, Mistral-7B, Qwen and Gemma are optimized versions of larger LLMs, trained or distilled for: Reduced memory footprint: Efficient use of limited edge device memory Lower compute demand: Optimized for CPU and edge GPU performance Faster startup times: Quick initialization for responsive applications They unlock powerful NLP capabilities while meeting the constraints of: Embedded systems: IoT devices and industrial controllers Mobile devices: Smartphones and tablets with offline capabilities IoT Devices: Sensors and smart devices with limited resources Edge servers: Local processing units with limited GPU resources Personal Computers: Desktop and laptop deployment scenarios Course Modules & Navigation Course duration. 10 hours of content Module Topic Focus Area Key Content Level Duration 📖 00 Introduction to EdgeAI Foundation & Context EdgeAI Overview • Industry Applications • SLM Introduction • Learning Objectives Beginner 1-2 hrs 📚 01 EdgeAI Fundamentals Cloud vs Edge AI comparison EdgeAI Fundamentals • Real World Case Studies • Implementation Guide • Edge Deployment Beginner 3-4 hrs 🧠 02 SLM Model Foundations Model families & architecture Phi Family • Qwen Family • Gemma Family • BitNET • μModel • Phi-Silica Beginner 4-5 hrs 🚀 03 SLM Deployment Practice Local & cloud deployment Advanced Learning • Local Environment • Cloud Deployment Intermediate 4-5 hrs ⚙️ 04 Model Optimization Toolkit Cross-platform optimization Introduction • Llama.cpp • Microsoft Olive • OpenVINO • Apple MLX • Workflow Synthesis Intermediate 5-6 hrs 🔧 05 SLMOps Production Production operations SLMOps Introduction • Model Distillation • Fine-tuning • Production Deployment Advanced 5-6 hrs 🤖 06 AI Agents & Function Calling Agent frameworks & MCP Agent Introduction • Function Calling • Model Context Protocol Advanced 4-5 hrs 💻 07 Platform Implementation Cross-platform samples AI Toolkit • Foundry Local • Windows Development Advanced 3-4 hrs 🏭 08 Foundry Local Toolkit Production-ready samples Sample applications (see details below) Expert 8-10 hrs Each module includes Jupyter notebooks, code samples, and deployment walkthroughs, perfect for engineers who learn by doing. Developer Highlights - 🔧 Olive: Microsoft's optimization toolchain for quantization, pruning, and acceleration. - 🧩 ONNX Runtime: Cross-platform inference engine with support for CPU, GPU, and NPU. - 🎮 DirectML: GPU-accelerated ML API for Windows, ideal for gaming and real-time apps. - 🖥️ Windows AI PCs: Devices with built-in NPUs for low-power, high-performance inference. Local AI: Beyond the Edge Local AI isn’t just about inference, it’s about autonomy. Imagine agents that: - Learn from local context - Adapt to user behavior - Respect privacy by design With tools like Agent Framework, Azure AI Foundry and Windows Copilot Studio, and Foundry Local developers can orchestrate local agents that blend LLMs, sensors, and user preferences, all without cloud dependency. Try It Yourself Ready to get started? Clone the Edge AI for Beginners GitHub repo, run the notebooks, and deploy your first model to a Windows AI PC or IoT devices Whether you're building smart kiosks, offline assistants, or industrial monitors, this curriculum gives you the scaffolding to go from prototype to production.
Lee_Stott
Oct 07, 2025 Place Microsoft Developer Community Blog
258Views
1like
1Comment
Foundry Fridays: Your Front-Row Seat to Azure AI Innovation
🔥 Foundry Fridays: Your Front-Row Seat to Azure AI Innovation Are you ready to go beyond the blog posts and docs and get your questions answered directly by the minds behind Azure AI? Then mark your calendars for Foundry Fridays a weekly Ask Me Anything (AMA) series hosted on the Azure AI Foundry Discord. Every Friday at 1:30 PM ET, the Azure AI team opens the floor to developers, researchers, and enthusiasts for a 30-minute live AMA with the experts building the future of AI at Microsoft. Whether you're curious about model fine-tuning, local inference, agentic workflows, or the latest in open-source tooling—Foundry Fridays is where the real-time insights happen. 🎙️ Why Join Foundry Fridays? Direct Access to Experts: Ask your questions live to Principal PMs, researchers, and engineers from the Azure AI Foundry team. Fresh Topics Weekly: Each session spotlights a new theme from model routing and MCP registries to SAMBA architectures and AI agent security. Community-Driven: These aren’t lectures—they’re conversations. Bring your curiosity, share your feedback, and help shape the future of Azure AI. No Slides, Just Substance: It’s raw, real, and refreshingly unscripted. You’ll hear what’s working, what’s coming, and what’s still being figured out. Episode is hosted by community leaders like Nitya Narasimhan and Lee Stott, who guide the conversation and ensure your questions get the spotlight they deserve you can watch all the Monday Model Series on Demand at https://aka.ms/model-mondays and get ready for Season 3 of Model Mondays every Monday at 1.30pm ET. 📈 Why It Matters Foundry Fridays isn’t just another event it’s a community catalyst. Join our communty hear from experts and share your experiences of using Azure AI Tools and Services. 🚀 How to Join Join the Discord: aka.ms/model-mondays/discord Find the AMA: Head to the Events #community-calls and #model-mondays channel or check the pinned events. Ask Anything: Come with questions, ideas, or just listen in. No registration required. Want a sneak peek at what’s coming? Check the Foundry Fridays schedule or follow the Azure AI Foundry Blog for recaps and resources. 💬 Final Thoughts Whether you're building with Azure AI, exploring open-source models, or just curious about what’s next—Foundry Fridays is your chance to connect, learn, and grow with the community. So grab your headphones, fire up Discord, and let’s build the future of AI—together. 🗓️ Fridays | 1:30 PM ET 📍 Azure AI Foundry Discord 🔗 Join Now
Lee_Stott
Oct 02, 2025 Place Microsoft Developer Community Blog
218Views
0likes
0Comments
AI Foundry - Open API spec tool issue
Hello, I'm trying to invoke my application's API as a tool within the AI Foundry OpenAPI specification tool. However, I keep encountering a 401 Unauthorized error. I'm using a Bearer token for authentication, and it works perfectly when tested via Postman. I'm unsure whether the issue lies in the input/output schema or the connection configuration. Unfortunately, the AI Foundry Traces aren't providing enough detail to pinpoint the exact problem. Additionally, my API and AI Foundry accounts are hosted in different Azure subscriptions and networks. Could this network separation be affecting the connection? I would appreciate any guidance or help to resolve this issue. -Tamizh
Tamizh
Oct 01, 2025 Place Azure AI Foundry Discussions
61Views
0likes
1Comment