azure ai foundry
28 TopicsAI Toolkit Extension Pack for Visual Studio Code: Ignite 2025 Update
Unlock the Latest Agentic App Capabilities The Ignite 2025 update delivers a major leap forward for the AI Toolkit extension pack in VS Code, introducing a unified, end-to-end environment for building, visualizing, and deploying agentic applications to Microsoft Foundry, and the addition of Anthropic’s frontier Claude models in the Model Catalog! This release enables developers to build and debug locally in VS Code, then deploy to the cloud with a single click. Seamlessly switch between VS Code and the Foundry portal for visualization, orchestration, and evaluation, creating a smooth roundtrip workflow that accelerates innovation and delivers a truly unified AI development experience. Download the http://aka.ms/aitoolkit today and start building next-generation agentic apps in VS Code! What Can You Do with the AI Toolkit Extension Pack? Access Anthropic models in the Model Catalog Following the Microsoft, NVIDIA and Anthropic strategic partnerships announcement today, we are excited to share that Anthropic’s frontier Claude models including Claude Sonnet 4.5, Claude Opus 4.1, and Claude Haiku 4.5, are now integrated into the AI Toolkit, providing even more choices and flexibility when building intelligent applications and AI agents. Build AI Agents Using GitHub Copilot Scaffold agent applications using best-practice patterns, tool-calling examples, tracing hooks, and test scaffolds, all powered by Copilot and aligned with the Microsoft Agent Framework. Generate agent code in Python or .NET, giving you flexibility to target your preferred runtime. Build and Customize YAML Workflows Design YAML-based workflows in the Foundry portal, then continue editing and testing directly in VS Code. To customize your YAML-based workflows, instantly convert it to Agent Framework code using GitHub Copilot. Upgrade from declarative design to code-first customization without starting from scratch. Visualize Multi-Agent Workflows Envision your code-based agent workflows with an interactive graph visualizer that reveals each component and how they connect Watch in real-time how each node lights up as you run your agent. Use the visualizer to understand and debug complex agent graphs, making iteration fast and intuitive. Experiment, Debug, and Evaluate Locally Use the Hosted Agents Playground to quickly interact with your agents on your development machine. Leverage local tracing support to debug reasoning steps, tool calls, and latency hotspots—so you can quickly diagnose and fix issues. Define metrics, tasks, and datasets for agent evaluation, then implement metrics using the Foundry Evaluation SDK and orchestrate evaluations runs with the help of Copilot. Seamless Integration Across Environments Jump from Foundry Portal to VS Code Web for a development environment in your preferred code editor setting. Open YAML workflows, playgrounds, and agent templates directly in VS Code for editing and deployment. How to Get Started Install the AI Toolkit extension pack from the VS Code marketplace. Check out documentation. Get started with building workflows with Microsoft Foundry in VS Code 1. Work with Hosted (Pro-code) Agent workflows in VS Code 2. Work with Declarative (Low-code) Agent workflows in VS Code Feedback & Support Try out the extensions and let us know what you think! File issues or feedback on our GitHub repo for Foundry extension and AI Toolkit extension. Your input helps us make continuous improvements.Understanding Small Language Modes
In Part 1, we discussed the differences between Large Language Models (LLMs) and Small Language Models (SLMs), highlighting how SLMs are transforming edge computing. Unlike LLMs, which rely on massive cloud infrastructure, SLMs can run directly on local devices like smartphones and IoT systems, offering faster performance, better privacy, and lower energy use. In this second part, we dive into how SLMs work, exploring their technical foundations and design principles that make them ideal for edge environments.Building Adaptive Multilingual Apps Using TypeScript and Azure AI Translator API
Learn how to integrate Azure AI Translator 2025-05-01-preview into TypeScript apps using hybrid Neural Machine Translation (NMT) + Large Language Model (LLM) translation, tone & gender controls, adaptive reference pairs, and a migration path from v3.On‑Device AI with Windows AI Foundry
From “waiting” to “instant”- without sending data away AI is everywhere, but speed, privacy, and reliability are critical. Users expect instant answers without compromise. On-device AI makes that possible: fast, private and available, even when the network isn’t - empowering apps to deliver seamless experiences. Imagine an intelligent assistant that works in seconds, without sending a text to the cloud. This approach brings speed and data control to the places that need it most; while still letting you tap into cloud power when it makes sense. Windows AI Foundry: A Local Home for Models Windows AI Foundry is a developer toolkit that makes it simple to run AI models directly on Windows devices. It uses ONNX Runtime under the hood and can leverage CPU, GPU (via DirectML), or NPU acceleration, without requiring you to manage those details. The principle is straightforward: Keep the model and the data on the same device. Inference becomes faster, and data stays local by default unless you explicitly choose to use the cloud. Foundry Local Foundry Local is the engine that powers this experience. Think of it as local AI runtime - fast, private, and easy to integrate into an app. Why Adopt On‑Device AI? Faster, more responsive apps: Local inference often reduces perceived latency and improves user experience. Privacy‑first by design: Keep sensitive data on the device; avoid cloud round trips unless the user opts in. Offline capability: An app can provide AI features even without a network connection. Cost control: Reduce cloud compute and data costs for common, high‑volume tasks. This approach is especially useful in regulated industries, field‑work tools, and any app where users expect quick, on‑device responses. Hybrid Pattern for Real Apps On-device AI doesn’t replace the cloud, it complements it. Here’s how: Standalone On‑Device: Quick, private actions like document summarization, local search, and offline assistants. Cloud‑Enhanced (Optional): Large-context models, up-to-date knowledge, or heavy multimodal workloads. Design an app to keep data local by default and surface cloud options transparently with user consent and clear disclosures. Windows AI Foundry supports hybrid workflows: Use Foundry Local for real-time inference. Sync with Azure AI services for model updates, telemetry, and advanced analytics. Implement fallback strategies for resource-intensive scenarios. Application Workflow Code Example 1. Only On-Device: Tries Foundry Local first, falls back to ONNX if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}") return "Error: No local AI available" 2. Hybrid approach: On-device first, cloud as last resort def get_answer(question, context): """ Priority order: 1. Foundry Local (best: advanced + private) 2. ONNX Runtime (good: fast + private) 3. Cloud API (fallback: requires internet, less private) # in case of Hybrid approach, based on real-time scenario """ if foundry_runtime.check_foundry_available(): # Use on-device Foundry Local models try: answer = foundry_runtime.run_inference(question, context) return answer, source="Foundry Local (On-Device)" except Exception as e: logger.warning(f"Foundry failed: {e}, trying ONNX...") if onnx_model.is_loaded(): # Fallback to local BERT ONNX model try: answer = bert_model.get_answer(question, context) return answer, source="BERT ONNX (On-Device)" except Exception as e: logger.warning(f"ONNX failed: {e}, trying cloud...") # Last resort: Cloud API (requires internet) if network_available(): try: import requests response = requests.post( '{BASE_URL_AI_CHAT_COMPLETION}', headers={'Authorization': f'Bearer {API_KEY}'}, json={ 'model': '{MODEL_NAME}', 'messages': [{ 'role': 'user', 'content': f'Context: {context}\n\nQuestion: {question}' }] }, timeout=10 ) answer = response.json()['choices'][0]['message']['content'] return answer, source="Cloud API (Online)" except Exception as e: return "Error: No AI runtime available", source="Failed" else: return "Error: No internet and no local AI available", source="Offline" Demo Project Output: Foundry Local answering context-based questions offline : The Foundry Local engine ran the Phi-4-mini model offline and retrieved context-based data. : The Foundry Local engine ran the Phi-4-mini model offline and mentioned that there is no answer. Practical Use Cases Privacy-First Reading Assistant: Summarize documents locally without sending text to the cloud. Healthcare Apps: Analyze medical data on-device for compliance. Financial Tools: Risk scoring without exposing sensitive financial data. IoT & Edge Devices: Real-time anomaly detection without network dependency. Conclusion On-device AI isn’t just a trend - it’s a shift toward smarter, faster, and more secure applications. With Windows AI Foundry and Foundry Local, developers can deliver experiences that respect user specific data, reduce latency, and work even when connectivity fails. By combining local inference with optional cloud enhancements, you get the best of both worlds: instant performance and scalable intelligence. Whether you’re creating document summarizers, offline assistants, or compliance-ready solutions, this approach ensures your apps stay responsive, reliable, and user-centric. References Get started with Foundry Local - Foundry Local | Microsoft Learn What is Windows AI Foundry? | Microsoft Learn https://devblogs.microsoft.com/foundry/unlock-instant-on-device-ai-with-foundry-local/Understanding Small Language Modes
Small Language Models (SLMs) bring AI from the cloud to your device. Unlike Large Language Models that require massive compute and energy, SLMs run locally, offering speed, privacy, and efficiency. They’re ideal for edge applications like mobile, robotics, and IoT.Announcing Public Preview: AI Toolkit for GitHub Copilot Prompt-First Agent Development
This week at GitHub Universe, we’re announcing the Public Preview of the GitHub Copilot prompt-first agent development in the AI Toolkit for Visual Studio Code. With this release, building powerful AI agents is now simpler and faster - no need to wrestle with complex frameworks or orchestrators. Just start with natural language prompts and let GitHub Copilot guide you from concept to working agent code. Accelerate Agent Development in VS Code The AI Toolkit embeds agent development workflows directly into Visual Studio Code and GitHub Copilot, enabling you to transform ideas into production-ready agents within minutes. This unified experience empowers developers and product teams to: Select the best model for your agent scenario Build and orchestrate agents using Microsoft Agent Framework Trace agent behaviors Evaluate agent response quality Select the best model for your scenario Models are the foundation for building powerful agents. Using the AI Toolkit, you can already explore and experiment with a wide range of local and remote models. Copilot now recommends models tailored to your agent’s needs, helping you make informed choices quickly. Build and orchestrate agents Whether you’re creating a single agent or designing a multi-agent workflow, Copilot leverages the latest Microsoft Agent Framework to generate robust agent code. You can initiate agent creation with simple prompts and visualize workflows for greater clarity and control. Create a single agent using Copilot Create a multi-agent workflow using Copilot and visualize workflow execution Trace agent behaviors As agents become more sophisticated, understanding their actions is crucial. The AI Toolkit enables tracing via Copilot, collecting local traces and displaying detailed agent calls, all within VS Code. Evaluate agent response quality Copilot guides you through structured evaluation, recommending metrics and generating test datasets. Integrate evaluations into your CI/CD pipeline for continuous quality assurance and confident deployments. Get started and share feedback This release marks a significant step toward making AI agent development easier and more accessible in Visual Studio Code. Try out the AI Toolkit for Visual Studio Code, share your thoughts, and file issues and suggest features on our GitHub repo. Thank you for being a part of this journey with us!Serverless MCP Agent with LangChain.js v1 — Burgers, Tools, and Traces 🍔
AI agents that can actually do stuff (not just chat) are the fun part nowadays, but wiring them cleanly into real APIs, keeping things observable, and shipping them to the cloud can get... messy. So we built a fresh end‑to‑end sample to show how to do it right with the brand new LangChain.js v1 and Model Context Protocol (MCP). In case you missed it, MCP is a recent open standard that makes it easy for LLM agents to consume tools and APIs, and LangChain.js, a great framework for building GenAI apps and agents, has first-class support for it. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. This new sample gives you: A LangChain.js v1 agent that streams its result, along reasoning + tool steps An MCP server exposing real tools (burger menu + ordering) from a business API A web interface with authentication, sessions history, and a debug panel (for developers) A production-ready multi-service architecture Serverless deployment on Azure in one command ( azd up ) Yes, it’s a burger ordering system. Who doesn't like burgers? Grab your favorite beverage ☕, and let’s dive in for a quick tour! TL;DR key takeaways New sample: full-stack Node.js AI agent using LangChain.js v1 + MCP tools Architecture: web app → agent API → MCP server → burger API Runs locally with a single npm start , deploys with azd up Uses streaming (NDJSON) with intermediate tool + LLM steps surfaced to the UI Ready to fork, extend, and plug into your own domain / tools What will you learn here? What this sample is about and its high-level architecture What LangChain.js v1 brings to the table for agents How to deploy and run the sample How MCP tools can expose real-world APIs Reference links for everything we use GitHub repo LangChain.js docs Model Context Protocol Azure Developer CLI MCP Inspector Use case You want an AI assistant that can take a natural language request like “Order two spicy burgers and show me my pending orders” and: Understand intent (query menu, then place order) Call the right MCP tools in sequence, calling in turn the necessary APIs Stream progress (LLM tokens + tool steps) Return a clean final answer Swap “burgers” for “inventory”, “bookings”, “support tickets”, or “IoT devices” and you’ve got a reusable pattern! Sample overview Before we play a bit with the sample, let's have a look at the main services implemented here: Service Role Tech Agent Web App ( agent-webapp ) Chat UI + streaming + session history Azure Static Web Apps, Lit web components Agent API ( agent-api ) LangChain.js v1 agent orchestration + auth + history Azure Functions, Node.js Burger MCP Server ( burger-mcp ) Exposes burger API as tools over MCP (Streamable HTTP + SSE) Azure Functions, Express, MCP SDK Burger API ( burger-api ) Business logic: burgers, toppings, orders lifecycle Azure Functions, Cosmos DB Here's a simplified view of how they interact: There are also other supporting components like databases and storage not shown here for clarity. For this quickstart we'll only interact with the Agent Web App and the Burger MCP Server, as they are the main stars of the show here. LangChain.js v1 agent features The recent release of LangChain.js v1 is a huge milestone for the JavaScript AI community! It marks a significant shift from experimental tools to a production-ready framework. The new version doubles down on what’s needed to build robust AI applications, with a strong focus on agents. This includes first-class support for streaming not just the final output, but also intermediate steps like tool calls and agent reasoning. This makes building transparent and interactive agent experiences (like the one in this sample) much more straightforward. Quickstart Requirements GitHub account Azure account (free signup, or if you're a student, get free credits here) Azure Developer CLI Deploy and run the sample We'll use GitHub Codespaces for a quick zero-install setup here, but if you prefer to run it locally, check the README. Click on the following link or open it in a new tab to launch a Codespace: Create Codespace This will open a VS Code environment in your browser with the repo already cloned and all the tools installed and ready to go. Provision and deploy to Azure Open a terminal and run these commands: # Install dependencies npm install # Login to Azure azd auth login # Provision and deploy all resources azd up Follow the prompts to select your Azure subscription and region. If you're unsure of which one to pick, choose East US 2 . The deployment will take about 15 minutes the first time, to create all the necessary resources (Functions, Static Web Apps, Cosmos DB, AI Models). If you're curious about what happens under the hood, you can take a look at the main.bicep file in the infra folder, which defines the infrastructure as code for this sample. Test the MCP server While the deployment is running, you can run the MCP server and API locally (even in Codespaces) to see how it works. Open another terminal and run: npm start This will start all services locally, including the Burger API and the MCP server, which will be available at http://localhost:3000/mcp . This may take a few seconds, wait until you see this message in the terminal: 🚀 All services ready 🚀 When these services are running without Azure resources provisioned, they will use in-memory data instead of Cosmos DB so you can experiment freely with the API and MCP server, though the agent won't be functional as it requires a LLM resource. MCP tools The MCP server exposes the following tools, which the agent can use to interact with the burger ordering system: Tool Name Description get_burgers Get a list of all burgers in the menu get_burger_by_id Get a specific burger by its ID get_toppings Get a list of all toppings in the menu get_topping_by_id Get a specific topping by its ID get_topping_categories Get a list of all topping categories get_orders Get a list of all orders in the system get_order_by_id Get a specific order by its ID place_order Place a new order with burgers (requires userId , optional nickname ) delete_order_by_id Cancel an order if it has not yet been started (status must be pending , requires userId ) You can test these tools using the MCP Inspector. Open another terminal and run: npx -y @modelcontextprotocol/inspector Then open the URL printed in the terminal in your browser and connect using these settings: Transport: Streamable HTTP URL: http://localhost:3000/mcp Connection Type: Via Proxy (should be default) Click on Connect, then try listing the tools first, and run get_burgers tool to get the menu info. Test the Agent Web App After the deployment is completed, you can run the command npm run env to print the URLs of the deployed services. Open the Agent Web App URL in your browser (it should look like https://<your-web-app>.azurestaticapps.net ). You'll first be greeted by an authentication page, you can sign in either with your GitHub or Microsoft account and then you should be able to access the chat interface. From there, you can start asking any question or use one of the suggested prompts, for example try asking: Recommend me an extra spicy burger . As the agent processes your request, you'll see the response streaming in real-time, along with the intermediate steps and tool calls. Once the response is complete, you can also unfold the debug panel to see the full reasoning chain and the tools that were invoked: Tip: Our agent service also sends detailed tracing data using OpenTelemetry. You can explore these either in Azure Monitor for the deployed service, or locally using an OpenTelemetry collector. We'll cover this in more detail in a future post. Wrap it up Congratulations, you just finished spinning up a full-stack serverless AI agent using LangChain.js v1, MCP tools, and Azure’s serverless platform. Now it's your turn to dive in the code and extend it for your use cases! 😎 And don't forget to azd down once you're done to avoid any unwanted costs. Going further This was just a quick introduction to this sample, and you can expect more in-depth posts and tutorials soon. Since we're in the era of AI agents, we've also made sure that this sample can be explored and extended easily with code agents like GitHub Copilot. We even built a custom chat mode to help you discover and understand the codebase faster! Check out the Copilot setup guide in the repo to get started. You can quickly get up speed with the MCP for Beginners course and AI Agents for Beginners course. If you like this sample, don't forget to star the repo ⭐️! You can also join us in the Azure AI community Discord to chat and ask any questions. Happy coding and burger ordering! 🍔