Your Fastest Path from Idea to Production with AI
As we shared in the announcement, Microsoft Foundry Toolkit for Visual Studio Code is now generally available. In this deep dive, we walk through everything that’s in the GA release — from the rebrand and extension consolidation, to model experimentation, agent development, evaluations, and on-device AI for scientists and engineers pushing the boundaries of edge hardware.
Whether you’re exploring your first model, shipping a production agent, or squeezing performance from edge hardware, Foundry Toolkit meets you where you are.
🧪The Fastest Way to Start Experimenting with AI Models
You’ve heard about a new model and want to try it right now — not after spinning up infrastructure or writing boilerplate API code. That’s exactly what Microsoft Foundry Toolkit is built to deliver.
With a Model Catalog spanning 100+ models — cloud-hosted from GitHub, Microsoft Foundry, OpenAI, Anthropic, and Google, plus local models via ONNX, Foundry Local, or Ollama — you go from curiosity to testing in minutes.
The Model Playground is where experimentation lives: compare two models side by side, attach files for multimodal testing, enable web search, adjust system prompts, and watch streaming responses come in.
When something works, View Code generates ready-to-use snippets in Python, JavaScript, C#, or Java — the exact API call you just tested, translated into your language of choice and ready to paste.
🤖Building AI Agents: From Prototype to Production
Foundry Toolkit supports the full agent development journey with two distinct paths and a clean bridge between them.
Path A: The Prototyper: No Code Required
Agent Builder is a low-code interface that lets you take an idea, define instructions, attach tools, and start a conversation — all without writing a line of code. It’s the fastest way to validate whether an agent concept actually works. You can:
- Write and refine instructions with the built-in Prompt Optimizer, which analyzes your instructions and suggests improvements
- Connect tools from the Tool Catalog — browse tools from the Foundry public catalog or local MCP servers, configure them with a few clicks, and wire them into your agent
- Configure MCP tool approval — decide whether tool calls need your sign-off or can run automatically
- Switch between agents instantly with the quick switcher, and manage multiple agent drafts without losing work (auto-save has you covered)
- Save to Foundry with a single click and manage your agents from there.
The result is a working, testable agent in minutes — perfect for validating use cases or prototyping features before investing in a full codebase.
Path B: The Professional Team: Code-First, Production-Ready
For teams building complex systems — multi-agent workflows, domain-specific orchestration, production deployments — code gives you control. Foundry Toolkit scaffolds production-ready code structures for Microsoft Agent Framework, LangGraph, and other popular orchestration frameworks. You’re not starting from scratch; you’re starting from a solid foundation.
Once your agent is running, Agent Inspector turns debugging from guesswork into real engineering:
- Hit F5to launch your agent with full VS Code debugger support — breakpoints, variable inspection, step-through execution
- Watch real-time streaming responses, tool calls, and workflow graphs visualize as your agent runs
- Double-click any node in the workflow visualization to jump straight to the source code behind it
- Local tracing captures the full execution span tree across tool calls and delegation chains — no external infrastructure needed
When you’re ready to ship, one-click deployment packages your agent and deploys it to a production-grade runtime on Microsoft Foundry Agent Service as a hosted-agent. The Hosted Agent Playground lets you test it directly from the VS Code sidebar, keeping the feedback loop tight.
The Bridge: From Prototype to Code, Seamlessly
These paths aren’t silos — they’re stages. When your Agent Builder prototype is ready to grow, export it directly to code with a single click. The generated project includes the agent’s instructions, tool configurations, and scaffolding — giving your engineering team a real starting point rather than a rewrite.
GitHub Copilot with the Microsoft Foundry Skill keeps momentum going once you’re in code. The skill knows the Agent Framework patterns, evaluation APIs, and Foundry deployment model. Ask it to generate an agent, write an evaluation, or scaffold a multi-agent workflow, and it produces code that works with the rest of the toolkit.
🎯Evaluations: Quality Built Into the Workflow
At every stage — prototype or production — integrated evaluations let you measure agent quality without switching tools. Define evaluations using familiar pytest syntax, run them from VS Code Test Explorer alongside your unit tests, and analyze results in a tabular view with Data Wrangler integration. When you need scale, submit the same definitions to run in Microsoft Foundry. Evaluations become versioned, repeatable, and CI-friendly — not one-off scripts you hope to remember.
💻Unlock AI's Full Capabilities on Edge Device
AI running on your device — at your pace, without data leaving your machine.
Cloud-hosted AI is convenient — but it's not always the right fit. Local models offer:
- Privacy and Compliance: Your data stays on your machine. No round-trips to a server.
- Cost control: Run as many inferences as you want — no per-token billing.
- Offline capability: Works anywhere, even without internet access.
- Hardware leverage: Modern Windows devices are built for localAI.
That's why we're bringing a complete end-to-end workflow for discovering, running, converting, profiling, and fine-tuning AI models directly on Windows. Whether you're a developer exploring what models can do, an engineer optimizing models for production, or a researcher training domain-specific model adapters, Foundry Toolkit gives you the tools to work with local AI without compromise.
Model Playground: Try Any Local Model, Instantly
As we mentioned at the beginning of this article, the Model Playground is your starting point — not only for cloud models but also for local models. It includes Microsoft's full catalog of models, including the Phi open model family and Phi Silica — Microsoft's local language model optimized for Windows. As you go deeper, the Playground also supports any LLM model you've converted locally through the Conversion workflow — add it to My Resources and try it immediately in the same chat experience.
Model Conversion: From Hugging Face to Hardware-Ready on Windows
Getting a model from a research checkpoint to something that runs efficiently on your specific hardware is non-trivial. Foundry Toolkit's conversion pipeline handles the full transformation for a growing selection of popular HuggingFace models: Hugging Face → Conversion → Quantization → Evaluation → ONNX
The result: a model optimized for Windows ML — Microsoft's unified runtime for local AI on Windows.
All supported hardware targets are aligned with Windows ML's execution provider ecosystem:
- MIGraphX (AMD)
- NvTensorRtRtx (NVIDIA)
- OpenVINO (Intel)
- QNN (Qualcomm)
- VitisAI (AMD)
Why Windows ML matters for you: Windows ML lets your app automatically acquire and use hardware-specific EPs at runtime — no device-specific code required. Your converted model runs across the full range of supported Windows hardware.
Once your model has been converted successfully, Foundry Toolkit gives you everything you need to validate, share, and ship:
- Benchmark results: Every conversion run is automatically tracked in the History Board — giving you an easy way to validate accuracy, latency, and throughput across model variants before you ship.
- Sample code with Windows ML: Get ready-to-use code showing how to load and inference your converted model with the Windows ML runtime — no boilerplate hunting, just copy and go.
- Quick Playground via GitHub Copilot: Ask GitHub Copilot to generate a playground web demo for your converted model. Instantly get an interactive experience to validate behavior before integrating into your application.
- Package as MSIX: Package your converted model into an MSIX installer. Share it with teammates or incorporate into your application.
Profiling: See Exactly What Your Model Is Doing
Converting a local model is one thing. Understanding how it uses your hardware is another. Foundry Toolkit’s profiling tools give you real-time visibility into CPU, GPU, NPU, and memory consumption — with per-second granularity and a 10-minute rolling window.
Three profiling modes cover different workflows:
- Attach at startup — profile a model from the moment it loads
- Connect to a running process — attach to an already-running inference session
- Profile an ONNX model directly — The Toolkit feeds data to the model and runs performance measurement directly, no application or process needed
For example, when you run a local model in the Playground, you get detailed visibility into what's happening under the hood during inference — far beyond basic resource usage. Windows ML Event Breakdown surfaces how execution time is spent: a single model execution is broken down into phases — such as session initialization versus active inference — so you know whether slowness is a one-time startup cost or a per-request bottleneck.
When you profile any ONNX model directly, operator-level tracing shows exactly which graph nodes and operators are dispatched to the NPU, CPU, or GPU, and how long each one takes. This makes it straightforward to identify which parts of your model are underutilizing available hardware — and where quantization, graph optimization, or EP changes will have the most impact.
Fine-Tuning: Make Phi Silica Yours
Generic models are capable. Domain-specific models are precise with LoRA (Low-Rank Adaption). Foundry Toolkit's fine-tuning workflow lets you train LoRA adapters for Phi Silica using your own data — no ML infrastructure required.
Bring your data, customize your LoRA parameters, and submit a job to the cloud. Foundry Toolkit spins up Azure Container Apps to train your adapter with your own subscription. To validate finetuning quality, the workflow tracks training and evaluation loss curves for your LoRA adapter and cloud inference is available to validate the adapter’s behavior, helping you confirm learning progress and output quality before shipping.
Once satisfied, download the adapter and incorporate it into your app for use at runtime.
This is the full loop: train in the cloud → run at the edge. Domain adaptation for local AI, without standing up your own training infrastructure.
🚀One Toolkit for Every Stage.
Foundry Toolkit for VS Code GA supports every stage of serious AI development:
- Explore 100+ models without commitment
- Prototype agents in minutes with no code
- Build production agents with real debugging, popular frameworks, and coding agent assistance
- Deploy to Microsoft Foundry with one click and test without leaving VS Code
- Measure quality with evaluations that fit your existing test workflows
- Optimize models for specific hardware and use cases
All of it, inside VS Code. All of it, now generally available. Install Foundry Toolkit from the VS Code Marketplace →
Get Started with Hands on Labs and Samples:
- https://github.com/Azure-Samples/Foundry_Toolkit_Samples
- https://github.com/microsoft-foundry/Foundry_Toolkit_for_VSCode_Lab
We'd love to hear what you build. Share feedback and file issues on GitHub, and join the broader conversation in the Microsoft Foundry Community.