In this blog post - and the accompanying video tutorial - we walk you through the end-to-end journey of an AI engineer building, testing, and operationalizing a hosted AI agent using AI Toolkit in Visual Studio Code and Microsoft Foundry. The goal is to show not just how to build an agent, but how to do it in a way that’s scalable, testable, and production ready.
From Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry
Agentic AI is no longer a future concept — it’s quickly becoming the backbone of intelligent, action-oriented applications. But while it’s easy to prototype an AI agent, taking it all the way to production requires much more than a clever prompt.
In this blog post - and the accompanying video tutorial - we walk through the end-to-end journey of an AI engineer building, testing, and operationalizing a hosted AI agent using AI Toolkit in Visual Studio Code and Microsoft Foundry. The goal is to show not just how to build an agent, but how to do it in a way that’s scalable, testable, and production ready.
The scenario: a retail agent for sales and inventory insights
To make things concrete, the demo uses a fictional DIY and home‑improvement retailer called Zava.
The objective is to build an AI agent that can assist the internal team in:
- Analyzing sales data (e.g. reason over a product catalog, identify top‑selling categories, etc.)
- Managing inventory (e.g. Detect products running low on stock, trigger restock actions, etc.)
Chapter 1 (min 00:00 – 01:20): Model selection with GitHub Copilot and AI Toolkit
The journey starts in Visual Studio Code, using GitHub Copilot together with the AI Toolkit.
Instead of picking a model arbitrarily, we:
- Describe the business scenario in natural language
- Ask Copilot to perform a comparative analysis between two candidate models
- Define explicit evaluation criteria (reasoning quality, tool support, suitability for analytics)
Copilot leverages AI Toolkit skills to explain why one model is a better fit than the other — turning model selection into a transparent, repeatable decision.
To go deeper, we explore the AI Toolkit Model Catalog, which lets you:
- Browse hundreds of models
- Filter by hosting platform (GitHub, Microsoft Foundry, local)
- Filter by publisher (open‑source and proprietary)
Once the right model is identified, we deploy it to Microsoft Foundry with a single click and validate it with test prompts.
Chapter 2 (min 01:20 – 02:48): Rapid agent prototyping with Agent Builder UI
With the model ready, it’s time to build the agent.
Using the Agent Builder UI, we configure:
- The agent’s identity (name, role, responsibilities)
- Instructions that define tone, behavior, and scope
- The model the agent runs on
- The tools and data sources it can access
For this scenario, we add:
- File search, grounded on uploaded sales logs and a product catalog
- Code interpreter, enabling the agent to compute metrics, generate charts, and write reports
We can then test the agent in the right-side playground by asking business questions like:
“What were the top three selling categories in 2025?”
The response is not generic — it’s grounded in the retailer’s data, and you can inspect which tools and data were used to produce the answer.
The Agent Builder also provides local evaluation and tracing functionalities.
Chapter 3 (min 02:48 – 04:04): From UI prototype to hosted agent code
UI-based prototyping is powerful, but real solutions often require custom logic.
This is where we transition from prototype to production by using a built-in workflow to migrate from UI to a hosted agent template
The result is a production-ready scaffold that includes:
- Agent code (built with Microsoft Agent Framework; you can choose between Python or C#)
- A YAML-based agent definition
- Container configuration files
From here, we extend the agent with custom functions — for example, to create and manage restock orders.
GitHub Copilot helps accelerate this step by adapting the template to the Zava business scenario.
Chapter 4 (min 04:04 – 05:12): Local debugging and cloud deployment
Before deploying, we test the agent locally:
- Ask it to identify products running out of stock
- Trigger a restock action using the custom function
- Debug the full tool‑calling flow end to end
Once validated, we deploy the agent to Microsoft Foundry.
By deploying the agent to the Cloud, we don’t just get compute power, but a whole set of built-in features to operationalize our solution and maintain it in production.
Chapter 5 (min 05:12 – 08:04): Evaluation, safety, and monitoring in Foundry
Production readiness doesn’t stop at deployment.
In the Foundry portal, we explore:
- Evaluation runs, using both real and synthetic datasets
- LLM‑based judges that score responses across multiple metrics, with explanations
- Red teaming, where an adversarial agent probes for unsafe or undesired behavior
- Monitoring dashboards, tracking usage, latency, regressions, and cost across the agent fleet
These capabilities make it possible to move from ad‑hoc testing to continuous quality and safety assessment.
Why this workflow matters
This end-to-end flow demonstrates a key idea:
Agentic AI isn’t just about building agents — it’s about operating them responsibly at scale.
By combining AI Toolkit in VS Code with Microsoft Foundry, you get:
- A smooth developer experience
- Clear separation between experimentation and production
- Built‑in evaluation, safety, and observability
Resources
Demo Sample: GitHub Repo
Foundry tutorials: Inside Microsoft Foundry - YouTube