Blog Post

Microsoft Developer Community Blog
4 MIN READ

From Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry

carlottacaste's avatar
carlottacaste
Icon for Microsoft rankMicrosoft
Mar 16, 2026

In this blog post - and the accompanying video tutorial - we walk you through the end-to-end journey of an AI engineer building, testing, and operationalizing a hosted AI agent using AI Toolkit in Visual Studio Code and Microsoft Foundry. The goal is to show not just how to build an agent, but how to do it in a way that’s scalable, testable, and production ready.

From Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry

Agentic AI is no longer a future concept — it’s quickly becoming the backbone of intelligent, action-oriented applications. But while it’s easy to prototype an AI agent, taking it all the way to production requires much more than a clever prompt.

In this blog post - and the accompanying video tutorial - we walk through the end-to-end journey of an AI engineer building, testing, and operationalizing a hosted AI agent using AI Toolkit in Visual Studio Code and Microsoft Foundry. The goal is to show not just how to build an agent, but how to do it in a way that’s scalable, testable, and production ready.

The scenario: a retail agent for sales and inventory insights

To make things concrete, the demo uses a fictional DIY and home‑improvement retailer called Zava.

The objective is to build an AI agent that can assist the internal team in:

  • Analyzing sales data (e.g. reason over a product catalog, identify top‑selling categories, etc.)
  • Managing inventory (e.g. Detect products running low on stock, trigger restock actions, etc.)

Chapter 1 (min 00:00 – 01:20): Model selection with GitHub Copilot and AI Toolkit

The journey starts in Visual Studio Code, using GitHub Copilot together with the AI Toolkit.

Instead of picking a model arbitrarily, we:

  • Describe the business scenario in natural language
  • Ask Copilot to perform a comparative analysis between two candidate models
  • Define explicit evaluation criteria (reasoning quality, tool support, suitability for analytics)

Copilot leverages AI Toolkit skills to explain why one model is a better fit than the other — turning model selection into a transparent, repeatable decision.

To go deeper, we explore the AI Toolkit Model Catalog, which lets you:

  • Browse hundreds of models
  • Filter by hosting platform (GitHub, Microsoft Foundry, local)
  • Filter by publisher (open‑source and proprietary)

Once the right model is identified, we deploy it to Microsoft Foundry with a single click and validate it with test prompts.

Chapter 2 (min 01:20 – 02:48): Rapid agent prototyping with Agent Builder UI

With the model ready, it’s time to build the agent.

Using the Agent Builder UI, we configure:

  • The agent’s identity (name, role, responsibilities)
  • Instructions that define tone, behavior, and scope
  • The model the agent runs on
  • The tools and data sources it can access

For this scenario, we add:

  • File search, grounded on uploaded sales logs and a product catalog
  • Code interpreter, enabling the agent to compute metrics, generate charts, and write reports

We can then test the agent in the right-side playground by asking business questions like:

“What were the top three selling categories in 2025?”

The response is not generic — it’s grounded in the retailer’s data, and you can inspect which tools and data were used to produce the answer.

The Agent Builder also provides local evaluation and tracing functionalities.

Chapter 3 (min 02:48 – 04:04): From UI prototype to hosted agent code

UI-based prototyping is powerful, but real solutions often require custom logic.

This is where we transition from prototype to production by using a built-in workflow to migrate from UI to a hosted agent template

The result is a production-ready scaffold that includes:

  • Agent code (built with Microsoft Agent Framework; you can choose between Python or C#)
  • A YAML-based agent definition
  • Container configuration files

From here, we extend the agent with custom functions — for example, to create and manage restock orders.

GitHub Copilot helps accelerate this step by adapting the template to the Zava business scenario.

Chapter 4 (min 04:04 – 05:12): Local debugging and cloud deployment

Before deploying, we test the agent locally:

  • Ask it to identify products running out of stock
  • Trigger a restock action using the custom function
  • Debug the full tool‑calling flow end to end

Once validated, we deploy the agent to Microsoft Foundry.

By deploying the agent to the Cloud, we don’t just get compute power, but a whole set of built-in features to operationalize our solution and maintain it in production.

Chapter 5 (min 05:12 – 08:04): Evaluation, safety, and monitoring in Foundry

Production readiness doesn’t stop at deployment.

In the Foundry portal, we explore:

  • Evaluation runs, using both real and synthetic datasets
  • LLM‑based judges that score responses across multiple metrics, with explanations
  • Red teaming, where an adversarial agent probes for unsafe or undesired behavior
  • Monitoring dashboards, tracking usage, latency, regressions, and cost across the agent fleet

These capabilities make it possible to move from ad‑hoc testing to continuous quality and safety assessment.

Why this workflow matters

This end-to-end flow demonstrates a key idea:

Agentic AI isn’t just about building agents — it’s about operating them responsibly at scale.

By combining AI Toolkit in VS Code with Microsoft Foundry, you get:

  • A smooth developer experience
  • Clear separation between experimentation and production
  • Built‑in evaluation, safety, and observability

Resources

Demo Sample: GitHub Repo

Foundry tutorials: Inside Microsoft Foundry - YouTube

 

Updated Mar 13, 2026
Version 1.0
No CommentsBe the first to comment