Microsoft Developer Community Blog

12 MIN READ

Build AI Agents End-to-End in VS Code

April_Gittens

Microsoft

May 28, 2025

Everything you need to build, test, and ship an AI agent lives inside VS Code. The Microsoft AI Tools Extension Pack simplifies the entire workflow—from prompt design to cloud deployment.

If you’ve ever thought about building your own AI agent but didn’t know where to begin, you’re not alone. Between navigating APIs, wrangling datasets, and managing cloud deployments, creating even a simple agent can feel overwhelming. That’s where Visual Studio Code steps in—not just as a code editor, but as your AI workshop. With the new Microsoft AI Tools Extension Pack, you get an integrated toolchain that simplifies every step of agent development, from designing prompts and models to deploying them in Azure. I’ll break down what each extension does and show you how they team up to build something useful and real.

Why use VS Code extensions for AI Agent development?

Building an AI agent from scratch can feel like trying to assemble furniture without the manual, or the right tools. You’ve got prompts to test, models to choose from, data to clean, and code to write. Not to mention deployment, evaluation, and versioning. It’s easy to get lost in a dozen tabs and terminals just trying to make everything talk to each other.

That’s where Visual Studio Code steps in. not just as a lightweight code editor, but as a full AI development environment when equipped with the right extensions.

No more jumping between platforms or copy-pasting data across tools. Just one smooth, integrated experience that helps you build smarter, faster.

Let’s explore the extensions that make up the Microsoft AI Tools Extension Pack!

Meet the Extensions

AI Toolkit - Provides a comprehensive set of tools for exploring and creating AI agents with local and remote generative AI models.
Azure AI Foundry - Provides AI model and agent management and deployment capabilities using the Azure AI Foundry service.
GitHub Copilot - An AI pair programming tool that helps you write code faster and smarter.
GitHub Copilot for Azure - Enhances GitHub Copilot with Azure-specific capabilities for cloud development.
Data Wrangler - Simplifies data preparation and transformation for AI development and machine learning workflows.

Building the Agent

Let’s bring everything together with a practical example: building a “Sales Prep Coach”. The idea is simple, before a sales rep walks into a meeting, the agent gives them a concise brief with the prospect’s company info, recent news, CRM notes, and product fit suggestions. It’s the kind of tool that saves time and enhance sales calls!

Here’s how each extension in the Microsoft AI Tools Extension Pack fits into the process:

Step 1: Explore models

Before you start building prompts or writing logic, it helps to know what you’re working with. Not all AI models are created equal—some are better at summarization, others shine in code generation, and some are optimized for speed or cost.

The AI Toolkit and Azure AI Foundry extensions include a Model Catalog that enables you to browse available models, whether they’re hosted in the cloud (like OpenAI or Azure OpenAI) or running locally (like Ollama or Hugging Face-based models). Each model comes with a model card that gives you the details you need such as supported capabilities, input/output limits, pricing, and ideal use cases.

AI Toolkit Model Catalog

In the context of our Sales Prep Coach agent, we might compare models based on:

Context window: Does the model support long CRM history input?
Latency: Will it generate briefs fast enough to be useful before a meeting?
Multimodal capabilities: Can the model handle both text and images—like parsing a screenshot of a sales dashboard?
Language support: Does the model support multilingual outputs for international sales teams?
Cost efficiency: Does the model offer a good balance of performance and price, especially for frequent usage?

This step helps you make smart choices up front, so you’re not switching models mid-project!

Note: Not sure which model catalog to use? If you’re an Azure AI Foundry developer, I’d recommend exploring models within the Azure AI Foundry extension. If you’re open to exploring agent development beyond Azure, consider browsing models within the AI Toolkit.

Step 2: Test the models

Once you’ve explored your model options in the catalog, the next step is to put them to the test.

With the AI Toolkit’s Playground, you can quickly try out different prompts across various models, whether they’re hosted in the cloud or running locally. This is your chance to see how each model behaves in practice: how it responds, what tone it uses, how consistent it is, and whether it follows your instructions well.

AI Toolkit Playground

In our Sales Prep Coach scenario, this means trying out sample prompts like:

“Summarize this CRM entry into 2 bullet points for a sales meeting.”
“What are 3 talking points I should bring up with a lead from the healthcare sector?”
“Create a short company overview for Contoso, based on this bio and funding history.”

With the AI Toolkit Playground, you can test these prompts across different models (like GPT-4o, Claude 3.7 Sonnet, or a local LLM via Ollama) to see which ones generate the most helpful and consistent results. The built-in side-by-side comparison makes it easy to evaluate model output. And depending on your model choice, you may also be able to use tools such as Web search as well!

With the AI Toolkit, you can compare models in the Playground.

Note: The Azure AI Foundry extension leverages the same playground as the AI Toolkit.

Step 3: Clean and format the CRM data

Next, we bring in messy CRM exports—CSV files with notes, timestamps, and client details. With Data Wrangler, we can filter out irrelevant fields, fix formatting issues, and prep the data for downstream use, especially when we want our agent to stay grounded in real customer context.

Data Wrangler

Cleaning this data isn’t just about making it readable, it’s about making it usable by the model. A well-prepared CRM dataset can become a valuable source of grounding information that helps your agent generate more relevant, accurate, and personalized responses.

This cleaned data might be used for:

Semantic search – Helping the agent find similar leads, past conversations, or related deals.
Contextual memory – Allowing the agent to “remember” historical notes or previous client interactions.
Retrieval-Augmented Generation (RAG) – Dynamically injecting CRM details into the agent’s responses during a live conversation.

In these cases, you’ll typically convert the cleaned CRM entries into embeddings using a model (like text-embedding-ada-002 or a local one), then store those vectors in a vector database, so the agent can retrieve and ground its answers in contextually relevant data.

By cleaning your data first with Data Wrangler, you’re setting up the foundation for a grounded, trustworthy AI agent that speaks from your own company knowledge, not just generic training data.

Step 4: Prototype the agent

Now that you’ve tested your prompts and chosen a model, it’s time to bring your AI agent to life! This is where you move beyond single responses and start defining how your agent behaves, what tools it can use, and how it handles multi-turn conversations.

Depending on your workflow, you can prototype your agent in two ways:

AI Toolkit using the Agent (Prompt) Builder OR
Azure AI Foundry using the Agent Designer

Both are integrated into VS Code and offer intuitive ways to configure and test your agent’s logic.

AI Toolkit – Agent (Prompt) Builder

The Agent (Prompt) Builder in AI Toolkit gives you a powerful sandbox for crafting your agent locally.

AI Toolkit Agent (Prompt) Builder

You can use it to:

Generate a system prompt: Use AI to help write the agent's system prompt based on a short description—ideal if you're not sure how to phrase instructions.
Multi-turn simulation: Test how the agent responds in back-and-forth conversations, not just single prompts.
Add an MCP server and tools: Integrate tools like retrieving competitor insights, fetching client profiles, or generating follow-up email drafts by connecting to an existing or custom MCP server.
Define the output structure: Clearly specify how you want the agent’s responses to be structured—whether it's text or a JSON schema.
Evaluate output quality: Test agent behavior across various scenarios and manually assess the output.

For our Sales Prep Coach, this is where we’d wire up tools like:

A document search for company insights
A summarizer for CRM notes
A personalized briefing opener that adjusts tone based on the time of day or user’s role.

Azure AI Foundry – Agent Designer

For a more cloud-native approach, Azure AI Foundry’s Agent Designer helps you build, configure, and deploy agents directly to Azure.

Azure AI Foundry Agent Designer

With it, you can:

Visually define your agent: Add a model, configure its system message, add tools, and define your agent.
Use real-world plugins: Add tools, such as Grounding with Bing to ground responses in external knowledge—perfect for keeping sales briefs up-to-date with live web search.
Deploy to Azure: Once configured, your agent can be deployed as a live endpoint for apps, websites, or API integration.
Test in the Agent Playground: Interact with your deployed agent using real prompts and see how it performs across different use cases.

This setup is ideal for taking your prototype into a production-ready environment.

Note: Not sure which extension to use for building your agent? Like what I shared above, if you’re an Azure AI Foundry developer, I’d recommend creating your Agent with Azure AI Foundry. If you’re open to exploring agent development beyond Azure, consider using the AI Toolkit’s Agent (Prompt) Builder.

Step 5: Fine-tune model behavior

Once the model is generating useful responses, the next step is dialing in its behavior, so it doesn’t just answer, but answers like it knows your business. If you want to fine-tune a model to reflect your company’s voice, use specific product terminology, or tailor responses for the sales domain, then the AI Toolkit has you covered!

AI Toolkit Fine-Tuning

With the toolkit, you can fine-tune AI models on Azure, leveraging an Azure Container App to run model fine-tuning. In addition, you can also host an inference endpoint in the cloud. All necessary Azure resources are provisioned within VS Code.

Once the fine-tuning job has started, you can view both the console logs and the streaming logs for the running job all directly in the VS Code Output panel.

Step 6: Evaluate the agent output

Evaluating your agent’s output early helps you catch issues and improve performance before deployment. That’s where the Evaluation feature in the AI Toolkit comes in. It gives you a built-in, no-code-friendly way to measure the quality, accuracy, and consistency of your AI agent’s responses.

AI Toolkit Evaluations

Evaluators

The Evaluation tool supports both standard evaluators and custom ones, and can assess output across dimensions like:

Intent resolution – Did the response address the user’s actual request?
Tool call accuracy – Did the agent use the right tool with the right inputs?
Task adherence – Did it follow the given instructions or constraints?
Relevance, coherence, fluency – Is the response logically sound, on-topic, and well-worded?
Similarity metrics – F1 Score, BLEU, GLEU, and METEOR for more structured evaluation against ground truth.

Each evaluation run takes a query, agent response, and optionally a ground truth answer, tool calls, or tool definitions as input.

For our Sales Prep Coach agent, a useful custom evaluator could be Sales Brief Completeness, an evaluator that checks whether the agent’s response includes three key elements:

Company Overview
Recent CRM Activity Summary
Suggested Talking Points

If any of those elements are missing, the evaluator can flag the response and assign a lower score.

You can define custom evaluators in the AI Toolkit by specifying your own scoring logic or prompt-based evaluation criteria, especially helpful for unique business needs that go beyond generic metrics.

No Dataset? No Problem.

If you don’t already have a dataset of real user queries and ground truth responses, the AI Toolkit lets you generate a synthetic dataset to simulate realistic scenarios. This is especially helpful for early-stage testing or prototyping. Once you have a dataset, you can run evaluations in bulk and visualize the results to quickly spot weak points or regressions.

Step 7: Build the backend and frontend

Once your agent is ready, it’s time to build the interface and infrastructure around it. Whether you’re developing an API, a dashboard, or a chat UI, GitHub Copilot can help speed up development across both your backend and frontend.

Using GitHub Copilot to create UI for a Django application.

On the backend, GitHub Copilot can:

Scaffold out an API endpoint to serve your Sales Prep Coach agent (e.g., via FastAPI or Express).
Write functions to parse incoming user input, clean it, and forward it to the agent service.
Add logging, rate limiting, or caching logic, all by suggesting boilerplate code based on your function names and docstrings.

On the frontend, GitHub Copilot can:

Generate React or Vue components for a simple chat UI, complete with input handling and dynamic message display.
Help build a Sales Briefing Dashboard that shows real-time summaries, client insights, and suggested talking points.
Suggest formatting logic for displaying data cleanly, such as creating cards for each CRM entry or rendering company overviews in structured layouts.

By leveraging GitHub Copilot, you reduce time spent on repetitive code and boilerplate, so you can focus on connecting your agent to the right logic, UI, and data sources. Whether you’re building a CLI tool or a fully responsive web app, GitHub Copilot acts like a helpful pair programmer, ready to fill in the blanks and keep you moving fast!

Step 8: Deploy to Azure

Once your agent is up and running locally, it’s time to take it to the cloud. The GitHub Copilot for Azure extension enhances GitHub Copilot with Azure-specific capabilities, helping you move from local dev to full deployment without breaking your flow.

Using GitHub Copilot for Azure to learn how to setup RBAC roles for an AI application.

With it, you can:

Generate infrastructure-as-code using Bicep or ARM templates to provision resources like Azure Functions, App Service, Cosmos DB, or OpenAI endpoints.
Deploy your agent backend as an Azure Function or containerized web app—with help generating deployment scripts, GitHub Actions workflows, and environment configurations.
Get inline suggestions tailored for Azure services, whether you're writing authentication logic, managing secrets with Key Vault, or connecting to Azure Storage.
Receive help with Azure DevOps tasks, like writing CI/CD pipelines, configuring environments, or setting up testing workflows.
Ask natural language questions in GitHub Copilot Chat (e.g., “How do I deploy a FastAPI app to Azure App Service?”) and get contextual answers or code snippets right in the editor.

GitHub Copilot for Azure helps you bridge the gap between writing great code and running it reliably in the cloud, with far less guesswork.

Next Steps: Try it yourself!

Ready to build your own AI agent? Here’s how to get started:

Install the Extension Pack
Install the Microsoft AI Tools Extension Pack to unlock a full suite of extensions for AI development—all in one place.
Explore Available Models
Browse the Model Catalog in AI Toolkit to compare models by capability, latency, context window, and more. Use model cards to decide which ones best suit your use case.
Test Prompts in the Playground
Experiment with real prompts using the AI Toolkit playground. Try out different models, tweak instructions, and compare outputs side-by-side before you build anything.
Prototype the Agent
Use the AI Toolkit Agent Builder or Azure AI Foundry Agent Designer to wire up tools, define logic, and configure your agent. Whether local or cloud-native, you’ve got options!
Evaluate Agent Output
Test how your agent performs across real or synthetic tasks using the AI Toolkit’s evaluation tools. Use built-in metrics or create custom evaluators specific to your business goals.
Build the Frontend and Backend with GitHub Copilot
Let GitHub Copilot accelerate your coding process, whether you're scaffolding an API, connecting to tools, or creating a user-friendly interface for your agent. It helps fill in boilerplate and streamline full-stack development right from your editor.
Deploy to Azure
Use GitHub Copilot for Azure to generate infrastructure-as-code, set up cloud services, or build CI/CD workflows. Deploy your agent as a web service or API with minimal friction.

Pro tip: You don’t have to complete every step at once. Start small—test prompts, explore models—and grow your project as your confidence and needs evolve.

Wrapping up

Creating your own AI agent doesn’t have to be overwhelming. With the Microsoft AI Tools Extension Pack, everything you need, from prompt testing and data prep to evaluation and deployment, is right at your fingertips inside VS Code.

Whether you’re just starting with AI or looking to scale smarter, this extension pack gives you the building blocks to move faster, stay organized, and bring your ideas to life with confidence.

Start small. Test a prompt. Build a quick prototype. You’ll be surprised how far you can go in just a few clicks.

Install the extension pack and get building!

And while you're here, check out Mandy's blog "Build Apps and Agents with Visual Studio Code and Azure" to learn more about the latest VS Code & Azure announcements from Microsoft Build 2025 for building agents.

Missed Microsoft Build 2025? Sessions are now available on demand via YouTube! I'd recommend checking out the following to see the extension pack extensions in action: