We’re at a moment where generative AI is shifting from single-prompt interactions to agents that can process visuals, store memory, and act. And the best way to understand that shift is to build something yourself!
That’s exactly what we’re doing in my upcoming live stream on Building AI Agents with the AI Toolkit & Microsoft Foundry — a hands-on walkthrough of the full lab experience from Microsoft Ignite 2025!
This session is designed for developers, makers, and anyone curious about how multimodal agents get from idea to working prototype.
What we'll explore
During the stream, I’ll walk through the core concepts and build steps from the lab, including:
- Setting Up Your Environment in Microsoft Foundry
You’ll see how to create and configure your project, connect to models, and prepare your workspace using the AI Toolkit in VS Code. This lab makes it approachable, even if you’re new to Foundry or agent workflows.
- Testing Multimodal Inputs
We’ll explore how the agent processes text and images, how the model interprets such input, and how that insight becomes part of its reasoning loop.
During the stream, I’ll show you what strong visual prompts look like, where people usually get stuck, and how to shape the output you want.
- Designing an Agent System Prompt
We’ll look at how to structure agent behavior and how a well-crafted system prompt becomes the foundation for consistent responses and accurate multimodal reasoning.
This includes grounding, action definitions, and the type of instructions that help an agent combine text, vision, and reasoning capabilities.
- Iterating With the AI Toolkit
This is where things get fun.
We’ll use the AI Toolkit’s playground and debugging tools to observe the agent’s thought process, test different instructions, and evaluate its planning behavior.
You’ll see why tools like trace view, structured output, and function definitions make iteration faster and more predictable.
- Expanding Beyond the Lab
To close, we’ll talk through what it looks like to extend the agent:
- Adding new skills
- Changing how it plans
- Connecting it to additional data
- Turning the prototype into an application
My goal is for you to take away a repeatable workflow, one you can reuse whether you’re building a creative tool, a developer agent, or something entirely new.
The Bigger Picture
Multimodal agents are becoming the new interface layer for apps: they can interpret images, understand context, take actions, and guide users through workflows that feel natural.
If you understand how to prototype them, you understand how AI-powered products will be built in the next few years.
This stream is for anyone who wants to experiment, learn by doing, and make sense of where AI tooling is headed.
Join the live stream
Date: Wednesday December 3, 2025
Time: 10AM – 11AM Pacific
Link: https://aka.ms/AITGHC/Dec3/b
Bring your curiosity…and maybe even your own idea for an agent!