agent support
17 TopicsBuilding a Multi-Agent On-Call Copilot with Microsoft Agent Framework
Four AI agents, one incident payload, structured triage in under 60 seconds powered by Microsoft Agent Framework and Foundry Hosted Agents. Multi-Agent Microsoft Agent Framework Foundry Hosted Agents Python SRE / Incident Response When an incident fires at 3 AM, every second the on-call engineer spends piecing together alerts, logs, and metrics is a second not spent fixing the problem. What if an AI system could ingest the raw incident signals and hand you a structured triage, a Slack update, a stakeholder brief, and a draft post-incident report, all in under 10 seconds? That’s exactly what On-Call Copilot does. In this post, we’ll walk through how we built it using the Microsoft Agent Framework, deployed it as a Foundry Hosted Agent, and discuss the key design decisions that make multi-agent orchestration practical for production workloads. The full source code is open-source on GitHub. You can deploy your own instance with a single azd up . Why Multi-Agent? The Problem with Single-Prompt Triage Early AI incident assistants used a single large prompt: “Here is the incident. Give me root causes, actions, a Slack message, and a post-incident report.” This approach has two fundamental problems: Context overload. A real incident may have 800 lines of logs, 10 alert lines, and dense metrics. Asking one model to process everything and produce four distinct output formats in a single turn pushes token limits and degrades quality. Conflicting concerns. Triage reasoning and communication drafting are cognitively different tasks. A model optimised for structured JSON analysis often produces stilted Slack messages—and vice versa. The fix is specialisation: decompose the task into focused agents, give each agent a narrow instruction set, and run them in parallel. This is the core pattern that the Microsoft Agent Framework makes easy. Architecture: Four Agents Running Concurrently On-Call Copilot is deployed as a Foundry Hosted Agent—a containerised Python service running on Microsoft Foundry’s managed infrastructure. The core orchestrator uses ConcurrentBuilder from the Microsoft Agent Framework SDK to run four specialist agents in parallel via asyncio.gather() . All four panels populated simultaneously: Triage (red), Summary (blue), Comms (green), PIR (purple). Architecture: The orchestrator runs four specialist agents concurrently via asyncio.gather(), then merges their JSON fragments into a single response. All four agents The solution share a single Azure OpenAI Model Router deployment. Rather than hardcoding gpt-4o or gpt-4o-mini , Model Router analyses request complexity and routes automatically. A simple triage prompt costs less; a long post-incident synthesis uses a more capable model. One deployment name, zero model-selection code. Meet the Four Agents 🔍 Triage Agent Root cause analysis, immediate actions, missing data identification, and runbook alignment. suspected_root_causes · immediate_actions · missing_information · runbook_alignment 📋 Summary Agent Concise incident narrative: what happened and current status (ONGOING / MITIGATED / RESOLVED). summary.what_happened · summary.current_status 📢 Comms Agent Audience-appropriate communications: Slack channel update with emoji conventions, plus a non-technical stakeholder brief. comms.slack_update · comms.stakeholder_update 📝 PIR Agent Post-incident report: chronological timeline, quantified customer impact, and specific prevention actions. post_incident_report.timeline · .customer_impact · .prevention_actions The Code: Building the Orchestrator The entry point is remarkably concise. ConcurrentBuilder handles all the async wiring—you just declare the agents and let the framework handle parallelism, error propagation, and response merging. main.py — Orchestrator from agent_framework import ConcurrentBuilder from agent_framework.azure import AzureOpenAIChatClient from azure.ai.agentserver.agentframework import from_agent_framework from azure.identity import DefaultAzureCredential, get_bearer_token_provider from app.agents.triage import TRIAGE_INSTRUCTIONS from app.agents.comms import COMMS_INSTRUCTIONS from app.agents.pir import PIR_INSTRUCTIONS from app.agents.summary import SUMMARY_INSTRUCTIONS _credential = DefaultAzureCredential() _token_provider = get_bearer_token_provider( _credential, "https://cognitiveservices.azure.com/.default" ) def create_workflow_builder(): """Create 4 specialist agents and wire them into a ConcurrentBuilder.""" triage = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=TRIAGE_INSTRUCTIONS, name="triage-agent", ) summary = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=SUMMARY_INSTRUCTIONS, name="summary-agent", ) comms = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=COMMS_INSTRUCTIONS, name="comms-agent", ) pir = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=PIR_INSTRUCTIONS, name="pir-agent", ) return ConcurrentBuilder().participants([triage, summary, comms, pir]) def main(): builder = create_workflow_builder() from_agent_framework(builder.build).run() # starts on port 8088 if __name__ == "__main__": main() Key insight: DefaultAzureCredential means there are no API keys anywhere in the codebase. The container uses managed identity in production; local development uses your az login session. The same code runs in both environments without modification. Agent Instructions: Prompts as Configuration Each agent receives a tightly scoped system prompt that defines its output schema and guardrails. Here’s the Triage Agent—the most complex of the four: app/agents/triage.py TRIAGE_INSTRUCTIONS = """\ You are the **Triage Agent**, an expert Site Reliability Engineer specialising in root cause analysis and incident response. ## Task Analyse the incident data and return a single JSON object with ONLY these keys: { "suspected_root_causes": [ { "hypothesis": "string – concise root cause hypothesis", "evidence": ["string – supporting evidence from the input"], "confidence": 0.0 // 0-1, how confident you are } ], "immediate_actions": [ { "step": "string – concrete action with runnable command if applicable", "owner_role": "oncall-eng | dba | infra-eng | platform-eng", "priority": "P0 | P1 | P2 | P3" } ], "missing_information": [ { "question": "string – what data is missing", "why_it_matters": "string – why this data would help" } ], "runbook_alignment": { "matched_steps": ["string – runbook steps that match the situation"], "gaps": ["string – gaps or missing runbook coverage"] } } ## Guardrails 1. **No secrets** – redact any credential-like material as [REDACTED]. 2. **No hallucination** – if data is insufficient, set confidence to 0 and add entries to missing_information. 3. **Diagnostic suggestions** – when data is sparse, include diagnostic steps in immediate_actions. 4. **Structured output only** – return ONLY valid JSON, no prose. """ The Comms Agent follows the same pattern but targets a different audience: app/agents/comms.py COMMS_INSTRUCTIONS = """\ You are the **Comms Agent**, an expert incident communications writer. ## Task Return a single JSON object with ONLY this key: { "comms": { "slack_update": "Slack-formatted message with emoji, severity, status, impact, next steps, and ETA", "stakeholder_update": "Non-technical summary for executives. Focus on business impact and resolution." } } ## Guidelines - Slack: Use :rotating_light: for active SEV1/2, :warning: for degraded, :white_check_mark: for resolved. - Stakeholder: No jargon. Translate to business impact. - Tone: Calm, factual, action-oriented. Never blame individuals. - Structured output only – return ONLY valid JSON, no prose. """ Instructions as config, not code. Agent behaviour is defined entirely by instruction text strings. A non-developer can refine agent behaviour by editing the prompt and redeploying no Python changes needed. The Incident Envelope: What Goes In The agent accepts a single JSON envelope. It can come from a monitoring alert webhook, a PagerDuty payload, or a manual CLI invocation: Incident Input (JSON) { "incident_id": "INC-20260217-002", "title": "DB connection pool exhausted — checkout-api degraded", "severity": "SEV1", "timeframe": { "start": "2026-02-17T14:02:00Z", "end": null }, "alerts": [ { "name": "DatabaseConnectionPoolNearLimit", "description": "Connection pool at 99.7% on orders-db-primary", "timestamp": "2026-02-17T14:03:00Z" } ], "logs": [ { "source": "order-worker", "lines": [ "ERROR: connection timeout after 30s (attempt 3/3)", "WARN: pool exhausted, queueing request (queue_depth=847)" ] } ], "metrics": [ { "name": "db_connection_pool_utilization_pct", "window": "5m", "values_summary": "Jumped from 22% to 99.7% at 14:03Z" } ], "runbook_excerpt": "Step 1: Check DB connection dashboard...", "constraints": { "max_time_minutes": 15, "environment": "production", "region": "swedencentral" } } Declaring the Hosted Agent The agent is registered with Microsoft Foundry via a declarative agent.yaml file. This tells Foundry how to discover and route requests to the container: agent.yaml kind: hosted name: oncall-copilot description: | Multi-agent hosted agent that ingests incident signals and runs 4 specialist agents concurrently via Microsoft Agent Framework ConcurrentBuilder. metadata: tags: - Azure AI AgentServer - Microsoft Agent Framework - Multi-Agent - Model Router protocols: - protocol: responses environment_variables: - name: AZURE_OPENAI_ENDPOINT value: ${AZURE_OPENAI_ENDPOINT} - name: AZURE_OPENAI_CHAT_DEPLOYMENT_NAME value: model-router The protocols: [responses] declaration exposes the agent via the Foundry Responses API on port 8088. Clients can invoke it with a standard HTTP POST no custom API needed. Invoking the Agent Once deployed, you can invoke the agent with the project’s built-in scripts or directly via curl : CLI / curl # Using the included invoke script python scripts/invoke.py --demo 2 # multi-signal SEV1 demo python scripts/invoke.py --scenario 1 # Redis cluster outage # Or with curl directly TOKEN=$(az account get-access-token \ --resource https://ai.azure.com --query accessToken -o tsv) curl -X POST \ "$AZURE_AI_PROJECT_ENDPOINT/openai/responses?api-version=2025-05-15-preview" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "input": [ {"role": "user", "content": "<incident JSON here>"} ], "agent": { "type": "agent_reference", "name": "oncall-copilot" } }' The Browser UI The project includes a zero-dependency browser UI built with plain HTML, CSS, and vanilla JavaScript—no React, no bundler. A Python http.server backend proxies requests to the Foundry endpoint. The empty state. Quick-load buttons pre-populate the JSON editor with demo incidents or scenario files. Demo 1 loaded: API Gateway 5xx spike, SEV3. The JSON is fully editable before submitting. Agent Output Panels Triage: Root causes ranked by confidence. Evidence is collapsed under each hypothesis. Triage: Immediate actions with P0/P1/P2 priority badges and owner roles. Comms: Slack card with emoji substitution and a stakeholder executive summary. PIR: Chronological timeline with an ONGOING marker, customer impact in a red-bordered box. Performance: Parallel Execution Matters Incident Type Complexity Parallel Latency Sequential (est.) Single alert, minimal context (SEV4) Low 4–6 s ~16 s Multi-signal, logs + metrics (SEV2) Medium 7–10 s ~28 s Full SEV1 with long log lines High 10–15 s ~40 s Post-incident synthesis (resolved) High 10–14 s ~38 s asyncio.gather() running four independent agents cuts total latency by 3–4× compared to sequential execution. For a SEV1 at 3 AM, that’s the difference between a 10-second AI-powered head start and a 40-second wait. Five Key Design Decisions Parallel over sequential Each agent is independent and processes the full incident payload in isolation. ConcurrentBuilder with asyncio.gather() is the right primitive—no inter-agent dependencies, no shared state. JSON-only agent instructions Every agent returns only valid JSON with a defined schema. The orchestrator merges fragments with merged.update(agent_output) . No parsing, no extraction, no post-processing. No hardcoded model names AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=model-router is the only model reference. Model Router selects the best model at runtime based on prompt complexity. When new models ship, the agent gets better for free. DefaultAzureCredential everywhere No API keys. No token management code. Managed identity in production, az login in development. Same code, both environments. Instructions as configuration Each agent’s system prompt is a plain Python string. Behaviour changes are text edits, not code logic. A non-developer can refine prompts and redeploy. Guardrails: Built into the Prompts The agent instructions include explicit guardrails that don’t require external filtering: No hallucination: When data is insufficient, the agent sets confidence: 0 and populates missing_information rather than inventing facts. Secret redaction: Each agent is instructed to redact credential-like patterns as [REDACTED] in its output. Mark unknowns: Undeterminable fields use the literal string "UNKNOWN" rather than plausible-sounding guesses. Diagnostic suggestions: When signal is sparse, immediate_actions includes diagnostic steps that gather missing information before prescribing a fix. Model Router: Automatic Model Selection One of the most powerful aspects of this architecture is Model Router. Instead of choosing between gpt-4o , gpt-4o-mini , or o3-mini per agent, you deploy a single model-router endpoint. Model Router analyses each request’s complexity and routes it to the most cost-effective model that can handle it. Model Router insights: models selected per request with associated costs. Model Router telemetry from Microsoft Foundry: request distribution and cost analysis. This means you get optimal cost-performance without writing any model-selection logic. A simple Summary Agent prompt may route to gpt-4o-mini , while a complex Triage Agent prompt with 800 lines of logs routes to gpt-4o all automatically. Deployment: One Command The repo includes both azure.yaml and agent.yaml , so deployment is a single command: Deploy to Foundry # Deploy everything: infra + container + Model Router + Hosted Agent azd up This provisions the Foundry project resources, builds the Docker image, pushes to Azure Container Registry, deploys a Model Router instance, and creates the Hosted Agent. For more control, you can use the SDK deploy script: Manual Docker + SDK deploy # Build and push (must be linux/amd64) docker build --platform linux/amd64 -t oncall-copilot:v1 . docker tag oncall-copilot:v1 $ACR_IMAGE docker push $ACR_IMAGE # Create the hosted agent python scripts/deploy_sdk.py Getting Started Quickstart # Clone git clone https://github.com/microsoft-foundry/oncall-copilot cd oncall-copilot # Install python -m venv .venv source .venv/bin/activate # .venv\Scripts\activate on Windows pip install -r requirements.txt # Set environment variables export AZURE_OPENAI_ENDPOINT="https://<account>.openai.azure.com/" export AZURE_OPENAI_CHAT_DEPLOYMENT_NAME="model-router" export AZURE_AI_PROJECT_ENDPOINT="https://<account>.services.ai.azure.com/api/projects/<project>" # Validate schemas locally (no Azure needed) MOCK_MODE=true python scripts/validate.py # Deploy to Foundry azd up # Invoke the deployed agent python scripts/invoke.py --demo 1 # Start the browser UI python ui/server.py # → http://localhost:7860 Extending: Add Your Own Agent Adding a fifth agent is straightforward. Follow this pattern: Create app/agents/<name>.py with a *_INSTRUCTIONS constant following the existing pattern. Add the agent’s output keys to app/schemas.py . Register it in main.py : main.py — Adding a 5th agent from app.agents.my_new_agent import NEW_INSTRUCTIONS new_agent = AzureOpenAIChatClient( ad_token_provider=_token_provider ).create_agent( instructions=NEW_INSTRUCTIONS, name="new-agent", ) workflow = ConcurrentBuilder().participants( [triage, summary, comms, pir, new_agent] ) Ideas for extensions: a ticket auto-creation agent that creates Jira or Azure DevOps items from the PIR output, a webhook adapter agent that normalises PagerDuty or Datadog payloads, or a human-in-the-loop agent that surfaces missing_information as an interactive form. Key Takeaways for AI Engineers The multi-agent pattern isn’t just for chatbots. Any task that can be decomposed into independent subtasks with distinct output schemas is a candidate. Incident response, document processing, code review, data pipeline validation—the pattern transfers. Microsoft Agent Framework gives you ConcurrentBuilder for parallel execution and AzureOpenAIChatClient for Azure-native auth—you write the prompts, the framework handles the plumbing. Foundry Hosted Agents let you deploy containerised agents with managed infrastructure, automatic scaling, and built-in telemetry. No Kubernetes, no custom API gateway. Model Router eliminates the model selection problem. One deployment name handles all scenarios with optimal cost-performance tradeoffs. Prompt-as-config means your agents are iterable by anyone who can edit text. The feedback loop from “this output could be better” to “deployed improvement” is minutes, not sprints. Resources Microsoft Agent Framework SDK powering the multi-agent orchestration Model Router Automatic model selection based on prompt complexity Foundry Hosted Agents Deploy containerised agents on managed infrastructure ConcurrentBuilder Samples Official agents-in-workflow sample this project follows DefaultAzureCredential Zero-config auth chain used throughout Hosted Agents Concepts Architecture overview of Foundry Hosted Agents The On-Call Copilot sample is open source under the MIT licence. Contributions, scenario files, and agent instruction improvements are welcome via pull request.How can I measure the quality of my agent's responses?
Welcome back to Agent Support—a developer advice column for those head-scratching moments when you’re building an AI agent! Each post answers a real question from the community with simple, practical guidance to help you build smarter agents. Today’s question comes from someone curious about measuring how well their agent responds: 💬Dear Agent Support My agent seems to be responding well—but I want to make sure I’m not just guessing. Ideally, I’d like a way to check how accurate or helpful its answers really are. How can I measure the quality of my agent’s responses? 🧠 What are Evaluations? Evaluations are how we move from “this feels good” to “this performs well.” They’re structured ways of checking how your agent is doing, based on specific goals you care about. At the simplest level, evaluations help answer: Did the agent actually answer the question? Was the output relevant and grounded in the right info? Was it easy to read or did it ramble? Did it use the tool it was supposed to? That might mean checking if the model pulled the correct file in a retrieval task. Or whether it used the right tool when multiple are registered. Or even something subjective like if the tone felt helpful or aligned with your brand. 🎯 Why Do We Do Evaluations? When you're building an agent, it’s easy to rely on instinct. You run a prompt, glance at the response, and think: “Yeah, that sounds right.” But what happens when you change a system prompt? Or upgrade the model? Or wire in a new tool? Without evaluations, there’s no way to know if things are getting better or quietly breaking. Evaluations help you: Catch regressions early: Maybe your new prompt is more detailed, but now the agent rambles. A structured evaluation can spot that before users do. Compare options: Trying out two different models? Testing a retrieval-augmented version vs. a base version? Evaluations give you a side-by-side look at which one performs better. Build trust in output quality: Whether you're handing this to a client, a customer, or just your future self, evaluations help you say, “Yes, I’ve tested this. Here’s how I know it works.” They also make debugging faster. If something’s off, a good evaluation setup helps you narrow down where it went wrong: Was the tool call incorrect? Was the retrieved content irrelevant? Did the prompt confuse the model? Ultimately, evaluations turn your agent into a system you can improve with intention, not guesswork. ⏳ When Should I Start Evaluating? Short answer: Sooner than you think! You don’t need a finished agent or a fancy framework to start evaluating. In fact, the earlier you begin, the easier it is to steer things in the right direction. Here’s a simple rule of thumb: If your agent is generating output, you can evaluate it. That could be: Manually checking if it answers the user’s question Spotting when it picks the wrong tool Comparing two prompt versions to see which sounds clearer Even informal checks can reveal big issues early before you’ve built too much around a flawed behavior. As your agent matures, you can add more structure: Create a small evaluation set with expected outputs Define categories you want to score (like fluency, groundedness, relevance) Run batch tests when you update a model Think of it like writing tests for code. You don’t wait until the end, you build them alongside your system. That way, every change gets feedback fast. The key is momentum. Start light, then layer on depth as you go. You’ll save yourself debugging pain down the line, and build an agent that gets better over time. 📊 AI Toolkit You don’t need a full evaluation pipeline or scoring rubric on day one. In fact, the best place to begin is with a simple gut check—run a few test prompts and decide whether you like the agent’s response. And if you don’t have a dataset handy, no worry! With the AI Toolkit, you can both generate datasets and keep track of your manual evaluations with the Agent Builder’s Evaluation feature. Sidebar: If you’re curious about deeper eval workflows, like using AI to assist in judging your agent output against a set of evaluators like fluency, relevance Tool Call, or even custom evaluators, we’ll cover that in a future edition of Agent Support. For now, let’s keep it simple! Here’s how to do it: Open the Agent Builder from the AI Toolkit panel in Visual Studio Code. Click the + New Agent button and provide a name for your agent. Select a Model for your agent. Within the System Prompt section, enter: You recommend a movie based on the user’s favorite genre. Within the User Prompt section, enter: What’s a good {{genre}} movie to watch? On the right side of the Agent Builder, select the Evaluation tab. Click the Generate Data icon (the first icon above the table). For the Rows of data to generate field, increase the total to 5. Click Generate. You’re now ready to start evaluating the agent responses! 🧪 Test Before You Build You can run the rows of data either individually or in bulk. I’d suggest starting with a single run to get an initial feel for how the feature works. When you click Run, the agent’s response will appear in the response column. Review the output. In the Manual Evaluation column, select either thumb up or thumb down. You can continue to run the other rows or even add your own row and pass in a value for {{city}}. Want to share the evaluation run and results with a colleague? Click the Export icon to save the run as a .JSONL file. You’ve just taken the first step toward building a more structured, reliable process for evaluating your agent’s responses! 🔁 Recap Here’s a quick rundown of what we covered: Evaluations help you measure quality and consistency in your agent’s responses. They’re useful for debugging, comparing, and iterating. Start early—even rough checks can guide better decisions. The AI Toolkit makes it easier to run and track evaluations right inside your workflow. 📺 Want to Go Deeper? Check out my previous live-stream for AgentHack: Evaluating Agents where I explore concepts and methodologies for evaluating generative AI applications. Although I focus on leveraging the Azure AI Evaluation SDK, it’s still an invaluable intro to learning more about evaluations. The Evaluate and Improve the Quality and Safety of your AI Applications lab from Microsoft Build 2025 provides a comprehensive self-guided introduction to getting started with evaluations. You’ll learn what each evaluator means, how to analyze the scores, and why observability matters—plus how to use telemetry data locally or in the cloud to assess and debug your app’s performance! 👉 Explore the lab: https://github.com/microsoft/BUILD25-LAB334/ And for all your general AI and AI agent questions, join us in the Azure AI Foundry Discord! You can find me hanging out there answering your questions about the AI Toolkit. I'm looking forward to chatting with you there! Whether you’re debugging a tool call, comparing prompt versions, or prepping for production, evaluations are how you turn responses from plausible to dependable.How do I control how my agent responds?
Welcome to Agent Support—a developer advice column for those head-scratching moments when you’re building an AI agent! Each post answers a question inspired by real conversations in the AI developer community, offering practical advice and tips. This time, we’re talking about one of the most misunderstood ingredients in agent behavior: the system prompt. Let’s dive in! 💬 Dear Agent Support I’ve written a few different prompts to guide my agent’s responses, but the output still misses the mark—sometimes it’s too vague, other times too detailed. What’s the best way to structure the instructions so the results are more consistent? Great question! It gets right to the heart of prompt engineering. When the output feels inconsistent, it’s often because the instructions aren’t doing enough to guide the model’s behavior. That’s where prompt engineering can make a difference. By refining how you frame the instructions, you can guide the model toward more reliable, purpose-driven output. 🧠 What Is Prompt Engineering (and Why It Matters for Agents) Before we can fix the prompt, let’s define the craft. Prompt engineering is the practice of designing clear, structured input instructions that guide a model toward the behavior you want. In agent systems, this usually means writing the system prompt, a behind-the-scenes instruction that sets the tone, context, and boundaries for how the agent should act. While prompt engineering feels new, it’s rooted in decades of interface design, instruction tuning, and human-computer interaction research. The big shift? With large language models (LLMs), language becomes the interface. The better your instructions, the better your outcomes. 🧩 The Anatomy of a Good System Prompt Think of your system prompt as a blueprint for how the agent should operate. It sets the stage before the conversation starts. A strong system prompt should: Define the role: Who is this agent? What’s their tone, expertise, or purpose? Clarify the goal: What task should the agent help with? What should it avoid? Establish boundaries: Are there any constraints? Should it cite sources? Stay concise? Here’s a rough template you can build from: “You are a helpful assistant that specializes in [domain]. Your job is to [task]. Keep responses [format/length/tone]. If you’re unsure, respond with ‘I don’t know’ instead of guessing.” 🛠️ Why Prompts Fail (Even When They Sound Fine) Common issues we see: Too vague (“Be helpful” isn’t helpful.) Overloaded with logic (Treating the system prompt like a config file.) Conflicting instructions (“Be friendly” + “Use legal terminology precisely.”) Even well-written prompts can underperform if they’re mismatched with the model or task. That’s why we recommend testing and refining early and often! ✏️ Skip the Struggle— let the AI Toolkit Write It! Writing a great system prompt takes practice. And even then, it’s easy to overthink it! If you’re not sure where to start (or just want to speed things up), the AI Toolkit provides a built-in way to generate a system prompt for you. All you have to do is describe what the agent needs to do, and the AI Toolkit will generate a well-defined and detailed system prompt for your agent. Here's how to do it: Open the Agent Builder from the AI Toolkit panel in Visual Studio Code. Click the + New Agent button and provide a name for your agent. Select a Model for your agent. In the Prompts section, click Generate system prompt. In the Generate a prompt window that appears, provide basic details about your task and click Generate. After the AI Toolkit generates your agent’s system prompt, it’ll appear in the System prompt field. I recommend reviewing the system prompt and modifying any parts that may need revision! Heads up: System prompts aren’t just behind-the-scenes setup, they’re submitted along with the user prompt every time you send a request. That means they count toward your total token limit, so longer prompts can impact both cost and response length. 🧪 Test Before You Build Once you’ve written (or generated) a system prompt, don’t skip straight to wiring it into your agent. It’s worth testing how the model responds with the prompt in place first. You can do that right in the Agent Builder. Just submit a test prompt in the User Prompt field, click Run, and the model will generate a response using the system prompt behind the scenes. This gives you a quick read on whether the behavior aligns with your expectations before you start building around it. 🔁 Recap Here’s a quick rundown of what we covered: Prompt engineering helps guide your agent’s behavior through language. A good system prompt sets the tone, purpose, and guardrails for the agent. Test, tweak, and simplify—especially if responses seem inconsistent or off-target. You can use the Generate system prompt feature within the AI Toolkit to quickly generate instructions for your agent. 📺 Want to Go Deeper? Check out my latest video on how to define your agent’s behavior—it’s part of the Build an Agent Series, where I walk through the building blocks of turning an idea into a working AI agent. The Prompt Engineering Fundamentals chapter from our aka.ms/AITKGenAI curriculum overs all the essentials—prompt structure, common patterns, and ways to test and improve your outputs. It also includes exercises so you can get some hands-on practice. 👉 Explore the full curriculum: aka.ms/AITKGenAI And for all your general AI and AI agent questions, join us in the Azure AI Foundry Discord! You can find me hanging out there answering your questions about the AI Toolkit. I'm looking forward to chatting with you there! And remember, great agent behavior starts with great instructions—and now you’ve got the tools to write them.I want to show my agent a picture—Can I?
Welcome to Agent Support—a developer advice column for those head-scratching moments when you’re building an AI agent! Each post answers a question inspired by real conversations in the AI developer community, offering practical advice and tips. To kick things off, we’re tackling a common challenge for anyone experimenting with multimodal agents: working with image input. Let’s dive in! Dear Agent Support, I’m building an AI agent, and I’d like to include screenshots or product photos as part of the input. But I’m not sure if that’s even possible, or if I need to use a different kind of model altogether. Can I actually upload an image and have the agent process it? Great question, and one that trips up a lot of people early on! The short answer is: yes, some models can process images—but not all of them. Let’s break that down a bit. 🧠 Understanding Image Input When we talk about image input or image attachments, we’re talking about the ability to send a non-text file (like a .png, .jpg, or screenshot) into your prompt and have the model analyze or interpret it. That could mean describing what’s in the image, extracting text from it, answering questions about a chart, or giving feedback on a design layout. 🚫 Not All Models Support Image Input That said, this isn’t something every model can do. Most base language models are trained on text data only, they’re not designed to interpret non-text inputs like images. In most tools and interfaces, the option to upload an image only appears if the selected model supports it, since platforms typically hide or disable features that aren't compatible with a model's capabilities. So, if your current chat interface doesn’t mention anything about vision or image input, it’s likely because the model itself isn’t equipped to handle it. That’s where multimodal models come in. These are models that have been trained (or extended) to understand both text and images, and sometimes other data types too. Think of them as being fluent in more than one language, except in this case, one of those “languages” is visual. 🔎 How to Find Image-Supporting Models If you’re trying to figure out which models support images, the AI Toolkit is a great place to start! The extension includes a built-in Model Catalog where you can filter models by Feature—like Image Attachment—so you can skip the guesswork. Here’s how to do it: Open the Model Catalog from the AI Toolkit panel in Visual Studio Code. Click the Feature filter near the search bar. Select Image Attachment. Browse the filtered results to see which models can accept visual input. Once you've got your filtered list, you can check out the model details or try one in the Playground to test how it handles image-based prompts. 🧪 Test Before You Build Before you plug a model into your agent and start wiring things together, it’s a good idea to test how the model handles image input on its own. This gives you a quick feel for the model’s behavior and helps you catch any limitations before you're deep into building. You can do this in the Playground, where you can upload an image and pair it with a simple prompt like: “Describe the contents of this image.” OR “Summarize what’s happening in this screenshot.” If the model supports image input, you’ll be able to attach a file and get a response based on its visual analysis. If you don’t see the option to upload an image, double-check that the model you’ve selected has image capabilities—this is usually a model issue, not a UI bug. 🔁 Recap Here’s a quick rundown of what we covered: Not all models support image input—you’ll need a multimodal model specifically built to handle visual data. Most platforms won’t let you upload an image unless the model supports it, so if you don’t see that option, it’s probably a model limitation. You can use the AI Toolkit’s Model Catalog to filter models by capability—just check the box for Image Attachment. Test the model in the Playground before integrating it into your agent to make sure it behaves the way you expect. 📺 Want to Go Deeper? Check out my latest video on how to choose the right model for your agent—it’s part of the Build an Agent Series, where I walk through the building blocks of turning an idea into a working AI agent. And if you’re looking to sharpen your model instincts, don’t miss Model Mondays—a weekly series that helps developers like you build your Model IQ, one spotlight at a time. Whether you’re just starting out or already building AI-powered apps, it’s a great way to stay current and confident in your model choices. 👉 Explore the series and catch the next episode: aka.ms/model-mondays/rsvp If you're just getting started with building agents, check out our Agents for Beginners curriculum. And for all your general AI and AI agent questions, join us in the Azure AI Foundry Discord! You can find me hanging out there answering your questions about the AI Toolkit. I'm looking forward to chatting with you there! Whatever you're building, the right model is out there—and with the right tools, you'll know exactly how to find it.Building real-world AI automation with Foundry Local and the Microsoft Agent Framework
A hands-on guide to building real-world AI automation with Foundry Local, the Microsoft Agent Framework, and PyBullet. No cloud subscription, no API keys, no internet required. Why Developers Should Care About Offline AI Imagine telling a robot arm to "pick up the cube" and watching it execute the command in a physics simulator, all powered by a language model running on your laptop. No API calls leave your machine. No token costs accumulate. No internet connection is needed. That is what this project delivers, and every piece of it is open source and ready for you to fork, extend, and experiment with. Most AI demos today lean on cloud endpoints. That works for prototypes, but it introduces latency, ongoing costs, and data privacy concerns. For robotics and industrial automation, those trade-offs are unacceptable. You need inference that runs where the hardware is: on the factory floor, in the lab, or on your development machine. Foundry Local gives you an OpenAI-compatible endpoint running entirely on-device. Pair it with a multi-agent orchestration framework and a physics engine, and you have a complete pipeline that translates natural language into validated, safe robot actions. This post walks through how we built it, why the architecture works, and how you can start experimenting with your own offline AI simulators today. Architecture The system uses four specialised agents orchestrated by the Microsoft Agent Framework: Agent What It Does Speed PlannerAgent Sends user command to Foundry Local LLM → JSON action plan 4–45 s SafetyAgent Validates against workspace bounds + schema < 1 ms ExecutorAgent Dispatches actions to PyBullet (IK, gripper) < 2 s NarratorAgent Template summary (LLM opt-in via env var) < 1 ms User (text / voice) │ ▼ ┌──────────────┐ │ Orchestrator │ └──────┬───────┘ │ ┌────┴────┐ ▼ ▼ Planner Narrator │ ▼ Safety │ ▼ Executor │ ▼ PyBullet Setting Up Foundry Local from foundry_local import FoundryLocalManager import openai manager = FoundryLocalManager("qwen2.5-coder-0.5b") client = openai.OpenAI( base_url=manager.endpoint, api_key=manager.api_key, ) resp = client.chat.completions.create( model=manager.get_model_info("qwen2.5-coder-0.5b").id, messages=[{"role": "user", "content": "pick up the cube"}], max_tokens=128, stream=True, ) from foundry_local import FoundryLocalManager import openai manager = FoundryLocalManager("qwen2.5-coder-0.5b") client = openai.OpenAI( base_url=manager.endpoint, api_key=manager.api_key, ) resp = client.chat.completions.create( model=manager.get_model_info("qwen2.5-coder-0.5b").id, messages=[{"role": "user", "content": "pick up the cube"}], max_tokens=128, stream=True, ) The SDK auto-selects the best hardware backend (CUDA GPU → QNN NPU → CPU). No configuration needed. How the LLM Drives the Simulator Understanding the interaction between the language model and the physics simulator is central to the project. The two never communicate directly. Instead, a structured JSON contract forms the bridge between natural language and physical motion. From Words to JSON When a user says “pick up the cube”, the PlannerAgent sends the command to the Foundry Local LLM alongside a compact system prompt. The prompt lists every permitted tool and shows the expected JSON format. The LLM responds with a structured plan: { "type": "plan", "actions": [ {"tool": "describe_scene", "args": {}}, {"tool": "pick", "args": {"object": "cube_1"}} ] } The planner parses this response, validates it against the action schema, and retries once if the JSON is malformed. This constrained output format is what makes small models (0.5B parameters) viable: the response space is narrow enough that even a compact model can produce correct JSON reliably. From JSON to Motion Once the SafetyAgent approves the plan, the ExecutorAgent maps each action to concrete PyBullet calls: move_ee(target_xyz) : The target position in Cartesian coordinates is passed to PyBullet's inverse kinematics solver, which computes the seven joint angles needed to place the end-effector at that position. The robot then interpolates smoothly from its current joint state to the target, stepping the physics simulation at each increment. pick(object) : This triggers a multi-step grasp sequence. The controller looks up the object's position in the scene, moves the end-effector above the object, descends to grasp height, closes the gripper fingers with a configurable force, and lifts. At every step, PyBullet resolves contact forces and friction so that the object behaves realistically. place(target_xyz) : The reverse of a pick. The robot carries the grasped object to the target coordinates and opens the gripper, allowing the physics engine to drop the object naturally. describe_scene() : Rather than moving the robot, this action queries the simulation state and returns the position, orientation, and name of every object on the table, along with the current end-effector pose. The Abstraction Boundary The critical design choice is that the LLM knows nothing about joint angles, inverse kinematics, or physics. It operates purely at the level of high-level tool calls ( pick , move_ee ). The ActionExecutor translates those tool calls into the low-level API that PyBullet provides. This separation means the LLM prompt stays simple, the safety layer can validate plans without understanding kinematics, and the executor can be swapped out without retraining or re-prompting the model. Voice Input Pipeline Voice commands follow three stages: Browser capture: MediaRecorder captures audio, client-side resamples to 16 kHz mono WAV Server transcription: Foundry Local Whisper (ONNX, cached after first load) with automatic 30 s chunking Command execution: transcribed text goes through the same Planner → Safety → Executor pipeline The mic button (🎤) only appears when a Whisper model is cached or loaded. Whisper models are filtered out of the LLM dropdown. Web UI in Action Pick command Describe command Move command Reset command Performance: Model Choice Matters Model Params Inference Pipeline Total qwen2.5-coder-0.5b 0.5 B ~4 s ~5 s phi-4-mini 3.6 B ~35 s ~36 s qwen2.5-coder-7b 7 B ~45 s ~46 s For interactive robot control, qwen2.5-coder-0.5b is the clear winner: valid JSON for a 7-tool schema in under 5 seconds. The Simulator in Action Here is the Panda robot arm performing a pick-and-place sequence in PyBullet. Each frame is rendered by the simulator's built-in camera and streamed to the web UI in real time. Overview Reaching Above the cube Gripper detail Front interaction Side layout Get Running in Five Minutes You do not need a GPU, a cloud account, or any prior robotics experience. The entire stack runs on a standard development machine. # 1. Install Foundry Local winget install Microsoft.FoundryLocal # Windows brew install foundrylocal # macOS # 2. Download models (one-time, cached locally) foundry model run qwen2.5-coder-0.5b # Chat brain (~4 s inference) foundry model run whisper-base # Voice input (194 MB) # 3. Clone and set up the project git clone https://github.com/leestott/robot-simulator-foundrylocal cd robot-simulator-foundrylocal .\setup.ps1 # or ./setup.sh on macOS/Linux # 4. Launch the web UI python -m src.app --web --no-gui # → http://localhost:8080 Once the server starts, open your browser and try these commands in the chat box: "pick up the cube": the robot grasps the blue cube and lifts it "describe the scene": returns every object's name and position "move to 0.3 0.2 0.5": sends the end-effector to specific coordinates "reset": returns the arm to its neutral pose If you have a microphone connected, hold the mic button and speak your command instead of typing. Voice input uses a local Whisper model, so your audio never leaves the machine. Experiment and Build Your Own The project is deliberately simple so that you can modify it quickly. Here are some ideas to get started. Add a new robot action The robot currently understands seven tools. Adding an eighth takes four steps: Define the schema in TOOL_SCHEMAS ( src/brain/action_schema.py ). Write a _do_<tool> handler in src/executor/action_executor.py . Register it in ActionExecutor._dispatch . Add a test in tests/test_executor.py . For example, you could add a rotate_ee tool that spins the end-effector to a given roll/pitch/yaw without changing position. Add a new agent Every agent follows the same pattern: an async run(context) method that reads from and writes to a shared dictionary. Create a new file in src/agents/ , register it in orchestrator.py , and the pipeline will call it in sequence. Ideas for new agents: VisionAgent: analyse a camera frame to detect objects and update the scene state before planning. CostEstimatorAgent: predict how many simulation steps an action plan will take and warn the user if it is expensive. ExplanationAgent: generate a step-by-step natural language walkthrough of the plan before execution, allowing the user to approve or reject it. Swap the LLM python -m src.app --web --model phi-4-mini Or use the model dropdown in the web UI; no restart is needed. Try different models and compare accuracy against inference speed. Smaller models are faster but may produce malformed JSON more often. Larger models are more accurate but slower. The retry logic in the planner compensates for occasional failures, so even a small model works well in practice. Swap the simulator PyBullet is one option, but the architecture does not depend on it. You could replace the simulation layer with: MuJoCo: a high-fidelity physics engine popular in reinforcement learning research. Isaac Sim: NVIDIA's GPU-accelerated robotics simulator with photorealistic rendering. Gazebo: the standard ROS simulator, useful if you plan to move to real hardware through ROS 2. The only requirement is that your replacement implements the same interface as PandaRobot and GraspController . Build something completely different The pattern at the heart of this project (LLM produces structured JSON, safety layer validates, executor dispatches to a domain-specific engine) is not limited to robotics. You could apply the same architecture to: Home automation: "turn off the kitchen lights and set the thermostat to 19 degrees" translated into MQTT or Zigbee commands. Game AI: natural language control of characters in a game engine, with the safety agent preventing invalid moves. CAD automation: voice-driven 3D modelling where the LLM generates geometry commands for OpenSCAD or FreeCAD. Lab instrumentation: controlling scientific equipment (pumps, stages, spectrometers) via natural language, with the safety agent enforcing hardware limits. From Simulator to Real Robot One of the most common questions about projects like this is whether it could control a real robot. The answer is yes, and the architecture is designed to make that transition straightforward. What Stays the Same The entire upper half of the pipeline is hardware-agnostic: The LLM planner generates the same JSON action plans regardless of whether the target is simulated or physical. It has no knowledge of the underlying hardware. The safety agent validates workspace bounds and tool schemas. For a real robot, you would tighten the bounds to match the physical workspace and add checks for obstacle clearance using sensor data. The orchestrator coordinates agents in the same sequence. No changes are needed. The narrator reports what happened. It works with any result data the executor returns. What Changes The only component that must be replaced is the executor layer, specifically the PandaRobot class and the GraspController . In simulation, these call PyBullet's inverse kinematics solver and step the physics engine. On a real robot, they would instead call the hardware driver. For a Franka Emika Panda (the same robot modelled in the simulation), the replacement options include: libfranka: Franka's C++ real-time control library, which accepts joint position or torque commands at 1 kHz. ROS 2 with MoveIt: A robotics middleware stack that provides motion planning, collision avoidance, and hardware abstraction. The move_ee action would become a MoveIt goal, and the framework would handle trajectory planning and execution. Franka ROS 2 driver: Combines libfranka with ROS 2 for a drop-in replacement of the simulation controller. The ActionExecutor._dispatch method maps tool names to handler functions. Replacing _do_move_ee , _do_pick , and _do_place with calls to a real robot driver is the only code change required. Key Considerations for Real Hardware Safety: A simulated robot cannot cause physical harm; a real robot can. The safety agent would need to incorporate real-time collision checking against sensor data (point clouds from depth cameras, for example) rather than relying solely on static workspace bounds. Perception: In simulation, object positions are known exactly. On a real robot, you would need a perception system (cameras with object detection or fiducial markers) to locate objects before grasping. Calibration: The simulated robot's coordinate frame matches the URDF model perfectly. A real robot requires hand-eye calibration to align camera coordinates with the robot's base frame. Latency: Real actuators have physical response times. The executor would need to wait for motion completion signals from the hardware rather than stepping a simulation loop. Gripper feedback: In PyBullet, grasp success is determined by contact forces. A real gripper would provide force or torque feedback to confirm whether an object has been securely grasped. The Simulation as a Development Tool This is precisely why simulation-first development is valuable. You can iterate on the LLM prompts, agent logic, and command pipeline without risk to hardware. Once the pipeline reliably produces correct action plans in simulation, moving to a real robot is a matter of swapping the lowest layer of the stack. Key Takeaways for Developers On-device AI is production-ready. Foundry Local serves models through a standard OpenAI-compatible API. If your code already uses the OpenAI SDK, switching to local inference is a one-line change to base_url . Small models are surprisingly capable. A 0.5B parameter model produces valid JSON action plans in under 5 seconds. For constrained output schemas, you do not need a 70B model. Multi-agent pipelines are more reliable than monolithic prompts. Splitting planning, validation, execution, and narration across four agents makes each one simpler to test, debug, and replace. Simulation is the safest way to iterate. You can refine LLM prompts, agent logic, and tool schemas without risking real hardware. When the pipeline is reliable, swapping the executor for a real robot driver is the only change needed. The pattern generalises beyond robotics. Structured JSON output from an LLM, validated by a safety layer, dispatched to a domain-specific engine: that pattern works for home automation, game AI, CAD, lab equipment, and any other domain where you need safe, structured control. You can start building today. The entire project runs on a standard laptop with no GPU, no cloud account, and no API keys. Clone the repository, run the setup script, and you will have a working voice-controlled robot simulator in under five minutes. Ready to start building? Clone the repository, try the commands, and then start experimenting. Fork it, add your own agents, swap in a different simulator, or apply the pattern to an entirely different domain. The best way to learn how local AI can solve real-world problems is to build something yourself. Source code: github.com/leestott/robot-simulator-foundrylocal Built with Foundry Local, Microsoft Agent Framework, PyBullet, and FastAPI.🚀 AI Toolkit for VS Code — February 2026 Update
February brings a major milestone for AI Toolkit. Version 0.30.0 is packed with new capabilities that make agent development more discoverable, debuggable, and production-ready—from a brand-new Tool Catalog, to an end-to-end Agent Inspector, to treating evaluations as first-class tests. 🔧 New in v0.30.0 🧰 Tool Catalog: One place to discover and manage agent tools The new Tool Catalog is a centralized hub for discovering, configuring, and integrating tools into your AI agents. Instead of juggling scattered configs and definitions, you now get a unified experience for tool management: Browse, search, and filter tools from the public Foundry catalog and local stdio MCP servers Configure connection settings for each tool directly in VS Code Add tools to agents seamlessly via Agent Builder Manage the full tool lifecycle: add, update, or remove tools with confidence Why it matters: expanding your agent’s capabilities is now a few clicks away—and stays manageable as your agent grows. 🕵️ Agent Inspector: Debug agents like real software The new Agent Inspector turns agent debugging into a first-class experience inside VS Code. Just press F5 and launch your agent with full debugger support. Key highlights: One-click F5 debugging with breakpoints, variable inspection, and step-through execution Copilot auto-configuration that scaffolds agent code, endpoints, and debugging setup Production-ready code generated using the Hosted Agent SDK, ready for Microsoft Foundry Real-time visualization of streaming responses, tool calls, and multi-agent workflows Quick code navigation—double-click workflow nodes to jump straight to source Unified experience combining chat and workflow visualization in one view Why it matters: agents are no longer black boxes—you can see exactly what’s happening, when, and why. 🧪 Evaluation as Tests: Treat quality like code With Evaluation as Tests, agent quality checks now fit naturally into existing developer workflows. What’s new: Define evaluations as test cases using familiar pytest syntax and Eval Runner SDK annotations Run evaluations directly from VS Code Test Explorer, mixing and matching test cases Analyze results in a tabular view with Data Wrangler integration Submit evaluation definitions to run at scale in Microsoft Foundry Why it matters: evaluations are no longer ad-hoc scripts—they’re versioned, repeatable, and CI-friendly. 🔄 Improvements across the Toolkit 🧱 Agent Builder Agent Builder received a major usability refresh: Redesigned layout for better navigation and focus Quick switcher to move between agents effortlessly Support for authoring, running, and saving Foundry prompt agents Add tools to Foundry prompt agents directly from the Tool Catalog or built-in tools New Inspire Me feature to help you get started when drafting agent instructions Numerous performance and stability improvements 🤖 Model Catalog Added support for models using the OpenAI Response API, including gpt-5.2-codex General performance and reliability improvements 🧠 Build Agent with GitHub Copilot New Workflow entry point to quickly generate multi-agent workflows with Copilot Ability to orchestrate workflows by selecting prompt agents from Foundry 🔁 Conversion & Profiling Generate interactive playgrounds for history models Added Qualcomm GPU recipes Show resource usage for Phi Silica directly in Model Playground ✨ Wrapping up Version 0.30.0 is a big step forward for AI Toolkit. With better discoverability, real debugging, structured evaluation, and deeper Foundry integration, building AI agents in VS Code now feels much closer to building production software. As always, we’d love your feedback—keep it coming, and happy agent building! 🚀AI Toolkit for VS Code October Update
We're thrilled to bring you the October update for the AI Toolkit for Visual Studio Code! This month marks another major milestone with version 0.24.0, introducing groundbreaking GitHub Copilot Tools Integration and additional user experience enhancements that make AI-powered development more seamless than ever. Let's dive into what's new! 👇 🚀 GitHub Copilot Tools Integration We are excited to announce the integration of GitHub Copilot Tools into AI Toolkit for VS Code. This integration empowers developers to build AI-powered applications more efficiently by leveraging Copilot's capabilities enhanced by AI Toolkit. 🤖 AI Agent Code Generation Tool This powerful tool provides best practices, guidance, steps, and code samples on Microsoft Agent Framework for GitHub Copilot to better scaffold AI agent applications. Whether you're building your first agent or scaling complex multi-agent systems, this tool ensures you follow the latest best practices and patterns. 📊 AI Agent Evaluation Planner Tool Building great AI agents requires thorough evaluation. This tool guides users through the complete process of evaluating AI agents, including: Defining evaluation metrics - Establish clear success criteria for your agents Creating evaluation datasets - Generate comprehensive test datasets Analyzing results - Understand your agent's performance and areas for improvement The Evaluation Planner works seamlessly with two specialized sub-tools: 🏃♂️ Evaluation Agent Runner Tool This tool runs agents on provided datasets and collects results, making it easy to test your agents at scale across multiple scenarios and use cases. 💻 Evaluation Code Generation Tool Get best practices, guidance, steps, and code samples on Azure AI Foundry Evaluation Framework for GitHub Copilot to better scaffold code for evaluating AI agents. 🎯 Easy Access and Usage You can access these powerful tools in two convenient ways: Direct GitHub Copilot Integration: Simply enter prompts like: Create an AI agent using Microsoft Agent Framework to help users plan a trip to Paris. Evaluate the performance of my AI agent using Azure AI Foundry Evaluation Framework. AI Toolkit Tree View: For quick access, find these tools in the AI Toolkit Tree View UI under the section `Build Agent with GitHub Copilot`. ✨ Additional Enhancements 🎨 Model Playground Improvements The user experience in Model Playground has been significantly enhanced: Resizable Divider: The divider between chat output and model settings is now resizable, allowing you to customize your workspace layout for better usability and productivity. 📚 Model Catalog Updates We've unified and streamlined the model discovery experience: Unified Local Models: The ONNX models section in the Model Catalog has been merged with Foundry Local Models on macOS and Windows platforms, providing a unified experience for discovering and selecting local models. Simplified Navigation: Find all your local model options in one place, making it easier to compare and select the right model for your use case. ## 🌟 Why This Release Matters Version 0.24.0 represents a significant step forward in making AI development more accessible and efficient: Seamless Integration: The deep integration with GitHub Copilot means AI best practices are now available right where you're already working. End-to-End Workflow: From agent creation to evaluation, you now have comprehensive tooling that guides you through the entire AI development lifecycle. Enhanced Productivity: Improved UI elements and unified experiences reduce friction and help you focus on building great AI applications. 🚀 Get Started and Share Your Feedback Ready to experience the future of AI development? Here's how to get started: 📥 Download: Install the AI Toolkit from the Visual Studio Code Marketplace 📖 Learn: Explore our comprehensive AI Toolkit Documentation 🔍 Discover: Check out the complete changelog for v0.24.0 We'd love to hear from you! Whether it's a feature request, bug report, or feedback on your experience, join the conversation and contribute directly on our GitHub repository. 🎯 What's Next? This release sets the foundation for even more exciting developments ahead. The GitHub Copilot Tools Integration opens up new possibilities for AI-assisted development, and we're just getting started. Stay tuned for more updates, and let's continue building the future of AI agent development together! 💡💬 Happy coding, and see you next month! 🚀How do I get my agent to respond in a structured format like JSON?
Welcome back to Agent Support—a developer advice column for those head-scratching moments when you’re building an AI agent! Each post answers a real question from the community with simple, practical guidance to help you build smarter agents. 💬 Dear Agent Support I’m building an agent that feeds its response into another app—but sometimes the output is messy or unpredictable. Can you help? This looks like a job for JSON! While agent output often comes back as plain text, there are times when you need something more structured, especially if another system, app, or service needs to reliably parse and use that response. 🧠 What’s the Deal with JSON Output? If your agent is handing off data to another app, you need the output to be clean, consistent, and easy to parse. That’s where JSON comes in. JSON (JavaScript Object Notation) is a lightweight, widely supported format that plays nicely with almost every modern tool or platform. Whether your agent is powering a dashboard, triggering a workflow, or sending data into a UI component, JSON makes the handoff smooth. You can define exactly what shape the response should take (i.e. keys, values, types) so that other parts of your system know what to expect every single time. Without it? Things get messy fast! Unstructured text can vary wildly from one response to the next. A missing field, a misaligned format, or even just an unexpected line break can break downstream logic, crash a frontend, or silently cause bugs you won’t catch until much later. Worse, you end up writing brittle post-processing code to clean up the output just to make it usable. The bottom line: If you want your agent to work well with others, it needs to speak in a structured format. JSON isn’t just a nice-to-have, it’s the language of interoperability. 🧩 When You’d Want Structured Output Not every agent needs to speak JSON, but if your response is going anywhere beyond the chat window, structured output is your best bet. Let’s say your agent powers a dashboard. You might want to display the response in different UI components—a title, a summary, maybe a set of bullet points. That only works if the response is broken into predictable parts. Same goes for workflows. If you’re passing the output into another service, like Power Automate, an agent framework like Semantic Kernal, or even another agent, it needs to follow a format those systems can recognize. Structured output also makes logging and debugging easier. When every response follows the same format, you can spot problems faster. Missing fields, weird values, or unexpected data types stand out immediately. It also future proofs your agent. If you want to add new features later, like saving responses to a database or triggering different actions based on the content, having structured output gives you the flexibility to build on top of what’s already there. If the output is meant to do more than just be read by a human, it should be structured for a machine. 📄 How to Define a JSON Format Once you know your agent’s output needs to be structured, the next step is telling the model exactly how to structure it. That’s where defining a schema comes in. In this context, a schema is just a blueprint for the shape of your data. It outlines what fields you expect, what type of data each one should hold, and how everything should be organized. Think of it like a form template: the model just needs to fill in the blanks. Here’s a simple example of a JSON schema for a to-do app: { "task": "string", "priority": "high | medium | low", "due_date": "YYYY-MM-DD" } Once you have your format in mind, include it directly in your prompt. But if you’re using the AI Toolkit in Visual Studio Code, you don’t have to do this manually every time! 🔧 Create a JSON schema with the AI Toolkit The Agent Builder feature supports structured output natively. You can provide a JSON schema alongside your prompt, and the agent will automatically aim to match that format. This takes the guesswork out of prompting. Instead of relying on natural language instructions to shape the output, you’re giving the model a concrete set of instructions to follow. Here’s how to do it: Open the Agent Builder from the AI Toolkit panel in Visual Studio Code. Click the + New Agent button and provide a name for your agent. Select a Model for your agent. Within the System Prompt section, enter: You recommend a movie to watch. Within the User Prompt section, enter: Recommend a science-fiction movie with robots. In the Structured Output section, select json_schema as the output format. Click Prepare schema. In the wizard, select Use an example. For the example, select paper_metadata. Save the file to your desired location. You can name the file: movie_metadata. In the Agent Builder, select movie_metadata to open the file. Using the template provided, modify the schema to format the title, genre, year, and reason. Once done, save the file. { "name": "movie_metadata", "strict": true, "schema": { "type": "object", "properties": { "title": { "type": "string" }, "genre": { "type": "array", "items": { "type": "string" } }, "year": { "type": "string" }, "reason": { "type": "array", "items": { "type": "string" } } }, "required": [ "title", "genre", "year", "reason" ], "additionalProperties": false } } And just like that, you’ve set up a JSON schema for your agent’s output! 🧪 Test Before You Build You can submit a prompt with the Agent Builder to validate whether the agent adheres to the JSON schema when returning its response. When you click Run, the agent’s response will appear in the Prompt tab, ideally in JSON format. 🔁 Recap Here’s a quick rundown of what we covered: JSON lets you create reliable, machine-friendly agent responses. JSON is essential for interoperability between apps, tools, and workflows. Without JSON, you risk fragile pipelines, broken features, or confusing bugs. The AI Toolkit supports including a JSON schema with your agent’s prompt. 📺 Want to Go Deeper? Check out the AI Agents for Beginners curriculum in which we dive a bit more into agentic design patterns which includes defining structured outputs. We’ll have a video landing soon in the Build an Agent Series that takes you through a step-by-step look of creating a JSON schema for your agent!How do I give my agent access to tools?
Welcome back to Agent Support—a developer advice column for those head-scratching moments when you’re building an AI agent! Each post answers a real question from the community with simple, practical guidance to help you build smarter agents. Today’s question comes from someone trying to move beyond chat-only agents into more capable, action-driven ones: 💬 Dear Agent Support I want my agent to do more than just respond with text. Ideally, it could look up information, call APIs, or even run code—but I’m not sure where to start. How do I give my agent access to tools? This is exactly where agents start to get interesting! Giving your agent tools is one of the most powerful ways to expand what it can do. But before we get into the “how,” let’s talk about what tools actually mean in this context—and how Model Context Protocol (MCP) helps you use them in a standardized, agent-friendly way. 🛠️ What Do We Mean by “Tools”? In agent terms, a tool is any external function or capability your agent can use to complete a task. That might be: A web search function A weather lookup API A calculator A database query A custom Python script When you give an agent tools, you’re giving it a way to take action—not just generate text. Think of tools as buttons the agent can press to interact with the outside world. ⚡ Why Give Your Agent Tools? Without tools, an agent is limited to what it “knows” from its training data and prompt. It can guess, summarize, and predict, but it can’t do. Tools change that! With the right tools, your agent can: Pull live data from external systems Perform dynamic calculations Trigger workflows in real time Make decisions based on changing conditions It’s the difference between an assistant that can answer trivia questions vs. one that can book your travel or manage your calendar. 🧩 So… How Does This Work? Enter Model Context Protocol (MCP). MCP is a simple, open protocol that helps agents use tools in a consistent way—whether those tools live in your app, your cloud project, or on a server you built yourself. Here’s what MCP does: Describes tools in a standardized format that models can understand Wraps the function call + input + output into a predictable schema Lets agents request tools as needed (with reasoning behind their choices) This makes it much easier to plug tools into your agent workflow without reinventing the wheel every time! 🔌 How to Connect an Agent to Tools Wiring tools into your agent might sound complex, but it doesn’t have to be! If you’ve already got a MCP server in mind, there’s a straightforward way within the AI Toolkit to expose it as a tool your agent can use. Here’s how to do it: Open the Agent Builder from the AI Toolkit panel in Visual Studio Code. Click the + New Agent button and provide a name for your agent. Select a Model for your agent. Within the Tools section, click + MCP Server. In the wizard that appears, click + Add Server. From there, you can select one of the MCP servers built my Microsoft, connect to an existing server that’s running, or even create your own using a template! After giving the server a Server ID, you’ll be given the option to select which tools from the server to add for your agent. Once connected, your agent can call tools dynamically based on the task at hand. 🧪 Test Before You Build Once you’ve connected your agent to an MCP server and added tools, don’t jump straight into full integration. It’s worth taking time to test whether the agent is calling the right tool for the job. You can do this directly in the Agent Builder: enter a test prompt that should trigger a tool in the User Prompt field, click Run, and observe how the model responds. This gives you a quick read on tool call accuracy. If the agent selects the wrong tool, it’s a sign that your system prompt might need tweaking before you move forward. However, if the agent calls the correct tool but the output still doesn’t look right, take a step back and check both sides of the interaction. It might be that the system prompt isn’t clearly guiding the agent on how to use or interpret the tool’s response. But it could also be an issue with the tool itself—whether that’s a bug in the logic, unexpected behavior, or a mismatch between input and expected output. Testing both the tool and the prompt in isolation can help you pinpoint where things are going wrong before you move on to full integration. 🔁 Recap Here’s a quick rundown of what we covered: Tools = external functions your agent can use to take action MCP = a protocol that helps your agent discover and use those tools reliably If the agent calls the wrong tool—or uses the right tool incorrectly—check your system prompt and test the tool logic separately to isolate the issue. 📺 Want to Go Deeper? Check out my latest video on how to connect your agent to a MCP server—it’s part of the Build an Agent Series, where I walk through the building blocks of turning an idea into a working AI agent. The MCP for Beginners curriculum covers all the essentials—MCP architecture, creating and debugging servers, and best practices for developing, testing, and deploying MCP servers and features in production environments. It also includes several hands-on exercises across .NET, Java, TypeScript, JavaScript and Python. 👉 Explore the full curriculum: aka.ms/AITKmcp And for all your general AI and AI agent questions, join us in the Azure AI Foundry Discord! You can find me hanging out there answering your questions about the AI Toolkit. I'm looking forward to chatting with you there! Whether you’re building a productivity agent, a data assistant, or a game bot—tools are how you turn your agent from smart to useful.