agents
98 TopicsBuilding AI Agents with Microsoft Foundry: A Progressive Lab from Hello World to Self-Hosted
AI agent development has a steep on-ramp. The combination of new SDKs, tool-calling patterns, model selection decisions, retrieval-augmented generation, and deployment concerns means most developers spend more time wiring things together than actually building anything useful. The Microsoft Foundry Agent Lab is a structured, open-source demo series designed to change that — nine self-contained demos, each adding exactly one new concept, all built on the same Microsoft Foundry SDK and a single model deployment. This post walks through what the lab contains, how each demo works under the hood, and the architectural decisions that make it a useful reference for AI engineers building production agents. Why a Progressive Lab? Agent frameworks can be overwhelming. A developer who opens a rich example with RAG, tool-calling, streaming, and a custom UI all at once has no clear line of sight to which parts are essential and which are embellishments. The Foundry Agent Lab takes the opposite approach: start with the absolute minimum and introduce one new primitive per demo. By the time you reach Demo 8, you have seen every major capability — not in one monolithic sample, but in a layered sequence where each addition is visible and understandable. # Demo New Concept Tool Used UX 0 hello-demo Agent creation, Responses API, conversations None Terminal 1 tools-demo Function calling, tool-calling loop, live API FunctionTool Terminal 2 desktop-demo UI decoupling — same agent, different surface None Desktop (Tkinter) 3 websearch-demo Server-side built-in tools, no client loop WebSearchTool Terminal 4 code-demo Code execution in sandbox, Gradio web UI CodeInterpreterTool Web (Gradio) 5 rag-demo Document upload, vector stores, RAG grounding FileSearchTool Terminal 6 mcp-demo MCP servers, human-in-the-loop approval MCPTool Terminal 7 toolbox-demo Centralized tool governance, Toolbox versioning Toolbox Terminal 8 hosted-demo Self-hosted agent with Responses protocol Custom server Terminal + Agent Inspector The Model Router: One Deployment to Rule Them All Before diving into the demos, it is worth understanding the one architectural decision that ties the entire lab together: every agent uses model-router as its model deployment. MODEL_DEPLOYMENT=model-router Model Router is a Microsoft Foundry capability that inspects each request at inference time and routes it to the optimal available model — weighing task complexity, cost, and latency. A simple factual question goes to a fast, cheap model. A complex tool-calling chain with code generation gets routed to a frontier model. You write zero routing logic. The lab's MODEL-ROUTER.md file contains empirical observations from running all nine demos. A sample of what the router selected: Demo Query Task Type Model Selected hello "What's the capital of WA state?" Factual recall grok-4-1-fast-reasoning hello "Summarize our conversation" Summarization gpt-5.2-chat-2025-12-11 tools "What's the weather in Seattle?" Tool-using gpt-5.4-mini-2026-03-17 code Data analysis with code generation Code generation + execution gpt-5.4-2026-03-05 rag HR policy document question Retrieval + synthesis gpt-5.3-chat-2026-03-03 This is the strongest signal in the lab: you do not need to reason about model selection. You declare what your agent needs to do; the router handles the rest, and it chooses correctly. Demo 0: The Minimum Viable Agent The hello-demo establishes the baseline pattern used by every subsequent demo. Two files: one to register the agent, one to chat with it. Registering the agent from azure.identity import DefaultAzureCredential from azure.ai.projects import AIProjectClient from azure.ai.projects.models import PromptAgentDefinition credential = DefaultAzureCredential() project = AIProjectClient(endpoint=PROJECT_ENDPOINT, credential=credential) agent = project.agents.create_version( agent_name=AGENT_NAME, definition=PromptAgentDefinition( model=MODEL_DEPLOYMENT, instructions="You are a helpful, friendly assistant.", ), ) Authentication uses DefaultAzureCredential , which works with az login locally and with managed identity in production — no API keys anywhere in the code. Chatting with the agent # Create a server-side conversation (persists history across turns) conversation = openai.conversations.create() # Each turn sends the user message; the agent sees full history response = openai.responses.create( input=user_input, conversation=conversation.id, extra_body={"agent_reference": {"name": AGENT_NAME, "type": "agent_reference"}}, ) print(response.output_text) The conversation object is server-side. You pass its ID on every turn; the history lives in Foundry, not in a local list. This is the Responses API pattern — distinct from the older Completions or Chat Completions APIs. Demo 1: Function Tools and the Tool-Calling Loop Demo 1 adds function calling against a real weather API. The key insight here is that the model does not execute the function — it requests the execution, and your code executes it locally, then feeds the result back. Declaring a function tool from azure.ai.projects.models import FunctionTool, PromptAgentDefinition func_tool = FunctionTool( name="get_weather", description="Get the current weather for a given city.", parameters={ "type": "object", "properties": {"city": {"type": "string", "description": "City name"}}, "required": ["city"], }, strict=True, ) agent = project.agents.create_version( agent_name=AGENT_NAME, definition=PromptAgentDefinition( model=MODEL_DEPLOYMENT, tools=[func_tool], instructions="You are a weather assistant...", ), ) The tool-calling loop response = openai.responses.create(input=user_input, conversation=conversation.id, ...) # Loop while the model is requesting tool calls while any(item.type == "function_call" for item in response.output): input_list = [] for item in response.output: if item.type == "function_call": args = json.loads(item.arguments) result = get_weather(args["city"]) # execute locally input_list.append(FunctionCallOutput(call_id=item.call_id, output=result)) # Send results back to the agent response = openai.responses.create(input=input_list, conversation=conversation.id, ...) print(response.output_text) The strict=True parameter on FunctionTool enforces structured outputs — the model must return arguments that match the declared JSON schema exactly. This eliminates argument parsing errors in production. Demo 2: UI Is Not Your Agent Demo 2 runs the exact same agent as Demo 1 but surfaces it in a Tkinter desktop window. The point is pedagogical: your agent definition, conversation management, and tool-calling logic are entirely independent of your UI layer. Swapping from terminal to desktop requires changing only the presentation code — nothing in the agent or conversation path changes. This is a principle worth internalising early: agent logic and UI logic should never be entangled. The lab enforces this separation structurally. Demo 3: Server-Side Built-In Tools The web search demo introduces a sharp contrast with Demo 1. With WebSearchTool , the tool-calling loop disappears entirely from client code: from azure.ai.projects.models import WebSearchTool agent = project.agents.create_version( agent_name="Search-Agent", definition=PromptAgentDefinition( model=MODEL_DEPLOYMENT, tools=[WebSearchTool()], instructions="You are a research assistant...", ), ) The agent decides when to search, executes the search server-side, and returns a grounded response with citations. Your client code looks identical to Demo 0 — a simple responses.create() call with no tool loop. The distinction matters architecturally: Function tools (Demo 1) — tool execution happens on your client; you control the code, the API call, the error handling. Built-in tools (Demo 3+) — tool execution happens inside Foundry; you get results without managing execution. Demo 4: Code Interpreter and the Gradio Web UI Demo 4 attaches CodeInterpreterTool , which gives the agent a sandboxed Python execution environment inside Foundry. The agent can write code, run it, observe output, and iterate — all server-side. Combined with a Gradio web interface, this demo shows an agent that can perform data analysis, generate charts, and explain results through a browser UI. Model Router is particularly interesting here: the empirical data shows it selects a more capable frontier model ( gpt-5.4-2026-03-05 ) for code-generation tasks, while simpler conversational turns stay on lighter models. Demo 5: Retrieval-Augmented Generation with FileSearchTool Demo 5 introduces RAG. The setup phase uploads a document, creates a vector store, and attaches it to the agent: # Upload document and create a vector store vector_store = openai.vector_stores.create(name="employee-handbook-store") with open("data/employee-handbook.md", "rb") as f: openai.vector_stores.files.upload_and_poll( vector_store_id=vector_store.id, file=f ) # Attach the vector store to the agent agent = project.agents.create_version( agent_name="RAG-Agent", definition=PromptAgentDefinition( model=MODEL_DEPLOYMENT, tools=[FileSearchTool(vector_store_ids=[vector_store.id])], instructions="Answer questions using only the provided documents...", ), ) At query time, the agent embeds the question, searches the vector store semantically, retrieves matching chunks, and generates an answer grounded in the retrieved content — entirely server-side. The client code remains a plain responses.create() call. An important detail: the .vector_store_id file is written to disk during setup and read back during the chat session, so the demo survives process restarts without re-uploading the document. The .gitignore excludes this file from source control. Demo 6: Model Context Protocol Demo 6 connects the agent to a GitHub MCP server, giving it access to repository and issue data via the open Model Context Protocol standard. MCP servers expose tools over a standardised wire protocol; the agent discovers and calls them without any client-side function declarations. The demo also demonstrates human-in-the-loop approval: before executing any MCP tool call, the agent surfaces the proposed action and waits for the user to confirm. This is an important safety pattern for agents that can trigger side effects on external systems. Demo 7: Toolbox — Centralised Tool Governance Where Demo 6 connects to a single MCP server directly, Demo 7 uses a Toolbox — a managed Microsoft Foundry resource that bundles multiple tools into a single, versioned, MCP-compatible endpoint. The Toolbox in this demo exposes both GitHub Issues and GitHub Repos tools, curated into an immutable versioned snapshot. This pattern is significant for production multi-agent systems: Centralised governance — one team owns the tool definitions; all agents consume them via a single endpoint. Versioned snapshots — promoting a new Toolbox version is explicit; agents pin to a version and upgrade intentionally. MCP compatibility — any MCP-capable agent or framework can connect, not just Foundry SDK agents. from azure.ai.projects.models import McpTool toolbox_tool = McpTool( server_label="toolbox", server_url=TOOLBOX_ENDPOINT, allowed_tools=[], # empty = all tools in the Toolbox version headers={"Authorization": f"Bearer {token}"}, ) Demo 8: Self-Hosted Agent with the Responses Protocol The final demo departs from the prompt-agent pattern. Instead of registering a declarative agent in Foundry, Demo 8 implements a custom agent server using the Responses protocol. The server exposes a streaming HTTP endpoint; Foundry's Agent Inspector can connect to it and route user turns to it just as it would to a hosted prompt agent. This demo includes a Dockerfile and an agent.yaml , enabling deployment to Foundry's container hosting service. It uses gpt-4.1-mini directly rather than the model router, because the custom server owns the entire inference path. When to consider this pattern: Your agent requires custom pre- or post-processing logic that cannot be expressed in a system prompt. You need to integrate with infrastructure that is not reachable through MCP or built-in tools. You want to own the inference call for cost control, A/B testing, or compliance reasons. You are building a multi-agent orchestrator that needs to expose itself as an agent to other orchestrators. Getting Started The lab requires Python 3.10 or higher, an Azure subscription with a Microsoft Foundry project, and the Azure CLI. 1. Clone and set up the virtual environment git clone https://github.com/microsoft-foundry/Foundry-Agent-Lab.git cd Foundry-Agent-Lab # Create and activate the virtual environment python -m venv .venv # Windows Command Prompt .venv\Scripts\activate.bat # Windows PowerShell .venv\Scripts\Activate.ps1 # macOS / Linux source .venv/bin/activate pip install -r requirements.txt 2. Configure a demo copy hello-demo\.env.sample hello-demo\.env # Edit hello-demo\.env and set PROJECT_ENDPOINT Your PROJECT_ENDPOINT is on the Overview page of your Foundry project in the Azure portal. It takes the form https://your-resource.ai.azure.com/api/projects/your-project . 3. Run the demo az login 0-hello-demo Each numbered batch file at the root activates the virtual environment, runs create_agent.py , and launches chat.py . Append log to capture the full session transcript: 0-hello-demo log Reset between runs hello-demo\reset.bat Every demo includes a reset.bat that deletes the registered agent and any associated resources (vector stores, uploaded files). Demos are fully repeatable. Architecture Principles Demonstrated Across the nine demos, the lab illustrates a set of design principles that apply directly to production agent systems: Keyless authentication throughout Every demo uses DefaultAzureCredential . No API keys appear anywhere in the code. Locally, az login provides credentials. In production, managed identity takes over automatically — same code, no secrets to rotate. Server-side conversation state The Responses API stores conversation history server-side. Your application passes a conversation ID; Foundry maintains the thread. This eliminates the common bug of truncating history due to local list management and makes multi-process or multi-instance deployments straightforward. Client-side vs server-side tool execution The lab makes the distinction explicit. Function tools execute in your process — you control the code, the external call, and the error handling. Built-in tools (WebSearch, CodeInterpreter, FileSearch) execute inside Foundry — you get results without managing execution infrastructure. MCP tools (Demo 6, 7) fall between these: they execute in a separately deployed server, with the protocol mediating the call. Progressive tool introduction Each demo's create_agent.py registers the agent once. The chat.py file handles the conversation loop. These two responsibilities are always separate, making it easy to update agent definitions without modifying conversation logic, and vice versa. Security Considerations When building agents for production, keep the following in mind: Never commit .env files. The .gitignore excludes them, but verify this before pushing. Use Azure Key Vault or environment variable injection in CI/CD pipelines. Use managed identity in production. DefaultAzureCredential automatically picks up managed identity when deployed to Azure, eliminating the need for any stored credentials. Apply human-in-the-loop for side-effecting tools. Demo 6 demonstrates this pattern for MCP tool calls. Any agent that can modify external state (create issues, send emails, write files) should surface proposed actions for confirmation. Validate tool outputs before use. Treat data returned by external tools (weather APIs, search results, document retrieval) as untrusted input. Prompt injection through tool results is a real attack surface; grounding instructions in your system prompt reduce but do not eliminate this risk. Scope Toolbox permissions narrowly. When using a Toolbox (Demo 7), use allowed_tools to restrict which tools the agent can call, rather than granting access to all tools in a Toolbox version. Key Takeaways Start with the minimum. A prompt agent with no tools requires fewer than 30 lines of code using the Foundry SDK. Add tools only when the use case demands them. Use model-router unless you have a specific reason not to. The empirical data in the lab shows the router selects appropriate models across all task types — factual, creative, tool-calling, RAG, and code generation. Understand the client/server tool boundary. Function tools give you control; built-in tools give you simplicity. MCP and Toolbox give you governance and interoperability. Choose based on where you need control and where you need scale. Conversation state belongs on the server. Do not maintain conversation history in application memory if you can avoid it. The Responses API conversation object is designed for this. The hosted-demo pattern is for when you need to own the inference path. For most use cases, a declarative prompt agent is sufficient and far simpler to operate. Next Steps Explore the repo: github.com/microsoft-foundry/Foundry-Agent-Lab Microsoft Foundry SDK documentation: learn.microsoft.com/azure/ai-studio/ Responses API quickstart: Prompt agent quickstart Model Router conceptual documentation: Model Router for Microsoft Foundry Model Context Protocol: modelcontextprotocol.io Azure Identity SDK (DefaultAzureCredential): azure-identity Python SDK The Foundry Agent Lab is open source under the MIT licence. Contributions, bug reports, and feature requests are welcome through GitHub Issues. See CONTRIBUTING.md for guidelines.Agents League: The Esports-Inspired Hackathon Where AI Agents Battle for Glory
Ready to put your AI skills to the ultimate test? Agents League is here, a dynamic, esports-inspired developer challenge that brings the thrill of live competition to the world of agentic AI. Whether you're a seasoned AI developer or just getting started, this is your chance to build, compete, and win. What is Agents League? Agents League is a week-long hackathon running as part of AI Skills Fest (June 4–14, 2026). Unlike traditional hackathons, Agents League combines live AI coding battles, asynchronous project submissions, and a thriving Discord community all competing for a total prize pool of $55,000 USD. This isn't just about building it's about showcasing what's possible with agentic AI in a format that's fast, competitive, and globally accessible. Three Challenge Tracks Pick One or Compete in All 1. Creative Apps Build innovative applications using GitHub Copilot for AI-assisted development. Show off your creativity and demonstrate how AI can accelerate app creation from concept to code. 2. Reasoning Agents Create intelligent agents using Microsoft Foundry that solve complex problems through multi-step reasoning. This track is all about building agents that can think, plan, and execute. 3. Enterprise Agents Build business-ready knowledge agents integrated with Microsoft 365 Copilot, authored in Copilot Studio. Perfect for developers focused on real-world enterprise solutions. Live Microsoft Reactor Events—Don't Miss the Battles! The heart of Agents League beats through live Microsoft Reactor events. Watch experts go head-to-head in live coding battles, learn cutting-edge techniques, and get inspired for your own submissions: Event What You'll Learn Creative Apps Battle See GitHub Copilot in action as experts build innovative apps live Reasoning Agents Battle Watch multi-step reasoning agents come to life with Microsoft Foundry Enterprise Agents Battle Learn to build M365-integrated agents with Copilot Studio 👉 View the full event series Key Dates Registration Deadline: June 12, 2026, 12:00 PM PT Hacking Period: June 4–14, 2026 Submission Deadline: June 14, 2026, 11:59 PM PT What You Get Live coding battles with expert demonstrations Curated technical experiences and on-demand content Learning resources on Microsoft Learn and AI Skills Navigator Community support through Discord GitHub-based submissions for transparent, collaborative judging Why Participate? Agents League isn't just another hackathon. It's designed as a streamlined, competitive format that: ✅ Fits into your schedule with focused, time-boxed challenges ✅ Provides real-world product innovation experience ✅ Offers global accessibility—participate from anywhere ✅ Demonstrates the latest capabilities of agentic AI, including new IQ tools ✅ Connects you with a passionate developer community Ready to Enter the Arena? Register Now for Agents League Before you register: Review the Hackathon Rules and Regulations for prize categories and judging criteria Join the Microsoft Reactor event series for live battles and learning Check out the Microsoft Event Code of Conduct Join the Conversation Have questions? Want to connect with fellow competitors? Join the Agents League community on Discord and start strategizing with developers from around the world. Whether you're building creative apps, reasoning agents, or enterprise solutions—the arena awaits. May the best agent win! 🏆 Agents League hackathon is open to the public and offered at no cost. Government employees should check with their employers to ensure participation is permitted in accordance with applicable policies. Related Links: Agents League Hackathon Registration Microsoft Reactor Series AI Skills FestHow to Visualize Your Azure AI Workloads Usage for Observability
This article assumes you already have an Azure Foundry project and resource deployed in Microsoft Foundry. The options referenced here are documented in detail in the linked articles; this post serves as a consolidated step by step guide bringing them all together and explaining where each option is most useful. A Summary: Need Best Option Quick day-over-day visual, minimal setup Grafana Dashboard (Option 3) Custom growth % calculations App Insights + KQL in Log Analytics (Option 4) Shareable, interactive report Azure Workbooks (Option 5) Per-user/per-agent granularity APIM + App Insights (Option 6) Quick one-off chart, export to Excel Microsoft Foundry Monitor tab or App Insights Metrics Explorer (Option 1 and 2) Option 1. Within the Microsoft Foundry Portal (Quickest, No Setup) If you have models deployed in Microsoft Foundry and would like to monitor its usage, go to the New Foundry Portal → Build → Models → Monitor tab. View metrics such as: Estimated cost Total token usage Input vs. output tokens Number of requests This is the simplest way to monitor both model and agent usage. For PAYG plans: You can also view your total allocated quota (and figure out which Tier you are on) using the Quota Management Screen (New Foundry Portal → Operate → Quota tab). This screen shows how much your total allocated quota is, per model in a given subscription + region + Deployment Type (Global, Data Zones or Regional). For eg., in the image below, for gpt-4o, I am allocated 7M total TPM in my subscription. I am only using 150K TPM of the allocated 7M TPM amount. Which means, my requests will get throttled if I exceed the 150K TPM limit. To avoid throttling, I would need to increase my shared allocation limit. NOTE: you are charged for usage, so if you allow more capacity, you use more, so you pay more. Option 2: Azure Monitor Metrics Explorer This is already built into the Azure Portal and gives you time-series charts out of the box. Go to Azure Portal → your Azure OpenAI / Foundry resource → Monitoring → Metrics Select a metric like AzureOpenAIRequests or TokenTransaction Set Aggregation to Sum (total) or Max and Time granularity to 1 day Split by ModelDeploymentName to see per-model trends Adjust the time range (e.g., last 30 days) — you'll see day-over-day bars/lines Tip: You can pin these charts to an Azure Dashboard for a persistent view, or click Share → Download to Excel to get the raw data for your own analysis. Option 3: Azure Managed Grafana (Best Pre-Built Dashboard) This is the best option for a polished, real-time, day-over-day dashboard with no custom code. There's a pre-built AI Foundry dashboard ready to import. [grafana.com], [Create a M...ed Grafana] How to set it up: Create an Azure Managed Grafana workspace (if you don't have one) In Grafana, go to Dashboards → New → Import → enter dashboard ID 24039 (for Foundry) Select your Azure Monitor data source and point it to your Foundry resource Tip: You can also import this directly from the Azure Portal: Monitor → Dashboards with Grafana → AI Foundry. That's it — the dashboard gives you (per model deployment): Token trends over time (inference, prompt, completion — day over day) Request trends over time (AzureOpenAIRequests as a time series) Latency trends (bonus) NOTE: Default time range is 7 days — adjust to 30/60/90 days for growth trends Option 4: Application Insights + KQL Queries (Most Flexible, Custom Reports) If you want fully custom day-over-day growth calculations (e.g., % change day-to-day), this is the way. [azurefeeds.com] Setup: Ensure your Foundry project is connected to an Application Insights resource (Foundry → Settings → Connected Resources). Open up App Insights resource → Logs → New Query or choose a sample query. In the images below, we simply ran 'requests' and set the time range to 24 hours. There is also a Kusto Query Language (KQL) mode or Simple mode on the right-hand side: Simple mode will let you run out of the box samples. KQL mode will open up a query window for you to enter custom queries. Below are the results in grid view. Same view but showing a chart: Export options: Another way to get the above graphs are via Log Analytics. Simply enable Diagnostic Settings on your Azure OpenAI resource → send to a Log Analytics workspace. Open Log Analytics → Logs and try our your sample queries. Sample KQL for day-over-day token usage (adjust to your needs): AzureMetrics | where MetricName in ("TokenTransaction", "ProcessedPromptTokens", "GeneratedTokens") | where TimeGenerated > ago(30d) | summarize DailyTokens = sum(Total) by bin(TimeGenerated, 1d), MetricName | order by TimeGenerated asc | render timechart Result: Sample KQL for day-over-day growth % (adjust to your needs): AzureMetrics | where MetricName == "TokenTransaction" | where TimeGenerated > ago(30d) | summarize DailyTokens = sum(Total) by Day = bin(TimeGenerated, 1d) | sort by Day asc | extend PrevDay = prev(DailyTokens) | extend GrowthPct = round((DailyTokens - PrevDay) / PrevDay * 100, 2) | project Day, DailyTokens, GrowthPct Option 5: Azure Monitor Workbooks (Custom Dashboards, Shareable) Workbooks let you build interactive, parameterized dashboards that combine metrics and KQL logs. What's more, you can select resources from multiple subscriptions and visualize them all in one place using Workbooks! Go to Azure Portal → Monitor → Workbooks → New Add a Metrics query panel → select your Log Analytics or App Insights or Foundry resource -> Enter the same query you used in Option 4. Do a test run and view the graphs (this can be viewed as charts or a list (grid view)): 4. Save and share with your team. Option 6: APIM + Application Insights (Granular Per-Caller/Per-Agent Tracking) 1. If your app routes requests through Azure API Management, you can use the azure-openai-emit-token-metric policy to send per-request token metrics to Application Insights with custom dimensions (User ID, Subscription ID, Agent, etc.). [Azure API...osoft Docs] This is ideal for scenarios like: "Which agent consumed the most tokens last week?" "What's the token usage per API consumer/team?" NOTE: Microsoft Foundry resources do not track usage by users. So, fronting your Foundry resource with an APIM could be a way to track users provided you pass the username/id in the request context. How you implement this is upto your app design. Ref: AI-Gateway/labs/token-metrics-emitting/token-metrics-emitting.ipynb at main · Azure-Samples/AI-Gateway · GitHub Bonus: Check out all other APIM + AI related policies here: AI-Gateway/labs/semantic-caching at main · Azure-Samples/AI-Gateway AI-Gateway/labs/token-rate-limiting at main · Azure-Samples/AI-Gateway AI-Gateway/labs/token-metrics-emitting/token-metrics-emitting.ipynb at main · Azure-Samples/AI-Gateway · GitHubMicrosoft Foundry Toolkit for VS Code is Now Generally Available
We are thrilled to announce that the Microsoft Foundry Toolkit for VS Code, formerly AI Toolkit, is now Generally Available (GA)! From first model prompt to production‑grade AI agents, Foundry Toolkit lets you build, debug, and ship AI end to end without ever leaving VS Code. Same Product. New Name. You may know this extension as AI Toolkit — and we thank you for using it in the past year and for the continuous feedback that has shaped the product. With this GA release, we’re rebranding AI Toolkit to Microsoft Foundry Toolkit. The new name reflects where we’re headed: a single, unified developer experience for building AI apps and agents on the Microsoft AI platform. Rest assured, this is a name change only — there are no plans to remove or deprecate any existing features. Empower AI Development from Idea to Production with Foundry Toolkit The GA release brings together the most requested features into a high-performance workflow: 🧪 Curated Model Playground: Don’t waste time with setup. Browse and chat with over 100+ state-of-the-art models from Microsoft Foundry, GitHub, OpenAI, Anthropic, Ollama, and more. Compare performance side-by-side and export production-ready code in seconds. 🤖 Agent Builder (No-Code/low code): Experiment with agent ideas or build sophisticated agents without writing boilerplate code. Define instructions, link tools from the Foundry catalog, or connect local MCP (Model Context Protocol) servers to have a functional agent running in minutes. ✨GitHub Copilot powered agent development: With Foundry tools and skills built into the Toolkit, GitHub Copilot is equipped with deep context to jumpstart agent creation using the Microsoft Agent Framework - often from a single prompt. 🛠️ Deep-Cycle Debugging: Move beyond black-box AI. The Agent Inspector provides real-time workflow visualization, breakpoints, and full local tracing across tool calls and agent chains. ⚡ Edge-Optimized Performance: Specialized support for the Phi model family. Fine-tune Phi Silica on your data, quantize for NPU/GPU targets, and profile on-device performance to ensure your models run lean and fast. 🚀 Seamless Scale: Transition from local to cloud with one click. Deploy directly to the Microsoft Foundry Agent Service and run continuous evaluations using familiar pytest syntax within the VS Code Test Explorer. Get Started Today Install: Microsoft Foundry Toolkit on VS Code Marketplace. Quick Start: Follow our 3-Minute Getting Started Tutorial to build your first AI agent. Deep Dive: Check out documentations, Samples, and workshop. Join the Community Join us on Model Monday event on 4/20 where we will talk through Building Foundry Agents using VS Code and GitHub Copilot. We can’t wait to see what you build. Share your projects, file issues, or suggest features on our GitHub repository. Welcome to the next chapter of AI development!Six Coding Agents, One Production System: A Field Guide to AgenticOps with AKS-Lab-GitHubCopilot
The shift: from "AI helps me code" to "AI authors my repo" For two years we've been talking about GitHub Copilot as an inline pair programmer — a clever autocomplete that lives in your editor. That framing is officially out of date. The new reality is agentic delivery: a team of named, scoped AI agents owns slices of your repository, each with its own tools, skills, and refusal rules. They produce pull requests. They run tests. They roll deployments. And when one finishes its turn, it hands off to the next. The microsoft/AKS-Lab-GitHubCopilot's five labs you ship ZavaShop — a multi-agent retail supply-chain control plane running on AKS + Azure Container Apps — and along the way you internalize an operating model you can carry to any project. Everything in the repo (specs, agents, MCP servers, tests, Bicep, Helm, GitHub Actions) is authored by six GitHub Copilot Custom Coding Agents working from your IDE, plus the remote GitHub Copilot Coding Agent that closes the PR loop on GitHub. This is what AgenticOps looks like in practice. Two layers of agents — don't confuse them The first cognitive hurdle in this lab is keeping two very different agent populations straight: Layer What it is When it lives Examples Application agents The product you ship — the runtime ZavaShop fleet that solves a business problem Production (AKS + ACA) InventoryAgent, SupplierAgent, LogisticsAgent, PricingAgent, OrchestratorAgent Coding agents The dev-time team that writes the application agents Your IDE + GitHub requirements-analyst, mcp-builder, agent-builder, orchestrator-architect, test-author, deploy-engineer Both are built with the Microsoft Agent Framework (MAF). Both use the GitHub Copilot SDK as their model provider. But they exist at different layers of the development lifecycle, and the entire lab is structured around that distinction. If you only remember one thing from this post: the coding agents are how you build the application agents. That is the whole AgenticOps loop, compressed into one sentence. GitHub Copilot Coding Agent vs. Custom Coding Agents There are two flavors of "coding agent" in the GitHub Copilot ecosystem, and this lab uses both. 1. The remote GitHub Copilot Coding Agent This is the GitHub-side, asynchronous, PR-driven agent. You assign it an issue, it spins up a sandboxed environment, writes the code, runs the tests, and opens a PR for human review. You don't watch it work — you review what it produces. In ZavaShop, Lab 04 (Testing) explicitly uses this agent: you take a failing eval scenario, file it as an issue, assign it to Copilot, and the agent comes back with a PR. Your job is the human bar, not the keystrokes. Important governance choice from AGENTS.md: the remote Coding Agent is allowed to open PRs against src/ and tests/ only — never against infra/ without human review. That single rule is a textbook example of agent-aware policy. 2. The local Custom Coding Agents These are scoped, in-IDE specialist agents you select <agent name> in Copilot Chat. They live as *.agent.md files inside .github/agents/ and are discovered by VS Code on reload. Each one owns exactly one slice of the repository. Six of them ship in this lab: Phase Agent Owns Refusal rule Requirements requirements-analyst specs/*.md Refuses to write code MCP tools mcp-builder src/mcp_servers/* One server per turn Specialist agents agent-builder src/agents/<specialist>/* One specialist per turn Orchestration orchestrator-architect src/agents/orchestrator/*, src/shared/*, docker-compose.yml Owns wiring, not business logic Tests test-author tests/** Never edits src/ Deploy deploy-engineer infra/**, .github/workflows/** Won't touch application code The pattern that matters here isn't just "we made some custom agents." It's that every agent declares what it owns and what it refuses to do. That refusal envelope is what makes the system safe to delegate to. Without it, you'd just have a noisier autocomplete. Three workflow prompts in .github/prompts/ chain the agents together so you don't have to remember the sequence: /feature-from-issue — issue → spec → code → tests → PR → deploy /spec-to-code — drive an existing spec through code + tests /ship-it — quality gate → build → push → ACR/ACA/AKS rollout → smoke + evals This is the closest thing I've seen to a programmable software development lifecycle. Where AgenticOps fits in DevOps gave us repeatable infrastructure. MLOps gave us repeatable model lifecycles. AgenticOps is what you need when the thing you're operating is itself a fleet of autonomous agents — both at build time and at runtime. The lab makes the four pillars of AgenticOps concrete: Specs as the contract. /requirements-analyst produces specs/<slug>.md files with goals, contracts, and eval scenarios. Nothing else in the repo is built until that spec exists. Specs are the source of truth that human reviewers actually read. Skills as living documentation. .github/skills/<skill>/SKILL.md files hold shared, agent-agnostic knowledge — Python conventions, Kubernetes patterns, MAF idioms. Every coding agent declares which skills it must consult before writing code. This is how you stop drift: knowledge lives in one place and is pulled in on demand. Evals as the quality gate. The repo runs a four-layer test pyramid plus five golden eval scenarios (S1–S5). uv run poe check runs locally and in GitHub Actions. Copilot-authored PRs must pass the same bar a human does — no exceptions. Observability tied to agent identity. Every agent emits agent.name, agent.run_id, and agent.span_id through structlog. When something misbehaves in production, you can trace the line from "this evaluation failed" all the way back to "this version of this agent, on this run, called this tool with these arguments." These four pillars aren't ZavaShop-specific. They're the contract for any AgenticOps system: scoped ownership, contracts as code, evals as gates, identity in every span. Walking through the workshop: which agent does what, when The five labs are five chapters of one story — ZavaShop going from an empty Azure subscription to a live retail control plane. Each lab activates a different subset of coding agents. Lab 01 — Environment Setup (no coding agents yet) You provision the platform: AKS cluster, ACA environment, Azure Container Registry, Key Vault, and the Workload Identity that every agent will wear. Then you install the six Custom Coding Agents into your IDE. Think of this as hiring the development team and giving them their badges. Lab 02 — Agent Creation (four agents in play) This is where it clicks. You start by requirements-analyst in Copilot Chat to produce the spec for each ZavaShop application agent. Then mcp-builder is invoked four times to scaffold the four MCP servers — one per domain (inventory DB, supplier API, shipping API, pricing API). Then agent-builder runs four more times to build the typed ChatAgent specialists. Finally orchestrator-architect wires them together with a MAF Workflow. What's stunning about this lab is the handoff discipline. Every coding agent ends its turn with a line naming the next agent to invoke. You're not orchestrating the work — the agents are. Lab 03 — Multi-Agent Orchestration & Config (two agents) The orchestrator stops being a one-shot LLM call and becomes a deterministic Workflow. Secrets move from .env to Key Vault. The whole fleet boots locally with Docker Compose. This is orchestrator-architect's star turn — wiring A2A endpoints, MCP tool registration, Key Vault hydration, OpenTelemetry. Specs come from requirements-analyst; the rest is orchestration. Lab 04 — Testing (both coding agent flavors) /test-author writes the four-layer pyramid (unit, MCP contract, integration, eval). Then you switch gears: take a failing eval scenario, file it as a GitHub issue, and assign it to the remote GitHub Copilot Coding Agent. The agent works asynchronously, opens a PR, and uv run poe check decides whether it passes. This is the lab where the local-vs-remote distinction stops being abstract and starts being operational. Lab 05 — Deployment & Run (deployment specialist) /deploy-engineer writes the Helm chart for the AKS orchestrator and the Bicep modules for the ACA specialists. The /ship-it workflow prompt then runs the full pipeline: quality gate → ACR build → ACA deploy → AKS rollout → smoke tests → evals. GitHub Actions OIDC re-runs the same pipeline on every main push. Notice the pattern across all five labs: at no point does a human write production code from scratch. Humans set goals, review specs, approve PRs, and run quality gates. The keystrokes belong to agents. How Coding Agents transform the DevOps pipeline Take a step back from the lab and ask: what actually changes in your DevOps flow when you adopt this model? The atomic unit of work shifts. In classic DevOps the unit is the commit. In AgenticOps the unit is the spec. A spec drives one or more agents; agents produce commits; commits trigger CI; CI gates promotion. The commit becomes a derived artifact, not the starting point. Code review changes shape. You're no longer reviewing "did this human understand the codebase?" — you're reviewing "did this agent follow its refusal rules, consult its skills, and produce something that passes the evals?" Reviewers spend less time on style and more time on intent. The diff is often less interesting than the spec it came from. Governance becomes structural, not procedural. Instead of writing a wiki page that says "don't touch infra without review," you encode that rule in AGENTS.md and refuse to let the agent's tool set include infra paths. Policy becomes part of the agent definition, not a checklist humans hopefully remember. The CI pipeline expands. Beyond build/test/deploy, you now have an eval stage that asks "does the system still behave correctly on the golden scenarios?" — and a Copilot-authored PR has to pass the same eval stage as a human-authored one. The pipeline is the great equalizer. Onboarding compresses. A new engineer doesn't need to read 50 wiki pages to be productive. They read AGENTS.md, select the relevant agent walks them through. Institutional knowledge lives in .agent.md and SKILL.md files instead of senior engineers' heads. The net effect is a pipeline that's faster, more uniform, and easier to audit. Faster because agents parallelize what humans serialize. More uniform because every change goes through the same six-agent template. Easier to audit because every artifact has a named author and a refusal rule it had to respect. What to take away The AKS-Lab-GitHubCopilot workshop teaches three things at once. The surface lesson is "how to build a multi-agent retail system on AKS." The middle lesson is "how to use GitHub Copilot Custom Agents and the remote Coding Agent." The deepest lesson — and the one I'd argue matters most — is how to design a development process where AI agents are first-class citizens with bounded responsibilities, not free-form copilots. If you take the model and walk away from the lab, three patterns are worth keeping: Scope before capability. Don't give an agent every tool; give it the smallest surface that makes it useful. Specs are the API between humans and agents. Invest in requirements-analyst-style flows even if the rest of your stack isn't there yet. Evals are non-negotiable. The moment an agent can open a PR, you need a quality gate that doesn't care who the author is. Clone the repo microsoft/AKS-Lab-GitHubCopilot , hit Developer: Reload Window, select agents in Copilot Chat, and watch six teammates show up. That's the future of the DevOps pipeline — and it's already shipping. Resources microsoft/AKS-Lab-GitHubCopilot — The repository this post is built on. Best practices for using Copilot to work on tasks — Governance patterns for delegating issues to Copilot. GitHub Copilot SDK (Python) — The provider used by every agent in this lab.481Views0likes0CommentsHow to Test AI Agents with LangSmith: A Complete Guide
Testing AI agents is crucial for ensuring reliability and accuracy in production. Evaluation is a technique to evaluate your agents. Different type of evaluation are # Evaluation Type 1 Task Success (Pass / Fail) 2 Instruction Adherence 3 Correctness / Accuracy 4 Relevance 5 Groundedness (Hallucination) 6 Coherence / Fluency 7 Tool‑Use Accuracy 8 Safety / Harmfulness LangSmith provides powerful tools for creating datasets, running evaluations, and using LLM-as-judge techniques. This guide walks through the complete workflow using a practical example. Prerequistes : 1) create your account under langsmith. 2) generate langsmith key and store in .env file and load whenever a reference made for datacreation or doing evaluation or from command prompt use set LANGCHAIN_API_KEY = <your_api_key_here> Part 1: Creating Your Test Dataset The foundation of any good evaluation is a quality dataset. LangSmith allows you to create datasets programmatically with input-output pairs that serve as ground truth. from langsmith import Client def create_evaluation_dataset(): client = Client() # Create a new dataset dataset = client.create_dataset( dataset_name="Sample dataset", description="A sample dataset in LangSmith." ) # Define your test examples examples = [ { "inputs": {"question": "Which country is Mount Kilimanjaro located in?"}, "outputs": {"answer": "Mount Kilimanjaro is located in Tanzania."}, }, { "inputs": {"question": "What is Earth's lowest point?"}, "outputs": {"answer": "Earth's lowest point is The Dead Sea."}, }, ] # Add examples to the dataset client.create_examples(dataset_id=dataset.id, examples=examples) print(f"Created dataset: {dataset.name}") return dataset Best Practices for Dataset Creation Diverse Examples: Include edge cases and various question types Clear Ground Truth: Ensure reference answers are accurate and complete Sufficient Volume: Create enough examples to get statistically meaningful results Consistent Format: Maintain consistent input/output structure Part 2: Setting Up LLM-as-Judge Evaluation LLM-as-judge is a powerful technique where you use a language model to evaluate the quality of another model's responses. This approach scales well and can assess subjective qualities like correctness and hallucinations. import os from dotenv import load_dotenv from langsmith import Client, wrappers from openai import AzureOpenAI from openevals.llm import create_llm_as_judge from openevals.prompts import CORRECTNESS_PROMPT load_dotenv() # Wrap your AI client for LangSmith tracing openai_client = wrappers.wrap_openai(AzureOpenAI( azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"], api_key=os.environ["AZURE_OPENAI_API_KEY"], api_version="2025-04-01-preview", )) DEPLOYMENT_NAME = os.environ.get("AZURE_OPENAI_DEPLOYMENT", "gpt-5-mini") Defining Your Target Function The target function represents the AI agent you want to test: def target(inputs: dict) -> dict: """The AI agent being evaluated""" response = openai_client.chat.completions.create( model=DEPLOYMENT_NAME, messages=[ {"role": "system", "content": "Answer the following question accurately"}, {"role": "user", "content": inputs["question"]}, ], ) return {"answer": response.choices[0].message.content.strip()} Creating Custom Evaluators 1. Correctness Evaluator def correctness_evaluator(inputs: dict, outputs: dict, reference_outputs: dict): """Evaluates how correct the answer is compared to the reference""" evaluator = create_llm_as_judge( prompt=CORRECTNESS_PROMPT, # Pre-built prompt for correctness model="azure_openai:" + DEPLOYMENT_NAME, feedback_key="correctness", ) return evaluator( inputs=inputs, outputs=outputs, reference_outputs=reference_outputs ) 2. Hallucination Evaluator def hallucination_evaluator(inputs: dict, outputs: dict, reference_outputs: dict): """Detects if the answer contains unsupported claims""" evaluator = create_llm_as_judge( prompt="""You are an expert judge evaluating AI responses for hallucinations. <question> {inputs} </question> <answer> {outputs} </answer> <reference_answer> {reference_outputs} </reference_answer> Does the answer contain any claims or information that are not supported by the question or reference answer? Respond with true if the answer is free of hallucinations, false if it contains hallucinated information. You must also provide a brief explanation of your reasoning.""", model="azure_openai:" + DEPLOYMENT_NAME, feedback_key="hallucination", ) return evaluator( inputs=inputs, outputs=outputs, reference_outputs=reference_outputs ) Part 3: Running the Evaluation Execute the Complete Evaluation Pipeline def run_evaluation(): client = Client() # Run the evaluation experiment_results = client.evaluate( target, # Function to test data="Sample dataset", # Dataset name evaluators=[ # List of evaluators correctness_evaluator, hallucination_evaluator, ], experiment_prefix="first-eval-in-langsmith", max_concurrency=2, # Control API rate limits ) print("Evaluation Results:") print(experiment_results) return experiment_results if __name__ == "__main__": run_evaluation() Understanding Your Results When the evaluation completes, you'll get detailed metrics including: Individual Scores: Per-example results for each evaluator Aggregate Metrics: Overall performance across the dataset Trace Links: Deep links to view exact model interactions Comparison Views: Side-by-side comparisons of outputs vs. references Key Benefits of This Approach Automated Testing: Run comprehensive evaluations without manual review Scalable Assessment: Evaluate subjective qualities at scale Continuous Monitoring: Track performance changes over time Rich Analytics: Get detailed insights into failure modesGiving the Copilot SDK Agent a "hardware-level helmet" using Kata microVM on AKS
A Moment That Made Me Pause I was recently building an Agent service with the GitHub Copilot SDK. After getting it up and running, I went back through the execution logs and something jumped out at me: In a single conversation turn, the Agent had executed a shell command, read several files, and pulled down a third-party MCP server from npm via npx — all on its own. I didn't hard-code any of that. The model decided at runtime to run those commands, read those files, and install that package. That's when it hit me: a significant chunk of the code running inside this container was written on the fly — by the model, not by me. This is fundamentally different from a traditional web service. With a regular app, every line of code is written by a human, reviewed, and tested before it reaches production. But an AI Agent? Part of its behavior is generated at runtime. You don't know in advance what it's going to execute. So the question becomes: is the container we put it in actually strong enough? How Container Isolation Actually Works (And Where It Falls Short) Let me use an analogy. Think of a traditional container as an apartment in a building. Each apartment has its own walls — namespaces and cgroups keep things separated. From the inside, it feels like you have your own place. But every apartment shares the same roof — the host Linux kernel. Most of the time, this is fine. But if someone finds a crack in the roof — a kernel vulnerability — they can climb up from their apartment, walk across the roof, and drop into any other apartment in the building. That's a container escape. For a standard web service, this risk is manageable — the code inside your container is predictable. But an AI Agent is different. The code running inside the container is inherently unpredictable — it's not an external attacker you're worried about, it's the tenant itself. Docker laid this out clearly in Comparing Sandboxing Approaches for AI Agents: AI Agents are a class of workload that inherently requires stronger sandboxing. The shared-kernel model of traditional containers isn't enough. So what is enough? Meet the microVM: A Private Roof for Every Apartment Sticking with the building analogy — if the problem is a shared roof, the fix is obvious: give every apartment its own roof. You still live in an apartment (container). The building is still managed the same way (Kubernetes). But the ceiling above your head is now yours alone. Even if you punch through it, you only reach your own roof — not your neighbor's. That's the core idea behind a microVM. Koyeb published a great explainer called What Is a microVM. Here's the essence: It's a virtual machine — with its own independent guest kernel, fully isolated from the host kernel. This is where the security comes from. But it's a stripped-down VM — only the bare essentials: CPU, memory, network, block storage. No USB controllers, no sound cards, no GPU passthrough. So it's fast and light — millisecond boot times, small memory footprint, close to the container experience. One line summary: microVM = VM-grade isolation + near-container-grade lightness. How Does Kubernetes Use microVMs? Enter Kata Containers Knowing microVMs are great is one thing — but Kubernetes schedules Pods and containers, not VMs. How do you bridge these two worlds? That's exactly what Kata Containers does. Their tagline nails it: "The speed of containers, the security of VMs." Kata acts as a translation layer between Kubernetes and microVMs: From Kubernetes' perspective, it's still a standard Pod — scheduled, managed, and monitored normally. Under the hood, that Pod is actually running inside a lightweight VM with its own kernel. You don't change your application code. You don't change your CI/CD pipeline. You just tell Kubernetes: "Run this Pod with Kata's RuntimeClass." Kata handles the rest. On AKS, Microsoft has integrated Kata out of the box under the name Pod Sandboxing. The hypervisor is Microsoft Hyper-V (not QEMU), and the RuntimeClass is called kata-vm-isolation. You create a special node pool, and AKS sets everything up automatically. Now Let's Look at a Real Example Enough theory — let me walk you through something concrete. I built a sample called AKS_MicroVM that does one thing: Run a GitHub Copilot SDK Agent service on AKS, enforced to run inside kata-vm-isolation — a microVM sandbox. Here's the architecture: HTTPS request comes in └─ AKS Node Pool (KataVmIsolation enabled) └─ Pod (runtimeClassName: kata-vm-isolation) └─ Dedicated Hyper-V microVM └─ FastAPI service (Python / uvicorn) └─ GitHubCopilotAgent └─ Copilot CLI (Node.js) └─ MCP servers / tools Isolated guest kernel + seccomp + cgroup Egress restricted by NetworkPolicy From the outside, it's just an ordinary AKS Pod. On the inside, the app runs in its own micro virtual machine with a dedicated kernel. Project Structure The entire sample is just these files: app/ ← Agent service (Python) main.py ← FastAPI endpoints agent.py ← Copilot Agent wrapper tools.py ← Example function tools requirements.txt Dockerfile ← Python 3.12 + Node 20 + Copilot CLI k8s/ ← Kubernetes manifests namespace.yaml runtimeclass.yaml ← Reference (AKS auto-creates this) secret.example.yaml ← Token placeholder deployment.yaml ← The key file: enforces kata-vm-isolation service.yaml networkpolicy.yaml ← Locks down ingress/egress infra/ ← Infrastructure scripts 01-create-aks.sh ← Create the cluster 02-build-push.sh ← Build image, push to ACR 03-deploy.sh ← Deploy everything Three shell scripts to set up infrastructure, six YAML files to deploy the service. That's it. Not Just a microVM: Five Layers of Defense I want to emphasize this: the sample doesn't just slap on a microVM and call it a day. It stacks five layers of protection: What you're worried about How this layer addresses it Malicious code escaping the container kata-vm-isolation → dedicated microVM with its own kernel Privilege escalation inside the container runAsNonRoot + drop ALL caps + read-only filesystem + seccomp Agent phoning home to unauthorized endpoints NetworkPolicy allowlist — only Copilot/GitHub/MCP egress permitted Token leakage K8s Secret injection (upgradeable to Key Vault via CSI) Model instructing the Agent to do something dangerous on_permission_request defaults to deny; only allowlisted operations proceed The microVM is the outermost wall — hardware-grade isolation. But inside that wall, there are still guards, access controls, and surveillance cameras. You need all of them. Six Steps to Deploy # ① Create an AKS cluster with Kata support bash infra/01-create-aks.sh # ② Verify the RuntimeClass is ready kubectl get runtimeclass kata-vm-isolation # ③ Build the image and push to ACR (script auto-detects your ACR) bash infra/02-build-push.sh # ④ Add your GitHub Copilot token # Edit k8s/secret.example.yaml → rename to secret.yaml (don't commit it!) # ⑤ Deploy everything bash infra/03-deploy.sh # ⑥ Access via API server proxy kubectl proxy --port=8001 Then chat with the Agent: curl -s -X POST \ http://localhost:8001/api/v1/namespaces/copilot-agent/services/copilot-agent:80/proxy/chat \ -H 'content-type: application/json' \ -d '{"message":"Briefly introduce Kata Containers."}' Want streaming output? Use the stream endpoint: curl -N -X POST \ http://localhost:8001/api/v1/namespaces/copilot-agent/services/copilot-agent:80/proxy/chat/stream \ -H 'content-type: application/json' \ -d '{"message":"List 3 Linux kernel hardening tips","stream":true}' How to Verify It's Actually Running in a microVM One command: kubectl -n copilot-agent exec deploy/copilot-agent -- uname -r If the kernel version differs from the node's kernel — your Pod is running in its own guest kernel, not sharing the host's. Proof done. Gotchas I Hit So You Don't Have To kubectl port-forward doesn't work with Kata Pods. This is the easiest trap to fall into. The app listener runs inside the microVM, but port-forward connects to the empty sandbox netns on the host — you'll get connection refused. Use kubectl proxy instead. Token environment variable names. The Copilot CLI expects GH_TOKEN or GITHUB_TOKEN — not a custom name. The Deployment already injects both from the same Secret. Read-only filesystem needs emptyDir mounts. The container runs with readOnlyRootFilesystem: true, but the Copilot CLI needs to write to /home/agent/.cache at startup. The Deployment mounts emptyDir volumes at .cache, .copilot, and /tmp — miss one and the CLI won't start. Keep on_permission_request on deny-by-default. The Agent's tool calls go through a permission gate that defaults to deny, with an allowlist for approved operations. Don't switch this to approve-all in production — ever. Wrapping Up: The Thread That Ties It All Together Let me trace the logic one more time: ① Scenario: AI Agents inherently run model-generated, untrusted code inside containers ② Problem: Traditional containers share the host kernel — one escape compromises the entire node ③ Insight: We need hardware-grade isolation, stronger than namespaces alone ④ Solution: microVMs — a dedicated guest kernel for every Pod ⑤ Integration: Kata Containers brings microVM support to Kubernetes natively; AKS Pod Sandboxing makes it turnkey ⑥ Practice: The AKS_MicroVM sample — six steps to deploy, five layers of defense In the age of AI Agents, a container isn't just a box for your application — it's a box for uncertainty. It needs a stronger shell. The microVM is that shell. Full source code: https://github.com/kinfey/Multi-AI-Agents-Cloud-Native/tree/main/code/AKS_MicroVM Further reading: What is a microVM? — Koyeb Comparing Sandboxing Approaches for AI Agents — Docker Kata Containers225Views0likes0CommentsBuilding a Scalable Contract Data Extraction Pipeline with Microsoft Foundry and Python
Architecture Overview Alt text: Architecture diagram showing Blob Storage triggering Azure Function, calling Document Intelligence, transforming data, and storing in Cosmos DB Flow: Upload contract files (PDF or ZIP) to Azure Blob Storage Azure Function triggers automatically on file upload Azure AI Document Intelligence extracts layout and tables A transformation layer converts output into a canonical JSON format Data is stored in Azure Cosmos DB Step 1: Trigger Processing with Azure Functions An Azure Function with a Blob trigger enables automatic processing when a file is uploaded. import logging import azure.functions as func import zipfile import io def main(myblob: func.InputStream): logging.info(f"Processing blob: {myblob.name}") if myblob.name.endswith(".zip"): with zipfile.ZipFile(io.BytesIO(myblob.read())) as z: for file_name in z.namelist(): logging.info(f"Extracting {file_name}") file_data = z.read(file_name) # Pass file_data to extraction step Best Practices Keep functions stateless and idempotent Handle retries for transient failures Store configuration in environment variables Step 2: Extract Layout Using Document Intelligence The prebuilt layout model helps extract tables, text, and structure from documents. from azure.ai.documentintelligence import DocumentIntelligenceClient from azure.core.credentials import AzureKeyCredential client = DocumentIntelligenceClient( endpoint="<your-endpoint>", credential=AzureKeyCredential("<your-key>") ) poller = client.begin_analyze_document( "prebuilt-layout", document=file_data ) result = poller.result() Output Includes Structured tables Paragraphs and text blocks Bounding regions for layout context Step 3: Handle Multi-Page Table Continuity Contract documents often contain tables split across multiple pages. These need to be merged to preserve data integrity. def merge_tables(tables): merged = [] current = None for table in tables: headers = [cell.content for cell in table.cells if cell.row_index == 0] if current and headers == current["headers"]: current["rows"].extend(extract_rows(table)) else: if current: merged.append(current) current = { "headers": headers, "rows": extract_rows(table) } if current: merged.append(current) return merged Key Considerations Match headers to detect continuation Preserve row order Avoid duplicate headers Step 4: Transform to a Canonical JSON Schema A consistent schema ensures compatibility across downstream systems. { "id": "contract_123", "documentType": "contract", "vendorName": "ABC Corp", "invoiceDate": "2023-05-05", "tables": [ { "name": "Line Items", "headers": ["Item", "Qty", "Price"], "rows": [ ["Service A", "2", "100"] ] } ], "metadata": { "sourceFile": "contract.pdf", "processedAt": "2026-04-22T10:00:00Z" } } Design Tips Keep schema flexible and extensible Include metadata for traceability Avoid excessive nesting Step 5: Persist Data in Cosmos DB Store the transformed data in a scalable NoSQL database. from azure.cosmos import CosmosClient client = CosmosClient("<cosmos-uri>", "<key>") database = client.get_database_client("contracts-db") container = database.get_container_client("documents") container.upsert_item(canonical_json) Best Practices Choose an appropriate partition key (for example, documentType or vendorName) Optimize indexing policies Monitor request units (RU) usage Observability and Monitoring To ensure reliability: Enable logging with Application Insights Track processing time and failures Monitor document extraction accuracy Security Considerations Store secrets securely using Azure Key Vault Use Managed Identity for service authentication Apply role-based access control (RBAC) to storage resources Conclusion This approach provides a scalable and maintainable solution for contract data extraction: Event-driven processing with Azure Functions Accurate extraction using Document Intelligence Clean transformation into a reusable schema Efficient storage with Cosmos DB This foundation can be extended with validation layers, review workflows, or analytics dashboards depending on your business requirements. Resources Contract data extraction – Document Intelligence: Foundry Tools | Microsoft Learn microsoft/content-processing-solution-accelerator: Programmatically extract data and apply schemas to unstructured documents across text-based and multi-modal content using Azure AI Foundry, Azure OpenAI, Azure AI Content Understanding, and Cosmos DB.SonarPilot - Bulk Managing SonarQube Issues
Stop Managing SonarQube Issues One by One, Introducing SonarPilot Introduction If you've worked on a large projects, you know the feeling. You open your SonarQube dashboard, and you're staring at 30,000+ open issues spread across many rules. You need to triage them before the next sprint. You need to assign a category of issues to the right developer. You need to mark 4000 findings as false positives. And the only tool you have is, the SonarQube web UI. Click. Scroll. Click. Filter. Click again. This is the reality for thousands of development teams. SonarQube is a fantastic static analysis platform, but its UI is designed for investigating individual issues, not for making mass decisions across hundreds or thousands of findings at once. We built SonarPilot to solve exactly this problem. It's a Node.js/TypeScript IP accelerator that bridges SonarQube and Microsoft Excel, turning what used to take hours of repetitive portal work into a fast, familiar, spreadsheet-driven workflow. And we've published it as an IP on the Microsoft Chrysalis Portal so the entire community can benefit. In this blog, we'll walk through the problem in detail, show you how SonarPilot works step-by-step, and share the scenarios where it can save your team significant time. The Problem: SonarQube at Scale SonarQube works brilliantly for catching issues during development. But once a project accumulates technical debt or when you onboard a legacy codebase, you're quickly faced with a management problem that the tool itself doesn't solve well. Here are the specific pain points we kept running into: No native bulk-action UI SonarQube does not provide a way to select all issues under a rule and change their status in one action. You can filter by rule, but resolving or marking issues requires individual interactions. No offline review or sharing There's no way to export the backlog, share it with a project manager or architect for review, or triage issues in a meeting without portal access. Missing rule-level aggregation Leadership needs answers like "how many OPEN BLOCKER issues do we have in C# code across all security rules?", not a paginated list of individual findings. Bulk assignment isn't supported Assigning a category of issues (e.g., all SQL Injection warnings) to a developer requires manual effort, as bulk assignment is limited to a maximum of 400 issues per operation. False-positive triaging is tedious Marking 3000 false positives one at a time is slow enough that many teams simply don't do it, leaving dashboards permanently cluttered with noise. Introducing SonarPilot SonarPilot is a command-line tool built with Node.js and TypeScript that operates in two primary modes and an optional automation layer: Mode What It Does Read Connects to SonarQube, fetches all issues grouped by rule, and exports them to a structured multi-tab Excel workbook Update Reads the triage decisions you made in Excel and pushes them back to SonarQube via its bulk-change API Auto Pipeline File watcher + Azure Service Bus integration for a fully automated, event-driven update cycle The core idea is simple: use Excel as the triage interface. Everyone already knows Excel. You don't need to train anyone on the SonarQube portal to make triage decisions. The workbook becomes the artefact, reviewable, shareable, and auditable and SonarPilot handles the API communication in both directions. How the Excel Workbook is Structured When you run Read mode, SonarPilot generates an Excel workbook with the following structure: Information Sheet -> Report metadata: project key, branch, date generated, language filters, total issue count. Rules Sheet -> One row per SonarQube rule with: Rule key and name Severity (BLOCKER, CRITICAL, MAJOR, MINOR, INFO) Language Issue count (formula-linked to the rule's individual sheet) Action column this is where you make your triage decision One Sheet Per Rule -> Each rule gets its own tab containing every individual issue under that rule, with full details: Issue key Description / message File path and line number Current status Current assignee This structure gives you a complete, rule-first view of your entire code quality backlog in a single file. The Action Column: Your Triage Vocabulary The action column on the Rules Sheet is what powers SonarPilot's bulk-update capability. When you're ready to make a decision about an entire rule's worth of issues, you type one of the following values: Action Value Effect in SonarQube Typical Use Case CONFIRM Sets status to Confirmed Acknowledging real issues that need fixing UNCONFIRM Resets status to Reopened Reversing a previous confirmation RESOLVE_FP Resolves as False Positive Suppressing findings that don't apply RESOLVE_WF Resolves as Won't Fix Accepted risk or out-of-scope findings RESOLVE_FIXED Resolves as Fixed Issues already remediated in code ASSIGN:username Assigns to specified user Routing issues to the responsible developer SET_SEVERITY:value Changes severity level Reclassifying issue priority SET_TYPE:value Changes type (BUG, VULNERABILITY, CODE_SMELL) Correcting miscategorised findings You type the action once in the Rules Sheet, run Update mode, and SonarPilot applies that action to every issue under that rule. What would have been 300 clicks becomes a single cell edit. Step-by-Step: The SonarPilot Workflow Step 1 -> Configure and Connect Install and configure SonarPilot with zero code changes required. All settings are driven by environment variables or CLI arguments: npm install # Set environment variables export SONAR_URL=https://your-sonarqube-server.com export SONAR_TOKEN=your-api-token export SONAR_PROJECT_KEY=your-project-key export SONAR_BRANCH=main Or pass them as CLI arguments: npx ts-node src/index.read.ts --sonarUrl https://your-server.com --sonarToken <token> --projectKey my-project --branch main Step 2 -> Export Issues to Excel (Read Mode) Run the Read command to pull the full issue backlog from SonarQube: npx ts-node src/index.read.ts SonarPilot will: Authenticate via your SonarQube API token Fetch all quality profiles and rules for the project Paginate through the full issue list (handling SonarQube's 10,000-issue API limit) Generate a structured Excel workbook with the Information sheet, Rules sheet, and one sheet per rule The output is a timestamped .xlsx file in the configured output directory. Step 3 -> Triage in Excel Open the Excel workbook in Microsoft Excel. Navigate to the Rules Sheet. For each rule you want to act on, type the appropriate action value in the action column: Want to mark all "unused import" findings as Won't Fix? Type RESOLVE_WF Want to confirm all SQL injection warnings and assign them? Type CONFIRM in one row, ASSIGN:john.doe in another Want to dismiss 200 false positives in a formatting rule? Type RESOLVE_FP Save the file. That's all the human input needed. Step 4 -> Push Changes Back (Update Mode) Run the Update command pointing to your modified workbook: npx ts-node src/index.write.ts SonarPilot reads the action column, maps each action to the corresponding SonarQube API call, and executes them as bulk operations. You'll see a summary of how many issues were updated per rule. Step 5 -> Verify and Archive Open your SonarQube dashboard, you'll see all the changes reflected immediately. Keep the Excel workbook as a permanent audit record of what was triaged, when, and by whom. The Automation Pipeline: Zero-Touch Updates For teams that want to go further, SonarPilot supports a fully automated pipeline using Azure Service Bus and a local file watcher: Service Bus Listener -> Listens on an Azure Service Bus queue for trigger messages (e.g., from a CI/CD pipeline completing a SonarQube scan) Auto-Export -> On receiving a message, automatically runs Read mode to generate the Excel workbook File Watcher -> Monitors the output directory using chokidar; when the workbook is saved after triage, it triggers Update mode automatically Service Bus Sender -> After the update completes, sends a completion message back to the queue for downstream consumers This turns the entire Export → Triage → Update cycle into an event-driven pipeline. Configure it once, and the only manual step is opening the Excel file and making your triage decisions. # Start the Service Bus listener npx ts-node src/index.servicebus.ts # Start the file watcher (in a separate terminal) npx ts-node src/index.filewatcher.ts Key Features at a Glance Feature Description Rule-Grouped Excel Export Issues organised by rule with one tab per rule, not a flat dump 8 Bulk Actions Confirm, Unconfirm, Resolve (FP/WF/Fixed), Assign, Set Severity, Set Type Multi-Language Support Works with any language SonarQube analyses, C#, Java, TypeScript, Python, etc. Pagination Handling Automatically pages through SonarQube's 10K issue API limit Event-Driven Pipeline Azure Service Bus + file watcher for hands-free automation Dependency Injection Built with tsyringe for clean, testable architecture Zero Portal Dependency All triage happens in Excel, no SonarQube UI training needed Real-World Use Cases Legacy Codebase Onboarding A project being migrated to a new platform carries 45,000+ historical SonarQube issues. Before the first sprint, the architect exports the workbook, marks 12,000 findings as false positives (legacy patterns from the old framework), and assigns the remaining 33,000 issues to developers by rule category, all done in a 10-minute Excel session. One update run applies everything. The dashboard is clean and actionable on day one. Security Hotspot Triage The security team needs to classify all BLOCKER and CRITICAL findings across a multi-language project (C#, TypeScript, Java). SonarPilot exports them grouped by rule across all three languages into a single workbook. The security reviewer works through the Rules sheet, marking known false positives and assigning real findings to the appropriate team. The entire triage is completed and pushed in under one hour, no portal access required for the reviewer. Sprint Quality Check Before every sprint review, a developer runs Read mode against the main branch and shares the workbook with the delivery lead. The lead sees at a glance which rules have grown in issue count, marks any quick wins for resolution, and the update run clears them before the sprint demo. Clean dashboards, visible quality progress, zero portal navigation. Why Excel? Why Not a Custom UI? The choice to use Excel as the triage interface was deliberate, and it comes down to one thing: zero adoption friction. Every developer, architect, project manager, and delivery lead already knows Excel. Building a custom web UI would have added weeks of development, required deployment infrastructure, and introduced a learning curve. Excel gives you: Sorting and filtering -> instantly see all BLOCKER rules, sort by issue count Formulas -> the issue count on the Rules sheet is formula-linked to each rule's sheet Shareability -> email the workbook to anyone, no portal access needed Auditability -> every decision is recorded in the file, which is then archived Who Should Use SonarPilot? Engineering leads who need to triage thousands of issues across multiple projects before a release gate Security teams reviewing vulnerability findings and marking false positives in bulk after a penetration test or onboarding a legacy codebase Quality architects who want a shareable, offline artefact for sprint planning meetings, not a live portal session DevOps engineers automating quality gates in CI/CD pipelines where SonarQube scan results need programmatic triage Migration teams onboarding existing projects to SonarQube where the initial scan produces thousands of pre-existing findings that need bulk disposition Getting Started SonarPilot is published as an IP on the Microsoft Chrysalis Portal and available on GitHub. Prerequisites: Node.js v18+ (v20 LTS recommended) A SonarQube instance (8.x or 9.x) with an API token Microsoft Excel (for viewing/editing the workbook) Optional: Azure Service Bus namespace (for Auto Pipeline mode) Quick Start: # Clone the repository git clone https://github.com/HiteshDutt/sonar-qube-desktop-client.git cd sonar-qube-desktop-client # Install dependencies npm install # Configure (edit src/config/appsettings.ts or use environment variables) export SONAR_URL=https://your-server.com export SONAR_TOKEN=your-token export SONAR_PROJECT_KEY=your-project # Run Read mode npx ts-node src/index.read.ts # Make triage decisions in the generated Excel file... # Run Update mode npx ts-node src/index.write.ts What's Next? We're actively developing SonarPilot and have several enhancements planned: Branch comparison mode -> Compare issues between two branches to see what's new vs. inherited Dashboard summary sheet -> Auto-generated charts in the Excel workbook for executive reporting SonarCloud support -> Extending compatibility to SonarCloud's API Web UI companion -> A lightweight web interface for teams that prefer browser-based triage Conclusion SonarQube is an indispensable tool for code quality. But managing its output at scale, triaging, assigning, resolving, and auditing thousands of issues, has always been a manual, tedious process. SonarPilot bridges that gap by turning Excel into a powerful triage interface and automating the two-way communication with the SonarQube API. If your team spends hours clicking through the SonarQube portal to manage issues in bulk, give SonarPilot a try. We built it to solve our own pain, and we hope it helps yours too. Got questions or feedback? Drop a comment below or reach out to us directly. We'd love to hear how you're using SonarPilot and what features would help your workflow.Making Sense of Azure AI Foundry IQ
As enterprise teams build AI agents, the hardest design decisions often have nothing to do with models. Instead, they revolve around a more fundamental question: How should an agent access organizational knowledge in a way that is accurate, secure, and sustainable over time? Azure AI Foundry IQ is designed to address a specific version of that problem. It is not a general‑purpose data access layer, and it is not a replacement for every retrieval pattern. Understanding where it fits and where it does not is key to using it effectively. This post explores those boundaries and grounds them in concrete, enterprise‑relevant scenarios, before showing how Foundry IQ can be implemented directly via Azure AI Search APIs and SDKs. What Azure AI Foundry IQ Is (and Is Not): Azure AI Foundry IQ is a managed knowledge layer built on Azure AI Search. It allows you to define a knowledge base that spans multiple content sources such as SharePoint, Azure Blob Storage, OneLake, existing Azure AI Search indexes, and selected external sources and expose them through a single, permission‑aware endpoint. When an agent queries a knowledge base, Foundry IQ: Plans how the query should be executed Selects relevant knowledge sources Runs retrieval (optionally in multiple steps) Enforces user permissions Returns grounded results with citations A single knowledge base can be reused across multiple agents or applications, avoiding duplicated indexing and inconsistent retrieval logic. What Foundry IQ is not: It does not execute SQL queries, perform aggregations, or provide real‑time numeric accuracy. Foundry IQ retrieves unstructured text, not transactional or analytical data. Where Foundry IQ Is a Good Fit 1. Multi‑Source, Distributed Knowledge Foundry IQ is most valuable when relevant knowledge is spread across multiple systems. It removes the need for each agent to manage source‑specific routing and retrieval logic. This benefit increases as the number of sources grows; with a single source, the overhead is rarely justified. 2. Complex or Multi‑Part Questions Foundry IQ’s agentic retrieval model is designed for questions that require: Decomposition into sub‑questions Retrieval from multiple documents Synthesis across sources Its multi‑step retrieval approach is especially effective when a single document cannot answer the question on its own. 3. Reduced Custom Retrieval Engineering Foundry IQ automates indexing, chunking, vectorization, and orchestration across sources. This makes it a strong choice for teams that want to focus on agent behavior rather than building and maintaining custom RAG pipelines. 4. Enterprise Security and Governance Foundry IQ integrates with Microsoft Entra ID and supports document‑level permissions and Purview sensitivity labels where the underlying source allows it. This makes it suitable for internal or regulated scenarios where permission trimming is a hard requirement. 5. Shared Knowledge Across Multiple Agents A single knowledge base can serve multiple agents or applications, reducing operational overhead and ensuring consistent retrieval behavior across experiences. 6. High Emphasis on Answer Quality and Trust For scenarios where correctness, grounding, and citations matter more than latency or cost, Foundry IQ’s multi‑step retrieval consistently outperforms basic RAG approaches. Example Scenarios Where Foundry IQ Works Well Scenario A: Internal Policy and Operations Assistant An enterprise builds an internal assistant for store managers. Relevant information lives in: • HR policies in SharePoint • Safety procedures in Blob Storage • Operations manuals in OneLake Questions often span multiple documents. A single Foundry IQ knowledge base unifies these sources and enforces permissions automatically. Scenario B: Compliance or Regulatory Knowledge Assistant A compliance team needs answers strictly grounded in approved documents, with citations and access control. Foundry IQ ensures only authorized content is retrieved, reducing the risk of accidental data exposure. Scenario C: Shared Knowledge Layer for Multiple Internal Agents Multiple internal agents like chat assistants, workflow helpers, embedded copilots rely on the same procedural content. A shared knowledge base avoids duplicate indexing and centralizes governance. Where Foundry IQ Is Not a Good Fit 1. Simple or Single‑Source Q&A For a single, well‑defined source, Foundry IQ’s orchestration adds complexity without proportional benefit. 2. Structured or Analytical Data Queries Foundry IQ does not execute live queries or calculations. It retrieves text, not metrics. 3. Ultra‑Low Latency or High‑Throughput Requirements Agentic retrieval introduces LLM‑in‑the‑loop latency and token costs. For sub‑second responses at scale, simpler retrieval pipelines are more appropriate. 4. Highly Customized Retrieval Logic Foundry IQ abstracts the retrieval pipeline. If you require fine‑grained control over scoring or transformations, a fully custom search pipeline may be preferable. Example Scenarios Where Foundry IQ Is the Wrong Tool Scenario D: Sales and Inventory Analytics Agent Questions like “What were Q4 sales by region?” require live data queries. Indexing reports leads to stale answers. A direct SQL or analytics tool is the correct solution. Scenario E: High‑Volume, Low‑Latency Assistant Voice‑based assistants requiring sub‑second responses cannot tolerate the latency of agentic retrieval. A Common Architecture Pattern Most successful implementations combine: Foundry IQ for unstructured documents and policies Structured data tools for analytics and live queries An application or agent layer that routes questions based on intent This avoids forcing a single tool to solve every problem. Querying Foundry IQ Knowledge Bases Directly via Azure AI Search SDK You can query Azure AI Foundry IQ knowledge bases directly using the azure-search-documents Python SDK without using Foundry Agent Service. Your App → Azure AI Search SDK → Foundry IQ Knowledge Base → Grounded Results Ideal when you want full orchestration control while still benefiting from managed, agentic retrieval. How this works Note:It is a reference implementation Install pip install --pre azure-search-documents azure-identity Setup (High Level) Provision Azure AI Search (Basic or higher) Enable Azure AD and API key authentication Enable a system‑assigned managed identity Ingest Content via Knowledge Sources Blob Storage, SharePoint, or OneLake Index, indexer, data source, and skillset are created automatically Knowledge sources and KBs are created via REST API (2025‑11‑01‑preview) Create a Knowledge Base minimal reasoning → semantic retrieval only (no LLM) low / medium reasoning → requires Azure OpenAI model Search service MI needs Cognitive Services User Querying the Knowledge Base (Python) Initialize the Client from azure.identity import DefaultAzureCredential from azure.search.documents.knowledgebases import KnowledgeBaseRetrievalClient client = KnowledgeBaseRetrievalClient( endpoint="https://<search-service>.search.windows.net", knowledge_base_name="<kb-name>", credential=DefaultAzureCredential(), ) Minimal Reasoning (Fast, No LLM) from azure.search.documents.knowledgebases.models import ( KnowledgeBaseRetrievalRequest, KnowledgeRetrievalSemanticIntent, KnowledgeRetrievalMinimalReasoningEffort, KnowledgeRetrievalOutputMode, ) request = KnowledgeBaseRetrievalRequest( intents=[KnowledgeRetrievalSemanticIntent(search="your question here")], retrieval_reasoning_effort=KnowledgeRetrievalMinimalReasoningEffort(), output_mode=KnowledgeRetrievalOutputMode.EXTRACTIVE_DATA, ) response = client.retrieve(retrieval_request=request) Conversational Reasoning (LLM‑Backed) from azure.search.documents.knowledgebases.models import ( KnowledgeBaseRetrievalRequest, KnowledgeBaseMessage, KnowledgeBaseMessageTextContent, KnowledgeRetrievalLowReasoningEffort, KnowledgeRetrievalOutputMode, ) request = KnowledgeBaseRetrievalRequest( messages=[ KnowledgeBaseMessage( role="user", content=[KnowledgeBaseMessageTextContent(text="<first user question>")] ), KnowledgeBaseMessage( role="assistant", content=[KnowledgeBaseMessageTextContent(text="<assistant response>")] ), KnowledgeBaseMessage( role="user", content=[KnowledgeBaseMessageTextContent(text="<follow-up question>")] ), ], retrieval_reasoning_effort=KnowledgeRetrievalLowReasoningEffort(), output_mode=KnowledgeRetrievalOutputMode.EXTRACTIVE_DATA, ) response = client.retrieve(retrieval_request=request) Keep in mind: intents → minimal reasoning only messages → low / medium reasoning only They are not interchangeable. Processing the Response # Extracted content for msg in (response.response or []): for item in (msg.content or []): print(item.text) # Citations (handles blob, SharePoint, OneLake, and search index references) for ref in (response.references or []): ref_id = getattr(ref, "id", None) url = getattr(ref, "blob_url", None) or getattr(ref, "url", None) print(f"[{ref_id}] {url}") # Retrieval diagnostics for record in (response.activity or []): elapsed = getattr(record, "elapsed_ms", None) or "" print(f"{record.type}: {elapsed}ms") Output Modes Mode When to Use extractiveData Feed grounded chunks into your own LLM answerSynthesis Return a ready‑made answer with citations (LLM required) Security & Permissions RBAC: Search Index Data Reader with DefaultAzureCredential Permission trimming Must be enabled at ingestion (ingestionPermissionOptions) Enforced at query time by passing the user’s bearer token response = client.retrieve( retrieval_request=request, x_ms_query_source_authorization="Bearer <user-token>" ) Foundry IQ won't solve every retrieval problem. But when your agents need grounded, permission-aware answers from content scattered across SharePoint, Blob Storage, and OneLake, it handles the hard parts — so you can focus on what your agent actually does.