python
286 TopicsBuilding Knowledge-Grounded AI Agents with Foundry IQ
Foundry IQ now integrates with Foundry Agent Service via MCP (Model Context Protocol), enabling developers to build AI agents grounded in enterprise knowledge. This integration combines Foundry IQ’s intelligent retrieval capabilities with Foundry Agent Service’s orchestration, enabling agents to retrieve and reason over enterprise data. Key capabilities include: Auto-chunking of documents Vector embedding generation Permission-aware retrieval Semantic reranking Citation-backed responses Together, these capabilities allow AI agents to retrieve enterprise knowledge and generate responses that are accurate, traceable, and aligned with organizational permissions. Why Use Foundry IQ with Foundry Agent Service? Intelligent Retrieval Foundry IQ extends beyond traditional vector search by introducing: LLM-powered query decomposition Parallel retrieval across multiple sources Semantic reranking of results This enables agents to retrieve the most relevant enterprise knowledge even for complex queries. Permission-Aware Retrieval Agents only access content users are authorized to see. Access control lists from sources such as: SharePoint OneLake Azure Blob Storage are automatically synchronized and enforced at query time. Auto-Managed Indexing Foundry IQ automatically manages: Document chunking Vector embedding generation Indexing This eliminates the need to manually build and maintain complex ingestion pipelines. The Three Pillars of Foundry IQ 1. Knowledge Sources Foundry IQ connects to enterprise data wherever it lives — SharePoint, Azure Blob Storage, OneLake, and more. When you add a knowledge source: Auto-chunking — Documents are automatically split into optimal segments Auto-embedding — Vector embeddings are generated without manual pipelines Auto-ACL sync — Access permissions are synchronized from supported sources (SharePoint, OneLake) Auto-Purview integration — Sensitivity labels are respected from supported sources2. Knowledge Bases 2. Knowledge Bases A Knowledge Base unifies multiple sources into a single queryable index. Multiple agents can share the same knowledge base, ensuring consistent answers across your organization 3. Agentic Retrieval Agentic retrieval is an LLM-assisted retrieval pipeline that: Decomposes complex questions into subqueries Executes searches in parallel across sources Applies semantic reranking Returns a unified response with citations Agent → MCP Tool Call → Knowledge Base → Grounded Response with Citations The retrievalReasoningEffort parameter controls LLM processing: minimal — Fast queries low — Balanced reasoning medium — Complex multi-part questions Project Architecture ┌─────────────────────────────────────────────────────────────────────┐ │ FOUNDRY AGENT SERVICE │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │ │ │ Agent │───▶│ MCP Tool │───▶│ Project Connection │ │ │ │ (gpt-4.1) │ │ (knowledge_ │ │ (RemoteTool + MI Auth) │ │ │ └─────────────┘ │ base_retrieve) └─────────────────────────┘ │ └─────────────────────────────│───────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ FOUNDRY IQ (Azure AI Search) │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ MCP Endpoint: │ │ │ │ /knowledgebases/{kb-name}/mcp?api-version=2025-11-01-preview│ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │ │ │ Knowledge │ │ Knowledge │ │ Indexed Documents │ │ │ │ Sources │──│ Base │──│ (auto-chunked, │ │ │ │ (Blob, SP, etc) │ │ (unified index) │ │ auto-embedded) │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ Prerequisites Enable RBAC on Azure AI Search az search service update --name your-search --resource-group your-rg \ --auth-options aadOrApiKey Assign Role to Project's Managed Identity az role assignment create --assignee $PROJECT_MI \ --role "Search Index Data Reader" \ --scope "/subscriptions/.../Microsoft.Search/searchServices/{search}" Install Dependencies pip install azure-ai-projects>=2.0.0b4 azure-identity python-dotenv requests Connecting a Knowledge Base to an Agent The integration requires three steps. Connect Knowledge Base to Agent via MCP The integration requires three steps: Create a project connection — Links your AI Foundry project to the knowledge base using ProjectManagedIdentity authentication Create an agent with MCPTool — The agent uses knowledge_base_retrieve to query the knowledge base Chat with the agent — Use the OpenAI client to have grounded conversations Step 1: Create Project Connection import requests from azure.identity import DefaultAzureCredential, get_bearer_token_provider credential = DefaultAzureCredential() PROJECT_RESOURCE_ID = "/subscriptions/.../providers/Microsoft.CognitiveServices/accounts/.../projects/..." MCP_ENDPOINT = "https://{search}.search.windows.net/knowledgebases/{kb}/mcp?api-version=2025-11-01-preview" def create_project_connection(): """Create MCP connection to knowledge base.""" bearer = get_bearer_token_provider(credential, "https://management.azure.com/.default") response = requests.put( f"https://management.azure.com{PROJECT_RESOURCE_ID}/connections/kb-connection?api-version=2025-10-01-preview", headers={"Authorization": f"Bearer {bearer()}"}, json={ "name": "kb-connection", "properties": { "authType": "ProjectManagedIdentity", "category": "RemoteTool", "target": MCP_ENDPOINT, "isSharedToAll": True, "audience": "https://search.azure.com/", "metadata": {"ApiType": "Azure"} } } ) response.raise_for_status() Step 2: Create Agent with MCP Tool from azure.ai.projects import AIProjectClient from azure.ai.projects.models import PromptAgentDefinition, MCPTool def create_agent(): client = AIProjectClient(endpoint=PROJECT_ENDPOINT, credential=credential) # MCP tool connects agent to knowledge base mcp_kb_tool = MCPTool( server_label="knowledge-base", server_url=MCP_ENDPOINT, require_approval="never", allowed_tools=["knowledge_base_retrieve"], project_connection_id="kb-connection" ) # Create agent with knowledge base tool agent = client.agents.create_version( agent_name="enterprise-assistant", definition=PromptAgentDefinition( model="gpt-4.1", instructions="""You MUST use the knowledge_base_retrieve tool for every question. Include citations from sources.""", tools=[mcp_kb_tool] ) ) return agent, client Step 3: Chat with the Agent def chat(agent, client): openai_client = client.get_openai_client() conversation = openai_client.conversations.create() while True: question = input("You: ").strip() if question.lower() == "quit": break response = openai_client.responses.create( conversation=conversation.id, input=question, extra_body={ "agent_reference": { "name": agent.name, "type": "agent_reference" } } ) print(f"Assistant: {response.output_text}") More Information Azure AI Search Knowledge Stores Foundry Agent Service Model Context Protocol (MCP) Azure AI Projects SDK Summary The integration of Foundry IQ with Foundry Agent Service enables developers to build knowledge-grounded AI agents for enterprise scenarios. By combining: MCP-based tool calling Permission-aware retrieval Automatic document processing Semantic reranking organizations can build secure, enterprise-ready AI agents that deliver accurate, traceable responses backed by source data.Take Control of Every Message: Partial Failure Handling for Service Bus Triggers in Azure Functions
The Problem: All-or-Nothing Batch Processing in Azure Service Bus Azure Service Bus is one of the most widely used messaging services for building event-driven applications on Azure. When you use Azure Functions with a Service Bus trigger in batch mode, your function receives multiple messages at once for efficient, high-throughput processing. But what happens when one message in the batch fails? Your function receives a batch of 50 Service Bus messages. 49 process perfectly. 1 fails. What happens? In the default model, the entire batch fails. All 50 messages go back on the queue and get reprocessed, including the 49 that already succeeded. This leads to: Duplicate processing — messages that were already handled successfully get processed again Wasted compute — you pay for re-executing work that already completed Infinite retry loops — if that one "poison" message keeps failing, it blocks the entire batch indefinitely Idempotency burden — your downstream systems must handle duplicates gracefully, adding complexity to every consumer This is the classic all-or-nothing batch failure problem. Azure Functions solves it with per-message settlement. The Solution: Per-Message Settlement for Azure Service Bus Azure Functions gives you direct control over how each individual message is settled in real time, as you process it. Instead of treating the batch as all-or-nothing, you settle each message independently based on its processing outcome. With Service Bus message settlement actions in Azure Functions, you can: Action What It Does Complete Remove the message from the queue (successfully processed) Abandon Release the lock so the message returns to the queue for retry, optionally modifying application properties Dead-letter Move the message to the dead-letter queue (poison message handling) Defer Keep the message in the queue but make it only retrievable by sequence number This means in a batch of 50 messages, you can: Complete 47 that processed successfully Abandon 2 that hit a transient error (with updated retry metadata) Dead-letter 1 that is malformed and will never succeed All in a single function invocation. No reprocessing of successful messages. No building failure response objects. No all-or-nothing. Why This Matters 1. Eliminates Duplicate Processing When you complete messages individually, successfully processed messages are immediately removed from the queue. There's no chance of them being redelivered, even if other messages in the same batch fail. 2. Enables Granular Error Handling Different failures deserve different treatments. A malformed message should be dead-lettered immediately. A message that failed due to a transient database timeout should be abandoned for retry. A message that requires manual intervention should be deferred. Per-message settlement gives you this granularity. 3. Implements Exponential Backoff Without External Infrastructure By combining abandon with modified application properties, you can track retry counts per message and implement exponential backoff patterns directly in your function code, no additional queues or Durable Functions required. 4. Reduces Cost You stop paying for redundant re-execution of already-successful work. In high-throughput systems processing millions of messages, this can be a material cost reduction. 5. Simplifies Idempotency Requirements When successful messages are never redelivered, your downstream systems don't need to guard against duplicates as aggressively. This reduces architectural complexity and potential for bugs. Before: One Message = One Function Invocation Before batch support, there was no cardinality option, Azure Functions processed each Service Bus message as a separate function invocation. If your queue had 50 messages, the runtime spun up 50 individual executions. Single-Message Processing (The Old Way) import { app, InvocationContext } from '@azure/functions'; async function processOrder( message: unknown, // ← One message at a time, no batch context: InvocationContext ): Promise<void> { try { const order = message as Order; await processOrder(order); } catch (error) { context.error('Failed to process message:', error); // Message auto-complete by default. throw error; } } app.serviceBusQueue('processOrder', { connection: 'ServiceBusConnection', queueName: 'orders-queue', handler: processOrder, }); What this cost you: 50 messages on the queue Old (single-message) New (batch + settlement) Function invocations 50 separate invocations 1 invocation Connection overhead 50 separate DB/API connections 1 connection, reused across batch Compute cost 50× invocation overhead 1× invocation overhead Settlement control Binary: throw or don't 4 actions per message Every message paid the full price of a function invocation, startup, connection setup, teardown. At scale (millions of messages/day), this was a significant cost and latency penalty. And when a message failed, your only option was to throw (retry the whole message) or swallow the error (lose it silently). Code Examples Let's see how this looks across all three major Azure Functions language stacks. Node.js (TypeScript with @ azure/functions-extensions-servicebus) import '@azure/functions-extensions-servicebus'; import { app, InvocationContext } from '@azure/functions'; import { ServiceBusMessageContext, messageBodyAsJson } from '@azure/functions-extensions-servicebus'; interface Order { id: string; product: string; amount: number; } export async function processOrderBatch( sbContext: ServiceBusMessageContext, context: InvocationContext ): Promise<void> { const { messages, actions } = sbContext; for (const message of messages) { try { const order = messageBodyAsJson<Order>(message); await processOrder(order); await actions.complete(message); // ✅ Done } catch (error) { context.error(`Failed ${message.messageId}:`, error); await actions.deadletter(message); // ☠️ Poison } } } app.serviceBusQueue('processOrderBatch', { connection: 'ServiceBusConnection', queueName: 'orders-queue', sdkBinding: true, autoCompleteMessages: false, cardinality: 'many', handler: processOrderBatch, }); Key points: Enable sdkBinding: true and autoCompleteMessages: false to gain manual settlement control ServiceBusMessageContext provides both the messages array and actions object Settlement actions: complete(), abandon(), deadletter(), defer() Application properties can be passed to abandon() for retry tracking Built-in helpers like messageBodyAsJson<T>() handle Buffer-to-object parsing Full sample: serviceBusSampleWithComplete Python (V2 Programming Model) import json import logging from typing import List import azure.functions as func import azurefunctions.extensions.bindings.servicebus as servicebus app = func.FunctionApp(http_auth_level=func.AuthLevel.FUNCTION) @app.service_bus_queue_trigger(arg_name="messages", queue_name="orders-queue", connection="SERVICEBUS_CONNECTION", auto_complete_messages=False, cardinality="many") def process_order_batch(messages: List[servicebus.ServiceBusReceivedMessage], message_actions: servicebus.ServiceBusMessageActions): for message in messages: try: order = json.loads(message.body) process_order(order) message_actions.complete(message) # ✅ Done except Exception as e: logging.error(f"Failed {message.message_id}: {e}") message_actions.dead_letter(message) # ☠️ Poison def process_order(order): logging.info(f"Processing order: {order['id']}") Key points: Uses azurefunctions.extensions.bindings.servicebus for SDK-type bindings with ServiceBusReceivedMessage Supports both queue and topic triggers with cardinality="many" for batch processing Each message exposes SDK properties like body, enqueued_time_utc, lock_token, message_id, and sequence_number Full sample: servicebus_samples_settlement .NET (C# Isolated Worker) using Azure.Messaging.ServiceBus; using Microsoft.Azure.Functions.Worker; public class ServiceBusBatchProcessor(ILogger<ServiceBusBatchProcessor> logger) { [Function(nameof(ProcessOrderBatch))] public async Task ProcessOrderBatch( [ServiceBusTrigger("orders-queue", Connection = "ServiceBusConnection")] ServiceBusReceivedMessage[] messages, ServiceBusMessageActions messageActions) { foreach (var message in messages) { try { var order = message.Body.ToObjectFromJson<Order>(); await ProcessOrder(order); await messageActions.CompleteMessageAsync(message); // ✅ Done } catch (Exception ex) { logger.LogError(ex, "Failed {MessageId}", message.MessageId); await messageActions.DeadLetterMessageAsync(message); // ☠️ Poison } } } private Task ProcessOrder(Order order) => Task.CompletedTask; } public record Order(string Id, string Product, decimal Amount); Key points: Inject ServiceBusMessageActions directly alongside the message array Each message is individually settled with CompleteMessageAsync, DeadLetterMessageAsync, or AbandonMessageAsync Application properties can be modified on abandon to track retry metadata Full sample: ServiceBusReceivedMessageFunctions.cs198Views0likes0CommentsHosted Containers and AI Agent Solutions
If you have built a proof-of-concept AI agent on your laptop and wondered how to turn it into something other people can actually use, you are not alone. The gap between a working prototype and a production-ready service is where most agent projects stall. Hosted containers close that gap faster than any other approach available today. This post walks through why containers and managed hosting platforms like Azure Container Apps are an ideal fit for multi-agent AI systems, what practical benefits they unlock, and how you can get started with minimal friction. The problem with "it works on my machine" Most AI agent projects begin the same way: a Python script, an API key, and a local terminal. That workflow is perfect for experimentation, but it creates a handful of problems the moment you try to share your work. First, your colleagues need the same Python version, the same dependencies, and the same environment variables. Second, long-running agent pipelines tie up your machine and compete with everything else you are doing. Third, there is no reliable URL anyone can visit to use the system, which means every demo involves a screen share or a recorded video. Containers solve all three problems in one step. A single Dockerfile captures the runtime, the dependencies, and the startup command. Once the image builds, it runs identically on any machine, any cloud, or any colleague's laptop. Why containers suit AI agents particularly well AI agents have characteristics that make them a better fit for containers than many traditional web applications. Long, unpredictable execution times A typical web request completes in milliseconds. An agent pipeline that retrieves context from a database, imports a codebase, runs four verification agents in sequence, and generates a report can take two to five minutes. Managed container platforms handle long-running requests gracefully, with configurable timeouts and automatic keep-alive, whereas many serverless platforms impose strict execution limits that agent workloads quickly exceed. Heavy, specialised dependencies Agent applications often depend on large packages: machine learning libraries, language model SDKs, database drivers, and Git tooling. A container image bundles all of these once at build time. There is no cold-start dependency resolution and no version conflict with other projects on the same server. Stateless by design Most agent pipelines are stateless. They receive a request, execute a sequence of steps, and return a result. This maps perfectly to the container model, where each instance handles requests independently and the platform can scale the number of instances up or down based on demand. Reproducible environments When an agent misbehaves in production, you need to reproduce the issue locally. With containers, the production environment and the local environment are the same image. There is no "works on my machine" ambiguity. A real example: multi-agent code verification To make this concrete, consider a system called Opustest, an open-source project that uses the Microsoft Agent Framework with Azure OpenAI to analyse Python codebases automatically. The system runs AI agents in a pipeline: A Code Example Retrieval Agent queries Azure Cosmos DB for curated examples of good and bad Python code, providing the quality standards for the review. A Codebase Import Agent reads all Python files from a Git repository cloned on the server. Four Verification Agents each score a different dimension of code quality (coding standards, functional correctness, known error handling, and unknown error handling) on a scale of 0 to 5. A Report Generation Agent compiles all scores and errors into an HTML report with fix prompts that can be exported and fed directly into a coding assistant. The entire pipeline is orchestrated by a FastAPI backend that streams progress updates to the browser via Server-Sent Events. Users paste a Git URL, watch each stage light up in real time, and receive a detailed report at the end. The app in action Landing page: the default Git URL mode, ready for a repository link. Local Path mode: toggling to analyse a codebase from a local directory. Repository URL entered: a GitHub repository ready for verification. Stage 1: the Code Example Retrieval Agent fetching standards from Cosmos DB. Stage 3: the four Verification Agents scoring the codebase. Stage 4: the Report Generation Agent compiling the final report. Verification complete: all stages finished with a success banner. Report detail: scores and the errors table with fix prompts. The Dockerfile The container definition for this system is remarkably simple: FROM python:3.12-slim RUN apt-get update && apt-get install -y --no-install-recommends git \ && rm -rf /var/lib/apt/lists/* WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY backend/ backend/ COPY frontend/ frontend/ RUN adduser --disabled-password --gecos "" appuser USER appuser EXPOSE 8000 CMD ["uvicorn", "backend.app:app", "--host", "0.0.0.0", "--port", "8000"] Twenty lines. That is all it takes to package a six-agent AI system with a web frontend, a FastAPI backend, Git support, and all Python dependencies into a portable, production-ready image. Notice the security detail: the container runs as a non-root user. This is a best practice that many tutorials skip, but it matters when you are deploying to a shared platform. From image to production in one command With the Azure Developer CLI ( azd ), deploying this container to Azure Container Apps takes a single command: azd up Behind the scenes, azd reads an azure.yaml file that declares the project structure, provisions the infrastructure defined in Bicep templates (a Container Apps environment, an Azure Container Registry, and a Cosmos DB account), builds the Docker image, pushes it to the registry, deploys it to the container app, and even seeds the database with sample data via a post-provision hook. The result is a publicly accessible URL serving the full agent system, with automatic HTTPS, built-in scaling, and zero infrastructure to manage manually. Microsoft Hosted Agents vs Azure Container Apps: choosing the right home Microsoft offers two distinct approaches for running AI agent workloads in the cloud. Understanding the difference is important when deciding how to host your solution. Microsoft Foundry Hosted Agent Service (Microsoft Foundry) Microsoft Foundry provides a fully managed agent hosting service. You define your agent's behaviour declaratively, upload it to the platform, and Foundry handles execution, scaling, and lifecycle management. This is an excellent choice when your agents fit within the platform's conventions: single-purpose agents that respond to prompts, use built-in tool integrations, and do not require custom server-side logic or a bespoke frontend. Key characteristics of hosted agents in Foundry: Fully managed execution. You do not provision or maintain any infrastructure. The platform runs your agent and handles scaling automatically. Declarative configuration. Agents are defined through configuration and prompt templates rather than custom application code. Built-in tool ecosystem. Foundry provides pre-built connections to Azure services, knowledge stores, and evaluation tooling. Opinionated runtime. The platform controls the execution environment, request handling, and networking. Azure Container Apps Azure Container Apps is a managed container hosting platform. You package your entire application (agents, backend, frontend, and all dependencies) into a Docker image and deploy it. The platform handles scaling, HTTPS, and infrastructure, but you retain full control over what runs inside the container. Key characteristics of Container Apps: Full application control. You own the runtime, the web framework, the agent orchestration logic, and the frontend. Custom networking. You can serve a web UI, expose REST APIs, stream Server-Sent Events, or run WebSocket connections. Arbitrary dependencies. Your container can include any system package, any Python library, and any tooling (like Git for cloning repositories). Portable. The same Docker image runs locally, in CI, and in production without modification. Why Opustest uses Container Apps Opustest requires capabilities that go beyond what a managed agent hosting platform provides: Requirement Hosted Agents (Foundry) Container Apps Custom web UI with real-time progress Not supported natively Full control via FastAPI and SSE Multi-agent orchestration pipeline Platform-managed, limited customisation Custom orchestrator with arbitrary logic Git repository cloning on the server Not available Install Git in the container image Server-Sent Events streaming Not supported Full HTTP control Custom HTML report generation Limited to platform outputs Generate and serve any content Export button for Copilot prompts Not available Custom frontend with JavaScript RAG retrieval from Cosmos DB Possible via built-in connectors Direct SDK access with full query control The core reason is straightforward: Opustest is not just a set of agents. It is a complete web application that happens to use agents as its processing engine. It needs a custom frontend, real-time streaming, server-side Git operations, and full control over how the agent pipeline executes. Container Apps provides all of this while still offering managed infrastructure, automatic scaling, and zero server maintenance. When to choose which Choose Microsoft Hosted Agents when your use case is primarily conversational or prompt-driven, when you want the fastest path to a working agent with minimal code, and when the built-in tool ecosystem covers your integration needs. Choose Azure Container Apps when you need a custom frontend, custom orchestration logic, real-time streaming, server-side processing beyond prompt-response patterns, or when your agent system is part of a larger application with its own web server and API surface. Both approaches use the same underlying AI models via Azure OpenAI. The difference is in how much control you need over the surrounding application. Five practical benefits of hosted containers for agents 1. Consistent deployments across environments Whether you are running the container locally with docker run , in a CI pipeline, or on Azure Container Apps, the behaviour is identical. Configuration differences are handled through environment variables, not code changes. This eliminates an entire category of "it works locally but breaks in production" bugs. 2. Scaling without re-architecture Azure Container Apps can scale from zero instances (paying nothing when idle) to multiple instances under load. Because agent pipelines are stateless, each request is routed to whichever instance is available. You do not need to redesign your application to handle concurrency; the platform does it for you. 3. Isolation between services If your agent system grows to include multiple services (perhaps a separate service for document processing or a background worker for batch analysis), each service gets its own container. They can be deployed, scaled, and updated independently. A bug in one service does not bring down the others. 4. Built-in observability Managed container platforms provide logging, metrics, and health checks out of the box. When an agent pipeline fails after three minutes of execution, you can inspect the container logs to see exactly which stage failed and why, without adding custom logging infrastructure. 5. Infrastructure as code The entire deployment can be defined in code. Bicep templates, Terraform configurations, or Pulumi programmes describe every resource. This means deployments are repeatable, reviewable, and version-controlled alongside your application code. No clicking through portals, no undocumented manual steps. Common concerns addressed "Containers add complexity" For a single-file script, this is a fair point. But the moment your agent system has more than one dependency, a Dockerfile is simpler to maintain than a set of installation instructions. It is also self-documenting: anyone reading the Dockerfile knows exactly what the system needs to run. "Serverless is simpler" Serverless functions are excellent for short, event-driven tasks. But agent pipelines that run for minutes, require persistent connections (like SSE streaming), and depend on large packages are a poor fit for most serverless platforms. Containers give you the operational simplicity of managed hosting without the execution constraints. "I do not want to learn Docker" A basic Dockerfile for a Python application is fewer than ten lines. The core concepts are straightforward: start from a base image, install dependencies, copy your code, and specify the startup command. The learning investment is small relative to the deployment problems it solves. "What about cost?" Azure Container Apps supports scale-to-zero, meaning you pay nothing when the application is idle. For development and demonstration purposes, this makes hosted containers extremely cost-effective. You only pay for the compute time your agents actually use. Getting started: a practical checklist If you are ready to containerise your own agent solution, here is a step-by-step approach. Step 1: Write a Dockerfile. Start from an official Python base image. Install system-level dependencies (like Git, if your agents clone repositories), then your Python packages, then your application code. Run as a non-root user. Step 2: Test locally. Build and run the image on your machine: docker build -t my-agent-app . docker run -p 8000:8000 --env-file .env my-agent-app If it works locally, it will work in the cloud. Step 3: Define your infrastructure. Use Bicep, Terraform, or the Azure Developer CLI to declare the resources you need: a container app, a container registry, and any backing services (databases, key vaults, AI endpoints). Step 4: Deploy. Push your image to the registry and deploy to the container platform. With azd , this is a single command. With CI/CD, it is a pipeline that runs on every push to your main branch. Step 5: Iterate. Change your agent code, rebuild the image, and redeploy. The cycle is fast because Docker layer caching means only changed layers are rebuilt. The broader picture The AI agent ecosystem is maturing rapidly. Frameworks like Microsoft Agent Framework, LangChain, Semantic Kernel, and AutoGen make it straightforward to build sophisticated multi-agent systems. But building is only half the challenge. The other half is running these systems reliably, securely, and at scale. Hosted containers offer the best balance of flexibility and operational simplicity for agent workloads. They do not impose the execution limits of serverless platforms. They do not require the operational overhead of managing virtual machines. They give you a portable, reproducible unit of deployment that works the same everywhere. If you have an agent prototype sitting on your laptop, the path to making it available to your team, your organisation, or the world is shorter than you think. Write a Dockerfile, define your infrastructure, run azd up , and share the URL. Your agents deserve a proper home. Hosted containers are that home. Resources Azure Container Apps documentation Microsoft Foundry Hosted Agents Azure Developer CLI (azd) Microsoft Agent Framework Docker getting started guide Opustest: AI-powered code verification (source code)Announcing the IQ Series: Foundry IQ
AI agents are rapidly becoming a new way to build applications. But for agents to be truly useful, they need access to the knowledge and context that helps them reason about the world they operate in. That’s where Foundry IQ comes in. Today we’re announcing the IQ Series: Foundry IQ, a new set of developer-focused episodes exploring how to build knowledge-centric AI systems using Foundry IQ. The series focuses on the core ideas behind how modern AI systems work with knowledge, how they retrieve information, reason across sources, synthesize answers, and orchestrate multi-step interactions. Instead of treating retrieval as a single step in a pipeline, Foundry IQ approaches knowledge as something that AI systems actively work with throughout the reasoning process. The IQ Series breaks down these concepts and shows how they come together when building real AI applications. You can explore the series and all the accompanying samples here: 👉 https://aka.ms/iq-series What is Foundry IQ? Foundry IQ helps AI systems work with knowledge in a more structured and intentional way. Rather than wiring retrieval logic directly into every application, developers can define knowledge bases that connect to documents, data sources, and other information systems. AI agents can then query these knowledge bases to gather the context they need to generate responses, make decisions, or complete tasks. This model allows knowledge to be organized, reused, and combined across applications, instead of being rebuilt for each new scenario. What's covered in the IQ Series? The Foundry IQ episodes in the IQ Series explore the key building blocks behind knowledge-driven AI systems from how knowledge enters the system to how agents ultimately query and use it. The series is released as three weekly episodes: Foundry IQ: Unlocking Knowledge for Your Agents — March 18, 2026: Introduces Foundry IQ and the core ideas behind it. The episode explains how AI agents work with knowledge and walks through the main components of the Foundry IQ that support knowledge-driven applications. Foundry IQ: Building the Data Pipeline with Knowledge Sources — March 25, 2026: Focuses on Knowledge Sources and how different types of content flow into Foundry IQ. It explores how systems such as SharePoint, Fabric, OneLake, Azure Blob Storage, Azure AI Search, and the web contribute information that AI systems can later retrieve and use. Foundry IQ: Querying the Multi-Source AI Knowledge Bases — April 1, 2026: Dives into the Knowledge Bases and how multiple knowledge sources can be organized behind a single endpoint. The episode demonstrates how AI systems query across these sources and synthesize information to answer complex questions. Each episode includes a short executive introduction, a tech talk exploring the topic in depth, and a visual recap with doodle summaries of the key ideas. Alongside the episodes, the GitHub repository provides cookbooks with sample code, summary of the episodes, and additinal learning resources, so developers can explore the concepts and apply them in their own projects. Explore the Repo All episodes and supporting materials live in the IQ Series repository: 👉 https://aka.ms/iq-series Inside the repository you’ll find: The Foundry IQ episode links Cookbooks for each episode Links to documentation and additional resources If you're building AI agents or exploring how AI systems can work with knowledge, the IQ Series is a great place to start. Watch the episodes and explore the cookbooks! We’re excited to see what you build and welcome your feedback & ideas as the series evolves.Learn how to build agents and workflows in Python
We just concluded Python + Agents, a six-part livestream series where we explored the foundational concepts behind building AI agents in Python using the Microsoft Agent Framework: Using agents with tools, MCP servers, and subagents Adding context to agents with database calls and long-term memory with Redis or Mem0 Monitoring using OpenTelemetry and evaluating quality with the Azure AI Evaluation SDK AI-driven workflows with conditional branching, structured outputs, and multi-agent orchestration Adding human-in-the-loop with tool approval and checkpoints All of the materials from our series are available for you to keep learning from, and linked below: Video recordings of each stream Powerpoint slides that you can use for reviewing or even teaching the material to your own community Open-source code samples you can run yourself using frontier LLMs from GitHub Models or Microsoft Foundry Models Spanish speaker? Check out the Spanish version of the series. 🙋🏽♂️ Have follow up questions? Join the weekly Python+AI office hours on Foundry Discord or the weekly Agent Framework office hours. Building your first agent in Python 📺 Watch YouTube recording In the first session of our Python + Agents series, we'll kick things off with the fundamentals: what AI agents are, how they work, and how to build your first one using the Microsoft Agent Framework. We'll start with the core anatomy of an agent, then walk through how tool calling works in practice—beginning with a single tool, expanding to multiple tools, and finally connecting to tools exposed through local MCP servers. We'll conclude with the supervisor agent pattern, where a single supervisor agent coordinates subtasks across multiple subagents, by treating each agent as a tool. Along the way, we'll share tips for debugging and inspecting agents, like using the DevUI interface from Microsoft Agent Framework for interacting with agent prototypes. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session Adding context and memory to agents 📺 Watch YouTube recording In the second session of our Python + Agents series, we'll extend agents built with the Microsoft Agent Framework by adding two essential capabilities: context and memory. We'll begin with context, commonly known as Retrieval‑Augmented Generation (RAG), and show how agents can ground their responses using knowledge retrieved from local data sources such as SQLite or PostgreSQL. This enables agents to provide accurate, domain‑specific answers based on real information rather than model hallucination. Next, we'll explore memory—both short‑term, thread‑level context and long‑term, persistent memory. You'll see how agents can store and recall information using solutions like Redis or open‑source libraries such as Mem0, enabling them to remember previous interactions, user preferences, and evolving tasks across sessions. By the end, you'll understand how to build agents that are not only capable but context‑aware and memory‑efficient, resulting in richer, more personalized user experiences. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session Monitoring and evaluating agents 📺 Watch YouTube recording In the third session of our Python + Agents series, we'll focus on two essential components of building reliable agents: observability and evaluation. We'll begin with observability, using OpenTelemetry to capture traces, metrics, and logs from agent actions. You'll learn how to instrument your agents and use a local Aspire dashboard to identify slowdowns and failures. From there, we'll explore how to evaluate agent behavior using the Azure AI Evaluation SDK. You'll see how to define evaluation criteria, run automated assessments over a set of tasks, and analyze the results to measure accuracy, helpfulness, and task success. By the end of the session, you'll have practical tools and workflows for monitoring, measuring, and improving your agents—so they're not just functional, but dependable and verifiably effective. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session Building your first AI-driven workflows 📺 Watch YouTube recording In Session 4 of our Python + Agents series, we'll explore the foundations of building AI‑driven workflows using the Microsoft Agent Framework: defining workflow steps, connecting them, passing data between them, and introducing simple ways to guide the path a workflow takes. We'll begin with a conceptual overview of workflows and walk through their core components: executors, edges, and events. You'll learn how workflows can be composed of simple Python functions or powered by full AI agents when a step requires model‑driven behavior. From there, we'll dig into conditional branching, showing how workflows can follow different paths depending on model outputs, intermediate results, or lightweight decision functions. We'll introduce structured outputs as a way to make branching more reliable and easier to maintain—avoiding vague string checks and ensuring that workflow decisions are based on clear, typed data. We'll discover how the DevUI interface makes it easier to develop workflows by visualizing the workflow graph and surfacing the streaming events during a workflow's execution. Finally, we'll dive into an E2E demo application that uses workflows inside a user-facing application with a frontend and backend. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session Orchestrating advanced multi-agent workflows 📺 Watch YouTube recording In Session 5 of our Python + Agents series, we'll go beyond workflow fundamentals and explore how to orchestrate advanced, multi‑agent workflows using the Microsoft Agent Framework. This session focuses on patterns that coordinate multiple steps or multiple agents at once, enabling more powerful and flexible AI‑driven systems. We'll begin by comparing sequential vs. concurrent execution, then dive into techniques for running workflow steps in parallel. You'll learn how fan‑out and fan‑in edges enable multiple branches to run at the same time, how to aggregate their results, and how concurrency allows workflows to scale across tasks efficiently. From there, we'll introduce two multi‑agent orchestration approaches that are built into the framework. We'll start with handoff, where control moves entirely from one agent to another based on workflow logic, which is useful for routing tasks to the right agent as the workflow progresses. We'll then look at Magentic, a planning‑oriented supervisor that generates a high‑level plan for completing a task and delegates portions of that plan to other agents. Finally, we'll wrap up with a demo of an E2E application that showcases a concurrent multi-agent workflow in action. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this session Adding a human in the loop to agentic workflows 📺 Watch YouTube recording In the final session of our Python + Agents series, we'll explore how to incorporate human‑in‑the‑loop (HITL) interactions into agentic workflows using the Microsoft Agent Framework. This session focuses on adding points where a workflow can pause, request input or approval from a user, and then resume once the human has responded. HITL is especially important because LLMs can produce uncertain or inconsistent outputs, and human checkpoints provide an added layer of accuracy and oversight. We'll begin with the framework's requests‑and‑responses model, which provides a structured way for workflows to ask questions, collect human input, and continue execution with that data. We'll move onto tool approval, one of the most frequent reasons an agent requests input from a human, and see how workflows can surface pending tool calls for approval or rejection. Next, we'll cover checkpoints and resuming, which allow workflows to pause and be restarted later. This is especially important for HITL scenarios where the human may not be available immediately. We'll walk through examples that demonstrate how checkpoints store progress, how resuming picks up the workflow state, and how this mechanism supports longer‑running or multi‑step review cycles. This session brings together everything from the series—agents, workflows, branching, orchestration—and shows how to integrate humans thoughtfully into AI‑driven processes, especially when reliability and judgment matter most. 🖼️ Slides for this session 💻 Code repository with examples: python-agentframework-demos 📝 Write-up for this sessionFrom Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry
From Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry Agentic AI is no longer a future concept — it’s quickly becoming the backbone of intelligent, action-oriented applications. But while it’s easy to prototype an AI agent, taking it all the way to production requires much more than a clever prompt. In this blog post - and the accompanying video tutorial - we walk through the end-to-end journey of an AI engineer building, testing, and operationalizing a hosted AI agent using AI Toolkit in Visual Studio Code and Microsoft Foundry. The goal is to show not just how to build an agent, but how to do it in a way that’s scalable, testable, and production ready. The scenario: a retail agent for sales and inventory insights To make things concrete, the demo uses a fictional DIY and home‑improvement retailer called Zava. The objective is to build an AI agent that can assist the internal team in: Analyzing sales data (e.g. reason over a product catalog, identify top‑selling categories, etc.) Managing inventory (e.g. Detect products running low on stock, trigger restock actions, etc.) Chapter 1 (min 00:00 – 01:20): Model selection with GitHub Copilot and AI Toolkit The journey starts in Visual Studio Code, using GitHub Copilot together with the AI Toolkit. Instead of picking a model arbitrarily, we: Describe the business scenario in natural language Ask Copilot to perform a comparative analysis between two candidate models Define explicit evaluation criteria (reasoning quality, tool support, suitability for analytics) Copilot leverages AI Toolkit skills to explain why one model is a better fit than the other — turning model selection into a transparent, repeatable decision. To go deeper, we explore the AI Toolkit Model Catalog, which lets you: Browse hundreds of models Filter by hosting platform (GitHub, Microsoft Foundry, local) Filter by publisher (open‑source and proprietary) Once the right model is identified, we deploy it to Microsoft Foundry with a single click and validate it with test prompts. Chapter 2 (min 01:20 – 02:48): Rapid agent prototyping with Agent Builder UI With the model ready, it’s time to build the agent. Using the Agent Builder UI, we configure: The agent’s identity (name, role, responsibilities) Instructions that define tone, behavior, and scope The model the agent runs on The tools and data sources it can access For this scenario, we add: File search, grounded on uploaded sales logs and a product catalog Code interpreter, enabling the agent to compute metrics, generate charts, and write reports We can then test the agent in the right-side playground by asking business questions like: “What were the top three selling categories in 2025?” The response is not generic — it’s grounded in the retailer’s data, and you can inspect which tools and data were used to produce the answer. The Agent Builder also provides local evaluation and tracing functionalities. Chapter 3 (min 02:48 – 04:04): From UI prototype to hosted agent code UI-based prototyping is powerful, but real solutions often require custom logic. This is where we transition from prototype to production by using a built-in workflow to migrate from UI to a hosted agent template The result is a production-ready scaffold that includes: Agent code (built with Microsoft Agent Framework; you can choose between Python or C#) A YAML-based agent definition Container configuration files From here, we extend the agent with custom functions — for example, to create and manage restock orders. GitHub Copilot helps accelerate this step by adapting the template to the Zava business scenario. Chapter 4 (min 04:04 – 05:12): Local debugging and cloud deployment Before deploying, we test the agent locally: Ask it to identify products running out of stock Trigger a restock action using the custom function Debug the full tool‑calling flow end to end Once validated, we deploy the agent to Microsoft Foundry. By deploying the agent to the Cloud, we don’t just get compute power, but a whole set of built-in features to operationalize our solution and maintain it in production. Chapter 5 (min 05:12 – 08:04): Evaluation, safety, and monitoring in Foundry Production readiness doesn’t stop at deployment. In the Foundry portal, we explore: Evaluation runs, using both real and synthetic datasets LLM‑based judges that score responses across multiple metrics, with explanations Red teaming, where an adversarial agent probes for unsafe or undesired behavior Monitoring dashboards, tracking usage, latency, regressions, and cost across the agent fleet These capabilities make it possible to move from ad‑hoc testing to continuous quality and safety assessment. Why this workflow matters This end-to-end flow demonstrates a key idea: Agentic AI isn’t just about building agents — it’s about operating them responsibly at scale. By combining AI Toolkit in VS Code with Microsoft Foundry, you get: A smooth developer experience Clear separation between experimentation and production Built‑in evaluation, safety, and observability Resources Demo Sample: GitHub Repo Foundry tutorials: Inside Microsoft Foundry - YouTubeA Practical Path Forward for Heroku Customers with Azure
On February 6, 2026, Heroku announced it is moving to a sustaining engineering model focused on stability, security, reliability, and ongoing support. Many customers are now reassessing how their application platforms will support today’s workloads and future innovation. Microsoft is committed to helping customers migrate and modernize applications from platforms like Heroku to Azure.184Views0likes0CommentsBuilding a Multi-Agent On-Call Copilot with Microsoft Agent Framework
Four AI agents, one incident payload, structured triage in under 60 seconds powered by Microsoft Agent Framework and Foundry Hosted Agents. Multi-Agent Microsoft Agent Framework Foundry Hosted Agents Python SRE / Incident Response When an incident fires at 3 AM, every second the on-call engineer spends piecing together alerts, logs, and metrics is a second not spent fixing the problem. What if an AI system could ingest the raw incident signals and hand you a structured triage, a Slack update, a stakeholder brief, and a draft post-incident report, all in under 10 seconds? That’s exactly what On-Call Copilot does. In this post, we’ll walk through how we built it using the Microsoft Agent Framework, deployed it as a Foundry Hosted Agent, and discuss the key design decisions that make multi-agent orchestration practical for production workloads. The full source code is open-source on GitHub. You can deploy your own instance with a single azd up . Why Multi-Agent? The Problem with Single-Prompt Triage Early AI incident assistants used a single large prompt: “Here is the incident. Give me root causes, actions, a Slack message, and a post-incident report.” This approach has two fundamental problems: Context overload. A real incident may have 800 lines of logs, 10 alert lines, and dense metrics. Asking one model to process everything and produce four distinct output formats in a single turn pushes token limits and degrades quality. Conflicting concerns. Triage reasoning and communication drafting are cognitively different tasks. A model optimised for structured JSON analysis often produces stilted Slack messages—and vice versa. The fix is specialisation: decompose the task into focused agents, give each agent a narrow instruction set, and run them in parallel. This is the core pattern that the Microsoft Agent Framework makes easy. Architecture: Four Agents Running Concurrently On-Call Copilot is deployed as a Foundry Hosted Agent—a containerised Python service running on Microsoft Foundry’s managed infrastructure. The core orchestrator uses ConcurrentBuilder from the Microsoft Agent Framework SDK to run four specialist agents in parallel via asyncio.gather() . All four panels populated simultaneously: Triage (red), Summary (blue), Comms (green), PIR (purple). Architecture: The orchestrator runs four specialist agents concurrently via asyncio.gather(), then merges their JSON fragments into a single response. All four agents The solution share a single Azure OpenAI Model Router deployment. Rather than hardcoding gpt-4o or gpt-4o-mini , Model Router analyses request complexity and routes automatically. A simple triage prompt costs less; a long post-incident synthesis uses a more capable model. One deployment name, zero model-selection code. Meet the Four Agents 🔍 Triage Agent Root cause analysis, immediate actions, missing data identification, and runbook alignment. suspected_root_causes · immediate_actions · missing_information · runbook_alignment 📋 Summary Agent Concise incident narrative: what happened and current status (ONGOING / MITIGATED / RESOLVED). summary.what_happened · summary.current_status 📢 Comms Agent Audience-appropriate communications: Slack channel update with emoji conventions, plus a non-technical stakeholder brief. comms.slack_update · comms.stakeholder_update 📝 PIR Agent Post-incident report: chronological timeline, quantified customer impact, and specific prevention actions. post_incident_report.timeline · .customer_impact · .prevention_actions The Code: Building the Orchestrator The entry point is remarkably concise. ConcurrentBuilder handles all the async wiring—you just declare the agents and let the framework handle parallelism, error propagation, and response merging. main.py — Orchestrator from agent_framework import ConcurrentBuilder from agent_framework.azure import AzureOpenAIChatClient from azure.ai.agentserver.agentframework import from_agent_framework from azure.identity import DefaultAzureCredential, get_bearer_token_provider from app.agents.triage import TRIAGE_INSTRUCTIONS from app.agents.comms import COMMS_INSTRUCTIONS from app.agents.pir import PIR_INSTRUCTIONS from app.agents.summary import SUMMARY_INSTRUCTIONS _credential = DefaultAzureCredential() _token_provider = get_bearer_token_provider( _credential, "https://cognitiveservices.azure.com/.default" ) def create_workflow_builder(): """Create 4 specialist agents and wire them into a ConcurrentBuilder.""" triage = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=TRIAGE_INSTRUCTIONS, name="triage-agent", ) summary = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=SUMMARY_INSTRUCTIONS, name="summary-agent", ) comms = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=COMMS_INSTRUCTIONS, name="comms-agent", ) pir = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=PIR_INSTRUCTIONS, name="pir-agent", ) return ConcurrentBuilder().participants([triage, summary, comms, pir]) def main(): builder = create_workflow_builder() from_agent_framework(builder.build).run() # starts on port 8088 if __name__ == "__main__": main() Key insight: DefaultAzureCredential means there are no API keys anywhere in the codebase. The container uses managed identity in production; local development uses your az login session. The same code runs in both environments without modification. Agent Instructions: Prompts as Configuration Each agent receives a tightly scoped system prompt that defines its output schema and guardrails. Here’s the Triage Agent—the most complex of the four: app/agents/triage.py TRIAGE_INSTRUCTIONS = """\ You are the **Triage Agent**, an expert Site Reliability Engineer specialising in root cause analysis and incident response. ## Task Analyse the incident data and return a single JSON object with ONLY these keys: { "suspected_root_causes": [ { "hypothesis": "string – concise root cause hypothesis", "evidence": ["string – supporting evidence from the input"], "confidence": 0.0 // 0-1, how confident you are } ], "immediate_actions": [ { "step": "string – concrete action with runnable command if applicable", "owner_role": "oncall-eng | dba | infra-eng | platform-eng", "priority": "P0 | P1 | P2 | P3" } ], "missing_information": [ { "question": "string – what data is missing", "why_it_matters": "string – why this data would help" } ], "runbook_alignment": { "matched_steps": ["string – runbook steps that match the situation"], "gaps": ["string – gaps or missing runbook coverage"] } } ## Guardrails 1. **No secrets** – redact any credential-like material as [REDACTED]. 2. **No hallucination** – if data is insufficient, set confidence to 0 and add entries to missing_information. 3. **Diagnostic suggestions** – when data is sparse, include diagnostic steps in immediate_actions. 4. **Structured output only** – return ONLY valid JSON, no prose. """ The Comms Agent follows the same pattern but targets a different audience: app/agents/comms.py COMMS_INSTRUCTIONS = """\ You are the **Comms Agent**, an expert incident communications writer. ## Task Return a single JSON object with ONLY this key: { "comms": { "slack_update": "Slack-formatted message with emoji, severity, status, impact, next steps, and ETA", "stakeholder_update": "Non-technical summary for executives. Focus on business impact and resolution." } } ## Guidelines - Slack: Use :rotating_light: for active SEV1/2, :warning: for degraded, :white_check_mark: for resolved. - Stakeholder: No jargon. Translate to business impact. - Tone: Calm, factual, action-oriented. Never blame individuals. - Structured output only – return ONLY valid JSON, no prose. """ Instructions as config, not code. Agent behaviour is defined entirely by instruction text strings. A non-developer can refine agent behaviour by editing the prompt and redeploying no Python changes needed. The Incident Envelope: What Goes In The agent accepts a single JSON envelope. It can come from a monitoring alert webhook, a PagerDuty payload, or a manual CLI invocation: Incident Input (JSON) { "incident_id": "INC-20260217-002", "title": "DB connection pool exhausted — checkout-api degraded", "severity": "SEV1", "timeframe": { "start": "2026-02-17T14:02:00Z", "end": null }, "alerts": [ { "name": "DatabaseConnectionPoolNearLimit", "description": "Connection pool at 99.7% on orders-db-primary", "timestamp": "2026-02-17T14:03:00Z" } ], "logs": [ { "source": "order-worker", "lines": [ "ERROR: connection timeout after 30s (attempt 3/3)", "WARN: pool exhausted, queueing request (queue_depth=847)" ] } ], "metrics": [ { "name": "db_connection_pool_utilization_pct", "window": "5m", "values_summary": "Jumped from 22% to 99.7% at 14:03Z" } ], "runbook_excerpt": "Step 1: Check DB connection dashboard...", "constraints": { "max_time_minutes": 15, "environment": "production", "region": "swedencentral" } } Declaring the Hosted Agent The agent is registered with Microsoft Foundry via a declarative agent.yaml file. This tells Foundry how to discover and route requests to the container: agent.yaml kind: hosted name: oncall-copilot description: | Multi-agent hosted agent that ingests incident signals and runs 4 specialist agents concurrently via Microsoft Agent Framework ConcurrentBuilder. metadata: tags: - Azure AI AgentServer - Microsoft Agent Framework - Multi-Agent - Model Router protocols: - protocol: responses environment_variables: - name: AZURE_OPENAI_ENDPOINT value: ${AZURE_OPENAI_ENDPOINT} - name: AZURE_OPENAI_CHAT_DEPLOYMENT_NAME value: model-router The protocols: [responses] declaration exposes the agent via the Foundry Responses API on port 8088. Clients can invoke it with a standard HTTP POST no custom API needed. Invoking the Agent Once deployed, you can invoke the agent with the project’s built-in scripts or directly via curl : CLI / curl # Using the included invoke script python scripts/invoke.py --demo 2 # multi-signal SEV1 demo python scripts/invoke.py --scenario 1 # Redis cluster outage # Or with curl directly TOKEN=$(az account get-access-token \ --resource https://ai.azure.com --query accessToken -o tsv) curl -X POST \ "$AZURE_AI_PROJECT_ENDPOINT/openai/responses?api-version=2025-05-15-preview" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "input": [ {"role": "user", "content": "<incident JSON here>"} ], "agent": { "type": "agent_reference", "name": "oncall-copilot" } }' The Browser UI The project includes a zero-dependency browser UI built with plain HTML, CSS, and vanilla JavaScript—no React, no bundler. A Python http.server backend proxies requests to the Foundry endpoint. The empty state. Quick-load buttons pre-populate the JSON editor with demo incidents or scenario files. Demo 1 loaded: API Gateway 5xx spike, SEV3. The JSON is fully editable before submitting. Agent Output Panels Triage: Root causes ranked by confidence. Evidence is collapsed under each hypothesis. Triage: Immediate actions with P0/P1/P2 priority badges and owner roles. Comms: Slack card with emoji substitution and a stakeholder executive summary. PIR: Chronological timeline with an ONGOING marker, customer impact in a red-bordered box. Performance: Parallel Execution Matters Incident Type Complexity Parallel Latency Sequential (est.) Single alert, minimal context (SEV4) Low 4–6 s ~16 s Multi-signal, logs + metrics (SEV2) Medium 7–10 s ~28 s Full SEV1 with long log lines High 10–15 s ~40 s Post-incident synthesis (resolved) High 10–14 s ~38 s asyncio.gather() running four independent agents cuts total latency by 3–4× compared to sequential execution. For a SEV1 at 3 AM, that’s the difference between a 10-second AI-powered head start and a 40-second wait. Five Key Design Decisions Parallel over sequential Each agent is independent and processes the full incident payload in isolation. ConcurrentBuilder with asyncio.gather() is the right primitive—no inter-agent dependencies, no shared state. JSON-only agent instructions Every agent returns only valid JSON with a defined schema. The orchestrator merges fragments with merged.update(agent_output) . No parsing, no extraction, no post-processing. No hardcoded model names AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=model-router is the only model reference. Model Router selects the best model at runtime based on prompt complexity. When new models ship, the agent gets better for free. DefaultAzureCredential everywhere No API keys. No token management code. Managed identity in production, az login in development. Same code, both environments. Instructions as configuration Each agent’s system prompt is a plain Python string. Behaviour changes are text edits, not code logic. A non-developer can refine prompts and redeploy. Guardrails: Built into the Prompts The agent instructions include explicit guardrails that don’t require external filtering: No hallucination: When data is insufficient, the agent sets confidence: 0 and populates missing_information rather than inventing facts. Secret redaction: Each agent is instructed to redact credential-like patterns as [REDACTED] in its output. Mark unknowns: Undeterminable fields use the literal string "UNKNOWN" rather than plausible-sounding guesses. Diagnostic suggestions: When signal is sparse, immediate_actions includes diagnostic steps that gather missing information before prescribing a fix. Model Router: Automatic Model Selection One of the most powerful aspects of this architecture is Model Router. Instead of choosing between gpt-4o , gpt-4o-mini , or o3-mini per agent, you deploy a single model-router endpoint. Model Router analyses each request’s complexity and routes it to the most cost-effective model that can handle it. Model Router insights: models selected per request with associated costs. Model Router telemetry from Microsoft Foundry: request distribution and cost analysis. This means you get optimal cost-performance without writing any model-selection logic. A simple Summary Agent prompt may route to gpt-4o-mini , while a complex Triage Agent prompt with 800 lines of logs routes to gpt-4o all automatically. Deployment: One Command The repo includes both azure.yaml and agent.yaml , so deployment is a single command: Deploy to Foundry # Deploy everything: infra + container + Model Router + Hosted Agent azd up This provisions the Foundry project resources, builds the Docker image, pushes to Azure Container Registry, deploys a Model Router instance, and creates the Hosted Agent. For more control, you can use the SDK deploy script: Manual Docker + SDK deploy # Build and push (must be linux/amd64) docker build --platform linux/amd64 -t oncall-copilot:v1 . docker tag oncall-copilot:v1 $ACR_IMAGE docker push $ACR_IMAGE # Create the hosted agent python scripts/deploy_sdk.py Getting Started Quickstart # Clone git clone https://github.com/microsoft-foundry/oncall-copilot cd oncall-copilot # Install python -m venv .venv source .venv/bin/activate # .venv\Scripts\activate on Windows pip install -r requirements.txt # Set environment variables export AZURE_OPENAI_ENDPOINT="https://<account>.openai.azure.com/" export AZURE_OPENAI_CHAT_DEPLOYMENT_NAME="model-router" export AZURE_AI_PROJECT_ENDPOINT="https://<account>.services.ai.azure.com/api/projects/<project>" # Validate schemas locally (no Azure needed) MOCK_MODE=true python scripts/validate.py # Deploy to Foundry azd up # Invoke the deployed agent python scripts/invoke.py --demo 1 # Start the browser UI python ui/server.py # → http://localhost:7860 Extending: Add Your Own Agent Adding a fifth agent is straightforward. Follow this pattern: Create app/agents/<name>.py with a *_INSTRUCTIONS constant following the existing pattern. Add the agent’s output keys to app/schemas.py . Register it in main.py : main.py — Adding a 5th agent from app.agents.my_new_agent import NEW_INSTRUCTIONS new_agent = AzureOpenAIChatClient( ad_token_provider=_token_provider ).create_agent( instructions=NEW_INSTRUCTIONS, name="new-agent", ) workflow = ConcurrentBuilder().participants( [triage, summary, comms, pir, new_agent] ) Ideas for extensions: a ticket auto-creation agent that creates Jira or Azure DevOps items from the PIR output, a webhook adapter agent that normalises PagerDuty or Datadog payloads, or a human-in-the-loop agent that surfaces missing_information as an interactive form. Key Takeaways for AI Engineers The multi-agent pattern isn’t just for chatbots. Any task that can be decomposed into independent subtasks with distinct output schemas is a candidate. Incident response, document processing, code review, data pipeline validation—the pattern transfers. Microsoft Agent Framework gives you ConcurrentBuilder for parallel execution and AzureOpenAIChatClient for Azure-native auth—you write the prompts, the framework handles the plumbing. Foundry Hosted Agents let you deploy containerised agents with managed infrastructure, automatic scaling, and built-in telemetry. No Kubernetes, no custom API gateway. Model Router eliminates the model selection problem. One deployment name handles all scenarios with optimal cost-performance tradeoffs. Prompt-as-config means your agents are iterable by anyone who can edit text. The feedback loop from “this output could be better” to “deployed improvement” is minutes, not sprints. Resources Microsoft Agent Framework SDK powering the multi-agent orchestration Model Router Automatic model selection based on prompt complexity Foundry Hosted Agents Deploy containerised agents on managed infrastructure ConcurrentBuilder Samples Official agents-in-workflow sample this project follows DefaultAzureCredential Zero-config auth chain used throughout Hosted Agents Concepts Architecture overview of Foundry Hosted Agents The On-Call Copilot sample is open source under the MIT licence. Contributions, scenario files, and agent instruction improvements are welcome via pull request.Agent Hooks: Production-Grade Governance for Azure SRE Agent
Introduction Azure SRE Agent helps engineering teams automate incident response, diagnostics, and remediation tasks. But when you're giving an agent access to production systems—your databases, your Kubernetes clusters, your cloud resources—you need more than just automation. You need governance. Today, we're diving deep into Agent Hooks, the built-in governance framework in Azure SRE Agent that lets you enforce quality standards, prevent dangerous operations, and maintain audit trails without writing custom middleware or proxies. Agent Hooks work by intercepting your SRE Agent at critical execution points—before it responds to users (Stop hooks) or after it executes tools (PostToolUse hooks). You define the rules once in your custom agent configuration, and the SRE Agent runtime enforces them automatically across every conversation thread. In this post, we'll show you how to configure Agent Hooks for a real production scenario: diagnosing and remediating PostgreSQL connection pool exhaustion while maintaining enterprise controls. The Challenge: Autonomous Remediation with Guardrails You're managing a production application backed by Azure PostgreSQL Flexible Server. Your on-call team frequently deals with connection pool exhaustion issues that cause latency spikes. You want your SRE Agent to diagnose and resolve these incidents autonomously, but you need to ensure: Quality Control: The agent provides thorough, evidence-based analysis instead of superficial guesses Safety: The agent can't accidentally execute dangerous commands, but can still perform necessary remediation Compliance: Every agent action is logged for security audits and post-mortems Without Agent Hooks, you'd need to build custom middleware, write validation logic around the SRE Agent API, or settle for manual approval workflows. With Agent Hooks, you configure these controls once in your custom agent definition and the SRE Agent platform enforces them automatically. The Scenario: PostgreSQL Connection Pool Exhaustion For our demo, we'll use a real production application (octopets-prod-web) experiencing connection pool exhaustion. When this happens: P95 latency spikes from ~120ms to 800ms+ Active connections reach the pool limit New requests get queued or fail The correct remediation is to restart the PostgreSQL Flexible Server to flush stale connections—but we want our agent to do this safely and with proper oversight. Demo Setup: Three Hooks, Three Purposes We'll configure three hooks that work together to create a robust governance framework: Hook #1: Quality Gate (Stop Hook) Ensures the agent provides structured, evidence-based responses before presenting them to users. Hook #2: Safety Guardrails (PostToolUse Hook) Blocks dangerous commands while allowing safe operations through an explicit allowlist. Hook #3: Audit Trail (Global Hook) Logs every tool execution across all agents for compliance and debugging. Step-by-Step Implementation Creating the Custom Agent First, we create a specialized subagent in the Azure SRE Agent platform called sre_analyst_agent designed for PostgreSQL diagnostics. In the Agent Canvas, we configure the agent instructions: You are an SRE agent responsible for diagnosing and remediating production issues for an application backed by an Azure PostgreSQL Flexible Server. When investigating a problem: - Use available tools to query Azure Monitor metrics, PostgreSQL logs, and connection statistics - Look for patterns: latency spikes, connection counts, error rates, CPU/memory pressure - Quantify findings with actual numbers where possible (e.g., P95 latency in ms, active connection count, error rate %) When presenting your diagnosis, structure your response with these exact sections: ## Root Cause A precise explanation of what is causing the issue. ## Evidence Specific metrics and observations that support your root cause. Include actual numbers: latency values in ms, connection counts, error rates, timestamps. ## Recommended Actions Numbered list of remediation steps ordered by priority. Be specific — include actual resource names and exact commands. When executing a fix: - Always verify the current state before acting - Confirm the fix worked by re-checking the same metrics after the action - Report before and after numbers to show impact This explicit guidance ensures the agent knows the correct remediation path. Configuring Hook #1: Quality Gate In the Agent Canvas' Hooks tab, we add our first agent-level hook—a Stop hook that fires before the SRE Agent presents its response. This hook uses the SRE Agent's own LLM to evaluate response quality: Event Type: Stop Hook Type: Prompt Activation: Always Hook Prompt: You are a quality gate for an SRE agent that investigates database and app performance issues. Review the agent's response below: $ARGUMENTS Evaluate whether the response meets ALL of the following criteria: 1. Has a "## Root Cause" section with a specific, clear explanation (not vague — must say specifically what failed, e.g., "connection pool exhaustion due to long-running queries holding connections" not just "database issue") 2. Has a "## Evidence" section that includes at least one concrete metric or data point with an actual number (e.g., "P95 latency spiked to 847ms", "active connections: 497/500", "error rate: 23% over last 15 minutes") 3. Has a "## Recommended Actions" section with numbered, specific steps (must include actual resource names or commands, not just "restart the database") If ALL three criteria are met with substantive content, respond: {"ok": true} If ANY criterion is missing, vague, or uses placeholder text, respond: {"ok": false, "reason": "Your response needs more depth before it reaches the user. Specifically: ## Root Cause must name the exact failure mechanism, ## Evidence must include real metric values with numbers (latency in ms, connection counts, error rates), ## Recommended Actions must reference actual resource names and specific commands. Go back and verify your findings."} This hook acts as an automated quality gate built directly into the SRE Agent runtime, catching superficial responses before they reach your on-call engineers. Configuring Hook #2: Safety Guardrails Our second agent-level hook is a PostToolUse hook that fires after the SRE Agent executes Bash or Python tools. This implements an allowlist pattern to control what commands can actually run in production: Event Type: PostToolUse Hook Type: Command (Python) Matcher: Bash|ExecuteShellCommand|ExecutePythonCode Activation: Always Hook Script: #!/usr/bin/env python3 import sys, json, re context = json.load(sys.stdin) tool_input = context.get('tool_input', {}) command = '' if isinstance(tool_input, dict): command = tool_input.get('command', '') or tool_input.get('code', '') # Safe allowlist — check these FIRST before any blocking logic # These are explicitly approved remediation actions for PostgreSQL issues safe_allowlist = [ r'az\s+postgres\s+flexible-server\s+restart', ] for safe_pattern in safe_allowlist: if re.search(safe_pattern, command, re.IGNORECASE): print(json.dumps({ 'decision': 'allow', 'hookSpecificOutput': { 'additionalContext': '[SAFETY] ✅ PostgreSQL server restart approved — recognized as a safe remediation action for connection pool exhaustion.' } })) sys.exit(0) # Destructive commands to block dangerous = [ (r'\baz\s+postgres\s+flexible-server\s+delete\b', 'az postgres flexible-server delete (permanent server deletion)'), (r'\baz\s+\S+\s+delete\b', 'az delete (Azure resource deletion)'), (r'\brm\s+-rf\b', 'rm -rf (recursive force delete)'), (r'\bsudo\b', 'sudo (privilege escalation)'), (r'\bdrop\s+(table|database)\b', 'DROP TABLE/DATABASE (irreversible data loss)'), (r'\btruncate\s+table\b', 'TRUNCATE TABLE (irreversible data wipe)'), (r'\bdelete\s+from\b(?!.*\bwhere\b)', 'DELETE FROM without WHERE clause (wipes entire table)'), ] for pattern, label in dangerous: if re.search(pattern, command, re.IGNORECASE): print(json.dumps({ 'decision': 'block', 'reason': f'🛑 BLOCKED: {label} is not permitted. Use safe, non-destructive alternatives. For PostgreSQL connection issues, prefer server restart or connection pool configuration changes.' })) sys.exit(0) print(json.dumps({'decision': 'allow'})) This ensures only pre-approved PostgreSQL operations can execute, preventing accidental data deletion or configuration changes. Now that we've configured both agent-level hooks, here's what our custom agent looks like in the canvas: - Overview ofsre_analyst_agent with hooks. Agent Canvas showing the sre_analyst_agent configuration with two agent-level hooks attached Configuring Hook #3: Audit Trail Finally, we create a Global hook using the Hooks management page in the Azure SRE Agent Portal. Global hooks apply across all custom agents in your organization, providing centralized governance: obal Hooks Management Page - Creating the sre_audit_trail global hook. The Global Hooks management page showing the sre_audit_trail hook configuration with event type, activation mode, matcher pattern, and Python script editor Event Type: PostToolUse Hook Type: Command (Python) Matcher: * (all tools) Activation: On-demand Hook Script: #!/usr/bin/env python3 import sys, json context = json.load(sys.stdin) tool_name = context.get('tool_name', 'unknown') agent_name = context.get('agent_name', 'unknown') succeeded = context.get('tool_succeeded', False) turn = context.get('current_turn', '?') audit = f'[AUDIT] Turn {turn} | Agent: {agent_name} | Tool: {tool_name} | Success: {succeeded}' print(audit, file=sys.stderr) print(json.dumps({ 'decision': 'allow', 'hookSpecificOutput': { 'additionalContext': audit } })) By setting this as "on-demand," your SRE engineers can toggle this hook on/off per conversation thread from the chat interface—enabling detailed audit logging during incident investigations without overwhelming logs during routine queries. Seeing Agent Hooks in Action Now let's see how these hooks work together when our SRE Agent investigates a real production incident. Activating Audit Trail Before starting our investigation, we toggle on the audit trail hook from the chat interface: - Managing hooks for this thread with sre_audit_trail activated the "Manage hooks for this thread" menu showing the sre_audit_trail global hook toggled on for this conversation This gives us visibility into every tool the agent executes during the investigation. Starting the Investigation We prompt our SRE Agent: "Can you check the octopets-prod-web application and diagnose any performance issues?" The SRE Agent begins gathering metrics from Azure Monitor, and we immediately see our audit trail hook logging each tool execution: This real-time visibility is invaluable for understanding what your SRE Agent is doing and debugging issues when things don't go as planned. Quality Gate Rejection The SRE Agent completes its initial analysis and attempts to respond. But our Stop hook intercepts it—the response doesn't meet our quality standards: - Stop hook forcing agent to provide more detailed analysisStop hook rejection message: "Your response needs more depth and specificity..." forcing the agent to re-analyze with more evidence The hook rejects the response and forces the SRE Agent to retry—gathering more evidence, querying additional metrics, and providing specific numbers. This self-correction happens automatically within the SRE Agent runtime, with no manual intervention required. Structured Final Response After re-verification, the SRE Agent presents a properly structured analysis that passes our quality gate: with Root Cause, Evidence, and Recommended Actions. Agent response showing the required structure: Root Cause section with connection pool exhaustion diagnosis, Evidence section with specific metric numbers, and Recommended Actions with the exact restart command Root Cause: Connection pool exhaustion Evidence: Specific metrics (83 active connections, P95 latency 847ms) Recommended Actions: Restart command with actual resource names This is the level of rigor we expect from production-ready agents. Safety Allowlist in Action The SRE Agent determines it needs to restart the PostgreSQL server to remediate the connection pool exhaustion. Our PostToolUse hook intercepts the command execution and validates it against our allowlist: - PostgreSQL metrics query and restart command output. Code execution output showing the PostgreSQL metrics query results and the az postgres flexible-server restart command being executed successfully Because the az postgres flexible-server restart command matches our safety allowlist pattern, the hook allows it to proceed. If the SRE Agent had attempted any unapproved operation (like DROP DATABASE or firewall rule changes), the safety hook would have blocked it immediately. The Results After the SRE Agent restarts the PostgreSQL server: P95 latency drops from 847ms back to ~120ms Active connections reset to healthy levels Application performance returns to normal But more importantly, we achieved autonomous remediation with enterprise governance: ✅ Quality assurance: Every response met our evidence standards (enforced by Stop hooks) ✅ Safety controls: Only pre-approved operations executed (enforced by PostToolUse hooks) ✅ Complete audit trail: Every tool call logged for compliance (enforced by Global hooks) ✅ Zero manual interventions: The SRE Agent self-corrected when quality standards weren't met This is the power of Agent Hooks—governance that doesn't get in the way of automation. Key Takeaways Agent Hooks bring production-grade governance to Azure SRE Agent: Layered Governance: Combine agent-level hooks for custom agent-specific controls with global hooks for organization-wide policies Fail-Safe by Default: Use allowlist patterns in PostToolUse hooks rather than denylists—explicitly permit safe operations instead of trying to block every dangerous one Self-Correcting SRE Agents: Stop hooks with quality gates create feedback loops that improve response quality without human intervention Audit Without Overhead: On-demand global hooks let your engineers toggle detailed logging only during incident investigations No Custom Middleware: All governance logic lives in your custom agent configuration—no need to build validation proxies or wrapper services Getting Started Agent Hooks are available now in the Azure SRE Agent platform. You can configure them entirely through the UI—no API calls or tokens needed: Agent-Level Hooks: Navigate to the Agent Canvas → Hooks tab and add hooks directly to your custom agent Global Hooks: Use the Hooks management page to create organization-wide policies Thread-Level Control: Toggle on-demand hooks from the chat interface using the "Manage hooks" menu Learn More Agent Hooks Documentation YAML Schema Reference Subagent Builder Guide Ready to build safer, smarter agents? Start experimenting with Agent Hooks today at sre.azure.com.392Views0likes0CommentsGetting Started with Behave: Writing Cucumber Tests in VS Code
What is Behave? Behave is a BDD test framework for Python that allows you to write tests in plain English using Given–When–Then syntax, backed by Python step definitions. Key benefits: Human‑readable test scenarios using Gherkin Strong alignment between business requirements and test automation Easy integration with CI/CD pipelines Lightweight and IDE‑friendly Prerequisites Before getting started, ensure you have the following installed: Python 3.10+ Visual Studio Code Basic understanding of Python Familiarity with BDD concepts (Given / When / Then) Steps Download the sample demo zip from github download Step 1: Create a Virtual Environment and activate it. python -m venv venv .venv\Scripts\activate Install Dependencies pip install behave requests Step 2: Install VS Code Extensions To get a first‑class experience in VS Code, install the following extensions: Python (Microsoft) Gherkin (for .feature syntax highlighting) Behave VSC (optional but recommended) The Behave VSC extension enables: Running tests directly from VS Code Step definition navigation Gherkin auto‑completion Test explorer integration Folder Structure Why This Structure? features/ – contains all Gherkin feature files steps/ – contains Python step implementations environment.py – optional hooks for setup/teardown config/configuration.py - for lifecycle hooks behave.ini – configuration file for Behave Step 3: Write Your First Feature File Feature: Login functionality Login Scenario: Successful login Given the application is running When the user enters valid credentials Then the user should see the dashboard Step 4: Writing Step Definitions from behave import given, when, then @given('the user is on the login page') def step_user_on_login_page(context): print("User navigates to login page") @when('the user enters valid credentials') def step_user_enters_credentials(context): print("User enters username and password") @then('the user should be redirected to the dashboard') def step_user_redirected(context): print("User is redirected to dashboard") Step 5: Adding Test Configuration (configuration.py) Create config/configuration.py to centralize environment-specific settings. This helps avoid hardcoding values across test files. class TestConfig: BASE_URL = "https://example.com" TIMEOUT = 30 BROWSER = "chrome" Step 6: Using Fixtures with environment.py The environment.py file is Behave’s hook mechanism. It runs before and after tests, similar to fixtures in pytest. Create features/environment.py: from config.configuration import TestConfig def before_all(context): print("Setting up test environment") context.config_data = TestConfig() def before_scenario(context, scenario): print(f"Starting scenario: {scenario.name}") def after_scenario(context, scenario): print(f"Finished scenario: {scenario.name}") def after_all(context): print("Tearing down test environment") Common Use Cases Initialize browsers or API clients Load environment variables Clean up test data Open/close DB connections Step 7: Optional Behave Configuration File Create behave.ini for execution settings. This helps during debugging by showing logs directly in the console. [behave] stdout_capture = false stderr_capture = false log_capture = false Step 8: Running Tests From the project root, run: behave To run a specific feature: behave features/login.feature Run by tag behave -t Login Best Practices ✔ Keep feature files business-readable ✔ Avoid logic in feature files ✔ Reuse steps wherever possible ✔ Centralize configs and fixtures ✔ Use tags for selective execution