python
152 TopicsAnnouncing the IQ Series: Foundry IQ
AI agents are rapidly becoming a new way to build applications. But for agents to be truly useful, they need access to the knowledge and context that helps them reason about the world they operate in. That’s where Foundry IQ comes in. Today we’re announcing the IQ Series: Foundry IQ, a new set of developer-focused episodes exploring how to build knowledge-centric AI systems using Foundry IQ. The series focuses on the core ideas behind how modern AI systems work with knowledge, how they retrieve information, reason across sources, synthesize answers, and orchestrate multi-step interactions. Instead of treating retrieval as a single step in a pipeline, Foundry IQ approaches knowledge as something that AI systems actively work with throughout the reasoning process. The IQ Series breaks down these concepts and shows how they come together when building real AI applications. You can explore the series and all the accompanying samples here: 👉 https://aka.ms/iq-series What is Foundry IQ? Foundry IQ helps AI systems work with knowledge in a more structured and intentional way. Rather than wiring retrieval logic directly into every application, developers can define knowledge bases that connect to documents, data sources, and other information systems. AI agents can then query these knowledge bases to gather the context they need to generate responses, make decisions, or complete tasks. This model allows knowledge to be organized, reused, and combined across applications, instead of being rebuilt for each new scenario. What's covered in the IQ Series? The Foundry IQ episodes in the IQ Series explore the key building blocks behind knowledge-driven AI systems from how knowledge enters the system to how agents ultimately query and use it. The series is released as three weekly episodes: Foundry IQ: Unlocking Knowledge for Your Agents — March 18, 2026: Introduces Foundry IQ and the core ideas behind it. The episode explains how AI agents work with knowledge and walks through the main components of the Foundry IQ that support knowledge-driven applications. Foundry IQ: Building the Data Pipeline with Knowledge Sources — March 25, 2026: Focuses on Knowledge Sources and how different types of content flow into Foundry IQ. It explores how systems such as SharePoint, Fabric, OneLake, Azure Blob Storage, Azure AI Search, and the web contribute information that AI systems can later retrieve and use. Foundry IQ: Querying the Multi-Source AI Knowledge Bases — April 1, 2026: Dives into the Knowledge Bases and how multiple knowledge sources can be organized behind a single endpoint. The episode demonstrates how AI systems query across these sources and synthesize information to answer complex questions. Each episode includes a short executive introduction, a tech talk exploring the topic in depth, and a visual recap with doodle summaries of the key ideas. Alongside the episodes, the GitHub repository provides cookbooks with sample code, summary of the episodes, and additinal learning resources, so developers can explore the concepts and apply them in their own projects. Explore the Repo All episodes and supporting materials live in the IQ Series repository: 👉 https://aka.ms/iq-series Inside the repository you’ll find: The Foundry IQ episode links Cookbooks for each episode Links to documentation and additional resources If you're building AI agents or exploring how AI systems can work with knowledge, the IQ Series is a great place to start. Watch the episodes and explore the cookbooks! We’re excited to see what you build and welcome your feedback & ideas as the series evolves.From Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry
From Prototype to Production: Building a Hosted Agent with AI Toolkit & Microsoft Foundry Agentic AI is no longer a future concept — it’s quickly becoming the backbone of intelligent, action-oriented applications. But while it’s easy to prototype an AI agent, taking it all the way to production requires much more than a clever prompt. In this blog post - and the accompanying video tutorial - we walk through the end-to-end journey of an AI engineer building, testing, and operationalizing a hosted AI agent using AI Toolkit in Visual Studio Code and Microsoft Foundry. The goal is to show not just how to build an agent, but how to do it in a way that’s scalable, testable, and production ready. The scenario: a retail agent for sales and inventory insights To make things concrete, the demo uses a fictional DIY and home‑improvement retailer called Zava. The objective is to build an AI agent that can assist the internal team in: Analyzing sales data (e.g. reason over a product catalog, identify top‑selling categories, etc.) Managing inventory (e.g. Detect products running low on stock, trigger restock actions, etc.) Chapter 1 (min 00:00 – 01:20): Model selection with GitHub Copilot and AI Toolkit The journey starts in Visual Studio Code, using GitHub Copilot together with the AI Toolkit. Instead of picking a model arbitrarily, we: Describe the business scenario in natural language Ask Copilot to perform a comparative analysis between two candidate models Define explicit evaluation criteria (reasoning quality, tool support, suitability for analytics) Copilot leverages AI Toolkit skills to explain why one model is a better fit than the other — turning model selection into a transparent, repeatable decision. To go deeper, we explore the AI Toolkit Model Catalog, which lets you: Browse hundreds of models Filter by hosting platform (GitHub, Microsoft Foundry, local) Filter by publisher (open‑source and proprietary) Once the right model is identified, we deploy it to Microsoft Foundry with a single click and validate it with test prompts. Chapter 2 (min 01:20 – 02:48): Rapid agent prototyping with Agent Builder UI With the model ready, it’s time to build the agent. Using the Agent Builder UI, we configure: The agent’s identity (name, role, responsibilities) Instructions that define tone, behavior, and scope The model the agent runs on The tools and data sources it can access For this scenario, we add: File search, grounded on uploaded sales logs and a product catalog Code interpreter, enabling the agent to compute metrics, generate charts, and write reports We can then test the agent in the right-side playground by asking business questions like: “What were the top three selling categories in 2025?” The response is not generic — it’s grounded in the retailer’s data, and you can inspect which tools and data were used to produce the answer. The Agent Builder also provides local evaluation and tracing functionalities. Chapter 3 (min 02:48 – 04:04): From UI prototype to hosted agent code UI-based prototyping is powerful, but real solutions often require custom logic. This is where we transition from prototype to production by using a built-in workflow to migrate from UI to a hosted agent template The result is a production-ready scaffold that includes: Agent code (built with Microsoft Agent Framework; you can choose between Python or C#) A YAML-based agent definition Container configuration files From here, we extend the agent with custom functions — for example, to create and manage restock orders. GitHub Copilot helps accelerate this step by adapting the template to the Zava business scenario. Chapter 4 (min 04:04 – 05:12): Local debugging and cloud deployment Before deploying, we test the agent locally: Ask it to identify products running out of stock Trigger a restock action using the custom function Debug the full tool‑calling flow end to end Once validated, we deploy the agent to Microsoft Foundry. By deploying the agent to the Cloud, we don’t just get compute power, but a whole set of built-in features to operationalize our solution and maintain it in production. Chapter 5 (min 05:12 – 08:04): Evaluation, safety, and monitoring in Foundry Production readiness doesn’t stop at deployment. In the Foundry portal, we explore: Evaluation runs, using both real and synthetic datasets LLM‑based judges that score responses across multiple metrics, with explanations Red teaming, where an adversarial agent probes for unsafe or undesired behavior Monitoring dashboards, tracking usage, latency, regressions, and cost across the agent fleet These capabilities make it possible to move from ad‑hoc testing to continuous quality and safety assessment. Why this workflow matters This end-to-end flow demonstrates a key idea: Agentic AI isn’t just about building agents — it’s about operating them responsibly at scale. By combining AI Toolkit in VS Code with Microsoft Foundry, you get: A smooth developer experience Clear separation between experimentation and production Built‑in evaluation, safety, and observability Resources Demo Sample: GitHub Repo Foundry tutorials: Inside Microsoft Foundry - YouTubeA Practical Path Forward for Heroku Customers with Azure
On February 6, 2026, Heroku announced it is moving to a sustaining engineering model focused on stability, security, reliability, and ongoing support. Many customers are now reassessing how their application platforms will support today’s workloads and future innovation. Microsoft is committed to helping customers migrate and modernize applications from platforms like Heroku to Azure.127Views0likes0CommentsBuilding a Multi-Agent On-Call Copilot with Microsoft Agent Framework
Four AI agents, one incident payload, structured triage in under 60 seconds powered by Microsoft Agent Framework and Foundry Hosted Agents. Multi-Agent Microsoft Agent Framework Foundry Hosted Agents Python SRE / Incident Response When an incident fires at 3 AM, every second the on-call engineer spends piecing together alerts, logs, and metrics is a second not spent fixing the problem. What if an AI system could ingest the raw incident signals and hand you a structured triage, a Slack update, a stakeholder brief, and a draft post-incident report, all in under 10 seconds? That’s exactly what On-Call Copilot does. In this post, we’ll walk through how we built it using the Microsoft Agent Framework, deployed it as a Foundry Hosted Agent, and discuss the key design decisions that make multi-agent orchestration practical for production workloads. The full source code is open-source on GitHub. You can deploy your own instance with a single azd up . Why Multi-Agent? The Problem with Single-Prompt Triage Early AI incident assistants used a single large prompt: “Here is the incident. Give me root causes, actions, a Slack message, and a post-incident report.” This approach has two fundamental problems: Context overload. A real incident may have 800 lines of logs, 10 alert lines, and dense metrics. Asking one model to process everything and produce four distinct output formats in a single turn pushes token limits and degrades quality. Conflicting concerns. Triage reasoning and communication drafting are cognitively different tasks. A model optimised for structured JSON analysis often produces stilted Slack messages—and vice versa. The fix is specialisation: decompose the task into focused agents, give each agent a narrow instruction set, and run them in parallel. This is the core pattern that the Microsoft Agent Framework makes easy. Architecture: Four Agents Running Concurrently On-Call Copilot is deployed as a Foundry Hosted Agent—a containerised Python service running on Microsoft Foundry’s managed infrastructure. The core orchestrator uses ConcurrentBuilder from the Microsoft Agent Framework SDK to run four specialist agents in parallel via asyncio.gather() . All four panels populated simultaneously: Triage (red), Summary (blue), Comms (green), PIR (purple). Architecture: The orchestrator runs four specialist agents concurrently via asyncio.gather(), then merges their JSON fragments into a single response. All four agents The solution share a single Azure OpenAI Model Router deployment. Rather than hardcoding gpt-4o or gpt-4o-mini , Model Router analyses request complexity and routes automatically. A simple triage prompt costs less; a long post-incident synthesis uses a more capable model. One deployment name, zero model-selection code. Meet the Four Agents 🔍 Triage Agent Root cause analysis, immediate actions, missing data identification, and runbook alignment. suspected_root_causes · immediate_actions · missing_information · runbook_alignment 📋 Summary Agent Concise incident narrative: what happened and current status (ONGOING / MITIGATED / RESOLVED). summary.what_happened · summary.current_status 📢 Comms Agent Audience-appropriate communications: Slack channel update with emoji conventions, plus a non-technical stakeholder brief. comms.slack_update · comms.stakeholder_update 📝 PIR Agent Post-incident report: chronological timeline, quantified customer impact, and specific prevention actions. post_incident_report.timeline · .customer_impact · .prevention_actions The Code: Building the Orchestrator The entry point is remarkably concise. ConcurrentBuilder handles all the async wiring—you just declare the agents and let the framework handle parallelism, error propagation, and response merging. main.py — Orchestrator from agent_framework import ConcurrentBuilder from agent_framework.azure import AzureOpenAIChatClient from azure.ai.agentserver.agentframework import from_agent_framework from azure.identity import DefaultAzureCredential, get_bearer_token_provider from app.agents.triage import TRIAGE_INSTRUCTIONS from app.agents.comms import COMMS_INSTRUCTIONS from app.agents.pir import PIR_INSTRUCTIONS from app.agents.summary import SUMMARY_INSTRUCTIONS _credential = DefaultAzureCredential() _token_provider = get_bearer_token_provider( _credential, "https://cognitiveservices.azure.com/.default" ) def create_workflow_builder(): """Create 4 specialist agents and wire them into a ConcurrentBuilder.""" triage = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=TRIAGE_INSTRUCTIONS, name="triage-agent", ) summary = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=SUMMARY_INSTRUCTIONS, name="summary-agent", ) comms = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=COMMS_INSTRUCTIONS, name="comms-agent", ) pir = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent( instructions=PIR_INSTRUCTIONS, name="pir-agent", ) return ConcurrentBuilder().participants([triage, summary, comms, pir]) def main(): builder = create_workflow_builder() from_agent_framework(builder.build).run() # starts on port 8088 if __name__ == "__main__": main() Key insight: DefaultAzureCredential means there are no API keys anywhere in the codebase. The container uses managed identity in production; local development uses your az login session. The same code runs in both environments without modification. Agent Instructions: Prompts as Configuration Each agent receives a tightly scoped system prompt that defines its output schema and guardrails. Here’s the Triage Agent—the most complex of the four: app/agents/triage.py TRIAGE_INSTRUCTIONS = """\ You are the **Triage Agent**, an expert Site Reliability Engineer specialising in root cause analysis and incident response. ## Task Analyse the incident data and return a single JSON object with ONLY these keys: { "suspected_root_causes": [ { "hypothesis": "string – concise root cause hypothesis", "evidence": ["string – supporting evidence from the input"], "confidence": 0.0 // 0-1, how confident you are } ], "immediate_actions": [ { "step": "string – concrete action with runnable command if applicable", "owner_role": "oncall-eng | dba | infra-eng | platform-eng", "priority": "P0 | P1 | P2 | P3" } ], "missing_information": [ { "question": "string – what data is missing", "why_it_matters": "string – why this data would help" } ], "runbook_alignment": { "matched_steps": ["string – runbook steps that match the situation"], "gaps": ["string – gaps or missing runbook coverage"] } } ## Guardrails 1. **No secrets** – redact any credential-like material as [REDACTED]. 2. **No hallucination** – if data is insufficient, set confidence to 0 and add entries to missing_information. 3. **Diagnostic suggestions** – when data is sparse, include diagnostic steps in immediate_actions. 4. **Structured output only** – return ONLY valid JSON, no prose. """ The Comms Agent follows the same pattern but targets a different audience: app/agents/comms.py COMMS_INSTRUCTIONS = """\ You are the **Comms Agent**, an expert incident communications writer. ## Task Return a single JSON object with ONLY this key: { "comms": { "slack_update": "Slack-formatted message with emoji, severity, status, impact, next steps, and ETA", "stakeholder_update": "Non-technical summary for executives. Focus on business impact and resolution." } } ## Guidelines - Slack: Use :rotating_light: for active SEV1/2, :warning: for degraded, :white_check_mark: for resolved. - Stakeholder: No jargon. Translate to business impact. - Tone: Calm, factual, action-oriented. Never blame individuals. - Structured output only – return ONLY valid JSON, no prose. """ Instructions as config, not code. Agent behaviour is defined entirely by instruction text strings. A non-developer can refine agent behaviour by editing the prompt and redeploying no Python changes needed. The Incident Envelope: What Goes In The agent accepts a single JSON envelope. It can come from a monitoring alert webhook, a PagerDuty payload, or a manual CLI invocation: Incident Input (JSON) { "incident_id": "INC-20260217-002", "title": "DB connection pool exhausted — checkout-api degraded", "severity": "SEV1", "timeframe": { "start": "2026-02-17T14:02:00Z", "end": null }, "alerts": [ { "name": "DatabaseConnectionPoolNearLimit", "description": "Connection pool at 99.7% on orders-db-primary", "timestamp": "2026-02-17T14:03:00Z" } ], "logs": [ { "source": "order-worker", "lines": [ "ERROR: connection timeout after 30s (attempt 3/3)", "WARN: pool exhausted, queueing request (queue_depth=847)" ] } ], "metrics": [ { "name": "db_connection_pool_utilization_pct", "window": "5m", "values_summary": "Jumped from 22% to 99.7% at 14:03Z" } ], "runbook_excerpt": "Step 1: Check DB connection dashboard...", "constraints": { "max_time_minutes": 15, "environment": "production", "region": "swedencentral" } } Declaring the Hosted Agent The agent is registered with Microsoft Foundry via a declarative agent.yaml file. This tells Foundry how to discover and route requests to the container: agent.yaml kind: hosted name: oncall-copilot description: | Multi-agent hosted agent that ingests incident signals and runs 4 specialist agents concurrently via Microsoft Agent Framework ConcurrentBuilder. metadata: tags: - Azure AI AgentServer - Microsoft Agent Framework - Multi-Agent - Model Router protocols: - protocol: responses environment_variables: - name: AZURE_OPENAI_ENDPOINT value: ${AZURE_OPENAI_ENDPOINT} - name: AZURE_OPENAI_CHAT_DEPLOYMENT_NAME value: model-router The protocols: [responses] declaration exposes the agent via the Foundry Responses API on port 8088. Clients can invoke it with a standard HTTP POST no custom API needed. Invoking the Agent Once deployed, you can invoke the agent with the project’s built-in scripts or directly via curl : CLI / curl # Using the included invoke script python scripts/invoke.py --demo 2 # multi-signal SEV1 demo python scripts/invoke.py --scenario 1 # Redis cluster outage # Or with curl directly TOKEN=$(az account get-access-token \ --resource https://ai.azure.com --query accessToken -o tsv) curl -X POST \ "$AZURE_AI_PROJECT_ENDPOINT/openai/responses?api-version=2025-05-15-preview" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "input": [ {"role": "user", "content": "<incident JSON here>"} ], "agent": { "type": "agent_reference", "name": "oncall-copilot" } }' The Browser UI The project includes a zero-dependency browser UI built with plain HTML, CSS, and vanilla JavaScript—no React, no bundler. A Python http.server backend proxies requests to the Foundry endpoint. The empty state. Quick-load buttons pre-populate the JSON editor with demo incidents or scenario files. Demo 1 loaded: API Gateway 5xx spike, SEV3. The JSON is fully editable before submitting. Agent Output Panels Triage: Root causes ranked by confidence. Evidence is collapsed under each hypothesis. Triage: Immediate actions with P0/P1/P2 priority badges and owner roles. Comms: Slack card with emoji substitution and a stakeholder executive summary. PIR: Chronological timeline with an ONGOING marker, customer impact in a red-bordered box. Performance: Parallel Execution Matters Incident Type Complexity Parallel Latency Sequential (est.) Single alert, minimal context (SEV4) Low 4–6 s ~16 s Multi-signal, logs + metrics (SEV2) Medium 7–10 s ~28 s Full SEV1 with long log lines High 10–15 s ~40 s Post-incident synthesis (resolved) High 10–14 s ~38 s asyncio.gather() running four independent agents cuts total latency by 3–4× compared to sequential execution. For a SEV1 at 3 AM, that’s the difference between a 10-second AI-powered head start and a 40-second wait. Five Key Design Decisions Parallel over sequential Each agent is independent and processes the full incident payload in isolation. ConcurrentBuilder with asyncio.gather() is the right primitive—no inter-agent dependencies, no shared state. JSON-only agent instructions Every agent returns only valid JSON with a defined schema. The orchestrator merges fragments with merged.update(agent_output) . No parsing, no extraction, no post-processing. No hardcoded model names AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=model-router is the only model reference. Model Router selects the best model at runtime based on prompt complexity. When new models ship, the agent gets better for free. DefaultAzureCredential everywhere No API keys. No token management code. Managed identity in production, az login in development. Same code, both environments. Instructions as configuration Each agent’s system prompt is a plain Python string. Behaviour changes are text edits, not code logic. A non-developer can refine prompts and redeploy. Guardrails: Built into the Prompts The agent instructions include explicit guardrails that don’t require external filtering: No hallucination: When data is insufficient, the agent sets confidence: 0 and populates missing_information rather than inventing facts. Secret redaction: Each agent is instructed to redact credential-like patterns as [REDACTED] in its output. Mark unknowns: Undeterminable fields use the literal string "UNKNOWN" rather than plausible-sounding guesses. Diagnostic suggestions: When signal is sparse, immediate_actions includes diagnostic steps that gather missing information before prescribing a fix. Model Router: Automatic Model Selection One of the most powerful aspects of this architecture is Model Router. Instead of choosing between gpt-4o , gpt-4o-mini , or o3-mini per agent, you deploy a single model-router endpoint. Model Router analyses each request’s complexity and routes it to the most cost-effective model that can handle it. Model Router insights: models selected per request with associated costs. Model Router telemetry from Microsoft Foundry: request distribution and cost analysis. This means you get optimal cost-performance without writing any model-selection logic. A simple Summary Agent prompt may route to gpt-4o-mini , while a complex Triage Agent prompt with 800 lines of logs routes to gpt-4o all automatically. Deployment: One Command The repo includes both azure.yaml and agent.yaml , so deployment is a single command: Deploy to Foundry # Deploy everything: infra + container + Model Router + Hosted Agent azd up This provisions the Foundry project resources, builds the Docker image, pushes to Azure Container Registry, deploys a Model Router instance, and creates the Hosted Agent. For more control, you can use the SDK deploy script: Manual Docker + SDK deploy # Build and push (must be linux/amd64) docker build --platform linux/amd64 -t oncall-copilot:v1 . docker tag oncall-copilot:v1 $ACR_IMAGE docker push $ACR_IMAGE # Create the hosted agent python scripts/deploy_sdk.py Getting Started Quickstart # Clone git clone https://github.com/microsoft-foundry/oncall-copilot cd oncall-copilot # Install python -m venv .venv source .venv/bin/activate # .venv\Scripts\activate on Windows pip install -r requirements.txt # Set environment variables export AZURE_OPENAI_ENDPOINT="https://<account>.openai.azure.com/" export AZURE_OPENAI_CHAT_DEPLOYMENT_NAME="model-router" export AZURE_AI_PROJECT_ENDPOINT="https://<account>.services.ai.azure.com/api/projects/<project>" # Validate schemas locally (no Azure needed) MOCK_MODE=true python scripts/validate.py # Deploy to Foundry azd up # Invoke the deployed agent python scripts/invoke.py --demo 1 # Start the browser UI python ui/server.py # → http://localhost:7860 Extending: Add Your Own Agent Adding a fifth agent is straightforward. Follow this pattern: Create app/agents/<name>.py with a *_INSTRUCTIONS constant following the existing pattern. Add the agent’s output keys to app/schemas.py . Register it in main.py : main.py — Adding a 5th agent from app.agents.my_new_agent import NEW_INSTRUCTIONS new_agent = AzureOpenAIChatClient( ad_token_provider=_token_provider ).create_agent( instructions=NEW_INSTRUCTIONS, name="new-agent", ) workflow = ConcurrentBuilder().participants( [triage, summary, comms, pir, new_agent] ) Ideas for extensions: a ticket auto-creation agent that creates Jira or Azure DevOps items from the PIR output, a webhook adapter agent that normalises PagerDuty or Datadog payloads, or a human-in-the-loop agent that surfaces missing_information as an interactive form. Key Takeaways for AI Engineers The multi-agent pattern isn’t just for chatbots. Any task that can be decomposed into independent subtasks with distinct output schemas is a candidate. Incident response, document processing, code review, data pipeline validation—the pattern transfers. Microsoft Agent Framework gives you ConcurrentBuilder for parallel execution and AzureOpenAIChatClient for Azure-native auth—you write the prompts, the framework handles the plumbing. Foundry Hosted Agents let you deploy containerised agents with managed infrastructure, automatic scaling, and built-in telemetry. No Kubernetes, no custom API gateway. Model Router eliminates the model selection problem. One deployment name handles all scenarios with optimal cost-performance tradeoffs. Prompt-as-config means your agents are iterable by anyone who can edit text. The feedback loop from “this output could be better” to “deployed improvement” is minutes, not sprints. Resources Microsoft Agent Framework SDK powering the multi-agent orchestration Model Router Automatic model selection based on prompt complexity Foundry Hosted Agents Deploy containerised agents on managed infrastructure ConcurrentBuilder Samples Official agents-in-workflow sample this project follows DefaultAzureCredential Zero-config auth chain used throughout Hosted Agents Concepts Architecture overview of Foundry Hosted Agents The On-Call Copilot sample is open source under the MIT licence. Contributions, scenario files, and agent instruction improvements are welcome via pull request.Agent Hooks: Production-Grade Governance for Azure SRE Agent
Introduction Azure SRE Agent helps engineering teams automate incident response, diagnostics, and remediation tasks. But when you're giving an agent access to production systems—your databases, your Kubernetes clusters, your cloud resources—you need more than just automation. You need governance. Today, we're diving deep into Agent Hooks, the built-in governance framework in Azure SRE Agent that lets you enforce quality standards, prevent dangerous operations, and maintain audit trails without writing custom middleware or proxies. Agent Hooks work by intercepting your SRE Agent at critical execution points—before it responds to users (Stop hooks) or after it executes tools (PostToolUse hooks). You define the rules once in your custom agent configuration, and the SRE Agent runtime enforces them automatically across every conversation thread. In this post, we'll show you how to configure Agent Hooks for a real production scenario: diagnosing and remediating PostgreSQL connection pool exhaustion while maintaining enterprise controls. The Challenge: Autonomous Remediation with Guardrails You're managing a production application backed by Azure PostgreSQL Flexible Server. Your on-call team frequently deals with connection pool exhaustion issues that cause latency spikes. You want your SRE Agent to diagnose and resolve these incidents autonomously, but you need to ensure: Quality Control: The agent provides thorough, evidence-based analysis instead of superficial guesses Safety: The agent can't accidentally execute dangerous commands, but can still perform necessary remediation Compliance: Every agent action is logged for security audits and post-mortems Without Agent Hooks, you'd need to build custom middleware, write validation logic around the SRE Agent API, or settle for manual approval workflows. With Agent Hooks, you configure these controls once in your custom agent definition and the SRE Agent platform enforces them automatically. The Scenario: PostgreSQL Connection Pool Exhaustion For our demo, we'll use a real production application (octopets-prod-web) experiencing connection pool exhaustion. When this happens: P95 latency spikes from ~120ms to 800ms+ Active connections reach the pool limit New requests get queued or fail The correct remediation is to restart the PostgreSQL Flexible Server to flush stale connections—but we want our agent to do this safely and with proper oversight. Demo Setup: Three Hooks, Three Purposes We'll configure three hooks that work together to create a robust governance framework: Hook #1: Quality Gate (Stop Hook) Ensures the agent provides structured, evidence-based responses before presenting them to users. Hook #2: Safety Guardrails (PostToolUse Hook) Blocks dangerous commands while allowing safe operations through an explicit allowlist. Hook #3: Audit Trail (Global Hook) Logs every tool execution across all agents for compliance and debugging. Step-by-Step Implementation Creating the Custom Agent First, we create a specialized subagent in the Azure SRE Agent platform called sre_analyst_agent designed for PostgreSQL diagnostics. In the Agent Canvas, we configure the agent instructions: You are an SRE agent responsible for diagnosing and remediating production issues for an application backed by an Azure PostgreSQL Flexible Server. When investigating a problem: - Use available tools to query Azure Monitor metrics, PostgreSQL logs, and connection statistics - Look for patterns: latency spikes, connection counts, error rates, CPU/memory pressure - Quantify findings with actual numbers where possible (e.g., P95 latency in ms, active connection count, error rate %) When presenting your diagnosis, structure your response with these exact sections: ## Root Cause A precise explanation of what is causing the issue. ## Evidence Specific metrics and observations that support your root cause. Include actual numbers: latency values in ms, connection counts, error rates, timestamps. ## Recommended Actions Numbered list of remediation steps ordered by priority. Be specific — include actual resource names and exact commands. When executing a fix: - Always verify the current state before acting - Confirm the fix worked by re-checking the same metrics after the action - Report before and after numbers to show impact This explicit guidance ensures the agent knows the correct remediation path. Configuring Hook #1: Quality Gate In the Agent Canvas' Hooks tab, we add our first agent-level hook—a Stop hook that fires before the SRE Agent presents its response. This hook uses the SRE Agent's own LLM to evaluate response quality: Event Type: Stop Hook Type: Prompt Activation: Always Hook Prompt: You are a quality gate for an SRE agent that investigates database and app performance issues. Review the agent's response below: $ARGUMENTS Evaluate whether the response meets ALL of the following criteria: 1. Has a "## Root Cause" section with a specific, clear explanation (not vague — must say specifically what failed, e.g., "connection pool exhaustion due to long-running queries holding connections" not just "database issue") 2. Has a "## Evidence" section that includes at least one concrete metric or data point with an actual number (e.g., "P95 latency spiked to 847ms", "active connections: 497/500", "error rate: 23% over last 15 minutes") 3. Has a "## Recommended Actions" section with numbered, specific steps (must include actual resource names or commands, not just "restart the database") If ALL three criteria are met with substantive content, respond: {"ok": true} If ANY criterion is missing, vague, or uses placeholder text, respond: {"ok": false, "reason": "Your response needs more depth before it reaches the user. Specifically: ## Root Cause must name the exact failure mechanism, ## Evidence must include real metric values with numbers (latency in ms, connection counts, error rates), ## Recommended Actions must reference actual resource names and specific commands. Go back and verify your findings."} This hook acts as an automated quality gate built directly into the SRE Agent runtime, catching superficial responses before they reach your on-call engineers. Configuring Hook #2: Safety Guardrails Our second agent-level hook is a PostToolUse hook that fires after the SRE Agent executes Bash or Python tools. This implements an allowlist pattern to control what commands can actually run in production: Event Type: PostToolUse Hook Type: Command (Python) Matcher: Bash|ExecuteShellCommand|ExecutePythonCode Activation: Always Hook Script: #!/usr/bin/env python3 import sys, json, re context = json.load(sys.stdin) tool_input = context.get('tool_input', {}) command = '' if isinstance(tool_input, dict): command = tool_input.get('command', '') or tool_input.get('code', '') # Safe allowlist — check these FIRST before any blocking logic # These are explicitly approved remediation actions for PostgreSQL issues safe_allowlist = [ r'az\s+postgres\s+flexible-server\s+restart', ] for safe_pattern in safe_allowlist: if re.search(safe_pattern, command, re.IGNORECASE): print(json.dumps({ 'decision': 'allow', 'hookSpecificOutput': { 'additionalContext': '[SAFETY] ✅ PostgreSQL server restart approved — recognized as a safe remediation action for connection pool exhaustion.' } })) sys.exit(0) # Destructive commands to block dangerous = [ (r'\baz\s+postgres\s+flexible-server\s+delete\b', 'az postgres flexible-server delete (permanent server deletion)'), (r'\baz\s+\S+\s+delete\b', 'az delete (Azure resource deletion)'), (r'\brm\s+-rf\b', 'rm -rf (recursive force delete)'), (r'\bsudo\b', 'sudo (privilege escalation)'), (r'\bdrop\s+(table|database)\b', 'DROP TABLE/DATABASE (irreversible data loss)'), (r'\btruncate\s+table\b', 'TRUNCATE TABLE (irreversible data wipe)'), (r'\bdelete\s+from\b(?!.*\bwhere\b)', 'DELETE FROM without WHERE clause (wipes entire table)'), ] for pattern, label in dangerous: if re.search(pattern, command, re.IGNORECASE): print(json.dumps({ 'decision': 'block', 'reason': f'🛑 BLOCKED: {label} is not permitted. Use safe, non-destructive alternatives. For PostgreSQL connection issues, prefer server restart or connection pool configuration changes.' })) sys.exit(0) print(json.dumps({'decision': 'allow'})) This ensures only pre-approved PostgreSQL operations can execute, preventing accidental data deletion or configuration changes. Now that we've configured both agent-level hooks, here's what our custom agent looks like in the canvas: - Overview ofsre_analyst_agent with hooks. Agent Canvas showing the sre_analyst_agent configuration with two agent-level hooks attached Configuring Hook #3: Audit Trail Finally, we create a Global hook using the Hooks management page in the Azure SRE Agent Portal. Global hooks apply across all custom agents in your organization, providing centralized governance: obal Hooks Management Page - Creating the sre_audit_trail global hook. The Global Hooks management page showing the sre_audit_trail hook configuration with event type, activation mode, matcher pattern, and Python script editor Event Type: PostToolUse Hook Type: Command (Python) Matcher: * (all tools) Activation: On-demand Hook Script: #!/usr/bin/env python3 import sys, json context = json.load(sys.stdin) tool_name = context.get('tool_name', 'unknown') agent_name = context.get('agent_name', 'unknown') succeeded = context.get('tool_succeeded', False) turn = context.get('current_turn', '?') audit = f'[AUDIT] Turn {turn} | Agent: {agent_name} | Tool: {tool_name} | Success: {succeeded}' print(audit, file=sys.stderr) print(json.dumps({ 'decision': 'allow', 'hookSpecificOutput': { 'additionalContext': audit } })) By setting this as "on-demand," your SRE engineers can toggle this hook on/off per conversation thread from the chat interface—enabling detailed audit logging during incident investigations without overwhelming logs during routine queries. Seeing Agent Hooks in Action Now let's see how these hooks work together when our SRE Agent investigates a real production incident. Activating Audit Trail Before starting our investigation, we toggle on the audit trail hook from the chat interface: - Managing hooks for this thread with sre_audit_trail activated the "Manage hooks for this thread" menu showing the sre_audit_trail global hook toggled on for this conversation This gives us visibility into every tool the agent executes during the investigation. Starting the Investigation We prompt our SRE Agent: "Can you check the octopets-prod-web application and diagnose any performance issues?" The SRE Agent begins gathering metrics from Azure Monitor, and we immediately see our audit trail hook logging each tool execution: This real-time visibility is invaluable for understanding what your SRE Agent is doing and debugging issues when things don't go as planned. Quality Gate Rejection The SRE Agent completes its initial analysis and attempts to respond. But our Stop hook intercepts it—the response doesn't meet our quality standards: - Stop hook forcing agent to provide more detailed analysisStop hook rejection message: "Your response needs more depth and specificity..." forcing the agent to re-analyze with more evidence The hook rejects the response and forces the SRE Agent to retry—gathering more evidence, querying additional metrics, and providing specific numbers. This self-correction happens automatically within the SRE Agent runtime, with no manual intervention required. Structured Final Response After re-verification, the SRE Agent presents a properly structured analysis that passes our quality gate: with Root Cause, Evidence, and Recommended Actions. Agent response showing the required structure: Root Cause section with connection pool exhaustion diagnosis, Evidence section with specific metric numbers, and Recommended Actions with the exact restart command Root Cause: Connection pool exhaustion Evidence: Specific metrics (83 active connections, P95 latency 847ms) Recommended Actions: Restart command with actual resource names This is the level of rigor we expect from production-ready agents. Safety Allowlist in Action The SRE Agent determines it needs to restart the PostgreSQL server to remediate the connection pool exhaustion. Our PostToolUse hook intercepts the command execution and validates it against our allowlist: - PostgreSQL metrics query and restart command output. Code execution output showing the PostgreSQL metrics query results and the az postgres flexible-server restart command being executed successfully Because the az postgres flexible-server restart command matches our safety allowlist pattern, the hook allows it to proceed. If the SRE Agent had attempted any unapproved operation (like DROP DATABASE or firewall rule changes), the safety hook would have blocked it immediately. The Results After the SRE Agent restarts the PostgreSQL server: P95 latency drops from 847ms back to ~120ms Active connections reset to healthy levels Application performance returns to normal But more importantly, we achieved autonomous remediation with enterprise governance: ✅ Quality assurance: Every response met our evidence standards (enforced by Stop hooks) ✅ Safety controls: Only pre-approved operations executed (enforced by PostToolUse hooks) ✅ Complete audit trail: Every tool call logged for compliance (enforced by Global hooks) ✅ Zero manual interventions: The SRE Agent self-corrected when quality standards weren't met This is the power of Agent Hooks—governance that doesn't get in the way of automation. Key Takeaways Agent Hooks bring production-grade governance to Azure SRE Agent: Layered Governance: Combine agent-level hooks for custom agent-specific controls with global hooks for organization-wide policies Fail-Safe by Default: Use allowlist patterns in PostToolUse hooks rather than denylists—explicitly permit safe operations instead of trying to block every dangerous one Self-Correcting SRE Agents: Stop hooks with quality gates create feedback loops that improve response quality without human intervention Audit Without Overhead: On-demand global hooks let your engineers toggle detailed logging only during incident investigations No Custom Middleware: All governance logic lives in your custom agent configuration—no need to build validation proxies or wrapper services Getting Started Agent Hooks are available now in the Azure SRE Agent platform. You can configure them entirely through the UI—no API calls or tokens needed: Agent-Level Hooks: Navigate to the Agent Canvas → Hooks tab and add hooks directly to your custom agent Global Hooks: Use the Hooks management page to create organization-wide policies Thread-Level Control: Toggle on-demand hooks from the chat interface using the "Manage hooks" menu Learn More Agent Hooks Documentation YAML Schema Reference Subagent Builder Guide Ready to build safer, smarter agents? Start experimenting with Agent Hooks today at sre.azure.com.244Views0likes0CommentsGetting Started with Behave: Writing Cucumber Tests in VS Code
What is Behave? Behave is a BDD test framework for Python that allows you to write tests in plain English using Given–When–Then syntax, backed by Python step definitions. Key benefits: Human‑readable test scenarios using Gherkin Strong alignment between business requirements and test automation Easy integration with CI/CD pipelines Lightweight and IDE‑friendly Prerequisites Before getting started, ensure you have the following installed: Python 3.10+ Visual Studio Code Basic understanding of Python Familiarity with BDD concepts (Given / When / Then) Steps Download the sample demo zip from github download Step 1: Create a Virtual Environment and activate it. python -m venv venv .venv\Scripts\activate Install Dependencies pip install behave requests Step 2: Install VS Code Extensions To get a first‑class experience in VS Code, install the following extensions: Python (Microsoft) Gherkin (for .feature syntax highlighting) Behave VSC (optional but recommended) The Behave VSC extension enables: Running tests directly from VS Code Step definition navigation Gherkin auto‑completion Test explorer integration Folder Structure Why This Structure? features/ – contains all Gherkin feature files steps/ – contains Python step implementations environment.py – optional hooks for setup/teardown config/configuration.py - for lifecycle hooks behave.ini – configuration file for Behave Step 3: Write Your First Feature File Feature: Login functionality Login Scenario: Successful login Given the application is running When the user enters valid credentials Then the user should see the dashboard Step 4: Writing Step Definitions from behave import given, when, then @given('the user is on the login page') def step_user_on_login_page(context): print("User navigates to login page") @when('the user enters valid credentials') def step_user_enters_credentials(context): print("User enters username and password") @then('the user should be redirected to the dashboard') def step_user_redirected(context): print("User is redirected to dashboard") Step 5: Adding Test Configuration (configuration.py) Create config/configuration.py to centralize environment-specific settings. This helps avoid hardcoding values across test files. class TestConfig: BASE_URL = "https://example.com" TIMEOUT = 30 BROWSER = "chrome" Step 6: Using Fixtures with environment.py The environment.py file is Behave’s hook mechanism. It runs before and after tests, similar to fixtures in pytest. Create features/environment.py: from config.configuration import TestConfig def before_all(context): print("Setting up test environment") context.config_data = TestConfig() def before_scenario(context, scenario): print(f"Starting scenario: {scenario.name}") def after_scenario(context, scenario): print(f"Finished scenario: {scenario.name}") def after_all(context): print("Tearing down test environment") Common Use Cases Initialize browsers or API clients Load environment variables Clean up test data Open/close DB connections Step 7: Optional Behave Configuration File Create behave.ini for execution settings. This helps during debugging by showing logs directly in the console. [behave] stdout_capture = false stderr_capture = false log_capture = false Step 8: Running Tests From the project root, run: behave To run a specific feature: behave features/login.feature Run by tag behave -t Login Best Practices ✔ Keep feature files business-readable ✔ Avoid logic in feature files ✔ Reuse steps wherever possible ✔ Centralize configs and fixtures ✔ Use tags for selective executionMicrosoft Agent Framework, Microsoft Foundry, MCP, Aspire を使った実践的な AI アプリを構築するサンプルが登場
AI エージェントを作ること自体は、以前よりも簡単になってきました。しかし、それらを実際の本番運用のアプリケーションの一部としてデプロイすること (複数のサービス、永続的な状態管理、本番向けのインフラを含めた形で運用すること) になると、途端に複雑になります。 .NET コミュニティの開発者からも、ローカル環境でもクラウドでも動作する、クラウドネイティブな実運用レベルのサンプルが見たいという声が多く寄せられていました。 その声に応え、私たちはオープンソースのサンプルアプリ『Interview Coach (面接コーチ)』を作りました。模擬就職面接を行う AI チャット web アプリです。 このサンプルでは、本番運用を想定したサービスにおいて、以下の技術がどのように組み合わさるのかを示しています: Microsoft Agent Framework Microsoft Foundry Model Context Protocol (MCP) Aspire このアプリは、実際に動作する 面接シミュレーター です。AI コーチがユーザーに対して行動面や技術面の質問を行い、最後に面接パフォーマンスのまとめをフィードバックとして提供します。 この記事では、このアプリで使用している設計パターンと、それらがどのような課題を解決するのかを紹介します。 こちらから Interview Coach デモアプリ を試すことができます。 なぜ Microsoft Agent Framework なのか? もしこれまで .NET で AI エージェントを開発してきたなら、おそらく Semantic Kernel や AutoGen、あるいはその両方を使ったことがあるでしょう。 Microsoft Agent Framework は、それらの次のステップにあたるフレームワークです。 このフレームワークは、同じチームによって開発されており、両プロジェクトをうまく統合して、1つのフレームワークにまとめたものです。 具体的には、 AutoGen のエージェント抽象化 Semantic Kernel のエンタープライズ機能 (状態管理、型安全性、ミドルウェア、テレメトリーなど) を統合し、さらにマルチエージェントのオーケストレーションのためのグラフベースのワークフローを追加しています。 .NET 開発者にとってのメリットは次のとおりです: フレームワークが1つに統合: Semantic Kernel と AutoGen のどちらを使うか悩む必要がありません。 馴染みのある開発パターン: エージェントは dependency injection、IChatClient、そして ASP.NET アプリと同じホスティングモデルを利用します。 本番運用を前提とした設計: OpenTelemetry、ミドルウェアパイプライン、Aspire との統合が最初から用意されています。 マルチエージェントのオーケストレーション: 逐次ワークフロー、並列実行、handoff パターン、グループチャットなどをサポートします。 Interview Coach は、これらの機能を単なる Hello World ではなく、実際のアプリケーションとしてまとめたサンプルです。 なぜ Microsoft Foundry なのか? AI エージェントには、単にモデルがあれば良いわけではありません。インフラも必要です。 Microsoft Foundry は、AI アプリケーションを構築・管理するための Azure のプラットフォームであり、Microsoft Agent Framework の推奨バックエンドでもあります。 Foundry を使うと、次のような機能を1つのポータルで利用できます: モデルアクセス: OpenAI、Meta、Mistral などのモデルを1つのエンドポイントから利用できるカタログ Content safety (安全性): モデレーションや個人情報(PII)検出が組み込まれており、エージェントが問題のある出力をしないように制御。 コスト最適化ルーティング: リクエストがタスクに最適なモデルへ自動的にルーティングされる 評価とファインチューニング: エージェントの品質を測定し、継続的に改善できる エンタープライズ向けガバナンス: Entra ID や Microsoft Defender による認証、アクセス制御、コンプライアンス Interview Coach では、Foundry がエージェントを動かすモデルエンドポイントを提供しています。 エージェントコードは IChatClient インターフェースを利用しているため、Foundry はあくまで設定の選択肢の 1 つですが、最初から豊富なツールが揃っている点で最も便利な選択肢です。 Interview Coach は何をするアプリなのか? Interview Coach は、模擬就職面接を行う対話型 AI です。 ユーザーが 履歴書(resume) と 応募先の職務内容(job description)を入力すると、そこから先はエージェントが面接プロセスを進めていきます。 情報収集(Intake): 履歴書と応募先の職務内容を収集します。 行動面接(Behavioral interview): あなたの経験に合わせて、 STAR メソッド (過去の行動を構造的に説明するための回答フレームワークで、Situation, Task, Action, Result の頭文字から来ている) に基づいた質問を行います。 技術面接(Technical interview): 応募する職種に応じた技術的な質問を行います。 まとめ(Summary): 面接のパフォーマンス (成績) を評価し、具体的なフィードバックを含むレビューを生成します。 ユーザーは、このシステムと Blazor の Web UI を通して対話します。 AI の回答は リアルタイムでストリーミング表示されます。 余談: Behavioral Interview とは Behavioral Interview(行動面接/行動事例面接)とは、応募者の「過去の具体的な行動」を深掘りし、その人の行動特性、スキル、考え方が企業の求める人材像と適合しているかを判断する面接手法です。 単なる知識や志望動機ではなく、「ストレスを感じた時どう対処したか」など過去の事実に基づき、将来のパフォーマンスを予測します。 アーキテクチャ概要 このアプリケーションは、複数のサービスに分割されており、すべて Aspire によってオーケストレーションされています: LLM Provider: Microsoft Foundry(推奨)を利用し、さまざまなモデルへアクセスします。 WebUI: 面接の対話を行うための Blazor ベースのチャットインターフェースです。 Agent: 面接のロジックを担うコンポーネントで、Microsoft Agent Framework 上で構築されています。 MarkItDown MCP Server: Microsoft の MarkItDown (なんでも Markdown にしてくれる Python ライブラリ) を利用し、履歴書(PDF や DOCX)を Markdown 形式に変換して解析します。 InterviewData MCP Server: .NET で実装された MCP サーバーで、面接セッションのデータを SQLite に保存します。 Aspire は、サービスディスカバリ、ヘルスチェック、テレメトリーを管理します。 各コンポーネントは 独立したプロセスとして実行され、1 つのコマンドでアプリケーション全体を起動できます。 パターン 1 : マルチエージェントによるハンドオフ このサンプルが特に興味深いのは、 ハンドオフ (handoff) パターン を採用している点です。 1 つのエージェントがすべてを処理するのではなく、面接のプロセスを 5 つの専門エージェントに分割しています: Agent 役割 Tools Triage (トリアージ) メッセージを適切な担当エージェントへ振り分ける なし(ルーティングのみ) Receptionist (受付) セッションを作成し、履歴書と職務内容を収集 MarkItDown + InterviewData Behavioral Interviewer (行動面接官) STAR メソッドを用いた行動面接を実施 InterviewData Technical Interviewer (技術面接官) 職種に応じた技術面接の質問を行う InterviewData Summarizer (サマリー生成) 面接の最終的なサマリーを生成 InterviewData ハンドオフパターンでは、あるエージェントが会話の制御を次のエージェントに完全に引き渡します。 引き継いだエージェントが、その後の会話をすべて担当します。 これは 「agent-as-tools」パターンとは異なります。 (agent-as-tools では、メインのエージェントが他のエージェントを補助ツールとして呼び出しますが、会話の制御自体はメインエージェントが保持します。) 以下は、このハンドオフワークフローの構成例です: var workflow = AgentWorkflowBuilder .CreateHandoffBuilderWith(triageAgent) .WithHandoffs(triageAgent, [receptionistAgent, behaviouralAgent, technicalAgent, summariserAgent]) .WithHandoffs(receptionistAgent, [behaviouralAgent, triageAgent]) .WithHandoffs(behaviouralAgent, [technicalAgent, triageAgent]) .WithHandoffs(technicalAgent, [summariserAgent, triageAgent]) .WithHandoff(summariserAgent, triageAgent) .Build(); 通常の処理フロー(happy path)は次の順序で進みます。 Receptionist → Behavioral → Technical → Summarizer それぞれの専門エージェントが、次のエージェントへ直接ハンドオフします。 もし想定外の状況が発生した場合は、エージェントは Triage エージェントへ戻り、適切なルーティングを再度行います。 なお、このサンプルには 単一エージェントモードも用意されており、よりシンプルな構成でのデプロイも可能です。 これにより、単一エージェントとマルチエージェントのアプローチを比較することができます。 パターン2: ツール統合のための MCP このプロジェクトでは、ツールはエージェントの内部に実装されていません。 それぞれが 独立した MCP(Model Context Protocol)サーバーとして提供されています。 例えば、MarkItDown サーバーはこのプロジェクトだけでなく、まったく別のエージェントプロジェクトでも再利用できます。また、ツール開発チームはエージェント開発チームとは独立してツールをリリースすることが可能です。 MCP は言語非依存(language-agnostic)であることも特徴です。 そのため、このサンプルでは MarkItDown が Python サーバーとして動作し、エージェントは .NET で実装されています。 エージェントは起動時に MCP クライアントを通じてツールを検出し、必要なエージェントにそれらを渡します。 var receptionistAgent = new ChatClientAgent( chatClient: chatClient, name: "receptionist", instructions: "You are the Receptionist. Set up sessions and collect documents...", tools: [.. markitdownTools, .. interviewDataTools]); 各エージェントには、必要なツールだけが割り当てられます: Triage エージェント:ツールなし(ルーティングのみを担当) インタビュアーエージェント:セッションデータへのアクセス Receptionist エージェント:ドキュメント解析 + セッションアクセス これは 最小権限の原則(principle of least privilege) に基づいた設計です。 パターン3: Aspire によるオーケストレーション Aspire は、アプリケーション全体をまとめて管理する役割を担います。 アプリホストはサービスのトポロジー(構成)を定義し、 どのサービスが存在するのか それぞれがどのように依存しているのか どの設定を受け取るのか を管理します。 これにより、次のような機能が利用できます: Service discovery. サービスは固定の URL ではなく、サービス名で互いを見つけることができます。 Health checks. Aspire ダッシュボードで、各コンポーネントの状態を確認できます。 Distributed tracing. 共通のサービス設定を通じて OpenTelemetry が組み込まれます。 One-command startup. aspire run --file ./apphost.cs を実行するだけで、すべてのサービスが起動します。 デプロイ時には、azd up を実行することで、アプリケーション全体が Azure Container Apps にデプロイされます。 始めてみよう 事前準備 .NET 10 SDK 以降 Azure サブスクリプション Microsoft Foundry project Docker Desktop またはその他のコンテナランタイム ローカルで実行する git clone https://github.com/Azure-Samples/interview-coach-agent-framework.git cd interview-coach-agent-framework # Configure credentials dotnet user-secrets --file ./apphost.cs set MicrosoftFoundry:Project:Endpoint "<your-endpoint>" dotnet user-secrets --file ./apphost.cs set MicrosoftFoundry:Project:ApiKey "<your-key>" # Start all services aspire run --file ./apphost.cs Aspire Dashboard を開き、すべてのサービスの状態が Running になるまで待ちます。 その後、WebUI のエンドポイントをクリックすると、模擬面接を開始できます。 以下は、ハンドオフパターンがどのように動作するかを DevUI 上で可視化したものです。 このチャット UI を使って、面接候補者としてエージェントと対話することができます。 Azure にデプロイする azd auth login azd up Tこれだけで完了です。 残りの処理は Aspire と azd が自動で実行します。 デプロイとテストが完了したら、次のコマンドを実行することで、作成されたすべてのリソースを安全に削除できます。 azd down --force --purge このサンプルから学べること Interview Coach を実際に試すことで、次のような内容を理解できます: Microsoft Foundry を モデルバックエンドとして利用する方法 Microsoft Agent Framework を使った 単一エージェントおよびマルチエージェントシステムの構築 ハンドオフによるオーケストレーションを用いて、ワークフローを専門エージェントに分割する方法 エージェントコードとは独立した MCP ツールサーバーの作成と利用 Aspire を使った 複数サービスからなるアプリケーションのオーケストレーション 一貫性のある構造化された振る舞いを生み出すプロンプト設計 azd up を使った アプリケーション全体のデプロイ方法 試してみよう 完全なソースコードは GitHub で公開されています: Azure-Samples/interview-coach-agent-framework Microsoft Agent Framework を初めて使う場合は、まず次の資料から始めることをおすすめします。 framework documentation Hello World sample. その後、このサンプルに戻ってくると、これらの要素がより大きなプロジェクトの中でどのように組み合わさるのかが理解できるでしょう。 もしこれらのパターンを使って何か作った場合は、ぜひ Issue を作成して 教えてください。 次は? (What's Next?) 我々は、現在、次のような さらなる統合シナリオにも取り組んでいます: Microsoft Foundry Agent Service GitHub Copilot A2A などなど。 これらの機能がリリースされ次第、このサンプルも随時アップデートしていく予定です。 Resources Microsoft Agent Framework ドキュメント Introducing Microsoft Agent Framework preview Microsoft Agent Framework Reaches Release Candidate Microsoft Foundry ドキュメント Microsoft Foundry Agent Service Microsoft Foundry Portal Microsoft.Extensions.AI Model Context Protocol specification Aspire ドキュメント ASP.NET BlazorPhi-4-Reasoning-Vision-15B: Use Cases In-Depth
Phi-4-Reasoning-vision-15B is Microsoft's latest vision reasoning model released on Microsoft Foundry. It combines high-resolution visual perception with selective, task-aware reasoning, making it the first model in the Phi-4 family to simultaneously achieve both "seeing clearly" and "thinking deeply" as a small language model (SLM). Traditional vision models only perform passive perception — recognizing "what's in" an image. Phi-4-Reasoning-Vision-15B goes further by performing structured, multi-step reasoning: understanding visual structure in images, connecting it with textual context, and reaching actionable conclusions. This enables developers to build intelligent applications ranging from chart analysis to GUI automation. Core Design Features 2.1 Selective Reasoning The model's most critical design feature is its hybrid reasoning behavior. It can switch between "reasoning mode" and "non-reasoning mode" based on the prompt: When deep reasoning is needed (e.g., math problems, logical analysis) → Multi-step reasoning chain is activated When fast perception is sufficient (e.g., OCR, element localization) → Direct output with reduced latency 2.2 Three Thinking Modes (from Notebook Examples) Developers can precisely control reasoning behavior via the thinking_mode parameter: Mode Trigger Description Best For hybrid (Mixed) Default Model autonomously decides whether deep reasoning is needed General use, balancing speed and accuracy think (Deep Thinking) Appends <think> token Forces full reasoning chain Complex math / science / logic problems nothink (Fast Response) Appends <nothink> token Skips reasoning chain, outputs directly Low-latency perception tasks, simple Q&A The corresponding code implementation: def run_inference(processor, model, prompt, image, thinking_mode="hybrid"): ## FORM MESSAGE AND LOAD IMAGE messages = [ { "role": "user", "content": prompt, } ] ## PROCESS INPUTS prompt = processor.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, return_dict=False, ) if thinking_mode == "think": prompt = str(prompt) + "<think>" elif thinking_mode == "nothink": prompt = str(prompt) + "<|dummy_84|>" print(f"Prompt: {prompt}") inputs = processor(text=prompt, images=[image], return_tensors="pt").to(model.device) ## GENERATE RESPONSE output_ids = model.generate( **inputs, max_new_tokens=1024, temperature=None, top_p=None, do_sample=False, use_cache=False, ) ## DECODE RESPONSE sequence_length = inputs["input_ids"].shape[1] sequence_length -= 1 if thinking_mode == "think" else 0 # remove the extra token for nothink mode new_output_ids = output_ids[:, sequence_length:] model_output = processor.batch_decode( new_output_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False )[0] return model_output This design allows developers to dynamically balance latency and accuracy at runtime — essential for real-time interactive applications. Key Use Cases Use Case 1: GUI Agents (Computer Use Agents) This is one of the model's most important application areas.The model receives a screenshot and a natural language instruction, then outputs the normalized bounding box coordinates for the target UI element. The Notebook also provides a plot_boxes() visualization function that compares model predictions (red box) against ground truth annotations (green box). Real-World Example — E-Commerce Shopping Agent: As described in the official documentation, in retail scenarios the model serves as the perception layer for computer-use agents: Screen comprehension: Identifies products, prices, filters, promotions, buttons, and cart states Grounded output: Produces actionable coordinates for upstream agent models (e.g., Fara-7B) to execute clicks, scrolls, and other interactions Real-time decision support: Compact model size and low-latency inference, suitable for navigating dense product listings and comparing options Use Case 2: Mathematical and Scientific Visual Reasoning Typical applications: Interpreting geometric figures and function graphs for problem-solving Analyzing scientific experiment diagrams and data charts Education: Students photograph and upload problems; the model shows the complete reasoning process and solution steps Use Case 3: Document, Chart, and Table Understanding Typical applications: IT Operations: Interpreting monitoring dashboards, performance charts, and incident reports to assist diagnosis and decision-making Financial Analysis: Extracting metrics from report screenshots and interpreting trends Enterprise Report Automation: Processing scanned documents and tables to generate structured summaries Samples 1. Using Phi-4-Reasoning-Vision-15B to detect jaywalking Go to - Sample Code 2. Using Phi-4-Reasoning-Vision-15B to math Go to - Sample Code 3. Using Phi-4-Reasoning-Vision-15B for GUI Agent Go to - Sample Code Model Comparison at a Glance Below is a comparison of Phi-4-Reasoning-Vision-15B against comparable models on key tasks: No Thinking Mode Thinking Mode Phi-4-Reasoning-Vision-15B shows clear advantages in math reasoning and GUI grounding tasks while remaining competitive in general multimodal understanding. Summary Phi-4-Reasoning-Vision-15B represents a significant milestone for small vision reasoning models: Sees clearly: High-resolution visual perception supporting documents, charts, UI screenshots, and more Thinks deeply: Selective multi-step reasoning chains that rival larger models on complex tasks Runs fast: 15B parameters + NoThink mode, suitable for real-time interactive applications Adapts flexibly: Three thinking modes switchable on the fly, letting developers dynamically balance accuracy and latency at runtime Whether building e-commerce shopping agents, IT operations assistants, or educational tutoring tools, this model provides a complete capability chain from "seeing" to "understanding" to "acting." Resources 1. Read official Blog - Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model 2. Learn more about Phi-4-reasoning-vision in Huggingface - https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B 3. Learn more about Microsoft Phi Family - Microsoft Phi CookBook526Views0likes0CommentsGiving Your AI Agents Reliable Skills with the Agent Skills SDK
AI agents are becoming increasingly capable, but they often do not have the context they need to do real work reliably. Your agent can reason well, but it does not actually know how to do the specific things your team needs it to do. For example, it cannot follow your company's incident response playbook, it does not know your escalation policy, and it has no idea how to page the on-call engineer at 3 AM. There are many ways to close this gap, from RAG to custom tool implementations. Agent Skills is one approach that stands out because it is designed around portability and progressive disclosure, keeping context window usage minimal while giving agents access to deep expertise on demand. What is Agent Skills? Agent Skills is an open format for giving agents new capabilities and expertise. The format was originally developed by Anthropic and released as an open standard. It is now supported by a growing list of agent products including Claude Code, VS Code, GitHub, OpenAI Codex, Cursor, Gemini CLI, and many others. As defined in the spec, a skill is a folder on disk containing a SKILL.md file with metadata and instructions, plus optional scripts, references, and assets: incident-response/ SKILL.md # Required: instructions + metadata references/ # Optional: additional documentation severity-levels.md escalation-policy.md scripts/ # Optional: executable code page-oncall.sh assets/ # Optional: templates, diagrams, data files The SKILL.md file has YAML frontmatter with a name and description (so agents know when the skill is relevant), followed by markdown instructions that tell the agent how to perform the task. The format is intentionally simple: self-documenting, extensible, and portable. What makes this design practical is progressive disclosure. The spec is built around the idea that agents should not load everything at once. It works in three stages: Discovery: At startup, agents load only the name and description of each available skill, just enough to know when it might be relevant. Activation: When a task matches a skill's description, the agent reads the full SKILL.md instructions into context. Execution: The agent follows the instructions, optionally loading referenced files or executing bundled scripts as needed. This keeps agents fast while giving them access to deep context on demand. The format is well-designed and widely adopted, but if you want to use skills from your own agents, there is a gap between the spec and a working implementation. The Agent Skills SDK Conceptually, a skill is more than a folder. It is a unit of expertise: a name, a description, a body of instructions, and a set of supporting resources. The file layout is one way to represent that, but there is nothing about the concept that requires a filesystem. The Agent Skills SDK is an open-source Python library built around that idea, treating skills as abstract units of expertise that can be stored anywhere and consumed by any agent framework. It does this by addressing two challenges that come up when you try to use the format from your own agents. The first is where skills live. The spec defines skills as folders on disk, and the tools that support the format today all assume skills are local files. Files are inherently portable, and that is one of the format's strengths. But in the real world, not every team can or wants to serve skills from the filesystem. Maybe your team keeps them in an S3 bucket. Maybe they are in Azure Blob Storage behind your CDN. Maybe they live in a database alongside the rest of your application data. At the moment, if your skills are not on the local filesystem, you are on your own. The SDK changes where skills are served from, not how they are authored. The content and format stay the same regardless of the storage backend, so skills remain portable across providers. The second is how agents consume them. The spec defines the progressive disclosure pattern but actually implementing it in your agent requires real work. You need to figure out how to validate skills against the spec, generate a catalog for the system prompt, expose the right tools for on-demand content retrieval, and handle the back-and-forth of the agent requesting metadata, then the body, then individual references or scripts. That is a lot of plumbing regardless of where the skills are stored, and the work multiplies if you want to support more than one agent framework. The SDK solves both by separating where skills come from (providers) from how agents use them (integrations), so you can mix and match freely. Load skills from the filesystem today, move them to an HTTP server tomorrow, swap in a custom database provider next month, and your agent code does not change at all. How the SDK works The SDK is a set of Python packages organized around two ideas: storage-agnostic providers and progressive disclosure. The provider abstraction means your skills can live anywhere. The SDK ships with providers for the local filesystem and static HTTP servers, but the SkillProvider interface is simple enough that you can write your own in a few methods. A Cosmos DB provider, a Git provider, a SharePoint provider, whatever makes sense for your team. The rest of the SDK does not care where the data comes from. On top of that, the SDK implements the progressive disclosure pattern from the spec as a set of tools that any LLM agent can use. At startup, the SDK generates a skills catalog containing each skill's name and description. Your agent injects this catalog into its system prompt so it knows what is available. Then, during a conversation, the agent calls tools to retrieve content on demand, following the same discovery-activation-execution flow the spec describes. Here is the flow in practice: You register skills from any source (local files, an HTTP server, your own database). The SDK generates a catalog and tool usage instructions, which you inject into the system prompt. The agent calls tools to retrieve content on demand. This matters because context windows are finite. An incident response skill might have a main body, three reference documents, two scripts, and a flowchart. The agent should not load all of that upfront. It should read the body first, then pull the escalation policy only when the conversation actually gets to escalation. A quick example Here is what it looks like in practice. Start by loading a skill from the filesystem: from pathlib import Path from agentskills_core import SkillRegistry from agentskills_fs import LocalFileSystemSkillProvider provider = LocalFileSystemSkillProvider(Path("my-skills")) registry = SkillRegistry() await registry.register("incident-response", provider) Now wire it into a LangChain agent: from langchain.agents import create_agent from agentskills_langchain import get_tools, get_tools_usage_instructions tools = get_tools(registry) skills_catalog = await registry.get_skills_catalog(format="xml") tool_usage_instructions = get_tools_usage_instructions() system_prompt = ( "You are an SRE assistant. Use the available skill tools to look up " "incident response procedures, severity definitions, and escalation " "policies. Always cite which reference document you used.\n\n" f"{skills_catalog}\n\n" f"{tool_usage_instructions}" ) agent = create_agent( llm, tools, system_prompt=system_prompt, ) That is it. The agent now knows what skills are available and has tools to fetch their content. When a user asks "How do I handle a SEV1 incident?", the agent will call get_skill_body to read the instructions, then get_skill_reference to pull the severity levels document, all without you writing any of that retrieval logic. The same pattern works with Microsoft Agent Framework: from agentskills_agentframework import get_tools, get_tools_usage_instructions tools = get_tools(registry) skills_catalog = await registry.get_skills_catalog(format="xml") tool_usage_instructions = get_tools_usage_instructions() system_prompt = ( "You are an SRE assistant. Use the available skill tools to look up " "incident response procedures, severity definitions, and escalation " "policies. Always cite which reference document you used.\n\n" f"{skills_catalog}\n\n" f"{tool_usage_instructions}" ) agent = Agent( client=client, instructions=system_prompt, tools=tools, ) What is in the SDK The SDK is split into small, composable packages so you only install what you need: agentskills-core handles registration, validation, the skills catalog, and the progressive disclosure API. It also defines the SkillProvider interface that all providers implement. agentskills-fs and agentskills-http are the two built-in providers. The filesystem provider loads skills from local directories. The HTTP provider loads them from any static file host: S3, Azure Blob Storage, GitHub Pages, a CDN, or anything that serves files over HTTP. agentskills-langchain and agentskills-agentframework generate framework-native tools and tool usage instructions from a skill registry. agentskills-mcp-server spins up an MCP server that exposes skill tool access and usage as tools and resources, so any MCP-compatible client can use them. Because providers and integrations are separate packages, you can combine them however you want. Use the filesystem provider during development, switch to the HTTP provider in production, or write a custom provider that reads skills from your own database. The integration layer does not need to know or care. Where to go from here The full source, working examples, and detailed API docs are on GitHub: github.com/pratikxpanda/agentskills-sdk The repo includes end-to-end examples for both LangChain and Microsoft Agent Framework, covering filesystem providers, HTTP providers, and MCP. There is also a sample incident-response skill you can use to try things out. A proposal to contribute this SDK to the official agentskills repository has been submitted. If you find it useful, feel free to show your support on the GitHub issue. To learn more about the Agent Skills format itself: What are skills? covers the format and why it matters. Specification is the complete format reference for SKILL.md files. Integrate skills explains how to add skills support to your agent. Example skills on GitHub are a good starting point for writing your own. The SDK is MIT licensed and contributions are welcome. If you have questions or ideas, post a question here or open an issue on the repo.Building MCP Apps with Azure Functions MCP Extension
Today, we are thrilled to announce the release of MCP App support in the Azure Functions MCP (Model Context Protocol) extension! You can now build MCP Apps using the Functions MCP Extension in Python, TypeScript, and .NET. What are MCP Apps Until now, MCP has primarily been a way for AI agents to “talk” to data and tools. A tool would take an input, perform a task, and return a text response. While powerful, text has limits. For example, it’s easier to see a chart than to read a long list of data points. It’s also more convenient and accurate to provide complex inputs via a form than a series of text responses. MCP Apps addresses the limits by allowing MCP servers to return interactive HTML interfaces that render directly in the conversation. The following scenarios shed light into how the UI capabilities of MCP Apps improve the user experience of MCP tools in ways that texts can’t: Data exploration: A sales analytics tool returns an interactive dashboard. Users filter by region, drill down into specific accounts, and export reports without leaving the conversation. Configuration wizards: A deployment tool presents a form with dependent fields. Selecting “production” reveals additional security options; selecting “staging” shows different defaults. Real-time monitoring: A server health tool shows live metrics that update as systems change. No need to re-run the tool to see current status. Building MCP Apps with Azure Functions MCP Extension Azure Functions is the ideal platform for hosting remote MCP servers because of its built-in authentication, event-driven scaling from 0 to N, and serverless billing. This ensures your agentic tools are secure, cost-effective, and ready to handle any load. How It Works: Connecting Tools to Resources Building an MCP App involves two main components: Tools: Tools are executable functions that allow an LLM to interact with external systems (e.g., querying a database or sending an email). Resources: Resources are read-only data entities (e.g., log files, API docs, or database schemas) that provide the LLM with information without triggering side effects. You connect the tools to resources via the tools’ metadata. 1. The Tool with UI Metadata The following code snippet defines an MCP tool called GetWeather using the McpToolTrigger and associated metadata using McpMetadata. The McpMetadata declares that the tool has an associated UI, telling AI clients that when this tool is invoked, there’s a specific visual component available to display the results. Example (Python): TOOL_METADATA = '{"ui": {"resourceUri": "ui://weather/index.html"}}' @app.mcp_tool(metadata=TOOL_METADATA) @app.mcp_tool_property(arg_name="location", description="City name to check weather for (e.g., Seattle, New York, Miami)") def get_weather(location: str) -> Dict[str, Any]: result = weather_service.get_current_weather(location) return json.dumps(result) Example (C#): private const string ToolMetadata = """ { "ui": { "resourceUri": "ui://weather/index.html" } } """; [Function(nameof(GetWeather))] public async Task<object> GetWeather( [McpToolTrigger(nameof(GetWeather), "Returns current weather for a location via Open-Meteo.")] [McpMetadata(ToolMetadata)] ToolInvocationContext context, [McpToolProperty("location", "City name to check weather for (e.g., Seattle, New York, Miami)")] string location) { var result = await _weatherService.GetCurrentWeatherAsync(location); return result; } 2. The Resource Serving the UI The following snippet defines an MCP resource called GetWeatherWidget, which serves the bundled HTML at the matching URI. The MimeType is set to text/html;profile=mcp-app. Note that the resource URI (ui://weather/index.html) is the same as the one specified in ToolMetadata from above. Example (Python): RESOURCE_METADATA = '{"ui": {"prefersBorder": true}}' WEATHER_WIDGET_URI = "ui://weather/index.html" WEATHER_WIDGET_NAME = "Weather Widget" WEATHER_WIDGET_DESCRIPTION = "Interactive weather display for MCP Apps" WEATHER_WIDGET_MIME_TYPE = "text/html;profile=mcp-app" @app.mcp_resource_trigger( arg_name="context", uri=WEATHER_WIDGET_URI, resource_name=WEATHER_WIDGET_NAME, description=WEATHER_WIDGET_DESCRIPTION, mime_type=WEATHER_WIDGET_MIME_TYPE, metadata=RESOURCE_METADATA ) def get_weather_widget(context) -> str: # Get the path to the widget HTML file current_dir = Path(__file__).parent file_path = current_dir / "app" / "dist" / "index.html" return file_path.read_text(encoding="utf-8") Example (C#): // Optional UI metadata private const string ResourceMetadata = """ { "ui": { "prefersBorder": true } } """; [Function(nameof(GetWeatherWidget))] public string GetWeatherWidget( [McpResourceTrigger( "ui://weather/index.html", "Weather Widget", MimeType = "text/html;profile=mcp-app", Description = "Interactive weather display for MCP Apps")] [McpMetadata(ResourceMetadata)] ResourceInvocationContext context) { var file = Path.Combine(AppContext.BaseDirectory, "app", "dist", "index.html"); return File.ReadAllText(file); } See quickstarts in Getting Started section for full sample code. 3. Putting It All Together User asks: “What’s the weather in Seattle?” Agent calls the GetWeathertool. The tool returns weather data (as a normal tool result). The tool also includes ui.resourceUri metadata (ui://weather/index.html) telling the client an interactive UI is available. The client fetches the UI resource from ui://weather/index.html and loads it in a sandboxed iframe. The client passes the tool result to the UI app. User sees an interactive weather widget instead of plain text Get Started You can start building today using our samples. Each sample demonstrates how to define tools that trigger interactive UI components: Python quickstart TypeScript quickstart .NET quickstart Documentation Learn more about the Azure Functions MCP extension. Learn more about MCP Apps. Next Step: Authentication The samples above secure the MCP Apps using access keys. Learn how to secure the apps using Microsoft Entra and the built-in MCP auth feature.7.4KViews1like0Comments