Blog Post

Microsoft Developer Community Blog
11 MIN READ

Building a Multi-Agent On-Call Copilot with Microsoft Agent Framework

Lee_Stott's avatar
Lee_Stott
Icon for Microsoft rankMicrosoft
Mar 12, 2026

Hosted with Microsoft Agent Services Hosted Agents

Four AI agents, one incident payload, structured triage in under 60 seconds  powered by Microsoft Agent Framework and Foundry Hosted Agents.

Multi-Agent Microsoft Agent Framework Foundry Hosted Agents Python SRE / Incident Response

When an incident fires at 3 AM, every second the on-call engineer spends piecing together alerts, logs, and metrics is a second not spent fixing the problem. What if an AI system could ingest the raw incident signals and hand you a structured triage, a Slack update, a stakeholder brief, and a draft post-incident report, all in under 10 seconds?

That’s exactly what On-Call Copilot does. In this post, we’ll walk through how we built it using the Microsoft Agent Framework, deployed it as a Foundry Hosted Agent, and discuss the key design decisions that make multi-agent orchestration practical for production workloads.

The full source code is open-source on GitHub. You can deploy your own instance with a single azd up.

Why Multi-Agent? The Problem with Single-Prompt Triage

Early AI incident assistants used a single large prompt: “Here is the incident. Give me root causes, actions, a Slack message, and a post-incident report.” This approach has two fundamental problems:

  1. Context overload. A real incident may have 800 lines of logs, 10 alert lines, and dense metrics. Asking one model to process everything and produce four distinct output formats in a single turn pushes token limits and degrades quality.
  2. Conflicting concerns. Triage reasoning and communication drafting are cognitively different tasks. A model optimised for structured JSON analysis often produces stilted Slack messages—and vice versa.

The fix is specialisation: decompose the task into focused agents, give each agent a narrow instruction set, and run them in parallel. This is the core pattern that the Microsoft Agent Framework makes easy.

Architecture: Four Agents Running Concurrently

On-Call Copilot is deployed as a Foundry Hosted Agent—a containerised Python service running on Microsoft Foundry’s managed infrastructure. The core orchestrator uses ConcurrentBuilder from the Microsoft Agent Framework SDK to run four specialist agents in parallel via asyncio.gather().

On-Call Copilot results: all four agent panels showing Triage, Summary, Comms, and PIR output
All four panels populated simultaneously: Triage (red), Summary (blue), Comms (green), PIR (purple).

Architecture: The orchestrator runs four specialist agents concurrently via asyncio.gather(), then merges their JSON fragments into a single response.

Architecture diagram: ConcurrentBuilder orchestrator running Triage, Summary, Comms, and PIR agents in parallel via asyncio.gather()
All four agents The solution share a single Azure OpenAI Model Router deployment. Rather than hardcoding gpt-4o or gpt-4o-mini, Model Router analyses request complexity and routes automatically. A simple triage prompt costs less; a long post-incident synthesis uses a more capable model. One deployment name, zero model-selection code.

Meet the Four Agents

🔍 Triage Agent

Root cause analysis, immediate actions, missing data identification, and runbook alignment.

suspected_root_causes · immediate_actions · missing_information · runbook_alignment

📋 Summary Agent

Concise incident narrative: what happened and current status (ONGOING / MITIGATED / RESOLVED).

summary.what_happened · summary.current_status

📢 Comms Agent

Audience-appropriate communications: Slack channel update with emoji conventions, plus a non-technical stakeholder brief.

comms.slack_update · comms.stakeholder_update

📝 PIR Agent

Post-incident report: chronological timeline, quantified customer impact, and specific prevention actions.

post_incident_report.timeline · .customer_impact · .prevention_actions

The Code: Building the Orchestrator

The entry point is remarkably concise. ConcurrentBuilder handles all the async wiring—you just declare the agents and let the framework handle parallelism, error propagation, and response merging.

main.py — Orchestrator
from agent_framework import ConcurrentBuilder
from agent_framework.azure import AzureOpenAIChatClient
from azure.ai.agentserver.agentframework import from_agent_framework
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

from app.agents.triage import TRIAGE_INSTRUCTIONS
from app.agents.comms import COMMS_INSTRUCTIONS
from app.agents.pir import PIR_INSTRUCTIONS
from app.agents.summary import SUMMARY_INSTRUCTIONS

_credential = DefaultAzureCredential()
_token_provider = get_bearer_token_provider(
    _credential, "https://cognitiveservices.azure.com/.default"
)

def create_workflow_builder():
    """Create 4 specialist agents and wire them into a ConcurrentBuilder."""
    triage = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent(
        instructions=TRIAGE_INSTRUCTIONS, name="triage-agent",
    )
    summary = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent(
        instructions=SUMMARY_INSTRUCTIONS, name="summary-agent",
    )
    comms = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent(
        instructions=COMMS_INSTRUCTIONS, name="comms-agent",
    )
    pir = AzureOpenAIChatClient(ad_token_provider=_token_provider).create_agent(
        instructions=PIR_INSTRUCTIONS, name="pir-agent",
    )
    return ConcurrentBuilder().participants([triage, summary, comms, pir])

def main():
    builder = create_workflow_builder()
    from_agent_framework(builder.build).run()  # starts on port 8088

if __name__ == "__main__":
    main()

 

Key insight: DefaultAzureCredential means there are no API keys anywhere in the codebase. The container uses managed identity in production; local development uses your az login session. The same code runs in both environments without modification.

Agent Instructions: Prompts as Configuration

Each agent receives a tightly scoped system prompt that defines its output schema and guardrails. Here’s the Triage Agent—the most complex of the four:

app/agents/triage.py
 
TRIAGE_INSTRUCTIONS = """\
You are the **Triage Agent**, an expert Site Reliability Engineer
specialising in root cause analysis and incident response.

## Task
Analyse the incident data and return a single JSON object with ONLY these keys:

{
  "suspected_root_causes": [
    {
      "hypothesis": "string – concise root cause hypothesis",
      "evidence": ["string – supporting evidence from the input"],
      "confidence": 0.0  // 0-1, how confident you are
    }
  ],
  "immediate_actions": [
    {
      "step": "string – concrete action with runnable command if applicable",
      "owner_role": "oncall-eng | dba | infra-eng | platform-eng",
      "priority": "P0 | P1 | P2 | P3"
    }
  ],
  "missing_information": [
    {
      "question": "string – what data is missing",
      "why_it_matters": "string – why this data would help"
    }
  ],
  "runbook_alignment": {
    "matched_steps": ["string – runbook steps that match the situation"],
    "gaps": ["string – gaps or missing runbook coverage"]
  }
}

## Guardrails
1. **No secrets** – redact any credential-like material as [REDACTED].
2. **No hallucination** – if data is insufficient, set confidence to 0
   and add entries to missing_information.
3. **Diagnostic suggestions** – when data is sparse, include diagnostic
   steps in immediate_actions.
4. **Structured output only** – return ONLY valid JSON, no prose.
"""
  

The Comms Agent follows the same pattern but targets a different audience:

app/agents/comms.py
 
COMMS_INSTRUCTIONS = """\
You are the **Comms Agent**, an expert incident communications writer.

## Task
Return a single JSON object with ONLY this key:

{
  "comms": {
    "slack_update": "Slack-formatted message with emoji, severity,
                     status, impact, next steps, and ETA",
    "stakeholder_update": "Non-technical summary for executives.
                           Focus on business impact and resolution."
  }
}

## Guidelines
- Slack: Use :rotating_light: for active SEV1/2, :warning: for degraded,
  :white_check_mark: for resolved.
- Stakeholder: No jargon. Translate to business impact.
- Tone: Calm, factual, action-oriented. Never blame individuals.
- Structured output only – return ONLY valid JSON, no prose.
"""
 
Instructions as config, not code. Agent behaviour is defined entirely by instruction text strings. A non-developer can refine agent behaviour by editing the prompt and redeploying no Python changes needed.

The Incident Envelope: What Goes In

The agent accepts a single JSON envelope. It can come from a monitoring alert webhook, a PagerDuty payload, or a manual CLI invocation:

Incident Input (JSON)
{
  "incident_id": "INC-20260217-002",
  "title": "DB connection pool exhausted — checkout-api degraded",
  "severity": "SEV1",
  "timeframe": {
    "start": "2026-02-17T14:02:00Z",
    "end": null
  },
  "alerts": [
    {
      "name": "DatabaseConnectionPoolNearLimit",
      "description": "Connection pool at 99.7% on orders-db-primary",
      "timestamp": "2026-02-17T14:03:00Z"
    }
  ],
  "logs": [
    {
      "source": "order-worker",
      "lines": [
        "ERROR: connection timeout after 30s (attempt 3/3)",
        "WARN: pool exhausted, queueing request (queue_depth=847)"
      ]
    }
  ],
  "metrics": [
    {
      "name": "db_connection_pool_utilization_pct",
      "window": "5m",
      "values_summary": "Jumped from 22% to 99.7% at 14:03Z"
    }
  ],
  "runbook_excerpt": "Step 1: Check DB connection dashboard...",
  "constraints": {
    "max_time_minutes": 15,
    "environment": "production",
    "region": "swedencentral"
  }
}

 

Declaring the Hosted Agent

The agent is registered with Microsoft Foundry via a declarative agent.yaml file. This tells Foundry how to discover and route requests to the container:

agent.yaml
kind: hosted
name: oncall-copilot
description: |
  Multi-agent hosted agent that ingests incident signals
  and runs 4 specialist agents concurrently via
  Microsoft Agent Framework ConcurrentBuilder.
metadata:
  tags:
    - Azure AI AgentServer
    - Microsoft Agent Framework
    - Multi-Agent
    - Model Router
protocols:
  - protocol: responses
environment_variables:
  - name: AZURE_OPENAI_ENDPOINT
    value: ${AZURE_OPENAI_ENDPOINT}
  - name: AZURE_OPENAI_CHAT_DEPLOYMENT_NAME
    value: model-router

The protocols: [responses] declaration exposes the agent via the Foundry Responses API on port 8088. Clients can invoke it with a standard HTTP POST no custom API needed.

Invoking the Agent

Once deployed, you can invoke the agent with the project’s built-in scripts or directly via curl:

CLI / curl
# Using the included invoke script
python scripts/invoke.py --demo 2      # multi-signal SEV1 demo
python scripts/invoke.py --scenario 1  # Redis cluster outage

# Or with curl directly
TOKEN=$(az account get-access-token \
  --resource https://ai.azure.com --query accessToken -o tsv)

curl -X POST \
  "$AZURE_AI_PROJECT_ENDPOINT/openai/responses?api-version=2025-05-15-preview" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input": [
      {"role": "user", "content": "<incident JSON here>"}
    ],
    "agent": {
      "type": "agent_reference",
      "name": "oncall-copilot"
    }
  }'

 

The Browser UI

The project includes a zero-dependency browser UI built with plain HTML, CSS, and vanilla JavaScript—no React, no bundler. A Python http.server backend proxies requests to the Foundry endpoint.

On-Call Copilot UI: empty state with quick-load preset buttons
The empty state. Quick-load buttons pre-populate the JSON editor with demo incidents or scenario files.
Demo 1 loaded: API Gateway SEV3 incident JSON in the editor
Demo 1 loaded: API Gateway 5xx spike, SEV3. The JSON is fully editable before submitting.

Agent Output Panels

Triage panel showing root causes ranked by confidence with evidence
Triage: Root causes ranked by confidence. Evidence is collapsed under each hypothesis.
Triage panel: immediate actions with priority badges and owner roles
Triage: Immediate actions with P0/P1/P2 priority badges and owner roles.
Comms panel: Slack card with emoji and stakeholder executive summary
Comms: Slack card with emoji substitution and a stakeholder executive summary.
PIR panel: chronological timeline and customer impact statement
PIR: Chronological timeline with an ONGOING marker, customer impact in a red-bordered box.

Performance: Parallel Execution Matters

Incident TypeComplexityParallel LatencySequential (est.)
Single alert, minimal context (SEV4)Low4–6 s~16 s
Multi-signal, logs + metrics (SEV2)Medium7–10 s~28 s
Full SEV1 with long log linesHigh10–15 s~40 s
Post-incident synthesis (resolved)High10–14 s~38 s

asyncio.gather() running four independent agents cuts total latency by 3–4× compared to sequential execution. For a SEV1 at 3 AM, that’s the difference between a 10-second AI-powered head start and a 40-second wait.

Five Key Design Decisions

  1. Parallel over sequential Each agent is independent and processes the full incident payload in isolation. ConcurrentBuilder with asyncio.gather() is the right primitive—no inter-agent dependencies, no shared state.
  2. JSON-only agent instructions Every agent returns only valid JSON with a defined schema. The orchestrator merges fragments with merged.update(agent_output). No parsing, no extraction, no post-processing.
  3. No hardcoded model names AZURE_OPENAI_CHAT_DEPLOYMENT_NAME=model-router is the only model reference. Model Router selects the best model at runtime based on prompt complexity. When new models ship, the agent gets better for free.
  4. DefaultAzureCredential everywhere No API keys. No token management code. Managed identity in production, az login in development. Same code, both environments.
  5. Instructions as configuration Each agent’s system prompt is a plain Python string. Behaviour changes are text edits, not code logic. A non-developer can refine prompts and redeploy.

Guardrails: Built into the Prompts

The agent instructions include explicit guardrails that don’t require external filtering:

  • No hallucination: When data is insufficient, the agent sets confidence: 0 and populates missing_information rather than inventing facts.
  • Secret redaction: Each agent is instructed to redact credential-like patterns as [REDACTED] in its output.
  • Mark unknowns: Undeterminable fields use the literal string "UNKNOWN" rather than plausible-sounding guesses.
  • Diagnostic suggestions: When signal is sparse, immediate_actions includes diagnostic steps that gather missing information before prescribing a fix.

Model Router: Automatic Model Selection

One of the most powerful aspects of this architecture is Model Router. Instead of choosing between gpt-4o, gpt-4o-mini, or o3-mini per agent, you deploy a single model-router endpoint. Model Router analyses each request’s complexity and routes it to the most cost-effective model that can handle it.

Model Router showing which models were selected for each request and associated costs
Model Router insights: models selected per request with associated costs.
Model Router telemetry dashboard from Microsoft Foundry
Model Router telemetry from Microsoft Foundry: request distribution and cost analysis.

This means you get optimal cost-performance without writing any model-selection logic. A simple Summary Agent prompt may route to gpt-4o-mini, while a complex Triage Agent prompt with 800 lines of logs routes to gpt-4o all automatically.

Deployment: One Command

The repo includes both azure.yaml and agent.yaml, so deployment is a single command:

Deploy to Foundry
# Deploy everything: infra + container + Model Router + Hosted Agent
azd up
 

This provisions the Foundry project resources, builds the Docker image, pushes to Azure Container Registry, deploys a Model Router instance, and creates the Hosted Agent. For more control, you can use the SDK deploy script:

Manual Docker + SDK deploy
# Build and push (must be linux/amd64)
docker build --platform linux/amd64 -t oncall-copilot:v1 .
docker tag oncall-copilot:v1 $ACR_IMAGE
docker push $ACR_IMAGE


# Create the hosted agent
python scripts/deploy_sdk.py

Getting Started

Quickstart
# Clone
git clone https://github.com/microsoft-foundry/oncall-copilot
cd oncall-copilot

# Install
python -m venv .venv
source .venv/bin/activate  # .venv\Scripts\activate on Windows
pip install -r requirements.txt

# Set environment variables
export AZURE_OPENAI_ENDPOINT="https://<account>.openai.azure.com/"
export AZURE_OPENAI_CHAT_DEPLOYMENT_NAME="model-router"
export AZURE_AI_PROJECT_ENDPOINT="https://<account>.services.ai.azure.com/api/projects/<project>"

# Validate schemas locally (no Azure needed)
MOCK_MODE=true python scripts/validate.py

# Deploy to Foundry
azd up

# Invoke the deployed agent
python scripts/invoke.py --demo 1

# Start the browser UI
python ui/server.py   # → http://localhost:7860

 

Extending: Add Your Own Agent

Adding a fifth agent is straightforward. Follow this pattern:

  1. Create app/agents/<name>.py with a *_INSTRUCTIONS constant following the existing pattern.
  2. Add the agent’s output keys to app/schemas.py.
  3. Register it in main.py:
main.py — Adding a 5th agent
from app.agents.my_new_agent import NEW_INSTRUCTIONS

new_agent = AzureOpenAIChatClient(
    ad_token_provider=_token_provider
).create_agent(
    instructions=NEW_INSTRUCTIONS,
    name="new-agent",
)

workflow = ConcurrentBuilder().participants(
    [triage, summary, comms, pir, new_agent]
)

Ideas for extensions: a ticket auto-creation agent that creates Jira or Azure DevOps items from the PIR output, a webhook adapter agent that normalises PagerDuty or Datadog payloads, or a human-in-the-loop agent that surfaces missing_information as an interactive form.

Key Takeaways for AI Engineers

The multi-agent pattern isn’t just for chatbots. Any task that can be decomposed into independent subtasks with distinct output schemas is a candidate. Incident response, document processing, code review, data pipeline validation—the pattern transfers.
  • Microsoft Agent Framework gives you ConcurrentBuilder for parallel execution and AzureOpenAIChatClient for Azure-native auth—you write the prompts, the framework handles the plumbing.
  • Foundry Hosted Agents let you deploy containerised agents with managed infrastructure, automatic scaling, and built-in telemetry. No Kubernetes, no custom API gateway.
  • Model Router eliminates the model selection problem. One deployment name handles all scenarios with optimal cost-performance tradeoffs.
  • Prompt-as-config means your agents are iterable by anyone who can edit text. The feedback loop from “this output could be better” to “deployed improvement” is minutes, not sprints.

Resources

Updated Mar 06, 2026
Version 1.0
No CommentsBe the first to comment