Blog Post

Apps on Azure Blog
17 MIN READ

Get started with PagerDuty MCP server and PagerDuty SRE Agent in Azure SRE Agent

dbandaru's avatar
dbandaru
Icon for Microsoft rankMicrosoft
Feb 25, 2026

Connect Azure SRE Agent to PagerDuty's incident management platform using the official PagerDuty MCP server for incidents, on-call schedules, services, escalation policies, and more. Also learn how to leverage PagerDuty's built-in SRE Agent for AI-powered incident triage and resolution.

Overview

The PagerDuty MCP server is a cloud-hosted bridge between your PagerDuty account and Azure SRE Agent. Once configured, it enables real-time interaction with incidents, on-call schedules, services, teams, escalation policies, event orchestration, incident workflows, status pages, and more through natural language. All actions respect the permissions of the user account associated with the API token.

The server uses Streamable HTTP transport with a single Authorization custom header for authentication. Azure SRE Agent connects directly to the PagerDuty-hosted endpoint—no npm packages, local proxies, or container deployments are required. Since there is no dedicated PagerDuty connector type in the portal, you use the generic MCP server (User provided connector) option and configure the authorization header manually.

Key capabilities

AreaCapabilities
IncidentsCreate, list, manage incidents; add notes, responders; view alerts; find related/outlier/past incidents
ServicesCreate, list, update, and get service details
On-Call & SchedulesList on-calls, manage schedules, create overrides, list schedule users
Escalation PoliciesList and get escalation policy details
Teams & UsersCreate, update, delete teams; manage team members; list and get user data
Alert GroupingCreate, update, delete, and list alert grouping settings
Change EventsList and get change events by service or incident
Event OrchestrationManage event orchestration routers, global rules, and service rules
Incident WorkflowsList, get, and start incident workflows
Log EntriesList and get log entry details
Status PagesCreate and manage status page posts, updates, impacts, and severities

This is the official PagerDuty-hosted MCP server. It exposes 60+ tools covering incidents, services, on-call, escalation, event orchestration, incident workflows, status pages, and more. The hosted service at mcp.pagerduty.com exposes all tools (both read and write) by default. Tool availability depends on your PagerDuty plan and user account permissions.


Prerequisites

  • Azure SRE Agent resource deployed in Azure
  • PagerDuty account with an active plan
  • PagerDuty user account with appropriate permissions
  • User API Token: Created from User Profile > User Settings > API Access

Step 1: Create a PagerDuty API token

Generate the User API Token needed to authenticate with the PagerDuty MCP server. PagerDuty uses a single token for both authentication and authorization—the token inherits all permissions of the user account that creates it.

Navigate to API Access in PagerDuty

  1. Log in to your PagerDuty account
  2. For EU accounts, use https://app.eu.pagerduty.com/
  3. Select your user avatar in the top-right corner of the navigation bar
  4. Select My Profile from the dropdown menu
  5. Select the User Settings tab at the top of your profile page
  6. Scroll down to the API Access section

Create a User API Token

  1. In the API Access section, select Create API User Token
  2. Enter a descriptive name for the token (e.g., sre-agent-mcp)
  3. Select Create Token
  4. Copy the token value immediately—it is displayed only once and cannot be retrieved later

The token format will look like: u+xxxxxxxxxxxxxxxx

Store the API token securely. If you lose it, you must delete the old token and create a new one. Navigate back to My Profile > User Settings > API Access to manage your tokens.

Choose the right account for token creation

The API token inherits all permissions of the PagerDuty user account that creates it. Consider these options:

Account typeWhen to usePermissions
Personal accountQuick testing and developmentFull permissions of your user role
Service account (recommended for production)Production deploymentsCreate a dedicated PagerDuty user with a restricted role
Read-only accountMonitoring-only use casesCreate a user with the Observer or Restricted Access role

For production use, create a dedicated PagerDuty user with a Responder or Observer role (depending on whether write access is needed), then generate the token from that account. This ensures the integration continues to work if team members leave the organization and limits the blast radius of a compromised token.

PagerDuty also supports Account-level API keys (created under Integrations > Developer Tools > API Access Keys), but the MCP server requires a User API Token, not an account-level key.


Step 2: Add the MCP connector

Connect the PagerDuty MCP server to your SRE Agent using the portal. Since there is no dedicated PagerDuty connector type, you use the generic MCP server (User provided connector) option.

Determine your regional endpoint

Select the endpoint URL that matches your PagerDuty account's service region:

RegionEndpoint URL
US (default)https://mcp.pagerduty.com/mcp
EUhttps://mcp.eu.pagerduty.com/mcp

Using the Azure portal

  1. In Azure portal, navigate to your SRE Agent resource
  2. Select Builder > Connectors
  3. Select Add connector
  4. Select MCP server (User provided connector) and select Next

Select "MCP server" (User provided connector) as the connector type in the Add a connector dialog

  1. Configure the connector:
FieldValue
Namepagerduty-mcp
Connection typeStreamable-HTTP
URLhttps://mcp.pagerduty.com/mcp (use EU endpoint for EU service region)
AuthenticationCustom headers
AuthorizationToken <your-pagerduty-api-token>
  1. Select Next to review

Configure the PagerDuty MCP connector with the endpoint URL and Authorization header using Token authentication

  1. Select Add connector

The token format in the Authorization header must be Token <your-api-token> (not Bearer). For example: Token u+abcdefg123456789. Using the wrong format will result in 401 Unauthorized errors.

Once the connector shows Connected status, the PagerDuty MCP tools are automatically available to your agent. You can verify by checking the tools list in the connector details.

Connectors list showing pagerduty-mcp with Connected status


Step 3: Create a PagerDuty subagent (optional)

Create a specialized subagent to give the AI focused PagerDuty incident management expertise and better prompt responses.

  1. Navigate to Builder > Subagents
  2. Select Add subagent
  3. Paste the following YAML configuration:
api_version: azuresre.ai/v1
kind: AgentConfiguration
metadata:
  owner: your-team@contoso.com
  version: "1.0.0"
spec:
  name: PagerDutyIncidentExpert
  display_name: PagerDuty Incident Expert

  system_prompt: |
    You are a PagerDuty incident management expert with access to incidents,
    services, on-call schedules, escalation policies, teams, event orchestration,
    incident workflows, status pages, and more via the PagerDuty MCP server.

    ## Capabilities

    ### Incidents
    - List and search incidents with `list_incidents`
    - Get incident details with `get_incident`
    - Create new incidents with `create_incident`
    - Manage incidents (update status, urgency, assignment, escalation) with `manage_incidents`
    - Add notes with `add_note_to_incident` and list notes with `list_incident_notes`
    - Add responders with `add_responders`
    - View alerts from incidents with `list_alerts_from_incident` and `get_alert_from_incident`
    - Find related incidents with `get_related_incidents`
    - Find similar past incidents with `get_past_incidents`
    - Identify outlier incidents with `get_outlier_incident`

    ### Services
    - List all services with `list_services`
    - Get service details with `get_service`
    - Create new services with `create_service`
    - Update service configuration with `update_service`

    ### On-Call & Schedules
    - List current on-calls with `list_oncalls`
    - Get schedule details with `get_schedule`
    - List all schedules with `list_schedules`
    - List users in a schedule with `list_schedule_users`
    - Create and update schedules with `create_schedule` and `update_schedule`
    - Create schedule overrides with `create_schedule_override`

    ### Escalation Policies
    - List escalation policies with `list_escalation_policies`
    - Get escalation policy details with `get_escalation_policy`

    ### Teams & Users
    - List teams with `list_teams` and get team details with `get_team`
    - Create, update, and delete teams
    - Manage team members with `add_team_member` and `remove_team_member`
    - List users with `list_users` and get user data with `get_user_data`

    ### Event Orchestration
    - List and get event orchestrations
    - Manage orchestration routers, global rules, and service rules
    - Append rules to event orchestration routers

    ### Incident Workflows
    - List and get incident workflows
    - Start incident workflows with `start_incident_workflow`

    ### Status Pages
    - Create and manage status page posts and updates
    - List status page impacts, severities, and statuses

    ### Log Entries
    - List and get log entry details for audit trails

    ### Alert Grouping
    - Create, update, and manage alert grouping settings

    ### Change Events
    - List and get change events, including by service or incident

    ## Best Practices

    When investigating incidents:
    - Start with `list_incidents` to find active or recent incidents
    - Use `get_incident` for full details including status and assignments
    - Check `list_alerts_from_incident` to see triggering alerts
    - Use `get_related_incidents` to find correlated issues
    - Use `get_past_incidents` to find similar historical incidents
    - Check `list_oncalls` to identify who is currently on-call
    - Review `list_incident_notes` for any existing investigation notes

    When managing on-call:
    - Use `list_oncalls` to see current on-call assignments
    - Use `get_schedule` and `list_schedule_users` for schedule details
    - Use `create_schedule_override` for temporary coverage changes

    When handling errors:
    - If 401 errors occur, explain the token may be invalid or expired
    - If 403 errors occur, explain which permissions may be missing
    - Suggest the user verify their API token is valid and has sufficient permissions

  mcp_connectors:
    - pagerduty-mcp

  handoffs: []
  1. Select Save

The mcp_connectors field references the connector name you created in Step 2. This gives the subagent access to all tools provided by the PagerDuty MCP server.


Step 4: Add a PagerDuty skill (optional)

Skills provide contextual knowledge and best practices that help agents use tools more effectively. Create a PagerDuty skill to give your agent expertise in incident management, on-call scheduling, and escalation workflows.

  1. Navigate to Builder > Skills
  2. Select Add skill
  3. Paste the following skill configuration:
api_version: azuresre.ai/v1
kind: SkillConfiguration
metadata:
  owner: your-team@contoso.com
  version: "1.0.0"
spec:
  name: pagerduty_incident_management
  display_name: PagerDuty Incident Management

  description: |
    Expertise in PagerDuty's incident management platform including incidents,
    on-call schedules, services, teams, escalation policies, event orchestration,
    incident workflows, and status pages. Use for managing incidents, checking
    on-call status, investigating alerts, escalating issues, and navigating
    PagerDuty data via the PagerDuty MCP server.

  instructions: |
    ## Overview

    PagerDuty is an incident management and on-call scheduling platform for
    operations teams. The PagerDuty MCP server enables natural language interaction
    with your PagerDuty account data including incidents, services, schedules,
    teams, escalation policies, and more.

    **Authentication:** A single `Authorization` custom header with the format
    `Token <api-token-value>`. All actions respect the permissions of the user
    account associated with the token.

    **Regional endpoints:** The hosted MCP server has two endpoints—US
    (`mcp.pagerduty.com`) and EU (`mcp.eu.pagerduty.com`). Ensure the connector
    URL matches your PagerDuty service region.

    ## Incident Management

    Use `list_incidents` to search and filter incidents, `get_incident` for
    details, and `manage_incidents` to update status, urgency, assignment,
    or escalation level.

    **Common incident workflows:**

    ```
    # List all triggered incidents
    Use list_incidents with status "triggered"

    # List high-urgency incidents
    Use list_incidents filtered by urgency "high"

    # Get details for a specific incident
    Use get_incident with the incident ID

    # Acknowledge an incident
    Use manage_incidents to set status to "acknowledged"

    # Resolve an incident
    Use manage_incidents to set status to "resolved"

    # Escalate an incident
    Use manage_incidents to escalate to the next level
    ```

    ## On-Call Management

    Use `list_oncalls` to see current on-call assignments, `get_schedule` for
    schedule details, and `create_schedule_override` for temporary coverage.

    **Common on-call workflows:**

    ```
    # Who is currently on-call?
    Use list_oncalls to see all current on-call assignments

    # Who is on-call for a specific escalation policy?
    Use list_oncalls filtered by escalation_policy_id

    # Get details for a schedule
    Use get_schedule with the schedule ID

    # Create a temporary override
    Use create_schedule_override with start/end times and user
    ```

    ## Service Management

    Use `list_services` to discover services, `get_service` for details, and
    `create_service` or `update_service` for configuration changes.

    **Service investigation patterns:**

    ```
    # List all services
    Use list_services

    # Get service details including integrations
    Use get_service with the service ID

    # Find incidents for a specific service
    Use list_incidents filtered by service_id
    ```

    ## Escalation Policy Management

    Use `list_escalation_policies` to discover policies and `get_escalation_policy`
    for details including escalation rules and targets.

    ## Team Management

    Use `list_teams` to discover teams, `get_team` for details, and team member
    management tools for roster changes.

    ## Incident Investigation Workflow

    For structured incident investigation:
    1. `list_incidents` — find active or recent incidents
    2. `get_incident` — get full incident details and current status
    3. `list_alerts_from_incident` — see triggering alerts and their details
    4. `get_alert_from_incident` — get specific alert details
    5. `get_related_incidents` — find correlated incidents
    6. `get_past_incidents` — find similar historical incidents
    7. `list_oncalls` — identify who is currently on-call
    8. `list_incident_notes` — review existing investigation notes
    9. `add_note_to_incident` — document findings
    10. `manage_incidents` — update status, urgency, or escalate

    ## Event Orchestration

    Use event orchestration tools to manage how events are routed and
    processed:
    - `list_event_orchestrations` — discover orchestration configurations
    - `get_event_orchestration_router` — view routing rules
    - `append_event_orchestration_router_rule` — add new routing rules
    - `get_event_orchestration_global` — view global orchestration rules
    - `get_event_orchestration_service` — view service-level rules

    ## Incident Workflows

    Use `list_incident_workflows` to discover automated workflows and
    `start_incident_workflow` to trigger them for an incident.

    ## Status Page Management

    Use status page tools to communicate during incidents:
    - `list_status_pages` — discover status pages
    - `create_status_page_post` — create a new incident post
    - `create_status_page_post_update` — add updates to existing posts
    - `list_status_page_impacts` — view impact categories
    - `list_status_page_severities` — view severity levels

    ## Troubleshooting

    | Issue | Solution |
    |-------|----------|
    | 401 Unauthorized | Verify the API token is valid and not expired |
    | 403 Forbidden | Check that the user account has sufficient permissions |
    | Connection refused | Verify firewall allows HTTPS to mcp.pagerduty.com |
    | EU region errors | Ensure you are using `mcp.eu.pagerduty.com` for EU accounts |
    | Token format error | Use `Token <value>` format, not `Bearer <value>` |
    | No data returned | Verify the token's user account has access to the requested resources |

  mcp_connectors:
    - pagerduty-mcp
  1. Select Save

Reference the skill in your subagent

Update your subagent configuration to include the skill:

spec:
  name: PagerDutyIncidentExpert
  skills:
    - pagerduty_incident_management
  mcp_connectors:
    - pagerduty-mcp

Step 5: Test the integration

  1. Open a new chat session with your SRE Agent
  2. Try these example prompts:

Incident management

Show me all currently triggered incidents

Get details for incident P1234567 including the timeline and notes

Create a new high-urgency incident for the payment-service with title "Payment processing degraded"

Acknowledge all triggered incidents assigned to me

On-call and schedules

Who is currently on-call for the platform-engineering escalation policy?

Show me the on-call schedule for the next 7 days

Create a schedule override for John Smith covering Saturday 9am to Monday 9am

List all users in the primary on-call schedule

Service and team management

List all services and their current status

Get details for the checkout-service including escalation policy and integrations

Show me all teams and their members

What escalation policies are configured for the payment team?

Incident investigation

Find incidents related to the current database outage

Show me similar past incidents to P1234567

What alerts triggered incident P1234567?

List all notes and timeline entries for the most recent SEV-1 incident

Event orchestration and workflows

List all event orchestration configurations

Show me the routing rules for the production orchestration

What incident workflows are available?

Start the "SEV-1 Response" workflow for incident P1234567

Status page management

List all status pages

Create a new status page post for the ongoing API degradation

Add an update to the current status page post indicating the issue is being investigated

What severity levels are available for status page posts?

Available tools

Incidents

ToolDescription
get_incidentGet details of a specific incident by ID
list_incidentsList and filter incidents by status, urgency, service, and more
create_incidentCreate a new incident on a specified service
manage_incidentsUpdate incident status, urgency, assignment, or escalation level
add_note_to_incidentAdd an investigation note to an incident
list_incident_notesList all notes on an incident
add_respondersAdd additional responders to an incident
list_alerts_from_incidentList all alerts associated with an incident
get_alert_from_incidentGet details of a specific alert from an incident
get_outlier_incidentIdentify outlier incidents based on patterns
get_past_incidentsFind similar historical incidents
get_related_incidentsFind incidents related to a specific incident

Services

ToolDescription
get_serviceGet details of a specific service
list_servicesList all services in the account
create_serviceCreate a new service
update_serviceUpdate service configuration

On-Call & Schedules

ToolDescription
list_oncallsList current on-call assignments
get_scheduleGet details of a specific schedule
list_schedulesList all schedules
list_schedule_usersList users in a specific schedule
create_scheduleCreate a new on-call schedule
update_scheduleUpdate an existing schedule
create_schedule_overrideCreate a temporary schedule override

Escalation Policies

ToolDescription
list_escalation_policiesList all escalation policies
get_escalation_policyGet details of a specific escalation policy

Teams & Users

ToolDescription
get_teamGet details of a specific team
list_teamsList all teams
list_team_membersList members of a specific team
create_teamCreate a new team
update_teamUpdate team details
delete_teamDelete a team
add_team_memberAdd a user to a team
remove_team_memberRemove a user from a team
get_user_dataGet details of a specific user
list_usersList all users in the account

Alert Grouping

ToolDescription
create_alert_grouping_settingCreate an alert grouping configuration
get_alert_grouping_settingGet details of an alert grouping setting
list_alert_grouping_settingsList all alert grouping settings
update_alert_grouping_settingUpdate an alert grouping setting
delete_alert_grouping_settingDelete an alert grouping setting

Change Events

ToolDescription
get_change_eventGet details of a specific change event
list_change_eventsList all change events
list_incident_change_eventsList change events related to an incident
list_service_change_eventsList change events for a specific service

Event Orchestration

ToolDescription
get_event_orchestrationGet details of an event orchestration
list_event_orchestrationsList all event orchestrations
get_event_orchestration_routerGet routing rules for an orchestration
update_event_orchestration_routerUpdate routing rules
append_event_orchestration_router_ruleAdd a new routing rule
get_event_orchestration_globalGet global orchestration rules
get_event_orchestration_serviceGet service-level orchestration rules

Incident Workflows

ToolDescription
get_incident_workflowGet details of an incident workflow
list_incident_workflowsList all incident workflows
start_incident_workflowStart an incident workflow for a specific incident

Log Entries

ToolDescription
get_log_entryGet details of a specific log entry
list_log_entriesList log entries for audit and investigation

Status Pages

ToolDescription
create_status_page_postCreate a new status page incident post
create_status_page_post_updateAdd an update to a status page post
get_status_page_postGet details of a status page post
list_status_page_impactsList available impact categories
list_status_page_post_updatesList updates for a status page post
list_status_page_severitiesList available severity levels
list_status_page_statusesList available status values
list_status_pagesList all status pages

Write operations

The PagerDuty MCP server supports both read and write operations. The hosted service at mcp.pagerduty.com exposes all tools (both read and write) by default.

Write tools

Write operations include creating and modifying PagerDuty resources:

CategoryWrite tools
Incidentscreate_incident, manage_incidents, add_note_to_incident, add_responders
Servicescreate_service, update_service
Schedulescreate_schedule, update_schedule, create_schedule_override
Teamscreate_team, update_team, delete_team, add_team_member, remove_team_member
Alert Groupingcreate_alert_grouping_setting, update_alert_grouping_setting, delete_alert_grouping_setting
Event Orchestrationupdate_event_orchestration_router, append_event_orchestration_router_rule
Incident Workflowsstart_incident_workflow
Status Pagescreate_status_page_post, create_status_page_post_update

PagerDuty also provides a self-hosted MCP server that can be run locally. The self-hosted server exposes only read-only tools by default; write tools require the --enable-write-tools flag at startup. For Azure SRE Agent, the hosted service at mcp.pagerduty.com is recommended as it requires no infrastructure management and exposes all tools automatically.


Troubleshooting

Authentication issues

ErrorCauseSolution
401 UnauthorizedInvalid or expired API tokenVerify the token is correct and active in User Settings > API Access
403 ForbiddenInsufficient user permissionsEnsure the user account associated with the token has the required PagerDuty role
Connection refusedFirewall blocking outbound HTTPSVerify firewall allows HTTPS traffic to mcp.pagerduty.com (port 443)
Token format errorUsing Bearer instead of TokenThe Authorization header must use Token <value> format, not Bearer <value>

Data and permission issues

ErrorCauseSolution
No data returnedToken user lacks access to the resourceVerify the user account has access to the requested services, teams, or incidents
EU region errorsUsing US endpoint for EU accountSwitch the connector URL to https://mcp.eu.pagerduty.com/mcp
Write operation failedUser lacks write permissionsVerify the token's user account has a role that allows write operations (e.g., Manager, Admin)
Rate limit exceededToo many API requestsPagerDuty rate limits vary by plan; reduce request frequency or contact PagerDuty support
Incident not foundWrong incident ID or no accessVerify the incident ID and that the token's user has access to the incident's service

Verify the connection

Test the server endpoint directly:

curl -I "https://mcp.pagerduty.com/mcp" \
  -H "Authorization: Token <your-api-token>"

Expected response: 200 OK confirms authentication is working.

Re-authorize the integration

If you encounter persistent issues:

  1. Navigate to My Profile > User Settings > API Access in PagerDuty
  2. Delete the existing API User Token
  3. Create a new API User Token
  4. Update the connector in the SRE Agent portal with the new token value in the Authorization header (format: Token <new-token>)

Limitations

LimitationDetails
User-scoped permissionsAPI token permissions are tied to the creating user's account; the token cannot exceed the user's access level
Self-hosted write restrictionThe self-hosted MCP server only exposes read-only tools by default; write tools require the --enable-write-tools flag
Rate limitsAPI rate limits apply per your PagerDuty plan; high-frequency usage may be throttled
No dedicated connector typeThe portal does not have a dedicated PagerDuty connector; you must use the generic MCP server connector and configure headers manually
Two regional endpoints onlyOnly US and EU service regions are supported; the endpoint must match your account's service region
Token rotationAPI tokens do not automatically expire; manual rotation is recommended as a security best practice

Security considerations

How permissions work

  • User-scoped: All actions respect the permissions of the PagerDuty user account that created the API token
  • Token-based: A single User API Token in the Authorization header provides both authentication and authorization
  • Role-based: The token inherits the PagerDuty role (Observer, Responder, Manager, Admin, etc.) of the creating user

Admin controls

PagerDuty administrators can: - Create and revoke User API tokens from user profile settings - Assign roles to user accounts to control permission scope - Use service accounts with restricted roles to limit the blast radius of compromised tokens - Monitor API token usage through PagerDuty's audit logs - Enforce token rotation policies as part of security governance

PagerDuty User API tokens can read and modify sensitive operational data including incidents, on-call schedules, and service configurations. Use service account tokens with restricted roles, grant only the permissions your agent needs, and rotate tokens regularly. Monitor the PagerDuty audit logs for unusual activity.


PagerDuty SRE Agent

In addition to connecting Azure SRE Agent to PagerDuty via MCP, PagerDuty offers its own built-in SRE Agent—an AI-powered assistant that works side-by-side with responders during incident triage and resolution. When combined with the Azure SRE Agent MCP integration, you get a powerful end-to-end incident management experience.

What is PagerDuty SRE Agent?

PagerDuty’s SRE Agent transforms incident response in the Operations Console and Slack by automatically analyzing incidents, providing key context, and recommending remediation actions. It accelerates triage to reduce risk, cost, and cognitive load, and it continuously learns to prevent repeat issues.

Key features

  • Automated incident analysis: Ingests and analyzes runbooks, SOPs, and diagnostics (e.g., error logs) to surface likely root causes
  • Playbook generation: Generates and saves playbooks for recurring issues based on past resolutions
  • Pattern detection: Detects patterns, recalls similar incidents, and provides structured troubleshooting
  • Actionable nudges: Recommends next steps through interactive buttons such as “Upload Runbook,” “Analyze Past Incidents,” “Generate a Playbook,” and “Search Logs”
  • Continuous learning: Builds memory from resolved incidents including incident playbooks, service runbooks, incident summaries, and service profiles
  • Observability integrations: Retrieves log data from platforms like Grafana, Datadog, New Relic, and AWS CloudWatch for deeper investigation

Prerequisites

  • PagerDuty Advance add-on (required for both Operations Console and Slack access)
  • AIOps add-on (required for Operations Console access)
  • Available on Enterprise, Business, and Professional plans
  • An Account Owner or Global Admin role is required to enable SRE Agent

Step 1: Enable PagerDuty SRE Agent

  1. In the PagerDuty web app, navigate to AI > AI Settings
  2. Select the Assistant and AI Agents Configuration tab
  3. Under AI Agents, find SRE Agent and toggle the switch to the on position

If you don’t have Account Owner or Global Admin permissions, click Request to Admin next to the SRE Agent toggle. This sends an email request to your admins to enable it for you.

Step 2: Configure tool integrations (optional)

PagerDuty SRE Agent can retrieve log data and runbooks from external tools for deeper investigation. Set up Workflow Integrations and select Allow SRE Agent access for each integration.

Supported integrations include:

  • Log platforms: Grafana, Datadog, New Relic, AWS CloudWatch
  • Runbook sources: Confluence, GitHub

For runbook sources, update your event payload to include the runbook URL in custom_details:

"custom_details": {
  "runbook_url": {
    "confluence": "https://YOUR-RUNBOOK-LINK"
  }
}

For more details, see Agent Tooling Configuration.

Step 3: Use SRE Agent in Operations Console

  1. Navigate to AIOps > Operations Console
  2. Optional: Add the SRE Agent column to the Operations Console for faster incident triage
  3. Select an incident by clicking its Title
  4. Select the SRE Agent tab and wait for the agent to load your incident summary
  5. Begin troubleshooting by asking questions or using the agent’s nudge buttons (e.g., Upload Runbook, Analyze Past Incidents, Generate a Playbook)

How it works with Azure SRE Agent

Azure SRE Agent has a built-in direct integration with PagerDuty’s SRE Agent. This means you can query PagerDuty’s AI-powered SRE Agent directly from within Azure SRE Agent’s chat interface—no separate tab or tool switching required.

Built-in PagerDuty Incident Management Agent

Azure SRE Agent includes a dedicated PagerDuty Incident Management Agent that provides the following tools:

ToolDescription
QueryPagerDutyIncidentChatQueries PagerDuty’s SRE Agent (Advance Chat API) for intelligent insights, troubleshooting guidance, runbook generation, or diagnostic recommendations about a specific incident
GetPagerDutyIncidentByIdRetrieves details for a specific PagerDuty incident by its ID
ResolvePagerDutyIncidentResolves a PagerDuty incident directly from Azure SRE Agent
AcknowledgePagerDutyIncidentAcknowledges a PagerDuty incident
AddNoteToPagerDutyIncidentAdds notes to a PagerDuty incident for tracking investigation progress

Querying PagerDuty SRE Agent from Azure SRE Agent

The QueryPagerDutyIncidentChat tool connects directly to PagerDuty’s Advance Chat API (https://api.pagerduty.com/advance/chat) using your PagerDuty API token. When you ask Azure SRE Agent a question about a PagerDuty incident, it automatically calls PagerDuty’s SRE Agent and returns the AI-powered response. This enables scenarios like:

  • “What caused incident Q391Y5VW0YYUEL?” — PagerDuty SRE Agent analyzes the incident context and provides root cause analysis
  • “Generate a runbook for incident Q391Y5VW0YYUEL” — PagerDuty SRE Agent creates a step-by-step runbook based on the incident details
  • “How do I troubleshoot incident Q391Y5VW0YYUEL?” — PagerDuty SRE Agent recommends diagnostic and remediation steps
  • “Provide mitigation steps for incident Q391Y5VW0YYUEL” — PagerDuty SRE Agent suggests actions prioritized by urgency and impact
  • “Triage incident Q391Y5VW0YYUEL” — PagerDuty SRE Agent provides a full triage summary with next steps

Configuration

The PagerDuty SRE Agent integration uses the same API token you configured for PagerDuty incident management. No additional setup is required beyond the standard PagerDuty connector configuration. When PagerDuty is configured as your incident management platform in Azure SRE Agent settings, the QueryPagerDutyIncidentChat tool is automatically available.

The PagerDuty Advance Chat API requires a PagerDuty Advance subscription. Each query to the SRE Agent consumes 4 PagerDuty Advance credits. Ensure your account has sufficient credits for your expected usage.

End-to-end workflow

With PagerDuty configured as both an MCP connector and an incident management platform, Azure SRE Agent enables a seamless workflow:

  1. Detect: Azure SRE Agent monitors your Azure infrastructure and detects issues
  2. Correlate: Azure SRE Agent retrieves related PagerDuty incidents for the affected Azure resources
  3. Triage: Azure SRE Agent queries PagerDuty’s SRE Agent for AI-powered root cause analysis, troubleshooting steps, and runbook recommendations
  4. Act: Azure SRE Agent acknowledges, adds notes to, or resolves PagerDuty incidents—all from a single conversation
  5. Learn: PagerDuty SRE Agent saves incident learnings and playbooks for future incidents, improving response over time

For the best experience, configure both the PagerDuty MCP connector (for service and schedule queries) and PagerDuty as your incident management platform (for direct SRE Agent access). This gives your team the full breadth of PagerDuty capabilities from within Azure SRE Agent.

For full documentation on PagerDuty SRE Agent capabilities, including best practices and example questions, see the PagerDuty SRE Agent documentation.


Updated Feb 26, 2026
Version 6.0
No CommentsBe the first to comment