Blog Post

Apps on Azure Blog
3 MIN READ

Enterprise-Ready and Extensible: Update on the Azure SRE Agent Preview

Mayunk_Jain's avatar
Mayunk_Jain
Icon for Microsoft rankMicrosoft
Sep 01, 2025

We’re thrilled to share new updates to the Azure SRE Agent shaped by what we’ve learned from customers during the preview. Also, as previously announced, billing for all Azure SRE Agent customers will begin on September 1, 2025.

At Build, we unveiled the Azure SRE Agent with a powerful end-to-end incident handling experience - detecting, diagnosing, mitigating, and handing off issues to development teams. Since then, it’s evolved dramatically in capability, coverage, and enterprise readiness. Designed to streamline incident response, improve uptime, and reduce operational costs, the agent is a pre-built AI assistant that brings intelligent automation to your Azure workloads. During the preview, customers explored features like natural language operations, proactive best practices, incident response, and daily health reports helping shape the updates we’re excited to share today.

Before we dive into what’s new, here’s an important update: Billing for all Azure SRE Agent customers will officially begin on September 1, 2025 as previously announced. The billing model includes two components: an always-on flow for continuous monitoring and an active flow for incident mitigation and task execution. Pricing is based on Azure Agent Units (AAUs)—a new metric that standardizes agentic processing across Azure’s growing family of AI agents. Customers can estimate costs using the Azure pricing calculator. Please refer to the billing announcement blog for more information.

With billing starting soon, we’re excited to highlight how the Azure SRE Agent has matured since Build. From deeper diagnostics to smarter integrations, the agent is now more capable, secure, and adaptable than ever ready to support enterprise-scale incident response with confidence. 

Today, the agent is:

  • Secure by design: Operates with read-only permissions and user-approved actions for governance.
  • Diagnostic-rich: Delivers deeper insights across a wider range of Azure services.
  • Seamlessly integrated: Connects with Azure Monitor, PagerDuty, ServiceNow, GitHub, and Azure DevOps.
  • Extensible incident handling: Uses past incident patterns and user-supplied Runbooks to guide response actions.
  • Source Code aware: Performs source code level root cause analysis for pinpoint accuracy.

Whether you’re automating incident response or keeping humans in the loop, the agent adapts to your operational style securely and consistently.

What’s New Since Build

1. Granular Permissions with Governance

Configure the agent with read-only access to Azure resources. When a write operation is needed, it requests explicit user approval. This ensures safe-by-default operations with full auditability.

2. Expanded Azure Service Skills

The agent now supports deeper diagnostics and safe operations across:

  • Azure CLI, kubectl and psql CLI: Answers questions and assists with operations across subscriptions, Kubernetes clusters and databases.
  • PostgreSQL: Diagnoses and performs safe maintenance tasks 
  • Azure API Management (APIM): Inspects gateway behavior, policies, and runtime signals.
  • Azure Functions, App Service, AKS, ACA: Offers richer diagnostics and operational insights.

And with Azure CLI support, it can reason over any Azure service even those not explicitly listed.

3. Incident Management Integrations

In addition to Azure Monitor alerts (enabled by default) and PagerDuty, the agent now integrates with ServiceNow for incident intake, updates, and status synchronization. These integrations ensure that incidents are automatically captured, diagnosed, and mitigated quickly.

4. DevOps Loop Closure

In addition to GitHub, the agent now supports generating incident reports directly into Azure DevOps work items. These reports can be assigned to coding agents to automatically create pull requests and merge changes after validation. This closes the loop from detection to remediation ensuring that learnings turn into actionable fixes, not just documentation.

5. Extensible Incident handling

The agent now supports customizable incident handling by learning from past incidents and applying user-supplied instructions. When similar issues arise, it can identify previously successful resolution steps and reuse them to ensure consistent outcomes. You can also define specific Runbook instructions, allowing the agent to follow your preferred workflows—whether fully automated or with human oversight.

6. Source Code Aware RCA

Root cause analysis now includes source code context via connections to Azure DevOps and GitHub. After analyzing logs, metrics, and exceptions, the agent links impacted resources to relevant code and validates suspected causes—pointing to specific files, methods for faster and more confident resolution.

What’s in It for You

  • Reduced Toil: Less manual triage and fewer repetitive tasks.
  • Improved Uptime: Faster detection and mitigation keeps services running.
  • Lower MTTR: Faster diagnosis and mitigation.
  • Enterprise-Grade Safety: Read-only permissions + approvals = safe by default.

The Azure SRE Agent is built to meet the demands of modern cloud operations - secure, extensible, and ready to scale with your team. Whether you're looking to reduce toil, improve uptime, or close the loop between detection and remediation, the agent is ready to help.

Ready to get started?

Updated Aug 29, 2025
Version 1.0

1 Comment

  • elmehdiba's avatar
    elmehdiba
    Copper Contributor

    Great update on the Azure SRE Agent! The move towards a more secure and extensible solution is excellent. Looking forward to seeing how these improvements will optimize large-scale incident response.