azure sre agent
60 TopicsBring Your Own GitHub App: Connecting Azure SRE Agent to Enterprise Repositories
What if your SRE agent could access your enterprise GitHub repositories the same way your CI/CD pipelines do with a governed service identity, not a personal token? Azure SRE Agent connects to your GitHub repositories to build rich context about your systems source code, infrastructure definitions, deployment configs, skills, runbooks, and operational history. This context is what turns generic troubleshooting into root cause analysis that points to the exact file, the exact commit, the exact config change. This is part of the Azure SRE Agent announcements at Build 2026. For teams on github.com, connecting is a quick OAuth sign-in. Today, we are extending that same deep context to GitHub Enterprise Cloud and introducing Bring Your Own GitHub App as a first-class authentication model for enterprise teams that need governed, app-based access to their repositories. Enterprise GitHub, enterprise identity Large organizations run on GitHub Enterprise Cloud with EMU (Enterprise Managed Users). In these environments, every identity is governed centrally, tokens are scoped by policy, rotated on schedule, and tied to individual humans. When an SRE agent needs to access your repositories, the identity it uses matters. With a GitHub App, the agent operates under a service identity registered and owned by your organization. Every repository operation — every clone, every issue query, every file read is attributed to the App’s installation, not to an individual engineer. Your security and compliance teams can trace agent activity to a governed service identity, and your audit logs reflect exactly what happened. GitHub Apps are the same identity model enterprises already use for CI/CD pipelines, deployment automation, and internal tooling. BYO App extends it to your SRE agent. How it works When you bring your own GitHub App to Azure SRE Agent, the authentication flow uses short-lived tokens with explicit permissions: Your organization registers a GitHub App on your GHE instance (or github.com) with the specific repository permissions you choose, Contents Read, Metadata Read, and optionally Issues or Pull Requests. The App’s private key lives in Azure Key Vault. The agent’s managed identity reads the PEM at runtime, mints a JWT, and exchanges it for an installation token that expires in about an hour. The private key never leaves Key Vault. Permissions are declared, not inherited. The App has exactly the access you configured at registration. The agent cannot exceed those boundaries regardless of who set it up. Token refresh is automatic. No human token to expire, no refresh chain to maintain. The agent mints new installation tokens as needed. For organizations managing multiple GitHub instances, say, one for platform engineering and another for application teams, each instance gets its own GitHub App with its own Key Vault secret. You can assign a different user-assigned managed identity per App for security isolation. Disconnecting one host does not affect others. What your agent does with GitHub access Once connected, your agent uses GitHub for more than source code. Repositories hold the artifacts that define how your services run and how your agent reasons about them: Source code and infrastructure definitions. The agent reads application code, Bicep templates, Terraform configurations, and Dockerfiles to understand what a service actually does — not what the docs say it does. Skills and runbooks. Teams store agent skills, response plans, and operational runbooks as files in repositories. GitHub access lets the agent load and update these artifacts directly. Configuration and deployment history. Helm charts, pipeline definitions, environment configs, and release manifests give the agent the context to correlate an incident with what changed and when. Issues and pull requests. The agent can search issues for known problems, check recent PRs for regression candidates, and create issues or PRs when it identifies a fix. Logs tell the agent what happened. Code tells it why. Your skills and runbooks tell it what to do about it. The difference with BYO App is the identity under which all of this happens. These operations occur under your organization’s App identity with the permissions you declared, the audit trail you govern, and the key lifecycle you control. GitHub Enterprise Cloud hosts For GitHub Enterprise Cloud domains (*.ghe.com), the Code Access wizard automatically selects BYO App as the authentication method. This is by design, GHE Cloud hosts use App-based authentication exclusively. The setup: Create a GitHub App on your GHE instance. Set Contents: Read and Metadata: Read at minimum. Install it on the repositories your agent needs. Store the private key in Azure Key Vault. Full PEM content as a secret. Grant the agent’s managed identity Key Vault Secrets User on that vault. Enter Client ID and Key Vault secret URI in Code Access. The agent validates credentials and loads your repositories. BYO App on github.com works the same way, useful when your organization’s policy requires App-based authentication even for public GitHub. Resources Create new SRE Agent — https://aka.ms/sreagent SRE Agent Documentation — https://aka.ms/sreagent/newdocs SRE Agent recipes — https://aka.ms/sreagent/recipes Build 2026 SRE Agent announcements - https://aka.ms/Build26/blog/SREAgent183Views0likes0CommentsNew in Azure SRE Agent: Log Analytics and Application Insights Connectors
Azure SRE Agent now supports Log Analytics and Application Insights as log providers. Connect your workspaces and App Insights resources, and the agent can query them directly during investigations. Why This Matters Log Analytics and Application Insights are common destinations for Azure operational data - container logs, application traces, dependency failures, security events. The agent could already access this data through az monitor CLI commands if you granted RBAC roles to its managed identity, and that approach still works. But it required manual RBAC setup and the agent had to shell out to CLI for every query. With these connectors, setup is simpler and querying is faster. You pick a workspace, we handle the RBAC grants, and the agent gets native MCP-backed query tools instead of going through CLI. What You Get Two new connector types in Builder > Connectors (or through the onboarding flow under Logs): Log Analytics - connect a workspace. The agent can query ContainerLog, Syslog, AzureDiagnostics, KubeEvents, SecurityEvent, custom tables, anything in that workspace. Application Insights - connect an App Insights resource. The agent gets access to requests, dependencies, exceptions, traces, and custom telemetry. You can connect multiple workspaces and App Insights resources. The agent knows which ones are available and targets the right one based on the investigation. Setup If you want early access, please enable: Early access to features under Settings > Basics. From there you can add connectors in two ways: Through onboarding: Click Logs in the onboarding flow, then select Log Analytics Workspace or Application Insights under Additional connectors. Through Builder: Go to Builder > Connectors in the sidebar and add a Log Analytics or Application Insights connector. Pick your resource from the dropdown and save. If discovery doesn't find your resource, both connector types have a manual entry fallback. On save, we grant the agent's managed identity Log Analytics Reader and Monitoring Reader on the target resource group. If your account can't assign roles, you can grant them separately. Backed by Azure MCP Under the hood, this uses the Azure MCP Server with the monitor namespace. When you save your first connector, we spin up an MCP server instance automatically. The agent gets access to tools like: monitor_workspace_log_query - KQL against a workspace monitor_resource_log_query - KQL against a specific resource monitor_workspace_list - discover workspaces monitor_table_list - list tables in a workspace Everything is read-only. The agent can query but never modify your monitoring configuration. If different connectors use different managed identities, the system handles per-call identity routing automatically. What It Looks Like An alert fires on your AKS cluster. The agent starts investigating and queries your connected workspace: ContainerLog | where TimeGenerated > ago(30m) | where LogEntry contains "error" or LogEntry contains "exception" | summarize count() by ContainerID, LogEntry | top 10 by count_ KubeEvents | where TimeGenerated > ago(1h) | where Reason in ("BackOff", "Failed", "Unhealthy") | summarize count() by Reason, Name, Namespace | order by count_ desc The agent also ships with built-in skills for common Log Analytics and App Insights query patterns, so it knows which tables to look at and how to structure queries for typical failure scenarios. Things to Know Read-only - the agent can query data but cannot modify alerts, retention, or workspace config Resource discovery needs Reader - the dropdown uses Azure Resource Graph. If your resources don't show up, use the manual entry fallback One identity per connector - if workspaces need different managed identities, create separate connectors Learn More Azure SRE Agent documentation Azure MCP Server We'd love feedback. Try it out and let us know what works and what doesn't. Azure SRE Agent is generally available. Learn more at sre.azure.com/docs.745Views1like0CommentsAzure SRE Agent at Microsoft Build 2026: Bringing agentic operations to the enterprise
Build 2026 Update When we launched Azure SRE Agent, the promise was simple: reduce operational toil, improve up time, and evolve teams from manual incident response towards AI-powered operations. Since GA in March 2026, that promise has held up in production. Teams are using the agent to diagnose live issues, reason across telemetry and code, and automate response workflows - and the footprint has grown fast. But there's a gap between an agent that works in a dev/test environment and one that works in your production environment. Real production environments sits behind a private network with strict egress rules for enterprise security. Their code lives in a GitHub Enterprise tenant that a consumer OAuth sign-in can't reach. Platform teams need to govern what the agent can learn and use, and connectors must scale across many tools and many teams. At Microsoft Build 2026, we're announcing five releases that take a major step toward enterprise adoption at scale: VNet integration Preview - Run SRE Agent inside your private Azure workloads, with full support for enterprise network boundaries and private connectivity. Managed Connectors - A redesigned connector experience for governing, securing and scaling connections across observability, incident management, code, and collaboration tools plus an expanded SaaS connector catalog including Jira, GitLab, Slack, Power BI, and more. Granular permissions model - Set allow, ask, and deny rules on individual tools. Admins can set guardrails that apply everywhere; Agent users can approve tools for the rest of their conversation without waiting on policy changes. Native GitHub Enterprise support - Ground investigations in your enterprise repositories and workflows, so an incident can become an issue, an investigation, a pull request, and a repair plan — all under a governed service identity. Private Plugins Marketplace - Give platform teams a governed way to publish approved skills, MCP tools, and operational workflows to every SRE Agent in the tenant. Together with our Infrastructure-as-Code templates, these releases make Azure SRE Agent easy to integrate into secure environments with locked-down networks, regulated teams and complex codebases. Read the series VNet integration - https://aka.ms/sreagent/blog/VNET Managed Connectors - https://aka.ms/sreagent/blog/connectorsv2 SRE Agent permissions model - https://aka.ms/sreagent/blog/HooksAndToolPermissions Native GitHub Enterprise support - https://aka.ms/sreagent/blog/githubenterprise Private Plugins Marketplace - https://aka.ms/sreagent/blog/privatepluginmarketplace 📺 Watch the on-demand session from #MSBuild 2026 - https://aka.ms/sreagent/build26 Get started Create an SRE Agent — https://aka.ms/sreagent Documentation — https://aka.ms/sreagent/newdocs Recipes — https://aka.ms/sreagent/recipes What's next We're exploring Microsoft Entra Agent ID for first-class agent identity and Microsoft Agent 365 integration for centralized agent governance. What enterprise controls would unlock adoption in your production environments? Tell us in the comments below779Views0likes0CommentsVNet integration for Azure SRE Agent (preview)
For many production systems, the logs, databases, private endpoints, repositories, and runbooks an SRE Agent needs to do its job are behind network boundaries your security team already governs. VNet integration for Azure SRE Agent, now in preview, puts the agent's outbound traffic under those same controls - your virtual network, your NSG rules, your private DNS - so it reaches only what your network allows. The principle is one your security team already applies to every other workload: a component's network access shouldn't depend on the component behaving correctly. Identity governs what the agent can reach. Permissions and hooks shape what it does within reach. The network sits beneath both: it blocks any request to a destination you haven't allowed no matter what the agent decides. Why egress control matters Two reasons. First, the agent reads sensitive things by design. Inspecting logs, code, configuration, and internal systems is the whole point during an incident, which means you have to decide where that data can go. Open egress gives that data a path out of your network - a risk you wouldn't accept for any other production-adjacent workload. Second, it reasons over text it didn't write - logs, issue descriptions, tool output — which is how prompt injection gets in. Handling that is partly model safety, and Azure SRE Agent runs under Microsoft's Responsible AI standard with safety work from OpenAI and Anthropic. Network controls add another layer: an instruction that tries to reach a destination you haven't allowed can't run, because the network blocks it. For example, an agent investigating an outage might query Log Analytics, read deployment configuration, and call an internal runbook - all private resources. With VNet integration, those calls follow the routes, DNS, and firewall rules your workloads already use. A request to an external endpoint you haven't allowed fails at the network boundary. It doesn't depend on the model recognizing the risk and refusing; the network stops it either way. Choose an egress mode Azure SRE Agent has three egress modes, and you don't have to start at the strongest. Unrestricted - all outbound traffic allowed Limited - deny all outbound, allow an explicit list of hosts. Gives you host-level control without setting up a full VNet Azure VNet - outbound traffic goes through a delegated subnet in your network, with your NSG rules and private DNS applied. The recommended mode for production and regulated workloads. How Azure VNet mode works Outbound traffic takes one of two paths, and every call takes exactly one. Your VNet. Everything not placed on the managed path goes through a delegated subnet in your own network, where your NSG rules, private DNS, and firewall all apply. The agent is just another workload on that subnet, so it can reach what the subnet can reach: databases behind private endpoints, internal services, monitoring stores, and key vaults -the parts of production that aren't reachable from the public internet. The resources that matter most during an incident are usually the private ones. If your network connects to on-premises over ExpressRoute or VPN, the agent can reach those systems too, as long as your existing routes and rules allow it. The managed infra path. Some destinations go through Azure SRE Agent's managed infrastructure network instead - platform services the agent needs, plus optional categories you turn on: package registries, code repositories, and remote MCP servers. This path skips your VNet, so your NSG rules and Firewall Policies don't apply to it. Treat it as a deliberate exception, used only where you need it. Why public services start on the managed path Public services are hard to allow by IP address. GitHub, PyPI, npm, NuGet, apt, and the container registries run on large, changing IP ranges, and they don't map to a single Azure service tag. If your NSG filters by IP and port, keeping those lists up to date is constant work, and when a list falls behind, the agent can't pull a package or read a repository - and an investigation stalls on a networking problem that has nothing to do with the incident. Each category has a toggle: package registries (PyPI, npm, NuGet, apt), code repositories (GitHub, GitHub Enterprise, Azure DevOps), remote MCP servers, and a list of additional hostnames. Starting with these on the managed path keeps the agent working reliably without maintaining an IP allowlist. For build-time dependencies, that's usually fine. If you want this traffic inspected too, the next step is name-based (FQDN) egress filtering in your own network. Once your firewall can allow github.com and pypi.org by name, you can move these categories off the managed path and route them through your VNet instead Configure it Two decisions: the subnet, and what (if anything) uses the bypass. Navigate to Settings > Workspace Configuration > Network Choose Azure VNet as the egress mode. Select a subnet that is /28 or larger and delegated to `Microsoft.App/environments`. Decide which categories, if any, use the bypass. Restrict who can change the egress mode and bypass toggles. These settings widen or narrow the agent's reach, so govern them like any production network control. Test the outbound behavior before using the agent with production data. A reasonable setup for most enterprises during preview: use Azure VNet mode, keep package registries and code repositories on the bypass if you need reliable access to them, and route everything else through your VNet. Stricter environments can turn those categories off and rely on their own name-based firewall rules. What it doesn't cover yet VNet integration is in preview, with two limitations to know. It covers outbound traffic only - reaching the agent privately from inside your network isn't part of this preview. And connector traffic still routes over the public internet; the governance and credential isolation in Connectors V2 still apply. Use VNet integration for outbound control of the agent workspace, and combine it with identity, RBAC, tool permissions, hooks, and connector governance for a complete set of controls. Where it fits VNet integration doesn't replace identity, RBAC, tool permissions, or connector governance. It controls where traffic can go. The agent still needs the right identity and permissions to access a resource in the first place. Identity is the foundation: your RBAC assignments decide what the agent can reach. Permissions and hooks shape what it does within reach: allow/ask/deny rules control what runs, and hooks let you inspect or change a tool call before it runs. VNet integration sits underneath, controlling where traffic can go no matter what the agent tries to do. You want the agent to be capable. You also want a boundary that holds whether or not it is. Get started Create an SRE Agent - https://aka.ms/sreagent Documentation - https://aka.ms/sreagent/newdocs Recipes - https://aka.ms/sreagent/recipes Build 2026 Announcement - https://aka.ms/Build26/blog/SREAgent623Views1like0CommentsManaged Connectors for SRE Agent (preview)- Govern what your agent can do
Giving an agent access to a tool is the easy part. The harder question is what it's allowed to do with that access. "Can the agent copy a file in OneDrive?" mostly answers itself. "Can it copy any file, to any destination, over one that's already there?" is the one that decides whether the integration has a governance layer. Managed Connectors is built around that second question. It expands the catalog of tools the agent can reach - OneDrive, SharePoint, Google Drive, GitLab, Power BI, Microsoft Security Copilot, with more being added regularly - and pairs it with a governance model that keeps the policy for those tools outside the agent's control. This is part of the Azure SRE Agent announcements at Build 2026 What's new Managed Connectors is the next generation of our connector experience. It significantly expands the catalog of third-party and first-party SaaS integrations available to SRE Agent and surfaces each one to the agent as a curated set of operations through the Model Context Protocol (MCP) - the same standard the agent already uses for every other tool source. Governance: the agent gets capability, you keep control The governance model is the headline of this release, so it's worth being concrete about it. When you add a connector, you walk through a short wizard - Set up connector, Configure tools, Review & Save - and the "Configure tools" step is where the policy is set. Three things make it different from "just wire the API up to the LLM": You choose what's exposed - it isn't automatic. A connector might offer 40+ operations; in the wizard you pick the ones the agent can use. The rest aren't shown to the model, so it can't call them. Parameter policy lives outside the agent. For each selected operation you can mark parameters as user-defined (pinned to a value you specify) or agent-defined (the agent fills it in). On the Microsoft Planner “Create a task” tool, for example, you can choose the group ID from a list of your joined groups – this means that the agent provides the task details but can’t assign it to any arbitrary group, because that isn’t a parameter it sees when invoking the tool. Per-tool approval is built in. Each operation has an Allow/Ask toggle integrated directly into the creation and edit wizards. "Ask" routes the call through the agent runtime human-in-the-loop approval flow before it executes. On that same Microsoft Planner connector, you might leave read-only tools like “List tasks” or "Get plan details” on Allow, but flip “Delete a task” to Ask so a human must confirm before anything is removed. This is enforced on the agent's runtime; it is not a prompt instruction the model can be talked out of following. Credential Isolation No long-lived secrets in the agent. No API keys, no client secrets, no certificates, no OAuth tokens. All service credentials are encrypted at rest and stored outside of the agent’s trust boundary Automatic token refreshed. Once you consent, the internal connector resource keeps your tokens valid. You won't be asked to re-authenticate unless your service itself requires it. You consent once, in your own browser, with your own service. SRE Agent never proxies your password or the sign-in flow. Per-connection authorization. Each connection is bound to the specific SRE Agent instance you set up on and cannot be used by external threat actors. How it fits together All of this is stored and evaluated outside the agent loop. Each configured connector becomes an MCP server that the SRE Agent runtime registers as a tool source, the same standard wire format the agent uses for everything else, so adoption on the model side is trivial. Each layer does one job, and the trust boundary between "what the model decided" and "what was actually sent" is explicit and inspectable: the agent never sees the operations you didn't select, never sees the parameter slots you pinned, and cannot bypass approval on operations you marked Ask. How to try it Open the SRE Agent portal and go to Builder > Connectors. Pick a connector from the catalog with the “Preview” label and go through the creation wizard steps. At the “Set up connector” step, choose how the connector authenticates. Start with “OAuth” if you just want to sign-in and see it working against your own account. At “Configure tools”, select the operations you want to expose, pin any parameters that shouldn't be agent-controlled, and mark sensitive operations as “Ask.” Review & Save. The connector is registered with the runtime and immediately available to your agent. You can enable/disable specific tools or connectors in the “Capabilities” section. Edit connector – after creating the new connector, at any point you can go back and authenticate it with a different account, add or remove operations, update tool parameters and configure approval policies Resources Create new SRE Agent — https://aka.ms/sreagent SRE Agent Documentation — https://aka.ms/sreagent/newdocs SRE Agent recipes — https://aka.ms/sreagent/recipes Build 2026 SRE Agent announcements - https://aka.ms/Build26/blog/SREAgent275Views1like0CommentsShaping what Azure SRE Agent does: Tool Permissions and Hooks
When an AI agent runs against production, the first question every security team asks is "What can it do, who decided it could, and what stops it from doing something it should not." Azure SRE Agent reached general availability in March. Since then, teams inside Microsoft and customers running it against real production workloads have asked for the same thing: finer-grained controls over what the agent can do on its own and a clear answer to who governs each call that reaches a tool. Today at Build 2026, we are releasing global tool access policies as one of a set of new governance controls. This post covers how they work. Tool access policies give security and platform teams a single place to define which tools the agent can invoke, under what conditions, and what requires human approval before it runs. Underneath those policies sits the identity the agent runs as the bedrock that every other control layer depends on. It is defense in depth applied to agent behavior: layers of control, each one holding on its own, so that governing the agent is something you can read, audit, and reason about as you scale it across production. Identity is the bedrock: managed identity today, agent identity next Start here, because nothing else matters if you skip it. The identity the SRE Agent runs as, and the Azure RBAC role assignments on that identity, are the most powerful boundary the agent works inside of. If your role assignments do not grant the agent access to a resource, none of the controls below come into play, because the agent cannot reach the resource to begin with. Network rules, tool permissions, hooks, and connector contracts all sit on top of an RBAC story that you write. The features in this post add layers above that floor. They do not replace it. Today the SRE Agent operates as a managed identity, and your RBAC role assignments on that identity govern what it can do. This is the bedrock, and it is the same model your other Azure workloads already use. You assign roles, you scope them, and the agent inherits exactly what you granted and nothing more. Everything that follows assumes the bedrock is in place. With identity settled, the next question is the obvious one: where is the agent allowed to send its traffic? Permissions: govern what the agent does with a tool Identity decides what the agent can reach. Permissions decide what the agent does with the access it has, down to the individual tool. Two levels cover the range: a point-and-click grid for the common cases, and hooks when a decision needs your own code. The grid is the easy mode. Every tool the agent can use, built-in tools along with MCP servers, services, and custom tools, shows up in one searchable list with two switches. On/Off sets whether the tool is available at all; turn it off and the agent cannot use it. Allow/Ask sets what happens when it is on: Allow lets the agent run the tool automatically, Ask requires a human to approve every time, except in Autonomous mode. Select tools in bulk to flip a whole category at once, filter by category or permission, and use the Advanced permissions tab when you want rules that apply at global, per-agent, or per-thread scope instead of tool by tool. Defaults stay put until you touch them, and the engine is fail-closed: if a rule cannot be evaluated, the call is blocked rather than allowed. That covers most of what teams need. Underneath those switches are three rules, allow, ask, and deny, and the Advanced tab is where you set them by scope. Global rules apply to every agent and thread, Agent rules to one custom agent, Thread rules to a single conversation. Deny is the hard one: it blocks the tool outright no matter the run mode, and a deny at a higher scope always wins, so an Allow at thread scope cannot reopen something denied globally. That split is deliberate. A platform team sets the Global guardrails that should never be crossed and the Asks that always need a human, and service teams add their own Allow rules at Agent scope for routine work, without being able to override the guardrails above them. Platform team, Global scope: deny: bash(az * delete *) - never delete, on any agent or thread deny: bash(kubectl delete *) ask: bash(az webapp restart *) - always confirm, even in Autonomous allow: bash(az monitor *) - auto-approve monitoring queries Service team, Agent scope: allow: bash(kubectl get *) - routine read-only work allow: bash(kubectl describe *) Two details make this safe to lean on. Rules match the canonicalized tool invocation rather than the raw text, so enforcement holds no matter how the command was assembled. And fail-closed has a softer edge than a hard stop: a cached last-known-good policy covers transient failures, so a blip in the policy store blocks the call rather than silently widening access. You can find these under Capabilities > Tools The layer worth spending time on is hooks. Allow and Ask answer "should this tool run." Hooks answer "should this specific call run, given exactly what it is about to do." A hook fires before the agent runs a tool and receives the actual call, parameters and all. Your code then decides the outcome and can reshape it: rewrite parameters before they are sent, inject extra context into the pipeline as a user message so the agent reconsiders before its next step, block the call outright, or redirect the agent toward a safer path. Because your code sees the real parameters, the decision can depend on anything you can express in code: which resource the call targets, whether a value falls outside an allowed range, the time of day, the result of an external policy lookup. This is where you write the rule the grid cannot. Two kinds of hook, mixable on the same agent. Command hooks are a script you write; reach for these when code is enough. Prompt hooks put a separate LLM in the loop as a judge that evaluates the call in context; reach for these when the decision needs reasoning rather than a fixed rule. A real example from our own internal test agent: when the agent tries to list files through the shell with ls or dir, a hook blocks the call. The agent absorbs the signal, reconsiders, and reaches for the ListDir tool instead. The hook did not argue with a human. It shaped what happened next. As with the grid, configure nothing and the agent behaves exactly as it does today. Both are additive. Authoring one is a short form. You name the hook, pick the event (Pre Tool Use, so it runs before the call), and set a tool matcher, either picked from the tool menu or written as a regex like (FetchWebpage|SearchMemory) with anchors and lookaheads when you need them, so the hook fires only on the calls you care about. You set a timeout and a fail mode (Block, so a hook that errors or hangs stops the call rather than waving it through), and you write the body in Bash or Python. A command hook reads the call as JSON on stdin, the event name, the tool name, its parameters, and the call id, and answers on stdout. Print nothing and exit zero to allow. Return a block decision with a reason to stop the call, and that reason is what the agent reads back. You can also substitute: run a cheaper or safer version yourself, block the real call, and hand your own output back as the result, so the agent never runs the expensive or risky original. #!/bin/bash input=$(cat) tool=$(echo "$input" | jq -r '.tool_name') # Block one tool, with a reason the agent will read if [ "$tool" = "ExampleToolName" ]; then echo '{"decision":"block","reason":"Blocked ExampleToolName by hook policy."}' exit 0 fi # Otherwise allow: print nothing and exit 0 exit 0 You can find these under Builder > Hooks Each layer holds on its own The layers stack. Identity is the floor: your RBAC assignments decide what the agent can reach at all. Permissions, the grid and hooks together, decide what it does with a tool. You author each layer, each one holds whether or not the layer above it behaves as expected, and all of it configures through the same ARM and Bicep surface your platform team already uses, reproducible the way the rest of your Azure estate is. The upgrade path is additive and non-breaking. Existing agents keep working. Turn on each control when you are ready, in the order your governance requires. There is more coming. We run Azure SRE Agent inside Microsoft on our own production workloads, so we feel the same gaps you do, and the next round is shaped by what we hear from teams running it in production today. Which control is doing the most for you, and which one are you still waiting on? Let us know and thank you! Getting started Create new SRE Agent — https://aka.ms/sreagent SRE Agent Documentation — https://aka.ms/sreagent/newdocs SRE Agent recipes — https://aka.ms/sreagent/recipes Build 2026 Announcement - https://aka.ms/Build26/blog/SREAgent248Views0likes0CommentsPrivate Plugins with Azure SRE Agent
SRE's and platform teams are building operational skills specific to their infrastructure: investigation runbooks, compliance checks, cost analysis playbooks, deployment verification procedures. The next step is making that work reusable across every agent in the organization without exposing it publicly. Today, SRE Agent supports plugin marketplaces hosted in private GitHub repositories, including GitHub Enterprise. This is part of the Azure SRE Agent announcements at Build 2026. You can now point SRE Agent at a private repo when adding a marketplace or installing a plugin. Authentication is handled per-marketplace, and supports OAuth, GitHub PATs, and GitHub Apps for GHE tenants. From one agent to an organization’s plugin catalog Most teams start with a single SRE Agent connected to their services. The agent learns their infrastructure, runs their runbooks, and handles their incidents. It works well. Then adoption grows. A second team stands up their own agent. Then a third. Platform engineering wants every agent to run the same compliance checks. Security needs approval hooks enforced consistently. FinOps has cost governance skills that should be standard across the organization. Suddenly the question isn’t “how do I set up my agent,” it’s “how do we share operational knowledge across all of them.” Without a distribution model, teams end up copying skill files between agents manually. A platform team writes a runbook, shares it over email or a wiki link, and each service team pastes it into their agent individually. When the runbook improves, some agents get updated, some don’t. There’s no version tracking, no central catalog, and no way to know which agent is running which version of which skill. Private marketplace support solves this. How Private Plugin marketplace meet enterprise needs A platform team publishes once, every agent installs. Codify best practices as plugins in a private GitHub repo. Service teams add that repo as a marketplace in their agents and install what they need. Compliance checks, cost governance thresholds, incident playbooks, deployment verification procedures all distributed through versioned plugins. Each team retains ownership. Security controls which plugins enforce approval hooks. FinOps locks cost thresholds into parameter values. Platform engineering governs infrastructure investigation patterns. The marketplace is the distribution layer for organizational standards. Versions are pinned, updates are explicit. Each installation locks to the commit at install time. A merged PR upstream does not change any agent’s behavior. Teams promote new versions on their own schedule: validate in dev, promote to staging, then production. Different agents can run different versions simultaneously. Reuse across environments and tools. The same plugin works across dev, staging, and production agents, and can be reused by local coding agents and other services that support plugins. One source of truth, not separate copies per environment. Accessing Private Plugin marketplaces Private repo support adds authentication to the SRE Agent's plugin workflow so your agent can clone and install from repos that require credentials. Authentication is configured once per marketplace. Every plugin within it inherits the credentials. Auth method When to use Setup OAuth github.com repos your agent can already access Uses your existing GitHub connection. One click. Personal access token Private repos in other orgs on github.com Per-marketplace PAT. Scoped to just that marketplace. GitHub App GitHub Enterprise (*.ghe.com) BYO App with private key in Azure Key Vault. Short-lived tokens minted at runtime. Getting started In SRE Agent, navigate to Builder > Plugins, then click Add Marketplace and enter the URL of the private marketplace you want to connect to. Then click Connect to GitHub to complete the OAuth sign-in. Click Add and you will see the plugins available from your connected marketplace. Click on the plugin to install and in the detail view you can browse the skills packaged with the plugin. click Install to install this plugin. You can now see the skills imported from plugins from Capabilities > Skills > Custom Skills The bottom line Private repo support turns the Plugin Marketplace from a public skill catalog into your organization’s internal distribution platform for operational automation. Your team writes the plugins. Your agents install them. Your GitHub permissions control who has access. Try it yourself: create a private repo with a marketplace.json and a few skills, add it as a marketplace in your agent, and install a plugin. Resources SRE Agent documentation — https://aka.ms/sreagent/newdocs SRE Agent overview — https://aka.ms/sreagent/newdocsoverview Plugin Marketplace capability page — https://aka.ms/sreagent/newdocs/capabilities/plugin-marketplace Build 2026 SRE Agent announcements - https://aka.ms/Build26/blog/SREAgent209Views0likes0CommentsGet started with Atlassian Rovo MCP server in Azure SRE Agent
Get started with Atlassian Rovo MCP server in Azure SRE Agent Connect Azure SRE Agent to Jira, Confluence, Compass, and Jira Service Management using the official Atlassian Rovo MCP server. Overview The Atlassian Rovo MCP server is a cloud-hosted bridge between your Atlassian Cloud site and Azure SRE Agent. Once configured, it enables real-time interaction with Jira, Confluence, Compass, and Jira Service Management data through natural language. All actions respect your existing Atlassian user permissions. The server supports API token (Basic or Bearer auth) for headless or automated setups. Azure SRE Agent connects using Streamable-HTTP transport directly to the Atlassian-hosted endpoint. Key capabilities Product Capabilities Jira Search issues with JQL, create/update tickets, add comments and worklogs, transition issues through workflows Confluence Search pages with CQL, create/update pages and live docs, manage inline and footer comments Compass Create/delete service components and relationships, manage custom fields, query dependencies Jira Service Management Query ops alerts, view on-call schedules, get team info, escalate alerts Rovo Search Natural language search across Jira and Confluence, fetch content by ARI [!NOTE] This is the official Atlassian-hosted MCP server at https://mcp.atlassian.com/v1/mcp . The server exposes 46+ tools across five product areas. Tool availability depends on authentication method and granted scopes. Prerequisites Azure SRE Agent resource deployed in Azure Atlassian Cloud site with one or more of: Jira, Confluence, Compass, or Jira Service Management User account with appropriate permissions in the Atlassian products you want to access For API token auth: Organization admin must enable API token authentication in the Rovo MCP server settings Step 1: Get your Atlassian credentials Choose one of the two authentication methods below. API token (Option A) is recommended for Azure SRE Agent because it enables headless configuration without browser-based flows. Option A: Personal API token (recommended for Azure SRE Agent) API token authentication allows headless configuration without browser-based OAuth flows—ideal for Azure SRE Agent connectors. Navigate to the API token page Log in to your Atlassian account Select your profile avatar in the top-right corner Select Manage account In the left sidebar, select Security Under the API tokens section, you can manage your existing tokens Alternatively, use this direct link that pre-selects all MCP scopes: Direct URL: Create API token with all MCP scopes Create the token Navigate to the Atlassian API token creation page to create a token with all MCP scopes preselected Optionally click Back to manually select only the scopes you need (see Available scopes) Copy the generated token and note the email address associated with your Atlassian account Base64-encode your credentials: # Format: email:api_token echo -n "your.email@example.com:YOUR_API_TOKEN_HERE" | base64 On Windows PowerShell: [Convert]::ToBase64String([Text.Encoding]::UTF8.GetBytes("your.email@example.com:YOUR_API_TOKEN_HERE")) This produces a base64-encoded string you'll use in the connector configuration as the Authorization: Basic <value> header. [!IMPORTANT] Store the API token securely. It cannot be viewed again after creation. If lost, generate a new token from the same API tokens page. [!NOTE] API token authentication must be enabled by your organization admin. If you cannot create a token, ask your admin to enable API token authentication in the Rovo MCP server settings at admin.atlassian.com > Security > Rovo MCP server. Available scopes The API token supports the following scope categories: Category Scopes Jira read:jira-work , write:jira-work , read:jira-user Confluence read:page:confluence , write:page:confluence , read:comment:confluence , write:comment:confluence , read:space:confluence , read:hierarchical-content:confluence , read:confluence-user , search:confluence Compass read:component:compass , write:component:compass JSM read:incident:jira-service-management , write:incident:jira-service-management , read:ops-alert:jira-service-management , write:ops-alert:jira-service-management , read:ops-config:jira-service-management , read:servicedesk-request Bitbucket read:repository:bitbucket , write:repository:bitbucket , read:pullrequest:bitbucket , write:pullrequest:bitbucket , read:pipeline:bitbucket , write:pipeline:bitbucket , read:user:bitbucket , read:workspace:bitbucket , admin:repository:bitbucket Platform read:me , read:account , search:rovo:mcp [!NOTE] Bitbucket scopes are available in the token, but Bitbucket tools are not yet listed on the official supported tools page. Bitbucket tool support may be added in a future update. Step 2: Add the MCP connector Connect the Atlassian Rovo MCP server to your SRE Agent using the portal. Using the Azure portal (API token auth) In Azure portal, navigate to your SRE Agent resource Select Builder > Connectors Select Add connector Select MCP server (User provided connector) and select Next Configure the connector: Field Value Name atlassian-rovo-mcp Connection type Streamable-HTTP URL https://mcp.atlassian.com/v1/mcp Authentication Custom headers Header Key Authorization Header Value Basic <your_base64_encoded_email_and_token> Select Next to review Select Add connector Step 3: Create an Atlassian subagent Create a specialized subagent to give the AI focused Atlassian expertise and better prompt responses. Navigate to Builder > Subagents Select Add subagent Paste the following YAML configuration: api_version: azuresre.ai/v1 kind: AgentConfiguration metadata: owner: your-team@contoso.com version: "1.0.0" spec: name: AtlassianRovoExpert display_name: Atlassian Rovo Expert system_prompt: | You are an Atlassian expert with access to Jira, Confluence, Compass, and Jira Service Management via the Atlassian Rovo MCP server. ## Capabilities ### Jira - Search issues using JQL (Jira Query Language) with `searchJiraIssuesUsingJql` - Create, update, and transition issues through workflows - Add comments, worklogs, and manage issue metadata - Look up user account IDs and project configurations ### Confluence - Search pages and content using CQL (Confluence Query Language) with `searchConfluenceUsingCql` - Create and update pages and live docs with Markdown content - Add inline and footer comments on pages - Navigate spaces and page hierarchies ### Compass - Create, query, and delete service components (services, libraries, applications) - Define and manage relationships between components - Manage custom field definitions and component metadata - View component activity events (deployments, alerts) ### Jira Service Management - Query ops alerts by ID, alias, or search criteria - View on-call schedules and current/next responders - Get team info including escalation policies and roles - Acknowledge, close, or escalate alerts ### Cross-Product - Use Rovo Search (`search`) for natural language queries across Jira and Confluence - Fetch specific content by Atlassian Resource Identifier (ARI) using `fetch` - Get current user info and list accessible cloud sites ## Best Practices When searching Jira: - Use JQL for precise queries: `project = "MYPROJ" AND status = "Open"` - Start with broad searches, then refine based on results - Use `currentUser()` for user-relative queries - Use `openSprints()` for active sprint work When searching Confluence: - Use CQL for structured searches: `space = "ENG" AND type = page` - Use Rovo Search for natural language queries when JQL/CQL isn't needed - Consider space keys to narrow results When creating content: - Confirm project/space/issue type with the user before creating - Use `getJiraIssueTypeMetaWithFields` to check required fields - Use `getConfluenceSpaces` to list available spaces When handling errors: - If access is denied, explain what permission is needed - Suggest the user contact their Atlassian administrator - For expired tokens, advise re-authentication mcp_connectors: - atlassian-rovo-mcp handoffs: [] Select Save [!NOTE] The mcp_connectors field references the connector name you created in Step 2. This gives the subagent access to all tools provided by the Atlassian Rovo MCP server. Step 4: Add an Atlassian skill Skills provide contextual knowledge and best practices that help agents use tools more effectively. Create an Atlassian skill to give your agent expertise in JQL, CQL, and Atlassian workflows. Navigate to Builder > Skills Select Add skill Paste the following skill configuration: api_version: azuresre.ai/v1 kind: SkillConfiguration metadata: owner: your-team@contoso.com version: "1.0.0" spec: name: atlassian_rovo display_name: Atlassian Rovo description: | Expertise in Atlassian Cloud products including Jira, Confluence, Compass, and Jira Service Management. Use for searching issues with JQL, creating and updating pages, managing service components, investigating ops alerts, and navigating Atlassian workspaces via the Rovo MCP server. instructions: | ## Overview Atlassian Cloud provides integrated tools for project tracking (Jira), documentation (Confluence), service catalog management (Compass), and incident management (Jira Service Management). The Atlassian Rovo MCP server enables natural language interaction with all four products. **Authentication:** OAuth 2.1 or API token (Basic/Bearer). All actions respect existing user permissions. ## Searching Jira with JQL JQL (Jira Query Language) enables precise issue searches. Always use `searchJiraIssuesUsingJql` for structured queries. **Common JQL patterns:** ```jql # Open issues assigned to current user assignee = currentUser() AND status != Done # Bugs created in the last 7 days project = "MYPROJ" AND type = Bug AND created >= -7d # High-priority issues in active sprints project = "MYPROJ" AND priority in (High, Highest) AND sprint in openSprints() # Full-text search project = "MYPROJ" AND text ~ "payment error" # Issues updated recently updated >= -24h ORDER BY updated DESC ``` **JQL operators:** `=`, `!=`, `~` (contains), `in`, `>=`, `<=`, `NOT`, `AND`, `OR` **JQL functions:** `currentUser()`, `openSprints()`, `startOfDay()`, `endOfDay()`, `membersOf("group")` ## Searching Confluence with CQL CQL (Confluence Query Language) searches pages, blog posts, and attachments. Use `searchConfluenceUsingCql` for structured queries. **Common CQL patterns:** ```cql # Search by title title ~ "Architecture" # Search in specific space space = "ENG" AND type = page # Full-text content search text ~ "deployment pipeline" # Recently modified pages lastModified >= now("-7d") AND type = page # Pages by label label = "runbook" AND space = "SRE" ``` **CQL fields:** `title`, `text`, `space`, `type`, `label`, `creator`, `lastModified` ## Creating Jira Issues Follow this workflow: 1. `getVisibleJiraProjects` — list available projects 2. `getJiraProjectIssueTypesMetadata` — list issue types for the project 3. `getJiraIssueTypeMetaWithFields` — get required/optional fields 4. `createJiraIssue` — create the issue Common issue types: Story, Bug, Task, Epic, Sub-task. ## Creating Confluence Pages Pages support Markdown content: 1. `getConfluenceSpaces` — list available spaces 2. `getPagesInConfluenceSpace` — optionally find a parent page 3. `createConfluencePage` — create the page with space, title, and body ## Working with Compass Components Component types: SERVICE, LIBRARY, APPLICATION, CAPABILITY, CLOUD_RESOURCE, DATA_PIPELINE, MACHINE_LEARNING_MODEL, UI_ELEMENT, WEBSITE, OTHER. Relationship types: DEPENDS_ON, OTHER. ## Jira Service Management Operations For incident and alert management: - `getJsmOpsAlerts` — query alerts by ID, alias, or search - `updateJsmOpsAlert` — acknowledge, close, or escalate alerts - `getJsmOpsScheduleInfo` — view on-call schedules and responders - `getJsmOpsTeamInfo` — list teams with escalation policies ## Cross-Product Workflows - Use `search` (Rovo Search) for natural language queries across products - Use `fetch` with ARIs (Atlassian Resource Identifiers) for direct content retrieval - Use `getAccessibleAtlassianResources` to list cloud sites and get cloudIds ## Troubleshooting | Issue | Solution | |-------|----------| | JQL syntax error | Check field names; quote values with spaces | | CQL returns no results | Verify space key; try broader terms | | Cannot create issue | Verify "Create" permission in the project | | Cannot edit page | Verify "Edit" permission in the space | | OAuth expired | Re-invoke any tool to trigger fresh OAuth flow | | "Site admin must authorize" | Admin must complete initial 3LO consent | | cloudId errors | Use `getAccessibleAtlassianResources` to find correct cloudId | mcp_connectors: - atlassian-rovo-mcp Select Save Reference the skill in your subagent Update your subagent configuration to include the skill: spec: name: AtlassianRovoExpert skills: - atlassian_rovo mcp_connectors: - atlassian-rovo-mcp Step 5: Test the integration Open a new chat session with your SRE Agent Try these example prompts: Jira workflows Find all open bugs assigned to me in the PAYMENTS project Create a new story in project PLATFORM titled "Implement rate limiting for API gateway" Show me the available transitions for issue PLATFORM-1234 Add a comment to PLATFORM-1234: "Reviewed and approved for deployment" Log 2 hours of work on PLATFORM-1234 with description "Code review and testing" Confluence workflows Search Confluence for pages about "incident response runbooks" Show me the spaces I have access to Create a new Confluence page in the Engineering space titled "Q3 2025 Architecture Review" What pages are under the "Runbooks" parent page? Compass workflows List all service components in Compass Create a new service component called "payment-gateway" What components depend on the api-gateway service? Show me recent activity events for the auth-service component Jira Service Management workflows Show me active ops alerts from the last 24 hours Who is currently on-call for the platform-engineering schedule? Acknowledge alert with alias "high-cpu-prod-web-01" Get team info and escalation policies for the SRE team Cross-product workflows Search across Jira and Confluence for content related to "deployment pipeline" What Atlassian cloud sites do I have access to? Fetch the Confluence page linked to Jira issue PLATFORM-500 Available tools Jira tools (14 tools) Tool Description Required Scopes searchJiraIssuesUsingJql Search issues using a JQL query read:jira-work getJiraIssue Get issue details by ID or key read:jira-work createJiraIssue Create a new issue in a project write:jira-work editJiraIssue Update fields on an existing issue write:jira-work addCommentToJiraIssue Add a comment to an issue write:jira-work addWorklogToJiraIssue Add a time tracking worklog entry write:jira-work transitionJiraIssue Perform a workflow transition write:jira-work getTransitionsForJiraIssue List available workflow transitions read:jira-work getVisibleJiraProjects List projects the user can access read:jira-work getJiraProjectIssueTypesMetadata List issue types in a project read:jira-work getJiraIssueTypeMetaWithFields Get create-field metadata for a project and issue type read:jira-work getJiraIssueRemoteIssueLinks List remote links (e.g., Confluence pages) on an issue read:jira-work lookupJiraAccountId Find user account IDs by name or email read:jira-work Confluence tools (11 tools) Tool Description Required Scopes searchConfluenceUsingCql Search content using a CQL query search:confluence getConfluencePage Get page content by ID (as Markdown) read:page:confluence createConfluencePage Create a new page or live doc with Markdown body write:page:confluence updateConfluencePage Update an existing page (title, body, location) write:page:confluence getConfluenceSpaces List spaces by key, ID, type, status, or labels read:space:confluence getPagesInConfluenceSpace List pages in a space, filtered by title/status/type read:page:confluence getConfluencePageDescendants List descendant pages under a parent page read:hierarchical-content:confluence createConfluenceFooterComment Create a footer comment or reply on a page write:page:confluence createConfluenceInlineComment Create an inline comment tied to selected text write:page:confluence getConfluencePageFooterComments List footer comments on a page (as Markdown) read:comment:confluence getConfluencePageInlineComments List inline comments on a page read:comment:confluence Compass tools (13 tools) Tool Description Required Scopes getCompassComponents Search or list components read:component:compass getCompassComponent Get component details by ID read:component:compass createCompassComponent Create a service, library, or other component write:component:compass deleteCompassComponent Delete an existing component and its definitions write:component:compass createCompassComponentRelationship Create a relationship between two components write:component:compass deleteCompassComponentRelationship Remove a relationship between two components write:component:compass getCompassComponentActivityEvents List recent activity events (deployments, alerts) read:component:compass getCompassComponentLabels Get labels applied to a component read:component:compass getCompassComponentTypes List available component types read:component:compass getCompassComponentsOwnedByMyTeams List components owned by your teams read:component:compass getCompassCustomFieldDefinitions List custom field definitions read:component:compass createCompassCustomFieldDefinition Create a custom field definition write:component:compass deleteCompassCustomFieldDefinition Delete a custom field definition write:component:compass Jira Service Management tools (4 tools) [!NOTE] JSM tools only support authentication via API token. These tools are available only if API token authentication is enabled by your organization admin. Tool Description Required Scopes getJsmOpsAlerts Get alert by ID/alias or search by query and time window read:ops-alert:jira-service-management , read:ops-config:jira-service-management , read:jira-user getJsmOpsScheduleInfo List on-call schedules or get current/next responders read:ops-config:jira-service-management , read:jira-user getJsmOpsTeamInfo List ops teams with escalation policies and roles read:ops-config:jira-service-management , read:jira-user updateJsmOpsAlert Acknowledge, unacknowledge, close, or escalate an alert read:ops-alert:jira-service-management , write:ops-alert:jira-service-management Rovo / Shared platform tools (4 tools) Tool Description Required Scopes search Natural language search across Jira and Confluence (not JQL/CQL) search:rovo:mcp fetch Fetch content by Atlassian Resource Identifier (ARI) search:rovo:mcp atlassianUserInfo Get current user details (account ID) read:me getAccessibleAtlassianResources List accessible cloud sites and their cloudIds read:account , read:me Troubleshooting Authentication issues Error Cause Solution 401 Unauthorized Invalid or expired API token Generate a new token at id.atlassian.com 403 Forbidden Missing product permissions Verify you have access to the Atlassian product (Jira, Confluence, etc.) "Your site admin must authorize this app" First-time setup requires admin A site admin must complete initial 3LO consent "Your organization admin must authorize access from a domain" Domain not allowed Admin must add the domain in Rovo MCP server settings "You don't have permission to connect from this IP address" IP allowlisting enabled Admin must add your IP range to the allowlist API token auth fails Feature disabled by admin Admin must enable API token authentication Data and permission issues Error Cause Solution No data returned Wrong cloudId or expired session Use getAccessibleAtlassianResources to find the correct cloudId Cannot create issue Missing project permission Verify "Create" permission in the Jira project Cannot update page Missing space permission Verify "Edit" permission in the Confluence space Tool not available Missing scopes Re-create API token with required scopes Compass tools unavailable Scopes not available for API tokens Some Compass tools require OAuth 2.1 JSM tools not working API token auth disabled Admin must enable API token authentication Verify the connection Test the server endpoint directly: # Test with API token (Basic auth) curl -I https://mcp.atlassian.com/v1/mcp \ -H "Authorization: Basic <your_base64_encoded_credentials>" # Test with service account (Bearer auth) curl -I https://mcp.atlassian.com/v1/mcp \ -H "Authorization: Bearer <your_api_key>" Expected response: 200 OK confirms authentication is working. Re-authorize the integration If you encounter persistent issues: Go to id.atlassian.com/manage-profile/apps Find and revoke the MCP app authorization Generate a new API token or re-invoke a tool to trigger fresh OAuth flow Limitations Limitation Details Limited tool availability with API tokens Some tools (e.g., certain Compass tools) may not be available because required scopes aren't available for API tokens No bounded cloudId API tokens are not bound to a specific cloudId. Tools must explicitly pass the cloudId where needed No domain allowlist validation API token auth doesn't use OAuth redirect URIs, so domain allowlist checks cannot be performed Bitbucket tools Bitbucket scopes are available in the token, but Bitbucket tools are not yet listed as supported JSM requires API token Jira Service Management tools only work with API token authentication, not OAuth 2.1 Security considerations How permissions work User-scoped: All actions respect the authenticated user's existing Atlassian permissions Product-level: Access requires matching product permissions (Jira, Confluence, Compass) Session-based: OAuth tokens expire and require re-authentication; API tokens persist until revoked Admin controls Atlassian administrators can: - Enable or disable API token authentication in Rovo MCP server settings - Manage and revoke MCP app access from the Connected Apps list - Control which external domains can connect via domain allowlists - Monitor activity through Atlassian audit logs - Configure IP allowlisting for additional security [!IMPORTANT] MCP clients can perform actions in Jira, Confluence, and Compass with your existing permissions. Use least privilege, review high-impact changes before confirming, and monitor audit logs for unusual activity. See MCP Clients - Understanding security risks. Related content Atlassian Rovo MCP Server - Getting started Atlassian Rovo MCP Server - Supported tools Atlassian Rovo MCP Server - API token authentication Atlassian Rovo MCP Server - OAuth 2.1 configuration Control Atlassian Rovo MCP Server settings MCP Clients - Understanding security risks MCP integration overview Build a custom subagent1.5KViews1like2CommentsWho's Calling Your Service? Designing for Humans and Agents at the Same Time
We're building three interfaces for Azure SRE Agent: an interactive CLI for humans at a terminal, an agent mode for coding agents that spawn it as a subprocess, and an MCP server for humans inside coding agents and for remote agents in other ecosystems. The CLI and agent mode are coming. The MCP server ships first. That ordering wasn't obvious at first, and this post is about why we landed there. Three interfaces, one question: who's actually calling this? When we started designing the Azure SRE Agent CLI, the question we kept running into was deceptively simple: who's the caller? A human at 2 AM during an incident? Yes. But also a coding agent mid-session that wants SRE Agent capabilities without leaving VS Code. And a PagerDuty SRE agent running an automated triage loop with no human in the picture. And another Azure SRE Agent instance that wants to delegate a sub-task. Four callers. Same backend. None of them want the same thing. The three interfaces map to these callers: Interactive CLI: humans at a terminal, terse and incident-optimized Agent Mode: coding agents like Copilot CLI that spawn it as a subprocess MCP Server: humans inside coding agents, and remote agents in other ecosystems The MCP server is the one shipping now. Here's why. The CLI requires someone to reach for it The interactive CLI and agent mode have something in common: the caller has to know Azure SRE Agent exists and decide to invoke it. A human types a command. A coding agent spawns a subprocess. Either way, it's a deliberate call. The MCP server works differently. It surfaces itself as tools inside whatever environment the caller is already in. The model decides when to use them. An SRE working in Copilot CLI doesn't open a separate terminal and type a command. They ask a question and the right tool fires. A remote agent in a PagerDuty loop doesn't spawn a subprocess. It speaks a protocol and gets a response. That's the difference. The CLI requires intent. The MCP server meets callers where they already are. Two callers, one protocol The MCP surface has two audiences. They speak the same protocol but come from completely different contexts. Humans inside coding agents. An SRE in VS Code Copilot, Claude Desktop, GitHub Copilot CLI, or Cursor is already in a session: writing a deployment script, reviewing a runbook, debugging a failing service. They don't want a context switch. They want SRE Agent capabilities alongside the work they're already doing. Connect the MCP server once and it's just there. Remote agents in other ecosystems. An AWS DevOps agent handling a cross-cloud incident might need to check Azure resource health without bouncing the call to a human. A PagerDuty SRE agent might pull an incident summary as part of its triage loop. One Azure SRE Agent instance might delegate work to another. MCP is what makes any of this work without custom integrations on both sides. Both sides agree on the protocol. Neither side has to know the other's internals. Caller Context What they need Human in Copilot CLI / VS Code Copilot Mid-workflow, coding session Readable summary, minimal overhead Human in Claude Desktop / Cursor Agentic session SRE tools available in the conversation AWS DevOps Agent Automated incident loop Defined schema, stable fields PagerDuty SRE Agent Triage pipeline Parseable, sparse, no narrative Other Azure SRE Agent Delegated sub-task Agent-to-agent contract Tool descriptions are product decisions Each MCP tool maps to a specific SRE Agent capability. Tools have names, descriptions in natural language, and JSON input schemas. The descriptions do more work than they look like they should. When someone in Copilot CLI asks "what's wrong with my API gateway," the model reads tool descriptions to decide which tool to call. A description that says "Returns health status for an Azure resource" gets invoked less reliably than one that says "Check whether an Azure resource (VM, gateway, database, container) is healthy, degraded, or unreachable. Use this when diagnosing an active outage or validating state after a deployment." The second version tells the model when to reach for the tool, not just what it does. PM, engineering, and content design reviewed descriptions together. When an invocation misfired in testing, the fix was almost always the description, not the schema. We iterated on tool descriptions the same way you'd iterate on a system prompt, because that's what they are. One output shape, two callers The human-in-a-coding-agent and the remote-agent-in-an-automated-loop want different things from the same tool response. A human wants something readable. A remote agent wants something parseable. The obvious answer is to return different shapes based on who's calling. We didn't do that. Every tool response follows the same contract: defined fields, stable semantics, no preamble, no internal reasoning, plus one summary field with a plain-language sentence. A human reads the summary. A remote agent ignores it and parses the structured fields. The overhead is negligible in both directions. We briefly considered branching on a caller-type hint in the request header. The problem: it added surface area to maintain and created subtle failure modes when the hint was wrong or missing. One shape, always. What's harder than it looks Statelessness is a feature for remote agents and friction for humans. MCP tools are stateless by design. Each invocation is independent. Remote agents love this; they don't want to manage session state across calls. Humans working interactively want context to carry forward. We handled it by making every response self-sufficient: the tool returns enough context that the model can construct a coherent follow-up call without re-explaining the situation. The tool doesn't remember. It returns enough that memory is cheap for whoever holds it. You can't test the remote agent use case the same way you test the human use case. Spin up Copilot CLI, connect the server, ask a question and you can watch what happens. You can't easily simulate an AWS agent calling you cold with no prior context about what Azure SRE Agent does. Designing for that caller meant writing descriptions and schemas that work for a model meeting your tools for the first time, with no assumed vocabulary and no assumed workflow. What's next The interactive CLI and agent mode follow the same three-node architecture. The interactive CLI is for humans at the terminal: terse, incident-optimized, with progressive disclosure. Agent Mode is for coding agents that spawn the CLI as a subprocess and want direct access to SRE Agent capabilities without a protocol layer in between. Both are in progress. In the meantime connect SRE Agent to your MCP client and it will show up where you're already working. Part of a series on the design decisions behind Azure SRE Agent. Companion posts on the CLI and agent mode will follow when they ship.224Views0likes0CommentsAccess Your SRE Agent from Any IDE, Terminal, or AI Assistant
Your team already uses an SRE Agent — it monitors your services, learns your architecture, and handles operational tasks. Now developers can talk to that agent in natural language from the interfaces they already use every day: their editor, their terminal, their AI assistant. Check what the agent knows, ask it a question, search its memories, wire it into a workflow — all without leaving the tool where they're already writing code. What we're announcing Azure SRE Agent tools are now shipping in the Azure MCP Server. The azure/mcp package includes a full set of SRE Agent tools that let you manage and operate your SRE Agents from any MCP-compatible client — GitHub Copilot CLI, VS Code Copilot, Cursor, Claude Desktop, or any agent framework that speaks MCP. No separate CLI. No portal tab. No custom integration code. Your SRE Agent becomes accessible wherever you already think and work. This is about meeting developers where they are. Your SRE Agent has deep context about your systems — incident history, architecture knowledge, operational patterns. Now that expertise is accessible from VS Code, from your terminal, from any MCP-compatible AI assistant. Just type a question in natural language and your agent responds, right inside the workflow you're already in. Your SRE Agent stops being a destination you visit and becomes part of how your team works every day. This post walks through what you can do and how to get it running, using GitHub Copilot CLI as the example. The same setup works in VS Code Copilot, Claude Desktop, Cursor, and any other MCP-compatible client. What this unlocks Once the Azure MCP Server is connected to Copilot CLI, you can talk to your SRE Agent infrastructure the same way you'd ask a colleague: "List my SRE agents in subscription <sub-id> " "Create a Kusto connector named prod-logs on agent myagent pointing at cluster https://help.kusto.windows.net , database Samples " "Search memories on agent myagent for 'deployment failures'" "Pause the nightly scheduled task on agent myagent " "Generate an architecture plan for a multi-region web app" The full capability set breaks down into seven areas: Manage SRE Agents. List, get, and create Microsoft.App/agents resources in your subscription. Discover which tools a given agent has access to. Resource groups are resolved automatically via Resource Graph. Configure connectors. Create and manage Kusto connectors, MCP connectors (both http and stdio transports), Azure Monitor connectors, and more. Connectors go through ARM and show up in the Azure portal alongside anything you created there. MCP connectors default to system-assigned managed identity. Run and inspect threads. Create and list conversation threads, get thread details, send messages, and manage hooks on a thread. This is how you talk to the agent programmatically or inspect what it's doing mid-run. Schedule recurring work. Create, list, pause, resume, and delete scheduled tasks. Manage incidents. List active incidents and run incident setup commands for PagerDuty and ServiceNow. Knowledge and prompts. Manage common prompts like safety rules and standing instructions. Search, upload, and delete memories. List and delete skills. Fetch agent docs by topic. Author workflows. Generate architecture plans from requirements. Generate, validate, and apply YAML workflows. Safety Giving an AI assistant broad management access to your SRE Agents means it's worth knowing what guardrails are in place: Destructive operations require --confirm true . Any delete (connectors, hooks, memories, skills, scheduled tasks, sub-agents) refuses to run without the explicit flag. There's no way to accidentally tear something down through an autocompleted command. Secrets are stripped before they reach your client. Bearer tokens, API keys, passwords, connection strings, and Authorization headers are redacted from connector and tool responses. Error messages are sanitized. Upstream error bodies are scrubbed for secrets and truncated before surfacing, so credentials don't leak through error text. Data-plane calls are pinned to *.azuresre.ai . HTTPS is required; the host suffix is enforced to prevent SSRF. http:// is only allowed for localhost . Third-party hosts are pinned. ServiceNow connectors are restricted to .service-now.com and .servicenowservices.com . PagerDuty subdomains must be valid DNS labels. MCP connector secrets must be env-referenced. Header and environment values for MCP connectors must use ${env:NAME} syntax — literal secrets are rejected so they never enter LLM context. Prerequisites Before connecting anything, make sure you have the following installed and authenticated. Node.js LTS node --version If you're not on a current LTS, update via nodejs.org or your package manager. The Azure MCP Server is tested against the active Node.js LTS releases. Azure CLI The MCP server uses DefaultAzureCredential , which picks up credentials from az login . You need the Azure CLI installed and signed in. Install: https://learn.microsoft.com/cli/azure/install-azure-cli az --version az login If you work across multiple tenants: az login --tenant <tenant-id> If you have multiple subscriptions, set a default: az account set --subscription <subscription-id> GitHub Copilot CLI Install Copilot CLI following the official instructions: https://docs.github.com/en/copilot/how-tos/use-copilot-agents/use-copilot-cli Once installed, launch it and authenticate with your GitHub account. Connect the Azure MCP Server to Copilot CLI The Azure MCP Server runs as an npm package ( azure/mcp ) and launches via npx . The easiest way to add it is interactively from within Copilot CLI: /mcp add Follow the prompts: name it azure , set the command to npx , and args to -y azure/mcp@latest server start . Or add it manually to your MCP config file ( ~/.copilot/mcp.json or .copilot/mcp.json in your repo): { "mcpServers": { "azure": { "type": "stdio", "command": "npx", "args": ["-y", "@azure/mcp@latest", "server", "start"] } } } Restart Copilot CLI after saving. On the next launch, npx fetches azure/mcp and starts the server automatically. If you'd rather install globally: npm install -g @azure/mcp azmcp server start Keeping it up to date @latest means npx pulls the newest version on each launch, but npx caches aggressively. If you upgrade and the old version is still running: npx clear-npx-cache # or: rm -rf ~/.npm/_npx Then restart Copilot CLI. For environments where you want version stability, pin an exact version instead: "args": ["-y", "@azure/mcp@0.x.y", "server", "start"] Bump the version string when you're ready to upgrade. Set up access to your SRE Agents The MCP server doesn't pin to a specific agent. It discovers agents dynamically from your subscription and you target one per command. Two steps to get access working. Assign RBAC You need two roles on the Microsoft.App/agents resource (or at the resource group or subscription level): Role What it covers Reader Control-plane: list and get agents and connectors via ARM SRE Agent Administrator Data-plane: threads, memories, scheduled tasks, prompts, and everything on the agent's own endpoint az role assignment create \ --assignee <your-upn-or-objectid> \ --role "Reader" \ --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.App/agents/<agentName> az role assignment create \ --assignee <your-upn-or-objectid> \ --role "SRE Agent Administrator" \ --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.App/agents/<agentName> On Windows PowerShell, use a single line or backtick continuations instead of \ . Find your agents From Copilot CLI, ask: "List my SRE agents in subscription <sub-id> " This returns each agent's name, resource group, and endpoint. Once you have that, you're ready to work. How the calls work under the hood Two distinct layers, worth knowing which is which. Control-plane (agents, connectors): goes through Azure Resource Manager at Microsoft.App/agents , API version 2025-05-01-preview . Anything you create or modify shows up in the Azure portal. Data-plane (threads, memories, scheduled tasks, incidents, prompts, skills, hooks, docs, workflows): goes through the agent's own endpoint at https://<name>--<hash>.<region>.azuresre.ai . The server handles the SRE Agent token audience automatically. You don't need to manage separate credentials for the data plane. Your az login session covers both. When things go wrong Symptom What's happening Fix 401/403 on data-plane calls Missing SRE Agent Administrator role Assign the role at the agent scope 403 on ARM calls Missing Reader role Assign Reader at subscription, RG, or agent scope "No agent endpoint" Agent not fully provisioned Check provisioningState in the portal sreagent_* tools not showing up npx cache is stale npx clear-npx-cache , restart Copilot CLI Wrong tenant errors Credentials from a different tenant az login --tenant <id> , restart Copilot CLI Verify it's working Ask Copilot CLI: "List my Azure subscriptions" or "List my SRE agents." If sreagent_* tools appear in the tool list and return results, you're connected and on a version that includes this release. Get started with Azure SRE Agent If you don't have an SRE Agent yet, you can create one in minutes from the Azure portal or through the CLI. Connect it to your code, your logs, and your incident sources — and it starts building expertise from day one. Once you've added the Azure MCP Server to your editor, your agent is one sentence away in every session. Resources SRE Agent documentation — https://aka.ms/sreagent/newdocs SRE Agent overview — https://aka.ms/sreagent/newdocsoverview Azure MCP Server — https://aka.ms/azmcp Azure MCP Server get-started — https://learn.microsoft.com/azure/developer/azure-mcp-server/get-started Deep Context blog — https://aka.ms/sreagent/blogs/deepcontextblog4KViews0likes0Comments