Your tools. Your workflows. SRE Agent adapts.
SRE Agent natively integrates with PagerDuty, ServiceNow, and Azure Monitor. But your team might use Jira for incident tracking. Grafana for dashboards. Loki for logs. Prometheus for metrics.
These aren't natively supported. That doesn't matter.
SRE Agent supports MCP, the Model Context Protocol. Any MCP-compatible server extends the agent's capabilities. Connect your Grafana instance. Connect your Jira. The agent queries logs, correlates errors, and creates tickets with root cause analysis across tools that were never designed to talk to each other.
The Scenario
I built a grocery store app that simulates a realistic SRE scenario: an external supplier API starts rate limiting your requests. Customers see "Unable to check inventory" errors. The on-call engineer gets paged.
The goal: SRE Agent should diagnose the issue by querying Loki logs through Grafana, identify the root cause, and create a Jira ticket with findings and recommendations.
The app runs on Azure Container Apps with Loki for logs and Azure Managed Grafana for visualization.
š Deploy it yourself: github.com/dm-chelupati/grocery-sre-demo
How I Set Up SRE Agent: Step by Step
Step 1: Create SRE Agent
I created an SRE Agent and gave it Reader access to my subscription
Step 2: Connect to Grafana and Jira via MCP
Neither MCP server had a remotely hosted option, and their stdio setup didn't match what SRE Agent supports. So I hosted them myself as Azure Container Apps:
Grafana MCP Server ā connects to my Azure Managed Grafana instance
Atlassian MCP Server ā connects to my Jira Cloud instance
Now I have two endpoints SRE Agent can reach:
- https://ca-mcp-grafana.<env>.azurecontainerapps.io/mcp
- https://ca-mcp-jira.<env>.azurecontainerapps.io/mcp
I added both to SRE Agent's MCP configuration as remotely hosted servers.
Step 3: Create Sub-Agent with Tools and Instructions
I created a sub-agent specifically for incident diagnosis with these tools enabled:
- Grafana MCP (for querying Loki logs)
- Atlassian MCP (for creating Jira tickets)
Instructions were simple:
You are expert in diagnosing applications running on Azure services. You need to use the Grafana tools to get the logs, metrics or traces and create a summary of your findings inside Jira as a ticket. use your knowledge base file loki-queries.md to learn about app configuration with loki and Query the loki for logs in Grafana.
Step 4: Invoke Sub-Agent and Watch It Work
I went to the SRE Agent chat and asked:
@JiraGrafanaexpert: My container app ca-api-3syj3i2fat5dm in resource group rg-groceryapp is experiencing rate limit errors from a supplier API when checking product inventory.
The agent:
- Queried Loki via Grafana MCP: {app="grocery-api"} |= "error"
- Found 429 rate limit errors spiking ā 55+ requests hitting supplier API limits
- Identified root cause: SUPPLIER_RATE_LIMIT_429 from FreshFoods Wholesale API
- Created a Jira ticket:
One prompt. Logs queried. Root cause identified. Ticket created with remediation steps.
Making It Better: The Knowledge File
SRE Agent can explore and discover how your apps are wired but you can speed that up. When querying observability data sources, the agent needs to learn the schema, available labels, table structures, and query syntax. For Loki, that means understanding LogQL, knowing which labels your apps use, and what JSON fields appear in logs.
SRE Agent can figure things out, but with context, it gets there faster ā just like humans.
I created a knowledge file that gives the agent a head start:
With this context, the agent knows exactly which labels to query, what fields to extract from JSON logs, and which query patterns to use
š See my full knowledge file
How MCP Makes This Possible
SRE Agent supports two ways to connect MCP servers:
stdio ā runs locally via command. This works for MCP servers that can be invoked via npx, node, or uvx. For example: npx -y @modelcontextprotocol/server-github.
Remotely hosted ā HTTP endpoint with streamable transport:
- https://mcp-server.example.com/sse or /mcp
The catch: Not every MCP server fits these options out of the box.
Some servers only support stdio but not the npx/node/uvx formats SRE Agent expects. Others don't offer a hosted endpoint at all.
The solution: host them yourself. Deploy the MCP server as a container with an HTTP endpoint. That's what I did with Grafana MCP Server and Atlassian MCP Server, deployed both as Azure Container Apps exposing /mcp endpoints.
Why This Matters
Enterprise tooling is fragmented across Azure and non-Azure ecosystems. Some teams use Azure Monitor, others use Datadog. Incident tracking might be ServiceNow in one org and Jira in another. Logs live in Loki, Splunk, Elasticsearch and sometimes all three.
SRE Agent meets you where you are. Azure-native tools work out of the box. Everything else connects via MCP. Your observability stack stays the same. Your ticketing system stays the same. The agent becomes the orchestration layer that ties them together.
One agent. Any tool. Intelligent workflows across your entire ecosystem.
Try It Yourself
- Create an SRE Agent
- Deploy MCP servers for your tools (Grafana, Atlassian)
- Create a sub-agent with the MCP tools connected
- Add a knowledge file with your app context
- Ask it to diagnose an issue
Watch logs become tickets. Errors become action items. Context becomes intelligence.
Learn More
Azure SRE Agent is currently in preview.