What if your deployments could fix themselves?
The Deployment Remediation Challenge
Modern operations teams face a recurring nightmare:
- A deployment ships at 9 AM
- Errors spike at 9:15 AM
- By the time you correlate logs, identify the bad revision, and execute a rollback—it's 10:30 AM
- Your users felt 75 minutes of degraded experience
The data to detect and fix this existed the entire time—but it was scattered across clouds and platforms:
- Error logs and traces → Dynatrace (third-party observability cloud)
- Deployment history and revisions → Azure Container Apps API
- Resource health and metrics → Azure Monitor
- Rollback commands → Azure CLI
Your observability data lives in one cloud. Your deployment data lives in another. Stitching together log analysis from Dynatrace with deployment correlation from Azure—and then executing remediation—required a human to manually bridge these silos.
What if an AI agent could unify data from third-party observability platforms with Azure deployment history and act on it automatically—every week, before users even notice?
Enter SRE Agent + Model Context Protocol (MCP) + Subagents
Azure SRE Agent doesn't just work with Azure. Using the Model Context Protocol (MCP), you can connect external observability platforms like Dynatrace directly to your agent. Combined with subagents for specialized expertise and scheduled tasks for automation, you can build an automated deployment remediation system.
Here's what I built/configured for my Azure Container Apps environment inside SRE Agent:
| Component | Purpose |
| Dynatrace MCP Connector |
Connect to Dynatrace's MCP gateway for log queries via DQL |
| 'Dynatrace' Subagent |
Log analysis specialist that executes DQL queries and identifies root causes |
| 'Remediation' Subagent |
Deployment remediation specialist that correlates errors with deployments and executes rollbacks |
| Scheduled Task |
Weekly Monday 9 AM health check for the 'octopets-prod-api' Container App |
Subagent workflow:
The subagent workflow in SRE Agent Builder: 'OctopetsScheduledTask' triggers 'RemediationSubagent' (12 tools), which hands off to 'DynatraceSubagent' (3 MCP tools) for log analysis.
How I Set It Up: Step by Step
Step 1: Connect Dynatrace via MCP
SRE Agent supports the Model Context Protocol (MCP) for connecting external data sources. Dynatrace exposes an MCP gateway that provides access to its APIs as first-class tools.
Connection configuration:
{ "name": "dynatrace-mcp-connector",
"dataConnectorType": "Mcp",
"dataSource": "Endpoint=https://<your-tenant>.live.dynatrace.com/platform-reserved/mcp-gateway/v0.1/servers/dynatrace-mcp/mcp;AuthType=BearerToken;BearerToken=<your-api-token>" }
Once connected, SRE Agent automatically discovers Dynatrace tools.
💡 Tip: When creating your Dynatrace API token, grant the `entities.read`, `events.read`, and `metrics.read` scopes for comprehensive access.
Step 2: Build Specialized Subagents
Generic agents are good. Specialized agents are better.
I created two subagents that work together in a coordinated workflow—one for Dynatrace log analysis, the other for deployment remediation.
DynatraceSubagent
This subagent is the log analysis specialist. It uses the Dynatrace MCP tools to execute DQL queries and identify root causes.
Key capabilities:
- Executes DQL queries via MCP tools (`create-dql`, `execute-dql`, `explain-dql`)
- Fetches 5xx error counts, request volumes, and spike detection
- Returns consolidated analysis with root cause, affected services, and error patterns
👉 View full DynatraceSubagent configuration here
RemediationSubagent
This is the deployment remediation specialist. It correlates Dynatrace log analysis with Azure Container Apps deployment history, generates correlation charts, and executes rollbacks when confidence is high.
Key capabilities:
- Retrieves Container Apps revision history (`GetDeploymentTimes`, `ListRevisions`)
- Generates correlation charts (`PlotTimeSeriesData`, `PlotBarChart`, `PlotAreaChartWithCorrelation`)
- Computes confidence score (0-100%) for deployment causation
- Executes rollback and traffic shift when confidence > 70%
👉 View full RemediationSubagent configuration here
The power of specialization: Each agent focuses on its domain—DynatraceSubagent handles log analysis, RemediationSubagent handles deployment correlation and rollback. When the workflow runs, RemediationSubagent hands off to DynatraceSubagent (bi-directional handoff) for analysis, gets the findings back, and continues with remediation. Simple delegation, not a single monolithic agent trying to do everything.
Step 3: Create the Weekly Scheduled Task
Now the automation. I configured a scheduled task that runs every Monday at 9:30 AM to check whether deployments in the last 4 hours caused any issues—and automatically remediate if needed.
Scheduled task configuration:
| Setting | Value |
| Task Name | OctopetsScheduledTask |
| Frequency | Weekly |
| Day of Week | Monday |
| Time | 9:30 AM |
| Response Subagent | RemediationSubagent |
Scheduled Task Configuration
Configuring the OctopetsScheduledTask in the SRE Agent portal
The key insight: the scheduled task is just a coordinator. It immediately hands off to the RemediationSubagent, which orchestrates the entire workflow including handoffs to DynatraceSubagent.
Step 4: See It In Action
Here's what happens when the scheduled task runs:
The scheduled task triggering and initiating Dynatrace analysis for octopets-prod-api
The DynatraceSubagent analyzes the logs and identifies the root cause:
DynatraceSubagent executing DQL queries and returning consolidated log analysisThe RemediationSubagent then generates correlation charts:
5xx errors spiked after deploying revision 0000039Error volume isolated to revision 0000039; prior revision 0000038 shows zero errors5xx error rate correlates strongly with deployment of revision 0000039Finally, with a 95% confidence score, SRE agent executes the rollback autonomously:
RemediationSubagent executing rollback and traffic shift autonomously.The agent detected the bad deployment, generated visual evidence, and automatically shifted 100% traffic to the last known working revision—all without human intervention.
Why This Matters
| Before | After |
| Manually check Dynatrace after incidents | Automated DQL queries via MCP |
| Stitch together logs + deployments manually | Subagents correlate data automatically |
| Rollback requires human decision + execution | Confidence-based auto-remediation |
| 75+ minutes from deployment to rollback | Under 5 Minutes with autonomous workflow |
| Reactive incident response | Proactive weekly health checks |
Try It Yourself
- Connect your observability tool via MCP (Dynatrace, Datadog, Prometheus—any tool with an MCP gateway)
- Build a log analysis subagent that knows how to query your observability data
- Build a remediation subagent that can correlate logs with deployments and execute fixes
- Wire them together with handoffs so the subagents can delegate log analysis
- Create a scheduled task to trigger the workflow automatically