Blog Post

Apps on Azure Blog
4 MIN READ

Unifying Scattered Observability Data from Dynatrace + Azure for Self-Healing with SRE Agent

Vineela-Suri's avatar
Vineela-Suri
Icon for Microsoft rankMicrosoft
Jan 26, 2026

What if your deployments could fix themselves?

The Deployment Remediation Challenge

Modern operations teams face a recurring nightmare:

  • A deployment ships at 9 AM
  • Errors spike at 9:15 AM  
  • By the time you correlate logs, identify the bad revision, and execute a rollback—it's 10:30 AM
  • Your users felt 75 minutes of degraded experience

The data to detect and fix this existed the entire time—but it was scattered across clouds and platforms:

  • Error logs and traces → Dynatrace (third-party observability cloud)
  • Deployment history and revisions → Azure Container Apps API
  • Resource health and metrics → Azure Monitor
  • Rollback commands → Azure CLI

Your observability data lives in one cloud. Your deployment data lives in another. Stitching together log analysis from Dynatrace with deployment correlation from Azure—and then executing remediation—required a human to manually bridge these silos.

What if an AI agent could unify data from third-party observability platforms with Azure deployment history and act on it automatically—every week, before users even notice?

Enter SRE Agent + Model Context Protocol (MCP) + Subagents

Azure SRE Agent doesn't just work with Azure. Using the Model Context Protocol (MCP), you can connect external observability platforms like Dynatrace directly to your agent. Combined with subagents for specialized expertise and scheduled tasks for automation, you can build an automated deployment remediation system.

Here's what I built/configured for my Azure Container Apps environment inside SRE Agent:

ComponentPurpose
Dynatrace MCP Connector

Connect to Dynatrace's MCP gateway for log queries via DQL

'Dynatrace' Subagent

Log analysis specialist that executes DQL queries and identifies root causes

'Remediation' Subagent

Deployment remediation specialist that correlates errors with deployments and executes rollbacks

Scheduled Task

Weekly Monday 9 AM health check for the 'octopets-prod-api' Container App

Subagent workflow:

The subagent workflow in SRE Agent Builder: 'OctopetsScheduledTask' triggers 'RemediationSubagent' (12 tools), which hands off to 'DynatraceSubagent' (3 MCP tools) for log analysis.

How I Set It Up: Step by Step

Step 1: Connect Dynatrace via MCP

SRE Agent supports the Model Context Protocol (MCP) for connecting external data sources. Dynatrace exposes an MCP gateway that provides access to its APIs as first-class tools.

Connection configuration:

{ "name": "dynatrace-mcp-connector", 
"dataConnectorType": "Mcp", 
"dataSource": "Endpoint=https://<your-tenant>.live.dynatrace.com/platform-reserved/mcp-gateway/v0.1/servers/dynatrace-mcp/mcp;AuthType=BearerToken;BearerToken=<your-api-token>" }

Once connected, SRE Agent automatically discovers Dynatrace tools. 

💡 Tip: When creating your Dynatrace API token, grant the `entities.read`, `events.read`, and `metrics.read` scopes for comprehensive access.

Step 2: Build Specialized Subagents

Generic agents are good. Specialized agents are better.

I created two subagents that work together in a coordinated workflow—one for Dynatrace log analysis, the other for deployment remediation.

DynatraceSubagent

This subagent is the log analysis specialist. It uses the Dynatrace MCP tools to execute DQL queries and identify root causes.

Key capabilities:

  • Executes DQL queries via MCP tools (`create-dql`, `execute-dql`, `explain-dql`)
  • Fetches 5xx error counts, request volumes, and spike detection
  • Returns consolidated analysis with root cause, affected services, and error patterns

👉 View full DynatraceSubagent configuration here

RemediationSubagent

This is the deployment remediation specialist. It correlates Dynatrace log analysis with Azure Container Apps deployment history, generates correlation charts, and executes rollbacks when confidence is high.

Key capabilities:

  • Retrieves Container Apps revision history (`GetDeploymentTimes`, `ListRevisions`)
  • Generates correlation charts (`PlotTimeSeriesData`, `PlotBarChart`, `PlotAreaChartWithCorrelation`)
  • Computes confidence score (0-100%) for deployment causation
  • Executes rollback and traffic shift when confidence > 70%

👉 View full RemediationSubagent configuration here

 

The power of specialization: Each agent focuses on its domain—DynatraceSubagent handles log analysis, RemediationSubagent handles deployment correlation and rollback. When the workflow runs, RemediationSubagent hands off to DynatraceSubagent (bi-directional handoff) for analysis, gets the findings back, and continues with remediation. Simple delegation, not a single monolithic agent trying to do everything.

Step 3: Create the Weekly Scheduled Task

Now the automation. I configured a scheduled task that runs every Monday at 9:30 AM to check whether deployments in the last 4 hours caused any issues—and automatically remediate if needed.

Scheduled task configuration:

SettingValue
Task NameOctopetsScheduledTask
FrequencyWeekly
Day of WeekMonday
Time9:30 AM
Response SubagentRemediationSubagent

 

Scheduled Task Configuration

Configuring the OctopetsScheduledTask in the SRE Agent portal

The key insight: the scheduled task is just a coordinator. It immediately hands off to the RemediationSubagent, which orchestrates the entire workflow including handoffs to DynatraceSubagent.

Step 4: See It In Action

Here's what happens when the scheduled task runs:

The scheduled task triggering and initiating Dynatrace analysis for octopets-prod-api

The DynatraceSubagent analyzes the logs and identifies the root cause:

DynatraceSubagent executing DQL queries and returning consolidated log analysis

The RemediationSubagent then generates correlation charts:

5xx errors spiked after deploying revision 0000039Error volume isolated to revision 0000039; prior revision 0000038 shows zero errors5xx error rate correlates strongly with deployment of revision 0000039

Finally, with a 95% confidence score, SRE agent executes the rollback autonomously:

RemediationSubagent executing rollback and traffic shift autonomously.

The agent detected the bad deployment, generated visual evidence, and automatically shifted 100% traffic to the last known working revision—all without human intervention.

Why This Matters
BeforeAfter
Manually check Dynatrace after incidentsAutomated DQL queries via MCP
Stitch together logs + deployments manuallySubagents correlate data automatically
Rollback requires human decision + executionConfidence-based auto-remediation
75+ minutes from deployment to rollbackUnder 5 Minutes with autonomous workflow
Reactive incident responseProactive weekly health checks
Try It Yourself
  1. Connect your observability tool via MCP (Dynatrace, Datadog, Prometheus—any tool with an MCP gateway)
  2. Build a log analysis subagent that knows how to query your observability data
  3. Build a remediation subagent that can correlate logs with deployments and execute fixes
  4. Wire them together with handoffs so the subagents can delegate log analysis
  5. Create a scheduled task to trigger the workflow automatically

Learn More

Updated Jan 26, 2026
Version 1.0
No CommentsBe the first to comment