Blog Post

Apps on Azure Blog
5 MIN READ

Proactive Cloud Ops with SRE Agent: Scheduled Checks for Cloud Optimization

dchelupati's avatar
dchelupati
Icon for Microsoft rankMicrosoft
Jan 19, 2026

Cloud operations isn't just about keeping things running - it's about running them better.

The Cloud Optimization Challenge

Your cloud environment is always changing:

  • New features ship weekly
  • Traffic patterns shift seasonally
  • Costs creep up quietly
  • Security best practices evolve
  • Teams spin up resources and forget them

It's Monday morning. You open the Azure portal. Everything looks... fine. But "fine" isn't great. That VM has been at 8% CPU for weeks. A Key Vault secret expires in 12 days. 

Nothing's broken. But security is drifting, costs are creeping, and capacity gaps are growing silently.

The question isn't "is something broken?" it's "could this be better?"

Four Pillars of Cloud Optimization

PillarWhat Teams WantThe Challenge
SecurityStay compliant, reduce riskConfig drift, legacy settings, expiring creds
CostSpend efficiently, justify budgetHard to spot waste across 100s of resources
PerformanceMeet SLOs, handle growthKnow when to scale before demand hits
AvailabilityMaximize uptime, build resilienceHidden dependencies, single points of failure

Most teams check these sometimes. SRE Agent checks them continuously.

Enter SRE Agent + Scheduled tasks

SRE Agent can pull data from Azure Monitor, resource configurations, metrics, logs, traces, errors, cost data and analyze it on a schedule. If you use tools outside Azure (Datadog, PagerDuty, Splunk), you can connect those via MCP servers so the agent sees your full observability stack.

My setup uses Azure-native sources. Here's how I wired it up.

How I Set It Up: Step by Step

Step 1: Create SRE Agent with Subscription Access

I created an SRE Agent without attaching it to any specific resource group. Instead, I gave it Reader access at the subscription level. This lets the agent scan across all my resource groups for optimization opportunities.

No resource group configuration needed. The agent builds a knowledge graph of everything VMs, storage accounts, Key Vaults, NSGs, web apps across the subscription.

Step 2: Create and Upload My Organization Practices

I created an org-practices.md file that defines what "good" looks like for my team:

I uploaded this to SRE Agent's knowledge base. Now the agent knows our bar, not just Azure defaults.

👉 See my full org-practices.md

Source repos for this demo:

Step 3: Connect to Teams Channel

I connected SRE Agent to my team's Teams channel so findings land where we already work. Critical findings get immediate notifications. Warnings go into a daily digest. No more logging into separate dashboards. The insights come to us.

Step 4: Connect Resource Groups to GitHub Repos

Add the two resource groups to the SRE Agent and link the apps to their corresponding GitHub repos:

Resource GroupGitHub Repository
rg-security-opt-demosecurity-demoapp
rg-cost-opt-sreademocostoptimizationapp

This enables the agent to create GitHub issues for findings linking violations directly to the repo responsible for that infrastructure.

Step 5: Test with Prompts

Before setting up automation, I tested the agent with manual prompts to make sure it was finding the right issues. The agent ran the checks, compared against my org-practices.md, and identified the issues.

Security Check:

Scan resource group "rg-security-opt-demo" for any violations of our security practices defined in org-practices.md in your knowledge base. list violations with severity and remediation steps. Make sure to check against all critical requirements and send message in teams channel with your findings and create an issue in the github repo https://github.com/dm-chelupati/security-demoapp.git

Cost Check:

Scan resource group "rg-cost-opt-sreademo" for any violations of our costpractices defined in org-practices.md in your knowledge base. list violations with severity and remediation steps. Make sure to check against all critical requirements and send message in teams channel with your findings and create an issue in the github repo  https://github.com/dm-chelupati/costoptimizationapp.git

Step 6: Check Output via GitHub Issues

After running prompts, I checked GitHub. The agent had created issues. Each issue has the root cause, impact, and fix ready for the team to action or for Coding Agent to pick up and create a PR.

👉 See the actual issues created:

Step 7: Set Up Scheduled Triggers

This is where it gets powerful. I configured recurring schedules:

Weekly Security Check (Wednesdays 8 AM):

Create a scheduled trigger that performs security practices checks against the org practices in knowledge base org-practices.md, creates github issue and send teams message on a weekly basis Wednesdays at 8 am UTC

Weekly Cost Review (Mondays 8 AM):

Create a scheduled trigger that performs cost practices checks against the org practices in knowledge base org-practices.md, creates github issue and send teams message on a weekly basis on Mondays at 8 am UTC

Now optimization runs automatically. Every week, fresh findings land in GitHub Issues and Teams.

Why Context Makes the SRE Agent Powerful

Think about hiring a new SRE. They're excellent at their craft—they know Kubernetes, networking, Azure inside out. But on day one, they can't solve problems in your environment yet. Why? They don't have context:

  • What are your SLOs? What's "acceptable" latency for your app?
  • When do you rotate secrets? Monthly? Quarterly? Before each release?
  • Which resources are production-critical vs. dev experiments?
  • What's your tagging policy? Who owns what?
  • How do you deploy? GitOps? Pipelines? Manual approvals?

A great engineer becomes your great engineer once they learn how your team operates.

SRE Agent works the same way.

Out of the box, it knows Azure resource types, networking, best practices. But it doesn't know your bar. Is 20% CPU utilization acceptable or wasteful? Should secrets expire in 30 days or 90? Are public endpoints ever okay, or never?

The more context you give the agent, your SLOs, your runbooks, your policies, the more it reasons like a team member who understands your environment, not just Azure in general.

That's why Step 2 matters so much. When I uploaded our standards, the agent stopped checking generic Azure best practices and started checking our best practices.

Bring your existing knowledge: You don't have to start from scratch. If your team's documentation already lives in Atlassian Confluence, SharePoint, or other tools, you can connect those via MCP servers. The agent pulls context from where your team already works, no need to duplicate content.

Why This Matters

Before this setup, optimization was a quarterly thing. Now it happens automatically:

BeforeAfter
Check security when audit requests itDaily automated posture check
Find waste when finance complainsWeekly savings report in Teams
Discover capacity issues during incidentsScheduled headroom analysis
Expire credentials and debug at 2 AM30-day warning with exact secret names

Optimization isn't a project anymore. It's a practice.

Try It Yourself

  1. Create an SRE Agent with access to your subscription
  2. Upload your team's standards (security policies, cost thresholds, tagging rules)
  3. Set up a scheduled trigger, start with a daily security check
  4. Watch the first report land in Teams

See what you've been missing while everything looked "fine."

Learn More

Azure SRE Agent is currently in preview.  Get Started 

Published Jan 19, 2026
Version 1.0
No CommentsBe the first to comment