azure sre agent
1 TopicProactive Monitoring Made Simple with Azure SRE Agent
SRE teams strive for proactive operations, catching issues before they impact customers and reducing the chaos of incident response. While perfection may be elusive, the real goal is minimizing outages and gaining immediate line of sight into production environments. Today, that’s harder than ever. It requires correlating countless signals and alerts, understanding how they relate—or don’t relate—to each other, and assigning the right sense of urgency and impact. Anything that shortens this cycle, accelerates detection, and enables automated remediation is what modern SRE teams crave. What if you could skip the scripting and pipelines? What if you could simply describe what you want in plain language and let it run automatically on a schedule? Scheduled Tasks for Azure SRE Agent With Scheduled Tasks for Azure SRE Agent, that what-if scenario is now a reality. Scheduled tasks combine natural language prompts with Azure SRE Agent’s automation capabilities, so you can express intent, set a schedule, and let the agent do the rest—without writing a single line of code. This means: ⚡ Faster incident response through early detection ✅ Better compliance with automated checks 🎯 More time for high-value engineering work and innovation 💡 The shift from reactive to proactive: Instead of waiting for alerts to fire or customers to report issues, you’re continuously monitoring, validating, and catching problems before they escalate. How Scheduled Tasks Work Under the Hood When you create a Scheduled Task, the process is more than just running a prompt on a timer. Here’s what happens: 1. Prompt Interpretation and Plan Creation The SRE Agent takes your natural language prompt—such as “Scan all resources for security best practices”—and converts it into a structured execution plan. This plan defines the steps, tools, and data sources required to fulfill your request. 2. Built-In Tools and MCP Integration The agent uses its built-in capabilities (Azure CLI, Log Analytics workspace, Appinsights) and can also leverage 3 rd party data sources or tools via MCP server integration for extended functionality. 3. Results Analysis and Smart Summarization After execution, the agent analyzes results, identifies anomalies or issues, and provides actionable summaries not just raw data dumps. 4. Notification and Escalation Based on findings, the agent can: Post updates to Teams channels Create or update incidents Send email notifications Trigger follow-up actions Real-World Use Cases for Proactive Ops Here’s where scheduled tasks shine for SRE teams: Use Case Prompt Example Schedule Security Posture Check “Scan all subscriptions for resources with public endpoints and flag any that shouldn’t be exposed” Daily Cost Anomaly Detection “Compare this week’s spend against last week and alert if any service exceeds 20% growth” Weekly Compliance Drift Detection “Check all storage accounts for encryption settings and report any non-compliant resources” Daily Resource Health Summary “Summarize the health status of all production VMs and highlight any degraded instances” Every 4 hours Incident Trend Analysis “Analyze ICM incidents from the past week, identify patterns, and summarize top contributing services” Weekly Getting Started in 3 Steps Step 1: Define Your Intent Write a natural language prompt describing what you want to monitor or check. Be specific about: - What resources or scope - What conditions to look for - What action to take if issues are found Example: > “Every morning at 8 AM, check all production Kubernetes clusters for pods in CrashLoopBackOff state. If any are found, post a summary to the #sre-alerts Teams channel with cluster name, namespace, and pod details.” Step 2: Set Your Schedule Choose how often the task should run: - Cron expressions for precise control - Simple intervals (hourly, daily, weekly) Step 3: Define Where to Receive Updates Include in your prompt where you want results delivered when the task finishes execution. The agent can use its built-in tools and connectors to: - Post summaries to a Teams channel - Send email notifications - Create or update ICM incidents Example prompt with notification: > "Check all production databases for long-running queries over 30 seconds. If any are found, post a summary to the #database-alerts Teams channel." Why This Matters for Proactive Operations Traditional monitoring approaches have limitations: Traditional Approach With Scheduled Tasks Write scripts, maintain pipelines Describe in plain language Static thresholds and rules Contextual, AI-powered analysis Alert fatigue from noisy signals Smart summarization of what matters Separate tools for check vs. action Unified detection and response Requires dedicated DevOps effort Any SRE can create and modify The result? Your team spends less time building and maintaining monitoring infrastructure and more time on the work that truly requires human expertise. Best Practices for Scheduled Tasks Start simple, iterate — Begin with one or two high-value checks and expand as you gain confidence Be specific in prompts — The more context you provide, the better the results Set appropriate frequencies — Not everything needs to run hourly; match the schedule to the risk Review and refine — Check task results periodically and adjust prompts for better accuracy What’s Next? Scheduled tasks are just the beginning. We’re continuing to invest in capabilities that help SRE teams shift left—catching issues earlier, automating routine checks, and freeing up time for strategic work. Ready to Start? Use this sample that shows how to create a scheduled health check sub-agent: https://github.com/microsoft/sre-agent/blob/main/samples/automation/samples/02-scheduled-health-check-sample.md This example demonstrates: - Building a HealthCheckAgent using built-in tools like Azure CLI and Log Analytics Workspace - Scheduling daily health checks for a container app at 7 AM - Sending email alerts when anomalies are detected 🔗 Explore more samples here: https://github.com/microsoft/sre-agent/tree/main/samples More to Learn Ignite 2025 announcements: https://aka.ms/ignite25/blog/sreagent Documentation: https://aka.ms/sreagent/docs Support & Feature Requests: https://github.com/microsoft/sre-agent/issues105Views0likes0Comments