In our recent blog post, we highlighted how Azure SRE Agent has evolved into an extensible AI-powered operations platform. One of the most requested capabilities from customers has been the ability for agents to retain knowledge across sessions-learning from past incidents, remembering team preferences, and continuously improving troubleshooting accuracy. Today, we're excited to dive deeper into the Azure SRE Agent memory, a powerful feature that transforms how your operations teams work with AI.
Why Memory Matters for AI Operations
Every seasoned SRE knows that institutional knowledge is invaluable. The most effective on-call engineers aren't just technically skilled, they remember the quirks of specific services, recall solutions from past incidents, and know the team's preferred diagnostic approaches. Until now, AI assistants started every conversation from scratch, forcing teams to repeatedly explain context that experienced engineers would simply know.
The SRE Agent Memory changes this paradigm. It enables agents to:
- Remember team facts, preferences, and context across all conversations
- Retrieve relevant runbooks and documentation during troubleshooting
- Learn from past sessions to improve future responses
- Share knowledge across your entire team automatically
Context Engineering: The Key to Better AI Outcomes
At the heart of the memory is a concept we call context engineering, the practice of purposefully curating and optimizing the information you provide to the agent to get better results. Rather than hoping the AI figures things out, you systematically build a knowledge foundation that makes every interaction smarter.
The workflow is simple:
- Identify gaps: Use Session Insights to see where the agent struggled or lacked knowledge
- Add targeted context: Upload runbooks to the Knowledge Base or save facts with User Memories
- Track improvement: Review subsequent sessions to measure whether your additions improved outcomes
- Iterate: Continuously refine your context based on real session data
This feedback loop transforms ad-hoc troubleshooting into a systematically improving process, where each session makes future sessions more effective.
Memory Components at a Glance
The memory consists of three complementary components that work together to give your agents comprehensive knowledge:
๐ง User Memories: Quick Chat Commands for Team Knowledge
Save facts, preferences, and context using simple chat commands. User Memories are ideal for team standards, service configurations, and workflow patterns that should persist across all conversations.
Key benefits:
- โ Instant setup-no configuration required
- โ Managed directly in chat with #remember, #forget, and #retrieve commands
- โ Shared across all team members automatically
- โ Works across all conversations and agents
Example commands:
#remember Team owns app-service-prod in East US region #remember For latency issues, check Redis cache first #remember Production deployments happen Tuesdays at 2 PM PST
When you save a memory, it's instantly available across all your team's conversations. The agent automatically retrieves relevant memories during reasoning, no additional configuration needed.
Saving team knowledge with the #remember command
Use #retrieve to search and display your saved memories:
Retrieving saved memories with the #retrieve command
๐ Knowledge Base: Direct Document Uploads for Runbooks and Guides
Upload markdown and text files directly to the agent's knowledge base. Documents are automatically indexed using semantic search and available for agent retrieval during troubleshooting.
The Knowledge Base uses intelligent indexing that combines keyword matching with semantic similarity. Documents are automatically split into optimal chunks, so agents retrieve the most relevant sections, not entire documents.
Key benefits:
- โ Supports .md and .txt files (up to 16MB per file)
- โ Automatic chunking and semantic indexing
- โ Simple file upload interface
- โ Instant availability after upload
Best for: Static runbooks, troubleshooting guides, internal documentation, and configuration templates.
Navigate to Settings > Knowledge Base to access document management. There you will find Add File, allows you to upload txt and md file(s) and Delete, allows you to delete individual or bulk files.
๐ Session Insights: Automated Analysis of Your Troubleshooting Sessions
Get automated feedback on your troubleshooting sessions with timelines, performance analysis, and key learnings. Session Insights help you understand what happened, learn from mistakes, and continuously improve.
Key benefits:
- โ Automatic analysis after conversations complete
- โ Chronological timeline of actions taken
- โ Performance scoring with specific improvement suggestions
- โ Key learnings for future sessions
Navigate to Settings > Session Insights to view your troubleshooting analysis:
Session Insights dashboard showing analysis of past troubleshooting sessions
You can also manually trigger insight generation for any conversation by clicking the Generate Session Insights icon in the chat footer:
Manually triggering Session Insights generation
Each insight includes:
- Timeline: A chronological narrative showing what actions were taken and their outcomes
- What Went Well: Highlights correct understanding and effective actions
- Areas for Improvement: Shows what could be done better with specific remediation steps
- Key Learnings: Actionable takeaways for future sessions
- Investigation Quality Score: Sessions rated on a 1-5 scale for completeness
How Azure SRE Agent Use Memory: The SearchMemory Tool
During conversations, incident handling, and scheduled tasks, Azure SRE Agents search across memory sources to retrieve relevant context using the SearchMemory tool.
Enabling Memory Retrieval in Custom Sub-Agents
When building custom sub-agents with the Sub-Agent Builder, you can enable memory retrieval by adding the SearchMemory tool to your sub-agent's toolset. This allows your custom automation to leverage all the knowledge stored in User Memories and the Knowledge Base.
How it works:
- In the Sub-Agent Builder, add the SearchMemory tool to your sub-agent's available tools
- The tool automatically searches across all memory sources using intelligent retrieval
- Your sub-agent receives relevant context to inform its responses and actions
This means your custom sub-agents, whether handling specific incident types, automating runbook execution, or performing scheduled health checks, can all benefit from your team's accumulated knowledge.
Choosing the Right Memory Type
| Feature | User Memories | Knowledge Base |
|---|---|---|
| Setup | Instant (chat commands) | Quick (file upload) |
| Management | Chat commands | Portal UI |
| Content Size | Short facts | Documents (up to 16MB) |
| Best Use Case | Team preferences | Static runbooks |
| Team Sharing | โ Shared | โ Shared |
Quick guidance:
- User Memories: Short, focused facts (1-2 sentences) for immediate team context
- Knowledge Base: Well-structured documents with clear headers for procedural knowledge
Getting Started in Minutes
1. Start with User Memories
Open any chat with your Azure SRE Agent and save immediate team knowledge:
#remember Team owns services: app-service-prod, redis-cache-prod, and sql-db-prod #remember For latency issues, check Redis cache health first #remember Team uses East US for production workloads
That's it, these facts are now available across all conversations.
2. Upload Key Documents
Add critical runbooks and guides to the Knowledge Base:
- Navigate to Settings > Knowledge Base
- Upload .md or .txt files
- Files are automatically indexed and available immediately
3. Review Session Insights
After troubleshooting sessions, check Settings > Session Insights to see what went well and where the agent needs more context. Use this feedback to identify gaps and add targeted memories or documentation.
Best Practices for Building Agent Memory
Content Organization
- Keep memories focused and specific
- Use consistent terminology across your team
- Avoid duplication, choose one source of truth for each piece of information
Security
Never store:
- โ Credentials, API keys, or secrets
- โ Personal identifiable information (PII)
- โ Customer data or logs
- โ Confidential business information
Maintenance
- Regularly review and update memories
- Remove outdated information using #forget
- Consolidate duplicate entries
- Use #retrieve to audit what's been saved
The Impact: Smarter Troubleshooting, Lower MTTR
The Azure SRE Agent memory delivers measurable improvements:
- Faster troubleshooting: Agents immediately understand your environment and preferences
- Reduced toil: No more repeatedly explaining the same context
- Institutional knowledge capture: Critical team knowledge persists even as team members change
- Continuous improvement: Each session makes future sessions more effective
By systematically building your agent's knowledge foundation, you create an operations assistant that truly understands your environment, reducing mean time to resolution (MTTR) and freeing your team to focus on high-value work.
Ready to Get Started?
What's Next?
We're continually enhancing the memory based on customer feedback. Your input is critical, use the thumbs up/down feedback in the agent, or share your thoughts in our GitHub repo.
What operational knowledge would you like your AI agent to remember? Let us know!
This blog post is part of our ongoing series on Azure SRE Agent capabilities. See our previous post on automation, integration, and extensibility features.