devops
382 TopicsUnlocking Application Modernisation with GitHub Copilot
AI-driven modernisation is unlocking new opportunities you may not have even considered yet. It's also allowing organisations to re-evaluate previously discarded modernisation attempts that were considered too hard, complex or simply didn't have the skills or time to do. During Microsoft Build 2025, we were introduced to the concept of Agentic AI modernisation and this post from Ikenna Okeke does a great job of summarising the topic - Reimagining App Modernisation for the Era of AI | Microsoft Community Hub. This blog post however, explores the modernisation opportunities that you may not even have thought of yet, the business benefits, how to start preparing your organisation, empowering your teams, and identifying where GitHub Copilot can help. I’ve spent the last 8 months working with customers exploring usage of GitHub Copilot, and want to share what my team members and I have discovered in terms of new opportunities to modernise, transform your applications, bringing some fun back into those migrations! Let’s delve into how GitHub Copilot is helping teams update old systems, move processes to the cloud, and achieve results faster than ever before. Background: The Modernisation Challenge (Then vs Now) Modernising legacy software has always been hard. In the past, teams faced steep challenges: brittle codebases full of technical debt, outdated languages (think decades-old COBOL or VB6), sparse documentation, and original developers long gone. Integrating old systems with modern cloud services often requiring specialised skills that were in short supply – for example, check out this fantastic post from Arvi LiVigni (@arilivigni ) which talks about migrating from COBOL “the number of developers who can read and write COBOL isn’t what it used to be,” making those systems much harder to update". Common pain points included compatibility issues, data migrations, high costs, security vulnerabilities, and the constant risk that any change could break critical business functions. It’s no wonder many modernisation projects stalled or were “put off” due to their complexity and risk. So, what’s different now (circa 2025) compared to two years ago? In a word: Intelligent AI assistance. Tools like GitHub Copilot have emerged as AI pair programmers that dramatically lower the barriers to modernisation. Arvi’s post talks about how only a couple of years ago, developers had to comb through documentation and Stack Overflow for clues when deciphering old code or upgrading frameworks. Today, GitHub Copilot can act like an expert co-developer inside your IDE, ready to explain mysterious code, suggest updates, and even rewrite legacy code in modern languages. This means less time fighting old code and more time implementing improvements. As Arvi says “nine times out of 10 it gives me the right answer… That speed – and not having to break out of my flow – is really what’s so impactful.” In short, AI coding assistants have evolved from novel experiments to indispensable tools, reimagining how we approach software updates and cloud adoption. I’d also add from my own experience – the models we were using 12 months ago have already been superseded by far superior models with ability to ingest larger context and tackle even further complexity. It's easier to experiment, and fail, bringing more robust outcomes – with such speed to create those proof of concepts, experimentation and failing faster, this has also unlocked the ability to test out multiple hypothesis’ and get you to the most confident outcome in a much shorter space of time. Modernisation is easier now because AI reduces the heavy lifting. Instead of reading the 10,000-line legacy program alone, a developer can ask Copilot to explain what the code does or even propose a refactored version. Rather than manually researching how to replace an outdated library, they can get instant recommendations for modern equivalents. These advancements mean that tasks which once took weeks or months can now be done in days or hours – with more confidence and less drudgery - more fun! The following sections will dive into specific opportunities unlocked by GitHub Copilot across the modernisation journey which you may not even have thought of. Modernisation Opportunities Unlocked by Copilot Modernising an application isn’t just about updating code – it involves bringing everyone and everything up to speed with cloud-era practices. Below are several scenarios and how GitHub Copilot adds value, with the specific benefits highlighted: 1. AI-Assisted Legacy Code Refactoring and Upgrades Instant Code Comprehension: GitHub Copilot can explain complex legacy code in plain English, helping developers quickly understand decades-old logic without scouring scarce documentation. For example, you can highlight a cryptic COBOL or C++ function and ask Copilot to describe what it does – an invaluable first step before making any changes. This saves hours and reduces errors when starting a modernisation effort. Automated Refactoring Suggestions: The AI suggests modern replacements for outdated patterns and APIs, and can even translate code between languages. For instance, Copilot can help convert a COBOL program into JavaScript or C# by recognising equivalent constructs. It also uses transformation tools (like OpenRewrite for Java/.NET) to systematically apply code updates – e.g. replacing all legacy HTTP calls with a modern library in one sweep. Developers remain in control, but GitHub Copilot handles the tedious bulk edits. Bulk Code Upgrades with AI: GitHub Copilot’s App Modernisation capabilities can analyse an entire codebase and generate a detailed upgrade plan, then execute many of the code changes automatically. It can upgrade framework versions (say from .NET Framework 4.x to .NET 6, or Java 8 to Java 17) by applying known fix patterns and even fixing compilation errors after the upgrade. Teams can finally tackle those hundreds of thousand-line enterprise applications – a task that could take multiple years with GitHub Copilot handling the repetitive changes. Technical Debt Reduction: By cleaning up old code and enforcing modern best practices, GitHub Copilot helps chip away at years of technical debt. The modernised codebase is more maintainable and stable, which lowers the long-term risk hanging over critical business systems. Notably, the tool can even scan for known security vulnerabilities during refactoring as it updates your code. In short, each legacy component refreshed with GitHub Copilot comes out safer and easier to work on, instead of remaining a brittle black box. 2. Accelerating Cloud Migration and Azure Modernisation Guided Azure Migration Planning: GitHub Copilot can assess a legacy application’s cloud readiness and recommend target Azure services for each component. For instance, it might suggest migrating an on-premises database to Azure SQL, moving file storage to Azure Blob Storage, and converting background jobs to Azure Functions. This provides a clear blueprint to confidently move an app from servers to Azure PaaS. One-Click Cloud Transformations: GitHub Copilot comes with predefined migration tasksthat automate the code changes required for cloud adoption. With one click, you can have the AI apply dozens of modifications across your codebase. For example: File storage: Replace local file read/writes with Azure Blob Storage SDK calls. Email/Comms: Swap out SMTP email code for Azure Communication Services or SendGrid. Identity: Migrate authentication from Windows AD to Azure AD (Entra ID) libraries. Configuration: Remove hard-coded configurations and use Azure App Configuration or Key Vault for secrets. GitHub Copilot performs these transformations consistently, following best practices (like using connection strings from Azure settings). After applying the changes, it even fixes any compile errors automatically, so you’re not left with broken builds. What used to require reading countless Azure migration guides is now handled in minutes. Automated Validation & Deployment: Modernisation doesn’t stop at code changes. GitHub Copilot can also generate unit tests to validate that the application still behaves correctly after the migration. It helps ensure that your modernised, cloud-ready app passes all its checks before going live. When you’re ready to deploy, GitHub Copilot can produce the necessary Infrastructure-as-Code templates (e.g. Azure Resource Manager Bicep files or Terraform configs) and even set up CI/CD pipeline scripts for you. In other words, the AI can configure the Azure environment and deployment process end-to-end. This dramatically reduces manual effort and error, getting your app to the cloud faster and with greater confidence. Integrations: GitHub Copilot also helps tackle larger migration scenarios that were previously considered too complex. For example, many enterprises want to retire expensive proprietary integration platforms like MuleSoft or Apigee and use Azure-native services instead, but rewriting hundreds of integration workflows was daunting. Now, GitHub Copilot can assist in translating those workflows: for instance, converting an Apigee API proxy into an Azure API Management policy, or a MuleSoft integration into an Azure Logic App. Multi-Cloud Migrations: if you plan to consolidate from other clouds into Azure, GitHub Copilot can suggest equivalent Azure services and SDK calls to replace AWS or GCP-specific code. These AI-assisted conversions significantly cut down the time needed to reimplement functionality on Azure. The business impact can be substantial. By lowering the effort of such migrations, GitHub Copilot makes it feasible to pursue opportunities that deliver big cost savings and simplification. 3. Boosting Developer Productivity and Quality Instant Unit Tests (TDD Made Easy): Writing tests for old code can be tedious, but GitHub Copilot can generate unit test cases on the fly. Developers can highlight an existing function and ask Copilot to create tests; it will produce meaningful test methods covering typical and edge scenarios. This makes it practical to apply test-driven development practices even to legacy systems – you can quickly build a safety net of tests before refactoring. By catching bugs early through these AI-generated tests, teams gain confidence to modernise code without breaking things. It essentially injects quality into the process from the start, which is crucial for successful modernisation. DevOps Automation: GitHub Copilot helps modernise your build and deployment process as well. It can draft CI/CD pipeline configurations, Dockerfiles, Kubernetes manifests, and other DevOps scripts by leveraging its knowledge of common patterns. For example, when setting up a GitHub Actions workflow to deploy your app, GitHub Copilot will autocomplete significant parts (like build steps, test runs, deployment jobs) based on the project structure. This not only saves time but also ensures best practices (proper caching, dependency installation, etc.) are followed by default. Microsoft even provides an extension where you can describe your Azure infrastructure needs in plain language and have GitHub Copilot generate the corresponding templates and pipeline YAML. By automating these pieces, teams can move to cloud-based, automated deployments much faster. Behaviour-Driven Development Support: Teams practicing BDD write human-readable scenarios (e.g. using Gherkin syntax) describing application behaviour. GitHub Copilot’s AI is adept at interpreting such descriptions and suggesting step definition code or test implementations to match. For instance, given a scenario “When a user with no items checks out, then an error message is shown,” GitHub Copilot can draft the code for that condition or the test steps required. This helps bridge the gap between non-technical specifications and actual code. It makes BDD more efficient and accessible, because even if team members aren’t strong coders, the AI can translate their intent into working code that developers can refine. Quality and Consistency: By using AI to handle boilerplate and repetitive tasks, developers can focus more on high-value improvements. GitHub Copilot’s suggestions are based on a vast corpus of code, which often means it surfaces well-structured, idiomatic patterns. Starting from these suggestions, developers are less likely to introduce errors or reinvent the wheel, which leads to more consistent code quality across the project. The AI also often reminds you of edge cases (for example, suggesting input validation or error handling code that might be missed), contributing to a more robust application. In practice, many teams find that adopting GitHub Copilot results in fewer bugs and quicker code reviews, as the code is cleaner on the first pass. It’s like having an extra set of eyes on every pull request, ensuring standards are met. Business Benefits of AI-Powered Modernisation Bringing together the technical advantages above, what’s the payoff for the business and stakeholders? Modernising with GitHub Copilot can yield multiple tangible and intangible benefits: Accelerated Time-to-Market: Modernisation projects that might have taken a year can potentially be completed in a few months, or an upgrade that took weeks can be done in days. This speed means you can deliver new features to customers sooner and respond faster to market changes. It also reduces downtime or disruption since migrations happen more swiftly. Cost Savings: By automating repetitive work and reducing the effort required from highly paid senior engineers, GitHub Copilot can trim development costs. Faster project completion also means lower overall project cost. Additionally, running modernised apps on cloud infrastructure (with updated code) often lowers operational costs due to more efficient resource usage and easier maintenance. There’s also an opportunity cost benefit: developers freed up by Copilot can work on other value-adding projects in parallel. Improved Quality & Reliability: GitHub Copilot’s contributions to testing, bug-fixing, and even security (like patching known vulnerabilities during upgrades) result in more robust applications. Modernised systems have fewer outages and security incidents than shaky legacy ones. Stakeholders will appreciate that with GitHub Copilot, modernisation doesn’t mean “trading one set of bugs for another” – instead, you can increase quality as you modernise (GitHub’s research noted higher code quality when using Copilot, as developers are less likely to introduce errors or skip tests). Business Agility: A modernised application (especially one refactored for cloud) is typically more scalable and adaptable. New integrations or features can be added much faster once the platform is up-to-date. GitHub Copilot helps clear the modernisation hurdle, after which the business can innovate on a solid, flexible foundation (for example, once a monolith is broken into microservices or moved to Azure PaaS, you can iterate on it much faster in the future). AI-assisted modernisation thus unlocks future opportunities (like easier expansion, integrations, AI features, etc.) that were impractical on the legacy stack. Employee Satisfaction and Innovation: Developer happiness is a subtle but important benefit. When tedious work is handled by AI, developers can spend more time on creative tasks – designing new features, improving user experience, exploring new technologies. This can foster a culture of innovation. Moreover, being seen as a company that leverages modern tools (like AI Copilot) helps attract and retain top tech talent. Teams that successfully modernise critical systems with Copilot will gain confidence to tackle other ambitious projects, creating a positive feedback loop of improvement. To sum up, GitHub Copilot acts as a force-multiplier for application modernisation. It enables organisations to do more with less: convert legacy “boat anchors” into modern, cloud-enabled assets rapidly, while improving quality and developer morale. This aligns IT goals with business goals – faster delivery, greater efficiency, and readiness for the future. Call to Action: Embrace the Future of Modernisation GitHub Copilot has proven to be a catalyst for transforming how we approach legacy systems and cloud adoption. If you’re excited about the possibilities, here are next steps and what to watch for: Start Experimenting: If you haven’t already, try GitHub Copilot on a sample of your code. Use Copilot or Copilot Chat to explain a piece of old code or generate a unit test. Seeing it in action on your own project can build confidence and spark ideas for where to apply it. Identify a Pilot Project: Look at your application portfolio for a candidate that’s ripe for modernisation – maybe a small legacy service that could be moved to Azure, or a module that needs a refactor. Use GitHub Copilot to assess and estimate the effort. Often, you’ll find tasks once deemed “too hard” might now be feasible. Early successes will help win support for larger initiatives. Stay Tuned for Our Upcoming Blog Series: This post is just the beginning. In forthcoming posts, we’ll dive deeper into: Setting Up Your Organisation for Copilot Adoption: Practical tips on preparing your enterprise environment – from licensing and security considerations to training programs. We’ll discuss best practices (like running internal awareness campaigns, defining success metrics, and creating Copilot champions in your teams) to ensure a smooth rollout. Empowering Your Colleagues: How to foster a culture that embraces AI assistance. This includes enabling continuous learning, sharing prompt techniques and knowledge bases, and addressing any scepticism. We’ll cover strategies to support developers in using Copilot effectively, so that everyone from new hires to veteran engineers can amplify their productivity. Identifying High-Impact Modernisation Areas: Guidance on spotting where GitHub Copilot can add the most value. We’ll look at different domains – code, cloud, tests, data – and how to evaluate opportunities (for example, using telemetry or feedback to find repetitive tasks suited for AI, or legacy components with high ROI if modernised). Engage and Share: As you start leveraging Copilot for modernisation, share your experiences and results. Success stories (even small wins like “GitHub Copilot helped reduce our code review times” or “we migrated a component to Azure in 1 sprint”) can build momentum within your organisation and the broader community. We invite you to discuss and ask questions in the comments or in our tech community forums. Take a look at the new App Modernisation Guidance—a comprehensive, step-by-step playbook designed to help organisations: Understand what to modernise and why Migrate and rebuild apps with AI-first design Continuously optimise with built-in governance and observability Modernisation is a journey, and AI is the new compass and Copilot to guide the way. By embracing tools like GitHub Copilot, you position your organisation to break through modernisation barriers that once seemed insurmountable. The result is not just updated software, but a more agile, cloud-ready business and a happier, more productive development team. Now is the time to take that step. Empower your team with Copilot, and unlock the full potential of your applications and your developers. Stay tuned for more insights in our next posts, and let’s modernise what’s possible together!1.3KViews4likes1CommentSearch Less, Build More: Inner Sourcing with GitHub Copilot and ADO MCP Server
Developers burn cycles context‑switching: opening five repos to find a logging example, searching a wiki for a data masking rule, scrolling chat history for the latest pipeline pattern. Organisations that I speak to are often on the path of transformational platform engineering projects but always have the fear or doubt of "what if my engineers don't use these resources". While projects like Backstage still play a pivotal role in inner sourcing and discoverability I also empathise with developers who would argue "How would I even know in the first place, which modules have or haven't been created for reuse". In this blog we explore how we can ensure organisational standards and developer satisfaction without any heavy lifting on either side, no custom model training, no rewriting or relocating of repositories and no stagnant local data. Using GitHub Copilot + Azure DevOps MCP server (with the free `code_search` extension) we turn the IDE into an organizational knowledge interface. Instead of guessing or re‑implementing, engineers can start scaffolding projects or solving issues as they would normally (hopefully using Copilot) and without extra prompting. GitHub Copilot can lean into organisational standards and ensure recommendations are made with code snippets directly generated from existing examples. What Is the Azure DevOps MCP Server + code_search Extension? MCP (Model Context Protocol) is an open standard that lets agents (like GitHub Copilot) pull in structured, on-demand context from external systems. MCP servers contain natural language explanations of the tools that the agent can utilise allowing dynamic decision making of when to implement certain toolsets over others. The Azure DevOps MCP Server is the ADO Product Team's implementation of that standard. It exposes your ADO environment in a way Copilot can consume. Out of the box it gives you access to: Projects – list and navigate across projects in your organization. Repositories – browse repos, branches, and files. Work items – surface user stories, bugs, or acceptance criteria. Wiki's – pull policies, standards, and documentation. This means Copilot can ground its answers in live ADO content, instead of hallucinating or relying only on what’s in the current editor window. The ADO server runs locally from your own machine to ensure that all sensitive project information remains within your secure network boundary. This also means that existing permissions on ADO objects such as Projects or Repositories are respected. Wiki search tooling available out of the box with ADO MCP server is very useful however if I am honest I have seen these wiki's go unused with documentation being stored elsewhere either inside the repository or in a project management tool. This means any tool that needs to implement code requires the ability to accurately search the code stored in the repositories themself. That is where the code_search extension enablement in ADO is so important. Most organisations have this enabled already however it is worth noting that this pre-requisite is the real unlock of cross-repo search. This allows for Copilot to: Query for symbols, snippets, or keywords across all repos. Retrieve usage examples from code, not just docs. Locate standards (like logging wrappers or retry policies) wherever they live. Back every recommendation with specific source lines. In short: MCP connects Copilot to Azure DevOps. code_search makes that connection powerful by turning it into a discovery engine. What is the relevance of Copilot Instructions? One of the less obvious but most powerful features of GitHub Copilot is its ability to follow instructions files. Copilot automatically looks for these files and uses them as a “playbook” for how it should behave. There are different types of instructions you can provide: Organisational instructions – apply across your entire workspace, regardless of which repo you’re in. Repo-specific instructions – scoped to a particular repository, useful when one project has unique standards or patterns. Personal instructions – smaller overrides layered on top of global rules when a local exception applies. (Stored in .github/copilot-instructions.md) In this solution, I’m using a single personal instructions file. It tells Copilot: When to search (e.g., always query repos and wikis before answering a standards question). Where to look (Azure DevOps repos, wikis, and with code_search, the code itself). How to answer (responses must cite the repo/file/line or wiki page; if no source is found, say so). How to resolve conflicts (prefer dated wiki entries over older README fragments). As a small example, a section of a Copilot instruction file could look like this: # GitHub Copilot Instructions for Azure DevOps MCP Integration This project uses Azure DevOps with MCP server integration to provide organizational context awareness. Always check to see if the Azure DevOps MCP server has a tool relevant to the user's request. ## Core Principles ### 1. Azure DevOps Integration - **Always prioritize Azure DevOps MCP tools** when users ask about: - Work items, stories, bugs, tasks - Pull requests and code reviews - Build pipelines and deployments - Repository operations and branch management - Wiki pages and documentation - Test plans and test cases - Project and team information ### 2. Organizational Context Awareness - Before suggesting solutions, **check existing organizational patterns** by: - Searching code across repositories for similar implementations - Referencing established coding standards and frameworks - Looking for existing shared libraries and utilities - Checking architectural decision records (ADRs) in wikis ### 3. Cross-Repository Intelligence - When providing code suggestions: - **Search for existing patterns** in other repositories first - **Reference shared libraries** and common utilities - **Maintain consistency** with organizational standards - **Suggest reusable components** when appropriate ## Tool Usage Guidelines ### Work Items and Project Management When users mention bugs, features, tasks, or project planning: ``` ✅ Use: wit_my_work_items, wit_create_work_item, wit_update_work_item ✅ Use: wit_list_backlogs, wit_get_work_items_for_iteration ✅ Use: work_list_team_iterations, core_list_projects The result... To test this I created 3 ADO Projects each with between 1-2 repositories. The repositories were light with only ReadMe's inside containing descriptions of the "repo" and some code snippets examples for usage. I have then created a brand-new workspace with no context apart from a Copilot instructions document (which could be part of a repo scaffold or organisation wide) which tells Copilot to search code and the wikis across all ADO projects in my demo environment. It returns guidance and standards from all available repo's and starts to use it to formulate its response. In the screenshot I have highlighted some key parts with red boxes. The first being a section of the readme that Copilot has identified in its response, that part also highlighted within CoPilot chat response. I have highlighted the rather generic prompt I used to get this response at the bottom of that window too. Above I have highlighted Copilot using the MCP server tooling searching through projects, repo's and code. Finally the largest box highlights the instructions given to Copilot on how to search and how easily these could be optimised or changed depending on the requirements and organisational coding standards. How did I implement this? Implementation is actually incredibly simple. As mentioned I created multiple projects and repositories within my ADO Organisation in order to test cross-project & cross-repo discovery. I then did the following: Enable code_search - in your Azure DevOps organization (Marketplace → install extension). Login to Azure - Use the AZ CLI to authenticate to Azure with "az login". Create vscode/mcp.json file - Snippet is provided below, the organisation name should be changed to your organisations name. Start and enable your MCP server - In the mcp.json file you should see a "Start" button. Using the snippet below you will be prompted to add your organisation name. Ensure your Copilot agent has access to the server under "tools" too. View this setup guide for full setup instructions (azure-devops-mcp/docs/GETTINGSTARTED.md at main · microsoft/azure-devops-mcp) Create a Copilot Instructions file - with a search-first directive. I have inserted the full version used in this demo at the bottom of the article. Experiment with Prompts – Start generic (“How do we secure APIs?”). Review the output and tools used and then tailor your instructions. Considerations While this is a great approach I do still have some considerations when going to production. Latency - Using MCP tooling on every request will add some latency to developer requests. We can look at optimizing usage through copilot instructions to better identify when Copilot should or shouldn't use the ADO MCP server. Complex Projects and Repositories - While I have demonstrated cross project and cross repository retrieval my demo environment does not truly simulate an enterprise ADO environment. Performance should be tested and closely monitored as organisational complexity increases. Public Preview - The ADO MCP server is moving quickly but is currently still public preview. We have demonstrated in this article how quickly we can make our Azure DevOps content discoverable. While their are considerations moving forward this showcases a direction towards agentic inner sourcing. Feel free to comment below how you think this approach could be extended or augmented for other use cases! Resources MCP Server Config (/.vscode/mcp.json) { "inputs": [ { "id": "ado_org", "type": "promptString", "description": "Azure DevOps organization name (e.g. 'contoso')" } ], "servers": { "ado": { "type": "stdio", "command": "npx", "args": ["-y", "@azure-devops/mcp", "${input:ado_org}"] } } } Copilot Instructions (/.github/copilot-instructions.md) # GitHub Copilot Instructions for Azure DevOps MCP Integration This project uses Azure DevOps with MCP server integration to provide organizational context awareness. Always check to see if the Azure DevOps MCP server has a tool relevant to the user's request. ## Core Principles ### 1. Azure DevOps Integration - **Always prioritize Azure DevOps MCP tools** when users ask about: - Work items, stories, bugs, tasks - Pull requests and code reviews - Build pipelines and deployments - Repository operations and branch management - Wiki pages and documentation - Test plans and test cases - Project and team information ### 2. Organizational Context Awareness - Before suggesting solutions, **check existing organizational patterns** by: - Searching code across repositories for similar implementations - Referencing established coding standards and frameworks - Looking for existing shared libraries and utilities - Checking architectural decision records (ADRs) in wikis ### 3. Cross-Repository Intelligence - When providing code suggestions: - **Search for existing patterns** in other repositories first - **Reference shared libraries** and common utilities - **Maintain consistency** with organizational standards - **Suggest reusable components** when appropriate ## Tool Usage Guidelines ### Work Items and Project Management When users mention bugs, features, tasks, or project planning: ``` ✅ Use: wit_my_work_items, wit_create_work_item, wit_update_work_item ✅ Use: wit_list_backlogs, wit_get_work_items_for_iteration ✅ Use: work_list_team_iterations, core_list_projects ``` ### Code and Repository Operations When users ask about code, branches, or pull requests: ``` ✅ Use: repo_list_repos_by_project, repo_list_pull_requests_by_repo ✅ Use: repo_list_branches_by_repo, repo_search_commits ✅ Use: search_code for finding patterns across repositories ``` ### Documentation and Knowledge Sharing When users need documentation or want to create/update docs: ``` ✅ Use: wiki_list_wikis, wiki_get_page_content, wiki_create_or_update_page ✅ Use: search_wiki for finding existing documentation ``` ### Build and Deployment When users ask about builds, deployments, or CI/CD: ``` ✅ Use: pipelines_get_builds, pipelines_get_build_definitions ✅ Use: pipelines_run_pipeline, pipelines_get_build_status ``` ## Response Patterns ### 1. Discovery First Before providing solutions, always discover organizational context: ``` "Let me first check what patterns exist in your organization..." → Search code, check repositories, review existing work items ``` ### 2. Reference Organizational Standards When suggesting code or approaches: ``` "Based on patterns I found in your [RepositoryName] repository..." "Following your organization's standard approach seen in..." "This aligns with the pattern established in [TeamName]'s implementation..." ``` ### 3. Actionable Integration Always offer to create or update Azure DevOps artifacts: ``` "I can create a work item for this enhancement..." "Should I update the wiki page with this new pattern?" "Let me link this to the current iteration..." ``` ## Specific Scenarios ### New Feature Development 1. **Search existing repositories** for similar features 2. **Check architectural patterns** and shared libraries 3. **Review related work items** and planning documents 4. **Suggest implementation** based on organizational standards 5. **Offer to create work items** and documentation ### Bug Investigation 1. **Search for similar issues** across repositories and work items 2. **Check related builds** and recent changes 3. **Review test results** and failure patterns 4. **Provide solution** based on organizational practices 5. **Offer to create/update** bug work items and documentation ### Code Review and Standards 1. **Compare against organizational patterns** found in other repositories 2. **Reference coding standards** from wiki documentation 3. **Suggest improvements** based on established practices 4. **Check for reusable components** that could be leveraged ### Documentation Requests 1. **Search existing wikis** for related content 2. **Check for ADRs** and technical documentation 3. **Reference patterns** from similar projects 4. **Offer to create/update** wiki pages with findings ## Error Handling If Azure DevOps MCP tools are not available or fail: 1. **Inform the user** about the limitation 2. **Provide alternative approaches** using available information 3. **Suggest manual steps** for Azure DevOps integration 4. **Offer to help** with configuration if needed ## Best Practices ### Always Do: - ✅ Search organizational context before suggesting solutions - ✅ Reference existing patterns and standards - ✅ Offer to create/update Azure DevOps artifacts - ✅ Maintain consistency with organizational practices - ✅ Provide actionable next steps ### Never Do: - ❌ Suggest solutions without checking organizational context - ❌ Ignore existing patterns and implementations - ❌ Provide generic advice when specific organizational context is available - ❌ Forget to offer Azure DevOps integration opportunities --- **Remember: The goal is to provide intelligent, context-aware assistance that leverages the full organizational knowledge base available through Azure DevOps while maintaining development efficiency and consistency.**1.2KViews1like3CommentsSecurity Where It Matters: Runtime Context and AI Fixes Now Integrated in Your Dev Workflow
Security teams and developers face the same frustrating cycle: thousands of alerts, limited time, and no clear way to know which issues matter most. Applications suffer attacks as quickly as once every three minutes, 1 emphasizing the importance of proactive security that prioritizes critical, exploitable vulnerabilities. Microsoft is leading this shift with new integrations in the end-to-end solution that combines GitHub Advanced Security’s developer-first application security tool with Microsoft Defender for Cloud's runtime protection, enhanced by agentic remediation. Now available in public preview. This integration empowers organizations to secure code to cloud and accelerates tackling of security issues in their software portfolio using agentic remediation and runtime context-based vulnerability prioritization. The result: fewer distractions, faster fixes, better collaboration and more proactive security from code to cloud. The DevSecOps Dilemma— too many alerts, not enough action Over the past decade, the application security industry has made significant strides in improving detection accuracy and fostering collaboration between security teams and developers. These advances have enabled both groups to work together on real issues and drive meaningful progress. However, despite these improvements, remediation trends across the industry have remained stagnant. Quarter after quarter, year after year, vulnerability counts continue to rise with critical / high vulnerabilities constituting 17.4% of vulnerability backlogs and a mean-time-to-remediation (MTTR) of 116 days 2 Today, three big challenges slow teams down: Security teams are drowning in alert fatigue, struggling to distinguish real, exploitable risks from noise. At the same time, AI is rapidly introducing new threat vectors that defenders have little time to research or understand—leaving organizations vulnerable to missed threats and evolving attack techniques. Developers lack clear prioritization while remediation takes long, so they lose time fixing issues that may never be exploited. Remediation cycles are slow, leaving systems exposed to potential attacks while teams debate which issues matter most or search for the right person to fix them Both teams rely on separate, non-integrated tools, making collaboration slow and frustrating. Development and security teams frequently operate in silos, reducing efficiency and creating blind spots. This leads to wasted time, unresolved threats, and growing backlogs. Teams are stuck reacting to noise instead of solving real problems. DevSecOps reimagined in the era of AI Your app is live and serving thousands of customers. Defender for Cloud detects a vulnerability in an internet-facing API that handles sensitive data. In the past, this alert would age in a dashboard while developers worked on unrelated fixes because they didn’t know this was the critical one. Now, with the new integration, a security campaign can be created in GitHub filtering for runtime risk (internet exposed, sensitive data etc.) notifying the developer to prioritize this issue. The developer views the issue in their workflow, understands why it matters, and uses Copilot Autofix to apply an AI-suggested fix in minutes. The developer can then select these risks at bulk and assign the GitHub Copilot coding agent to create a draft PR for a multi merge fix ready for human review. Virtual Registry: Code-to-Runtime Mapping Code to runtime mapping is possible with the Virtual Registry which makes GitHub a trusted source for artifact metadata. Integrated with Microsoft Defender for Cloud, the Virtual Registry enables smarter risk prioritization and faster incident response. Teams can quickly answer: Is this vulnerability running in production? Is it exposed to sensitive workloads? Do I need to act now? By combining runtime and repository context, the Virtual Registry streamlines alert triage and incident response. We shipped a new set of filters to both Code Scanning and Dependabot and Security Campaigns that are based on the artifact metadata that is stored in the Virtual Registry. Faster fixes with agentic remediation The integration includes Copilot Autofix, an AI-powered tool that suggests code changes to fix security problems. It checks that the fixes work and helps developers resolve issues quickly, without switching tools. To complete the agentic work flow we can be bulk assign these autofixes to GitHub Copilot Coding agent to create a draft Pull Request awaiting human review. Why this matters Fewer alerts to sort through: Focus only on what’s exploitable in production. Faster fixes: AI-powered fix suggestions through GitHub Copilot Autofix have shown to fix 50% of alerts within the PR with a 70% reduction in mean time-to-remediation 3 Better teamwork: Developers and security teams collaborate seamlessly. With collaborative security now powered by connected context, we’ve seen 68% of alert remediated using GitHub Advanced Security’s security campaigns. 3 Try it now This feature is available in public preview and will be showcased at Microsoft Ignite. If your team builds cloud-native applications, this integration helps you protect code to cloud more effectively—without slowing down development. Customer FAQs How do I start using the integration? From Microsoft Defender for Cloud: Go to the environment section in the Defender for Cloud portal. Grant a new GitHub connector or update an existing one to provide consent to scan your source code. If you use GitHub, setup is one click. You’ll immediately see initial scan results and recommended fixes. From GitHub: You will be able to filter alerts by runtime context in addition to receiving AI-suggested fixes. How do I purchase this integration? For GitHub: GitHub Advanced Security (GHAS) is available as: Code Security SKU: $30 per committer/month (available April 2025) GHAS Bundle: $49 per committer/month (available now) GitHub Enterprise Cloud GitHub Copilot For Microsoft Defender for Cloud CSPM: Defender CSPM: $5 per billable resource/month Both can be enabled through the Azure Portal as Azure meters. [1]: Software Under Siege | AppSec Threat Report 2025 | Contrast Security [2]: Edgescan | Vulnerability Statistics Report 2025 [3]: GitHub Internal Data1.2KViews2likes1CommentAzure File copy task v4 and later causes 403 error
I've configured a release pipeline in ADO which copies some files to a Storage Account. Using Azure File copy task version 6 consistently fails with a 403 error. RESPONSE Status: 403 This request is not authorized to perform this operation using this permission. After much wasted time checking IP restrictions, checking access and recreating service connections I tried using an earlier version of the task that some other pipelines which do the same thing were using. I found that using version 4 or later of the file copy task causes the issue. Setting the task version to 3 works. Are there any known issues around this?22Views0likes1CommentNever Explain Context Twice: Introducing Azure SRE Agent memory
In our recent blog post, we highlighted how Azure SRE Agent has evolved into an extensible AI-powered operations platform. One of the most requested capabilities from customers has been the ability for agents to retain knowledge across sessions-learning from past incidents, remembering team preferences, and continuously improving troubleshooting accuracy. Today, we're excited to dive deeper into the Azure SRE Agent memory, a powerful feature that transforms how your operations teams work with AI. Why Memory Matters for AI Operations Every seasoned SRE knows that institutional knowledge is invaluable. The most effective on-call engineers aren't just technically skilled, they remember the quirks of specific services, recall solutions from past incidents, and know the team's preferred diagnostic approaches. Until now, AI assistants started every conversation from scratch, forcing teams to repeatedly explain context that experienced engineers would simply know. The SRE Agent Memory changes this paradigm. It enables agents to: Remember team facts, preferences, and context across all conversations Retrieve relevant runbooks and documentation during troubleshooting Learn from past sessions to improve future responses Share knowledge across your entire team automatically Context Engineering: The Key to Better AI Outcomes At the heart of the memory is a concept we call context engineering, the practice of purposefully curating and optimizing the information you provide to the agent to get better results. Rather than hoping the AI figures things out, you systematically build a knowledge foundation that makes every interaction smarter. The workflow is simple: Identify gaps: Use Session Insights to see where the agent struggled or lacked knowledge Add targeted context: Upload runbooks to the Knowledge Base or save facts with User Memories Track improvement: Review subsequent sessions to measure whether your additions improved outcomes Iterate: Continuously refine your context based on real session data This feedback loop transforms ad-hoc troubleshooting into a systematically improving process, where each session makes future sessions more effective. Memory Components at a Glance The memory consists of three complementary components that work together to give your agents comprehensive knowledge: 🧠 User Memories: Quick Chat Commands for Team Knowledge Save facts, preferences, and context using simple chat commands. User Memories are ideal for team standards, service configurations, and workflow patterns that should persist across all conversations. Key benefits: ✅ Instant setup-no configuration required ✅ Managed directly in chat with #remember, #forget, and #retrieve commands ✅ Shared across all team members automatically ✅ Works across all conversations and agents Example commands: #remember Team owns app-service-prod in East US region #remember For latency issues, check Redis cache first #remember Production deployments happen Tuesdays at 2 PM PST When you save a memory, it's instantly available across all your team's conversations. The agent automatically retrieves relevant memories during reasoning, no additional configuration needed. Saving team knowledge with the #remember command Use #retrieve to search and display your saved memories: Retrieving saved memories with the #retrieve command 📚 Knowledge Base: Direct Document Uploads for Runbooks and Guides Upload markdown and text files directly to the agent's knowledge base. Documents are automatically indexed using semantic search and available for agent retrieval during troubleshooting. The Knowledge Base uses intelligent indexing that combines keyword matching with semantic similarity. Documents are automatically split into optimal chunks, so agents retrieve the most relevant sections, not entire documents. Key benefits: ✅ Supports .md and .txt files (up to 16MB per file) ✅ Automatic chunking and semantic indexing ✅ Simple file upload interface ✅ Instant availability after upload Best for: Static runbooks, troubleshooting guides, internal documentation, and configuration templates. Navigate to Settings > Knowledge Base to access document management. There you will find Add File, allows you to upload txt and md file(s) and Delete, allows you to delete individual or bulk files. 📊 Session Insights: Automated Analysis of Your Troubleshooting Sessions Get automated feedback on your troubleshooting sessions with timelines, performance analysis, and key learnings. Session Insights help you understand what happened, learn from mistakes, and continuously improve. Key benefits: ✅ Automatic analysis after conversations complete ✅ Chronological timeline of actions taken ✅ Performance scoring with specific improvement suggestions ✅ Key learnings for future sessions Navigate to Settings > Session Insights to view your troubleshooting analysis: Session Insights dashboard showing analysis of past troubleshooting sessions You can also manually trigger insight generation for any conversation by clicking the Generate Session Insights icon in the chat footer: Manually triggering Session Insights generation Each insight includes: Timeline: A chronological narrative showing what actions were taken and their outcomes What Went Well: Highlights correct understanding and effective actions Areas for Improvement: Shows what could be done better with specific remediation steps Key Learnings: Actionable takeaways for future sessions Investigation Quality Score: Sessions rated on a 1-5 scale for completeness How Azure SRE Agent Use Memory: The SearchMemory Tool During conversations, incident handling, and scheduled tasks, Azure SRE Agents search across memory sources to retrieve relevant context using the SearchMemory tool. Enabling Memory Retrieval in Custom Sub-Agents When building custom sub-agents with the Sub-Agent Builder, you can enable memory retrieval by adding the SearchMemory tool to your sub-agent's toolset. This allows your custom automation to leverage all the knowledge stored in User Memories and the Knowledge Base. How it works: In the Sub-Agent Builder, add the SearchMemory tool to your sub-agent's available tools The tool automatically searches across all memory sources using intelligent retrieval Your sub-agent receives relevant context to inform its responses and actions This means your custom sub-agents, whether handling specific incident types, automating runbook execution, or performing scheduled health checks, can all benefit from your team's accumulated knowledge. Choosing the Right Memory Type Feature User Memories Knowledge Base Setup Instant (chat commands) Quick (file upload) Management Chat commands Portal UI Content Size Short facts Documents (up to 16MB) Best Use Case Team preferences Static runbooks Team Sharing ✅ Shared ✅ Shared Quick guidance: User Memories: Short, focused facts (1-2 sentences) for immediate team context Knowledge Base: Well-structured documents with clear headers for procedural knowledge Getting Started in Minutes 1. Start with User Memories Open any chat with your Azure SRE Agent and save immediate team knowledge: #remember Team owns services: app-service-prod, redis-cache-prod, and sql-db-prod #remember For latency issues, check Redis cache health first #remember Team uses East US for production workloads That's it, these facts are now available across all conversations. 2. Upload Key Documents Add critical runbooks and guides to the Knowledge Base: Navigate to Settings > Knowledge Base Upload .md or .txt files Files are automatically indexed and available immediately 3. Review Session Insights After troubleshooting sessions, check Settings > Session Insights to see what went well and where the agent needs more context. Use this feedback to identify gaps and add targeted memories or documentation. Best Practices for Building Agent Memory Content Organization Keep memories focused and specific Use consistent terminology across your team Avoid duplication, choose one source of truth for each piece of information Security Never store: ❌ Credentials, API keys, or secrets ❌ Personal identifiable information (PII) ❌ Customer data or logs ❌ Confidential business information Maintenance Regularly review and update memories Remove outdated information using #forget Consolidate duplicate entries Use #retrieve to audit what's been saved The Impact: Smarter Troubleshooting, Lower MTTR The Azure SRE Agent memory delivers measurable improvements: Faster troubleshooting: Agents immediately understand your environment and preferences Reduced toil: No more repeatedly explaining the same context Institutional knowledge capture: Critical team knowledge persists even as team members change Continuous improvement: Each session makes future sessions more effective By systematically building your agent's knowledge foundation, you create an operations assistant that truly understands your environment, reducing mean time to resolution (MTTR) and freeing your team to focus on high-value work. Ready to Get Started? Azure SRE Agent home page Product documentation Pricing information Demo recordings What's Next? We're continually enhancing the memory based on customer feedback. Your input is critical, use the thumbs up/down feedback in the agent, or share your thoughts in our GitHub repo. What operational knowledge would you like your AI agent to remember? Let us know! This blog post is part of our ongoing series on Azure SRE Agent capabilities. See our previous post on automation, integration, and extensibility features.361Views0likes0CommentsApplying DevOps Principles on Lean Infrastructure. Lessons From Scaling to 102K Users.
Hi Azure Community, I'm a Microsoft Certified DevOps Engineer, and I want to share an unusual journey. I have been applying DevOps principles on traditional VPS infrastructure to scale to 102,000 users with 99.2% uptime. Why am I posting this in an Azure community? Because I'm planning migration to Azure in 2026, and I want to understand: What mistakes am I already making that will bite me during migration? THE CURRENT SETUP Platform: Social commerce (West Africa) Users: 102,000 active Monthly events: 2 million Uptime: 99.2% Infrastructure: Single VPS Stack: PHP/Laravel, MySQL, Redis Yes - one VPS. No cloud. No Kubernetes. No microservices. WHY I HAVEN'T USED AZURE YET Honest answer: Budget constraints in emerging market startup ecosystem. At our current scale, fully managed Azure services would significantly increase monthly burn before product-market expansion. The funding we raised needs to last through growth milestones. The trade: I manually optimize what Azure would auto-scale. I debug what Application Insights would catch. I do by hand what Azure Functions would automate. DEVOPS PRACTICES THAT KEPT US RUNNING Even on single-server infrastructure, core DevOps principles still apply: CI/CD Pipeline (GitHub Actions) • 3-5 deployments weekly • Zero-downtime deploys • Automated rollback on health check failures • Feature flags for gradual rollouts Monitoring & Observability • Custom monitoring (would love Application Insights) • Real-time alerting • Performance tracking and slow query detection • Resource usage monitoring Automation • Automated backups • Automated database optimization • Automated image compression • Automated security updates Infrastructure as Code • Configs in Git • Deployment scripts • Environment variables • Documented procedures Testing & Quality • Automated test suite • Pre-deployment health checks • Staging environment • Post-deployment verification KEY OPTIMIZATIONS Async Job Processing • Upload endpoint: 8 seconds → 340ms • 4x capacity increase Database Optimization • Feed loading: 6.4 seconds → 280ms • Strategic caching • Batch processing Image Compression • 3-8MB → 180KB (94% reduction) • Critical for mobile users Caching Strategy • Redis for hot data • Query result caching • Smart invalidation Progressive Enhancement • Server-rendered pages • 2-3 second loads on 4G WHAT I'M WORRIED ABOUT FOR AZURE MIGRATION This is where I need your help: Architecture Decisions • App Service vs Functions + managed services? • MySQL vs Azure SQL? • When does cost/benefit flip for managed services? Cost Management • How do startups manage Azure costs during growth? • Reserved instances vs pay-as-you-go? • Which Azure services are worth the premium? Migration Strategy • Lift-and-shift first, or re-architect immediately? • Zero-downtime migration with 102K active users? • Validation approach before full cutover? Monitoring & DevOps • Application Insights - worth it from day one? • Azure DevOps vs GitHub Actions for Azure deployments? • Operational burden reduction with managed services? Development Workflow • Local development against Azure services? • Cost-effective staging environments? • Testing Azure features without constant bills? MY PLANNED MIGRATION PATH Phase 1: Hybrid (Q1 2026) • Azure CDN for static assets • Azure Blob Storage for images • Application Insights trial • Keep compute on VPS Phase 2: Compute Migration (Q2 2026) • App Service for API • Azure Database for MySQL • Azure Cache for Redis • VPS for background jobs Phase 3: Full Azure (Q3 2026) • Azure Functions for processing • Full managed services • Retire VPS QUESTIONS FOR THIS COMMUNITY Question 1: Am I making migration harder by waiting? Should I have started with Azure at higher cost to avoid technical debt? Question 2: What will break when I migrate? What works on VPS but fails in cloud? What assumptions won't hold? Question 3: How do I validate before cutting over? Parallel infrastructure? Gradual traffic shift? Safe patterns? Question 4: Cost optimization from day one? What to optimize immediately vs later? Common cost mistakes? Question 5: DevOps practices that transfer? What stays the same? What needs rethinking for cloud-native? THE BIGGER QUESTION Have you migrated from self-hosted to Azure? What surprised you? I know my setup isn't best practice by Azure standards. But it's working, and I've learned optimization, monitoring, and DevOps fundamentals in practice. Will those lessons transfer? Or am I building habits that cloud will expose as problematic? Looking forward to insights from folks who've made similar migrations. --- About the Author: Microsoft Certified DevOps Engineer and Azure Developer. CTO at social commerce platform scaling in West Africa. Preparing for phased Azure migration in 2026. P.S. I got the Azure certifications to prepare for this migration. Now I need real-world wisdom from people who've actually done it!44Views0likes0CommentsExpanding the Public Preview of the Azure SRE Agent
We are excited to share that the Azure SRE Agent is now available in public preview for everyone instantly – no sign up required. A big thank you to all our preview customers who provided feedback and helped shape this release! Watching teams put the SRE Agent to work taught us a ton, and we’ve baked those lessons into a smarter, more resilient, and enterprise-ready experience. You can now find Azure SRE Agent directly in the Azure Portal and get started, or use the link below. 📖 Learn more about SRE Agent. 👉 Create your first SRE Agent (Azure login required) What’s New in Azure SRE Agent - October Update The Azure SRE Agent now delivers secure-by-default governance, deeper diagnostics, and extensible automation—built for scale. It can even resolve incidents autonomously by following your team’s runbooks. With native integrations across Azure Monitor, GitHub, ServiceNow, and PagerDuty, it supports root cause analysis using both source code and historical patterns. And since September 1, billing and reporting are available via Azure Agent Units (AAUs). Please visit product documentation for the latest updates. Here are a few highlights for this month: Prioritizing enterprise governance and security: By default, the Azure SRE Agent operates with least-privilege access and never executes write actions on Azure resources without explicit human approval. Additionally, it uses role-based access control (RBAC) so organizations can assign read-only or approver roles, providing clear oversight and traceability from day one. This allows teams to choose their desired level of autonomy from read-only insights to approval-gated actions to full automation without compromising control. Covering the breadth and depth of Azure: The Azure SRE Agent helps teams manage and understand their entire Azure footprint. With built-in support for AZ CLI and kubectl, it works across all Azure services. But it doesn’t stop there—diagnostics are enhanced for platforms like PostgreSQL, API Management, Azure Functions, AKS, Azure Container Apps, and Azure App Service. Whether you're running microservices or managing monoliths, the agent delivers consistent automation and deep insights across your cloud environment. Automating Incident Management: The Azure SRE Agent now plugs directly into Azure Monitor, PagerDuty, and ServiceNow to streamline incident detection and resolution. These integrations let the Agent ingest alerts and trigger workflows that match your team’s existing tools—so you can respond faster, with less manual effort. Engineered for extensibility: The Azure SRE Agent incident management approach lets teams reuse existing runbooks and customize response plans to fit their unique workflows. Whether you want to keep a human in the loop or empower the Agent to autonomously mitigate and resolve issues, the choice is yours. This flexibility gives teams the freedom to evolve—from guided actions to trusted autonomy—without ever giving up control. Root cause, meet source code: The Azure SRE Agent now supports code-aware root cause analysis (RCA) by linking diagnostics directly to source context in GitHub and Azure DevOps. This tight integration helps teams trace incidents back to the exact code changes that triggered them—accelerating resolution and boosting confidence in automated responses. By bridging operational signals with engineering workflows, the agent makes RCA faster, clearer, and more actionable. Close the loop with DevOps: The Azure SRE Agent now generates incident summary reports directly in GitHub and Azure DevOps—complete with diagnostic context. These reports can be assigned to a GitHub Copilot coding agent, which automatically creates pull requests and merges validated fixes. Every incident becomes an actionable code change, driving permanent resolution instead of temporary mitigation. Getting Started Start here: Create a new SRE Agent in the Azure portal (Azure login required) Blog: Announcing a flexible, predictable billing model for Azure SRE Agent Blog: Enterprise-ready and extensible – Update on the Azure SRE Agent preview Product documentation Product home page Community & Support We’d love to hear from you! Please use our GitHub repo to file issues, request features, or share feedback with the team5.5KViews2likes3CommentsAnnouncing a flexible, predictable billing model for Azure SRE Agent
Billing for Azure SRE Agent will start on September 1, 2025. Announced at Microsoft Build 2025, Azure SRE Agent is a pre-built AI agent for root cause analysis, uptime improvement, and operational cost reduction. Learn more about the billing model and example scenarios.3.5KViews1like1CommentReimagining AI Ops with Azure SRE Agent: New Automation, Integration, and Extensibility features
Azure SRE Agent offers intelligent and context aware automation for IT operations. Enhanced by customer feedback from our preview, the SRE Agent has evolved into an extensible platform to automate and manage tasks across Azure and other environments. Built on an Agentic DevOps approach - drawing from proven practices in internal Azure operations - the Azure SRE Agent has already saved over 20,000 engineering hours across Microsoft product teams operations, delivering strong ROI for teams seeking sustainable AIOps. An Operations Agent that adapts to your playbooks Azure SRE Agent is an AI powered operations automation platform that empowers SREs, DevOps, IT operations, and support teams to automate tasks such as incident response, customer support, and developer operations from a single, extensible agent. Its value proposition and capabilities have evolved beyond diagnosis and mitigation of Azure issues, to automating operational workflows and seamless integration with the standards and processes used in your organization. SRE Agent is designed to automate operational work and reduce toil, enabling developers and operators to focus on high-value tasks. By streamlining repetitive and complex processes, SRE Agent accelerates innovation and improves reliability across cloud and hybrid environments. In this article, we will look at what’s new and what has changed since the last update. What’s New: Automation, Integration, and Extensibility Azure SRE Agent just got a major upgrade. From no-code automation to seamless integrations and expanded data connectivity, here’s what’s new in this release: No-code Sub-Agent Builder: Rapidly create custom automations without writing code. Flexible, event-driven triggers: Instantly respond to incidents and operational changes. Expanded data connectivity: Unify diagnostics and troubleshooting across more data sources. Custom actions: Integrate with your existing tools and orchestrate end-to-end workflows via MCP. Prebuilt operational scenarios: Accelerate deployment and improve reliability out of the box. Unlike generic agent platforms, Azure SRE Agent comes with deep integrations, prebuilt tools, and frameworks specifically for IT, DevOps, and SRE workflows. This means you can automate complex operational tasks faster and more reliably, tailored to your organization’s needs. Sub-Agent Builder: Custom Automation, No Code Required Empower teams to automate repetitive operational tasks without coding expertise, dramatically reducing manual workload and development cycles. This feature helps address the need for targeted automation, letting teams solve specific operational pain points without relying on one-size-fits-all solutions. Modular Sub-Agents: Easily create custom sub-agents tailored to your team’s needs. Each sub-agent can have its own instructions, triggers, and toolsets, letting you automate everything from outage response to customer email triage. Prebuilt System Tools: Eliminate the inefficiency of creating basic automation from scratch, and choose from a rich library of hundreds of built-in tools for Azure operations, code analysis, deployment management, diagnostics, and more. Custom Logic: Align automation to your unique business processes by defining your automation logic and prompts, teaching the agent to act exactly as your workflow requires. Flexible Triggers: Automate on Your Terms Invoke the agent to respond automatically to mission-critical events, not wait for manual commands. This feature helps speed up incident response and eliminate missed opportunities for efficiency. Multi-Source Triggers: Go beyond chat-based interactions, and trigger the agent to automatically respond to Incident Management and Ticketing systems like PagerDuty and ServiceNow, Observability Alerting systems like Azure Monitor Alerts, or even on a cron-based schedule for proactive monitoring and best-practices checks. Additional trigger sources such as GitHub issues, Azure DevOps pipelines, email, etc. will be added over time. This means automation can start exactly when and where you need it. Event-Driven Operations: Integrate with your CI/CD, monitoring, or support systems to launch automations in response to real-world events - like deployments, incidents, or customer requests. Vital for reducing downtime, it ensures that business-critical actions happen automatically and promptly. Expanded Data Connectivity: Unified Observability and Troubleshooting Integrate data, enabling comprehensive diagnostics and troubleshooting and faster, more informed decision-making by eliminating silos and speeding up issue resolution. Multiple Data Sources: The agent can now read data from Azure Monitor, Log Analytics, and Application Insights based on its Azure role-based access control (RBAC). Additional observability data sources such as Dynatrace, New Relic, Datadog, and more can be added via the Remote Model Context Protocol (MCP) servers for these tools. This gives you a unified view for diagnostics and automation. Knowledge Integration: Rather than manually detailing every instruction in your prompt, you can upload your Troubleshooting Guide (TSG) or Runbook directly, allowing the agent to automatically create an execution plan from the file. You may also connect the agent to resources like SharePoint, Jira, or documentation repositories through Remote MCP servers, enabling it to retrieve needed files on its own. This approach utilizes your organization’s existing knowledge base, streamlining onboarding and enhancing consistency in managing incidents. Azure SRE Agent is also building multi-agent collaboration by integrating with PagerDuty and Neubird, enabling advanced, cross-platform incident management and reliability across diverse environments. Custom Actions: Automate Anything, Anywhere Extend automation beyond Azure and integrate with any tool or workflow, solving the problem of limited automation scope and enabling end-to-end process orchestration. Out-of-the-Box Actions: Instantly automate common tasks like running azcli, kubectl, creating GitHub issues, or updating Azure resources, reducing setup time and operational overhead. Communication Notifications: The SRE Agent now features built-in connectors for Outlook, enabling automated email notifications, and for Microsoft Teams, allowing it to post messages directly to Teams channels for streamlined communication. Bring Your Own Actions: Drop in your own Remote MCP servers to extend the agent’s capabilities to any custom tool or workflow. Future-proof your agentic DevOps by automating proprietary or emerging processes with confidence. Prebuilt Operations Scenarios Address common operational challenges out of the box, saving teams time and effort while improving reliability and customer satisfaction. Incident Response: Minimize business impact and reduce operational risk by automating detection, diagnosis, and mitigation of your workload stack. The agent has built-in runbooks for common issues related to many Azure resource types including Azure Kubernetes Service (AKS), Azure Container Apps (ACA), Azure App Service, Azure Logic Apps, Azure Database for PostgreSQL, Azure CosmosDB, Azure VMs, etc. Support for additional resource types is being added continually, please see product documentation for the latest information. Root Cause Analysis & IaC Drift Detection: Instantly pinpoint incident causes with AI-driven root cause analysis including automated source code scanning via GitHub and Azure DevOps integration. Proactively detect and resolve infrastructure drift by comparing live cloud environments against source-controlled IaC, ensuring configuration consistency and compliance. Handle Complex Investigations: Enable the deep investigation mode that uses a hypothesis-driven method to analyze possible root causes. It collects logs and metrics, tests hypotheses with iterative checks, and documents findings. The process delivers a clear summary and actionable steps to help teams accurately resolve critical issues. Incident Analysis: The integrated dashboard offers a comprehensive overview of all incidents managed by the SRE Agent. It presents essential metrics, including the number of incidents reviewed, assisted, and mitigated by the agent, as well as those awaiting human intervention. Users can leverage aggregated visualizations and AI-generated root cause analyses to gain insights into incident processing, identify trends, enhance response strategies, and detect areas for improvement in incident management. Inbuilt Agent Memory: The new SRE Agent Memory System transforms incident response by institutionalizing the expertise of top SREs - capturing, indexing, and reusing critical knowledge from past incidents, investigations, and user guidance. Benefit from faster, more accurate troubleshooting, as the agent learns from both successes and mistakes, surfacing relevant insights, runbooks, and mitigation strategies exactly when needed. This system leverages advanced retrieval techniques and a domain-aware schema to ensure every on-call engagement is smarter than the last, reducing mean time to resolution (MTTR) and minimizing repeated toil. Automatically gain a continuously improving agent that remembers what works, avoids past pitfalls, and delivers actionable guidance tailored to the environment. GitHub Copilot and Azure DevOps Integration: Automatically triage, respond to, and resolve issues raised in GitHub or Azure DevOps. Integration with modern development platforms such as GitHub Copilot coding agent increases efficiency and ensures that issues are resolved faster, reducing bottlenecks in the development lifecycle. Ready to get started? Azure SRE Agent home page Product overview Pricing Page Pricing Calculator Pricing Blog Demo recordings Deployment samples What’s Next? Give us feedback: Your feedback is critical - You can Thumbs Up / Thumbs Down each interaction or thread, or go to the “Give Feedback” button in the agent to give us in-product feedback - or you can create issues or just share your thoughts in our GitHub repo at https://github.com/microsoft/sre-agent. We’re just getting started. In the coming months, expect even more prebuilt integrations, expanded data sources, and new automation scenarios. We anticipate continuous growth and improvement throughout our agentic AI platforms and services to effectively address customer needs and preferences. Let us know what Ops toil you want to automate next!2.2KViews0likes0CommentsAzure SRE Agent: Expanding Observability and Multi-Cloud Resilience
The Azure SRE Agent continues to evolve as a cornerstone for operational excellence and incident management. Over the past few months, we have made significant strides in enabling integrations with leading observability platforms—Dynatrace, New Relic, and Datadog—through Model Context Protocol (MCP) Servers. These partnerships serve joint customers, enabling automated remediation across diverse environments. Deepening Integrations with MCP Servers Our collaboration with these partners is more than technical—it’s about delivering value at scale. Datadog, New Relic, and Dynatrace are all Azure Native ISV Service partners. With these integrations Azure Native customers can also choose to add these MCP servers directly from the Azure Native partners’ resource: Datadog: At Ignite, Azure SRE Agent was presented with the Datadog MCP Server, to demonstrate how our customers can streamline complex workflows. Customers can now bring their Datadog MCP Server into Azure SRE Agent, extending knowledge capabilities and centralizing logs and metrics. Find Datadog Azure Native offerings on Marketplace. New Relic: When an alert fires in New Relic, the Azure SRE Agent calls the New Relic MCP Server to provide Intelligent Observability insights. This agentic integration with the New Relic MCP Server offers over 35 specialized tools across, entity and account management, alerts and monitoring, data analysis and queries, performance analysis, and much more. The advanced remediation skills of the Azure SRE Agent + New Relic AI help our joint customers diagnose and resolve production issues faster. Find New Relic’s Azure Native offering on Marketplace Dynatrace: The Dynatrace integration bridges Microsoft Azure's cloud-native infrastructure management with Dynatrace's AI-powered observability platform, leveraging the Davis AI engine and remote MCP server capabilities for incident detection, root cause analysis, and remediation across hybrid cloud environments. Check out Dynatrace’s Azure Native offering on Marketplace. These integrations are made possible by Azure SRE Agent’s MCP connectors. The MCP connectors in Azure SRE Agent act as the bridge between the agent and MCP servers, enabling dynamic discovery and execution of specialized tools for observability and incident management across diverse environments. This feature allows customers to build their own custom sub-agents to leverage tools from MCP Servers from integrated platforms like Dynatrace, Datadog, and New Relic to complement the agent’s diagnostic and remediation capabilities. By connecting Azure SRE Agent to external MCP servers scenarios such as cross-platform telemetry analysis are unlocked. Looking Ahead: Multi-Agent Collaboration Azure SRE Agent isn’t stopping with MCP integrations We’re actively working with PagerDuty and NeuBird to support dynamic use cases via agent-to-agent collaboration: PagerDuty: PagerDuty’s PD Advance SRE Agent is an AI-powered assistant that triages incidents by analyzing logs, diagnostics, past incident history, and runbooks to surface relevant context and recommended remediations. At Ignite, PagerDuty and Microsoft demonstrated how Azure SRE Agent can ingest PagerDuty incidents and collaborate with PagerDuty’s SRE Agent to complement triage using historical patterns, runbook intelligence and Azure diagnostics. NeuBird: NeuBird’s Agentic AI SRE, Hawkeye, autonomously investigates and resolves incidents across hybrid, and multi-cloud environments. By connecting to telemetry sources like Azure Monitor, Prometheus, and GitHub, Hawkeye delivers real-time diagnosis and targeted fixes. Building on the work presented at SRE Day this partnership underscores our commitment to agentic ecosystems where specialized agents collaborate for complex scenarios. Sign up for the private preview to try the integration, here. Additionally, please check out NeuBird on Marketplace. These efforts reflect a broader vision: Azure SRE Agent as a hub for cross-platform reliability, enabling customers to manage incidents across Azure, on-premises, and other clouds with confidence. Why This Matters As organizations embrace distributed architectures, the need for integrated, intelligent, and multi-cloud-ready SRE solutions has never been greater. By partnering with industry leaders and pioneering agent-to-agent workflows, Azure SRE Agent is setting the stage for a future where resilience is not just reactive—it’s proactive and collaborative.632Views2likes0Comments