best practices
120 TopicsThe Agent that investigates itself
Azure SRE Agent handles tens of thousands of incident investigations each week for internal Microsoft services and external teams running it for their own systems. Last month, one of those incidents was about the agent itself. Our KV cache hit rate alert started firing. Cached token percentage was dropping across the fleet. We didn't open dashboards. We simply asked the agent. It spawned parallel subagents, searched logs, read through its own source code, and produced the analysis. First finding: Claude Haiku at 0% cache hits. The agent checked the input distribution and found that the average call was ~180 tokens, well below Anthropic’s 4,096-token minimum for Haiku prompt caching. Structurally, these requests could never be cached. They were false positives. The real regression was in Claude Opus: cache hit rate fell from ~70% to ~48% over a week. The agent correlated the drop against the deployment history and traced it to a single PR that restructured prompt ordering, breaking the common prefix that caching relies on. It submitted two fixes: one to exclude all uncacheable requests from the alert, and the other to restore prefix stability in the prompt pipeline. That investigation is how we develop now. We rarely start with dashboards or manual log queries. We start by asking the agent. Three months earlier, it could not have done any of this. The breakthrough was not building better playbooks. It was harness engineering: enabling the agent to discover context as the investigation unfolded. This post is about the architecture decisions that made it possible. Where we started In our last post, Context Engineering for Reliable AI Agents: Lessons from Building Azure SRE Agent, we described how moving to a single generalist agent unlocked more complex investigations. The resolution rates were climbing, and for many internal teams, the agent could now autonomously investigate and mitigate roughly 50% of incidents. We were moving in the right direction. But the scores weren't uniform, and when we dug into why, the pattern was uncomfortable. The high-performing scenarios shared a trait: they'd been built with heavy human scaffolding. They relied on custom response plans for specific incident types, hand-built subagents for known failure modes, and pre-written log queries exposed as opaque tools. We weren’t measuring the agent’s reasoning – we were measuring how much engineering had gone into the scenario beforehand. On anything new, the agent had nowhere to start. We found these gaps through manual review. Every week, engineers read through lower-scored investigation threads and pushed fixes: tighten a prompt, fix a tool schema, add a guardrail. Each fix was real. But we could only review fifty threads a week. The agent was handling ten thousand. We were debugging at human speed. The gap between those two numbers was where our blind spots lived. We needed an agent powerful enough to take this toil off us. An agent which could investigate itself. Dogfooding wasn't a philosophy - it was the only way to scale. The Inversion: Three bets The problem we faced was structural - and the KV cache investigation shows it clearly. The cache rate drop was visible in telemetry, but the cause was not. The agent had to correlate telemetry with deployment history, inspect the relevant code, and reason over the diff that broke prefix stability. We kept hitting the same gap in different forms: logs pointing in multiple directions, failure modes in uninstrumented paths, regressions that only made sense at the commit level. Telemetry showed symptoms, but not what actually changed. We'd been building the agent to reason over telemetry. We needed it to reason over the system itself. The instinct when agents fail is to restrict them: pre-write the queries, pre-fetch the context, pre-curate the tools. It feels like control. In practice, it creates a ceiling. The agent can only handle what engineers anticipated in advance. The answer is an agent that can discover what it needs as the investigation unfolds. In the KV cache incident, each step, from metric anomaly to deployment history to a specific diff, followed from what the previous step revealed. It was not a pre-scripted path. Navigating towards the right context with progressive discovery is key to creating deep agents which can handle novel scenarios. Three architectural decisions made this possible – and each one compounded on the last. Bet 1: The Filesystem as the Agent's World Our first bet was to give the agent a filesystem as its workspace instead of a custom API layer. Everything it reasons over – source code, runbooks, query schemas, past investigation notes – is exposed as files. It interacts with that world using read_file, grep, find, and shell. No SearchCodebase API. No RetrieveMemory endpoint. This is an old Unix idea: reduce heterogeneous resources to a single interface. Coding agents already work this way. It turns out the same pattern works for an SRE agent. Frontier models are trained on developer workflows: navigating repositories, grepping logs, patching files, running commands. The filesystem is not an abstraction layered on top of that prior. It matches it. When we materialized the agent’s world as a repo-like workspace, our human "Intent Met" score - whether the agent's investigation addressed the actual root cause as judged by the on-call engineer - rose from 45% to 75% on novel incidents. But interface design is only half the story. The other half is what you put inside it. Code Repositories: the highest-leverage context Teams had prewritten log queries because they did not trust the agent to generate correct ones. That distrust was justified. Models hallucinate table names, guess column schemas, and write queries against the wrong cluster. But the answer was not tighter restriction. It was better grounding. The repo is the schema. Everything else is derived from it. When the agent reads the code that produces the logs, query construction stops being guesswork. It knows the exact exceptions thrown, and the conditions under which each path executes. Stack traces start making sense, and logs become legible. But beyond query grounding, code access unlocked three new capabilities that telemetry alone could not provide: Ground truth over documentation. Docs drift and dashboards show symptoms. The code is what the service actually does. In practice, most investigations only made sense when logs were read alongside implementation. Point-in-time investigation. The agent checks out the exact commit at incident time, not current HEAD, so it can correlate the failure against the actual diffs. That's what cracked the KV cache investigation: a PR broke prefix stability, and the diff was the only place this was visible. Without commit history, you can't distinguish a code regression from external factors. Reasoning even where telemetry is absent. Some code paths are not well instrumented. The agent can still trace logic through source and explain behavior even when logs do not exist. This is especially valuable in novel failure modes – the ones most likely to be missed precisely because no one thought to instrument them. Memory as a filesystem, not a vector store Our first memory system used RAG over past session learnings. It had a circular dependency: a limited agent learned from limited sessions and produced limited knowledge. Garbage in, garbage out. But the deeper problem was retrieval. In SRE Context, embedding similarity is a weak proxy for relevance. “KV cache regression” and “prompt prefix instability” may be distant in embedding space yet still describe the same causal chain. We tried re-ranking, query expansion, and hybrid search. None fixed the core mismatch between semantic similarity and diagnostic relevance. We replaced RAG with structured Markdown files that the agent reads and writes through its standard tool interface. The model names each file semantically: overview.md for a service summary, team.md for ownership and escalation paths, logs.md for cluster access and query patterns, debugging.md for failure modes and prior learnings. Each carry just enough context to orient the agent, with links to deeper files when needed. The key design choice was to let the model navigate memory, not retrieve it through query matching. The agent starts from a structured entry point and follows the evidence toward what matters. RAG assumes you know the right query before you know what you need. File traversal lets relevance emerge as context accumulates. This removed chunking, overlap tuning, and re-ranking entirely. It also proved more accurate, because frontier models are better at following context than embeddings are at guessing relevance. As a side benefit, memory state can be snapshotted periodically. One problem remains unsolved: staleness. When two sessions write conflicting patterns to debugging.md, the model must reconcile them. When a service changes behavior, old entries can become misleading. We rely on timestamps and explicit deprecation notes, but we do not have a systemic solution yet. This is an active area of work, and anyone building memory at scale will run into it. The sandbox as epistemic boundary The filesystem also defines what the agent can see. If something is not in the sandbox, the agent cannot reason about it. We treat that as a feature, not a limitation. Security boundaries and epistemic boundaries are enforced by the same mechanism. Inside that boundary, the agent has full execution: arbitrary bash, python, jq, and package installs through pip or apt. That scope unlocks capabilities we never would have built as custom tools. It opens PRs with gh cli, like the prompt-ordering fix from KV cache incident. It pushes Grafana dashboards, like a cache-hit-rate dashboard we now track by model. It installs domain-specific CLI tools mid-investigation when needed. No bespoke integration required, just a shell. The recurring lesson was simple: a generally capable agent in the right execution environment outperforms a specialized agent with bespoke tooling. Custom tools accumulate maintenance costs. Shell commands compose for free. Bet 2: Context Layering Code access tells the agent what a service does. It does not tell the agent what it can access, which resources its tools are scoped to, or where an investigation should begin. This gap surfaced immediately. Users would ask "which team do you handle incidents for?" and the agent had no answer. Tools alone are not enough. An integration also needs ambient context so the model knows what exists, how it is configured, and when to use it. We fixed this with context hooks: structured context injected at prompt construction time to orient the agent before it takes action. Connectors - what can I access? A manifest of wired systems such as Log Analytics, Outlook, and Grafana, along with their configuration. Repositories - what does this system do? Serialized repo trees, plus files like AGENTS.md, Copilot.md, and CLAUDE.md with team-specific instructions. Knowledge map - what have I learned before? A two-tier memory index with a top-level file linking to deeper scenario-specific files, so the model can drill down only when needed. Azure resource topology - where do things live? A serialized map of relationships across subscriptions, resource groups, and regions, so investigations start in the right scope. Together, these context hooks turn a cold start into an informed one. That matters because a bad early choice does not just waste tokens. It sends the investigation down the wrong trajectory. A capable agent still needs to know what exists, what matters, and where to start. Bet 3: Frugal Context Management Layered context creates a new problem: budget. Serialized repo trees, resource topology, connector manifests, and a memory index fill context fast. Once the agent starts reading source files and logs, complex incidents hit context limits. We needed our context usage to be deliberately frugal. Tool result compression via the filesystem Large tool outputs are expensive because they consume context before the agent has extracted any value from them. In many cases, only a small slice or a derived summary of that output is actually useful. Our framework exposes these results as files to the agent. The agent can then use tools like grep, jq, or python to process them outside the model interface, so that only the final result enters context. The filesystem isn't just a capability abstraction - it's also a budget management primitive. Context Pruning and Auto Compact Long investigations accumulate dead weight. As hypotheses narrow, earlier context becomes noise. We handle this with two compaction strategies. Context Pruning runs mid-session. When context usage crosses a threshold, we trim or drop stale tool calls and outputs - keeping the window focused on what still matters. Auto-Compact kicks in when a session approaches its context limit. The framework summarizes findings and working hypotheses, then resumes from that summary. From the user's perspective, there's no visible limit. Long investigations just work. Parallel subagents The KV cache investigation required reasoning along two independent hypotheses: whether the alert definition was sound, and whether cache behavior had actually regressed. The agent spawned parallel subagents for each task, each operating in its own context window. Once both finished, it merged their conclusions. This pattern generalizes to any task with independent components. It speeds up the search, keeps intermediate work from consuming the main context window, and prevents one hypothesis from biasing another. The Feedback loop These architectural bets have enabled us to close the original scaling gap. Instead of debugging the agent at human speed, we could finally start using it to fix itself. As an example, we were hitting various LLM errors: timeouts, 429s (too many requests), failures in the middle of response streaming, 400s from code bugs that produced malformed payloads. These paper cuts would cause investigations to stall midway and some conversations broke entirely. So, we set up a daily monitoring task for these failures. The agent searches for the last 24 hours of errors, clusters the top hitters, traces each to its root cause in the codebase, and submits a PR. We review it manually before merging. Over two weeks, the errors were reduced by more than 80%. Over the last month, we have successfully used our agent across a wide range of scenarios: Analyzed our user churn rate and built dashboards we now review weekly. Correlated which builds needed the most hotfixes, surfacing flaky areas of the codebase. Ran security analysis and found vulnerabilities in the read path. Helped fill out parts of its own Responsible AI review, with strict human review. Handles customer-reported issues and LiveSite alerts end to end. Whenever it gets stuck, we talk to it and teach it, ask it to update its memory, and it doesn't fail that class of problem again. The title of this post is literal. The agent investigating itself is not a metaphor. It is a real workflow, driven by scheduled tasks, incident triggers, and direct conversations with users. What We Learned We spent months building scaffolding to compensate for what the agent could not do. The breakthrough was removing it. Every prewritten query was a place we told the model not to think. Every curated tool was a decision made on its behalf. Every pre-fetched context was a guess about what would matter before we understood the problem. The inversion was simple but hard to accept: stop pre-computing the answer space. Give the model a structured starting point, a filesystem it knows how to navigate, context hooks that tell it what it can access, and budget management that keeps it sharp through long investigations. The agent that investigates itself is both the proof and the product of this approach. It finds its own bugs, traces them to root causes in its own code, and submits its own fixes. Not because we designed it to. Because we designed it to reason over systems, and it happens to be one. We are still learning. Staleness is unsolved, budget tuning remains largely empirical, and we regularly discover assumptions baked into context that quietly constrain the agent. But we have crossed a new threshold: from an agent that follows your playbook to one that writes the next one. Thanks to visagarwal for co-authoring this post.13KViews6likes0CommentsAn AI led SDLC: Building an End-to-End Agentic Software Development Lifecycle with Azure and GitHub.
This is due to the inevitable move towards fully agentic, end-to-end SDLCs. We may not yet be at a point where software engineers are managing fleets of agents creating the billion-dollar AI abstraction layer, but (as I will evidence in this article) we are certainly on the precipice of such a world. Before we dive into the reality of agentic development today, let me examine two very different modules from university and their relevance in an AI-first development environment. Manual Requirements Translation. At university I dedicated two whole years to a unit called “Systems Design”. This was one of my favourite units, primarily focused on requirements translation. Often, I would receive a scenario between “The Proprietor” and “The Proprietor’s wife”, who seemed to be in a never-ending cycle of new product ideas. These tasks would be analysed, broken down, manually refined, and then mapped to some kind of early-stage application architecture (potentially some pseudo-code and a UML diagram or two). The big intellectual effort in this exercise was taking human intention and turning it into something tangible to build from (BA’s). Today, by the time I have opened Notepad and started to decipher requirements, an agent can already have created a comprehensive list, a service blueprint, and a code scaffold to start the process (*cough* spec-kit *cough*). Manual debugging. Need I say any more? Old-school debugging with print()’s and breakpoints is dead. I spent countless hours learning to debug in a classroom and then later with my own software, stepping through execution line by line, reading through logs, and understanding what to look for; where correlation did and didn’t mean causation. I think back to my year at IBM as a fresh-faced intern in a cloud engineering team, where around 50% of my time was debugging different issues until it was sufficiently “narrowed down”, and then reading countless Stack Overflow posts figuring out the actual change I would need to make to a PowerShell script or Jenkins pipeline. Already in Azure, with the emergence of SRE agents, that debug process looks entirely different. The debug process for software even more so… #terminallastcommand WHY IS THIS NOT RUNNING? #terminallastcommand Review these logs and surface errors relating to XYZ. As I said: breakpoints are dead, for now at least. Caveat – Is this a good thing? One more deviation from the main core of the article if you would be so kind (if you are not as kind skip to the implementation walkthrough below). Is this actually a good thing? Is a software engineering degree now worthless? What if I love printf()? I don’t know is my answer today, at the start of 2026. Two things worry me: one theoretical and one very real. To start with the theoretical: today AI takes a significant amount of the “donkey work” away from developers. How does this impact cognitive load at both ends of the spectrum? The list that “donkey work” encapsulates is certainly growing. As a result, on one end of the spectrum humans are left with the complicated parts yet to be within an agent’s remit. This could have quite an impact on our ability to perform tasks. If we are constantly dealing with the complex and advanced, when do we have time to re-root ourselves in the foundations? Will we see an increase in developer burnout? How do technical people perform without the mundane or routine tasks? I often hear people who have been in the industry for years discuss how simple infrastructure, computing, development, etc. were 20 years ago, almost with a longing to return to a world where today’s zero trust, globally replicated architectures are a twinkle in an architect’s eye. Is constantly working on only the most complex problems a good thing? At the other end of the spectrum, what if the performance of AI tooling and agents outperforms our wildest expectations? Suddenly, AI tools and agents are picking up more and more of today’s complicated and advanced tasks. Will developers, architects, and organisations lose some ability to innovate? Fundamentally, we are not talking about artificial general intelligence when we say AI; we are talking about incredibly complex predictive models that can augment the existing ideas they are built upon but are not, in themselves, innovators. Put simply, in the words of Scott Hanselman: “Spicy auto-complete”. Does increased reliance on these agents in more and more of our business processes remove the opportunity for innovative ideas? For example, if agents were football managers, would we ever have graduated from Neil Warnock and Mick McCarthy football to Pep? Would every agent just augment a ‘lump it long and hope’ approach? We hear about learning loops, but can these learning loops evolve into “innovation loops?” Past the theoretical and the game of 20 questions, the very real concern I have is off the back of some data shared recently on Stack Overflow traffic. We can see in the diagram below that Stack Overflow traffic has dipped significantly since the release of GitHub Copilot in October 2021, and as the product has matured that trend has only accelerated. Data from 12 months ago suggests that Stack Overflow has lost 77% of new questions compared to 2022… Stack Overflow democratises access to problem-solving (I have to be careful not to talk in past tense here), but I will admit I cannot remember the last time I was reviewing Stack Overflow or furiously searching through solutions that are vaguely similar to my own issue. This causes some concern over the data available in the future to train models. Today, models can be grounded in real, tested scenarios built by developers in anger. What happens with this question drop when API schemas change, when the technology built for today is old and deprecated, and the dataset is stale and never returning to its peak? How do we mitigate this impact? There is potential for some closed-loop type continuous improvement in the future, but do we think this is a scalable solution? I am unsure. So, back to the question: “Is this a good thing?”. It’s great today; the long-term impacts are yet to be seen. If we think that AGI may never be achieved, or is at least a very distant horizon, then understanding the foundations of your technical discipline is still incredibly important. Developers will not only be the managers of their fleet of agents, but also the janitors mopping up the mess when there is an accident (albeit likely mopping with AI-augmented tooling). An AI First SDLC Today – The Reality Enough reflection and nostalgia (I don’t think that’s why you clicked the article), let’s start building something. For the rest of this article I will be building an AI-led, agent-powered software development lifecycle. The example I will be building is an AI-generated weather dashboard. It’s a simple example, but if agents can generate, test, deploy, observe, and evolve this application, it proves that today, and into the future, the process can likely scale to more complex domains. Let’s start with the entry point. The problem statement that we will build from. “As a user I want to view real time weather data for my city so that I can plan my day.” We will use this as the single input for our AI led SDLC. This is what we will pass to promptkit and watch our app and subsequent features built in front of our eyes. The goal is that we will: - Spec-kit to get going and move from textual idea to requirements and scaffold. - Use a coding agent to implement our plan. - A Quality agent to assess the output and quality of the code. - GitHub Actions that not only host the agents (Abstracted) but also handle the build and deployment. - An SRE agent proactively monitoring and opening issues automatically. The end to end flow that we will review through this article is the following: Step 1: Spec-driven development - Spec First, Code Second A big piece of realising an AI-led SDLC today relies on spec-driven development (SDD). One of the best summaries for SDD that I have seen is: “Version control for your thinking”. Instead of huge specs that are stale and buried in a knowledge repository somewhere, SDD looks to make them a first-class citizen within the SDLC. Architectural decisions, business logic, and intent can be captured and versioned as a product evolves; an executable artefact that evolves with the project. In 2025, GitHub released the open-source Spec Kit: a tool that enables the goal of placing a specification at the centre of the engineering process. Specs drive the implementation, checklists, and task breakdowns, steering an agent towards the end goal. This article from GitHub does a great job explaining the basics, so if you’d like to learn more it’s a great place to start (https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/). In short, Spec Kit generates requirements, a plan, and tasks to guide a coding agent through an iterative, structured development process. Through the Spec Kit constitution, organisational standards and tech-stack preferences are adhered to throughout each change. I did notice one (likely intentional) gap in functionality that would cement Spec Kit’s role in an autonomous SDLC. That gap is that the implement stage is designed to run within an IDE or client coding agent. You can now, in the IDE, toggle between task implementation locally or with an agent in the cloud. That is great but again it still requires you to drive through the IDE. Thinking about this in the context of an AI-led SDLC (where we are pushing tasks from Spec Kit to a coding agent outside of my own desktop), it was clear that a bridge was needed. As a result, I used Spec Kit to create the Spec-to-issue tool. This allows us to take the tasks and plan generated by Spec Kit, parse the important parts, and automatically create a GitHub issue, with the option to auto-assign the coding agent. From the perspective of an autonomous AI-led SDLC, Speckit really is the entry point that triggers the flow. How Speckit is surfaced to users will vary depending on the organisation and the context of the users. For the rest of this demo I use Spec Kit to create a weather app calling out to the OpenWeather API, and then add additional features with new specs. With one simple prompt of “/promptkit.specify “Application feature/idea/change” I suddenly had a really clear breakdown of the tasks and plan required to get to my desired end state while respecting the context and preferences I had previously set in my Spec Kit constitution. I had mentioned a desire for test driven development, that I required certain coverage and that all solutions were to be Azure Native. The real benefit here compared to prompting directly into the coding agent is that the breakdown of one large task into individual measurable small components that are clear and methodical improves the coding agents ability to perform them by a considerable degree. We can see an example below of not just creating a whole application but another spec to iterate on an existing application and add a feature. We can see the result of the spec creation, the issue in our github repo and most importantly for the next step, our coding agent, GitHub CoPilot has been assigned automatically. Step 2: GitHub Coding Agent - Iterative, autonomous software creation Talking of coding agents, GitHub Copilot’s coding agent is an autonom ous agent in GitHub that can take a scoped development task and work on it in the background using the repository’s context. It can make code changes and produce concrete outputs like commits and pull requests for a developer to review. The developer stays in control by reviewing, requesting changes, or taking over at any point. This does the heavy lifting in our AI-led SDLC. We have already seen great success with customers who have adopted the coding agent when it comes to carrying out menial tasks to save developers time. These coding agents can work in parallel to human developers and with each other. In our example we see that the coding agent creates a new branch for its changes, and creates a PR which it starts working on as it ticks off the various tasks generated in our spec. One huge positive of the coding agent that sets it apart from other similar solutions is the transparency in decision-making and actions taken. The monitoring and observability built directly into the feature means that the agent’s “thinking” is easily visible: the iterations and steps being taken can be viewed in full sequence in the Agents tab. Furthermore, the action that the agent is running is also transparently available to view in the Actions tab, meaning problems can be assessed very quickly. Once the coding agent is finished, it has run the required tests and, even in the case of a UI change, goes as far as calling the Playwright MCP server and screenshotting the change to showcase in the PR. We are then asked to review the change. In this demo, I also created a GitHub Action that is triggered when a PR review is requested: it creates the required resources in Azure and surfaces the (in this case) Azure Container Apps revision URL, making it even smoother for the human in the loop to evaluate the changes. Just like any normal PR, if changes are required comments can be left; when they are, the coding agent can pick them up and action what is needed. It’s also worth noting that for any manual intervention here, use of GitHub Codespaces would work very well to make minor changes or perform testing on an agent’s branch. We can even see the unit tests that have been specified in our spec how been executed by our coding agent. The pattern used here (Spec Kit -> coding agent) overcomes one of the biggest challenges we see with the coding agent. Unlike an IDE-based coding agent, the GitHub.com coding agent is left to its own iterations and implementation without input until the PR review. This can lead to subpar performance, especially compared to IDE agents which have constant input and interruption. The concise and considered breakdown generated from Spec Kit provides the structure and foundation for the agent to execute on; very little is left to interpretation for the coding agent. Step 3: GitHub Code Quality Review (Human in the loop with agent assistance.) GitHub Code Quality is a feature (currently in preview) that proactively identifies code quality risks and opportunities for enhancement both in PRs and through repository scans. These are surfaced within a PR and also in repo-level scoreboards. This means that PRs can now extend existing static code analysis: Copilot can action CodeQL, PMD, and ESLint scanning on top of the new, in-context code quality findings and autofixes. Furthermore, we receive a summary of the actual changes made. This can be used to assist the human in the loop in understanding what changes have been made and whether enhancements or improvements are required. Thinking about this in the context of review coverage, one of the challenges sometimes in already-lean development teams is the time to give proper credence to PRs. Now, with AI-assisted quality scanning, we can be more confident in our overall evaluation and test coverage. I would expect that use of these tools alongside existing human review processes would increase repository code quality and reduce uncaught errors. The data points support this too. The Qodo 2025 AI Code Quality report showed that usage of AI code reviews increased quality improvements to 81% (from 55%). A similar study from Atlassian RovoDev 2026 study showed that 38.7% of comments left by AI agents in code reviews lead to additional code fixes. LLM’s in their current form are never going to achieve 100% accuracy however these are still considerable, significant gains in one of the most important (and often neglected) parts of the SDLC. With a significant number of software supply chain attacks recently it is also not a stretch to imagine that that many projects could benefit from "independently" (use this term loosely) reviewed and summarised PR's and commits. This in the future could potentially by a specialist/sub agent during a PR or merge to focus on identifying malicious code that may be hidden within otherwise normal contributions, case in point being the "near-miss" XZ Utils attack. Step 4: GitHub Actions for build and deploy - No agents here, just deterministic automation. This step will be our briefest, as the idea of CI/CD and automation needs no introduction. It is worth noting that while I am sure there are additional opportunities for using agents within a build and deploy pipeline, I have not investigated them. I often speak with customers about deterministic and non-deterministic business process automation, and the importance of distinguishing between the two. Some processes were created to be deterministic because that is all that was available at the time; the number of conditions required to deal with N possible flows just did not scale. However, now those processes can be non-deterministic. Good examples include IVR decision trees in customer service or hard-coded sales routines to retain a customer regardless of context; these would benefit from less determinism in their execution. However, some processes remain best as deterministic flows: financial transactions, policy engines, document ingestion. While all these flows may be part of an AI solution in the future (possibly as a tool an agent calls, or as part of a larger agent-based orchestration), the processes themselves are deterministic for a reason. Just because we could have dynamic decision-making doesn’t mean we should. Infrastructure deployment and CI/CD pipelines are one good example of this, in my opinion. We could have an agent decide what service best fits our codebase and which region we should deploy to, but do we really want to, and do the benefits outweigh the potential negatives? In this process flow we use a deterministic GitHub action to deploy our weather application into our “development” environment and then promote through the environments until we reach production and we want to now ensure that the application is running smoothly. We also use an action as mentioned above to deploy and surface our agents changes. In Azure Container Apps we can do this in a secure sandbox environment called a “Dynamic Session” to ensure strong isolation of what is essentially “untrusted code”. Often enterprises can view the building and development of AI applications as something that requires a completely new process to take to production, while certain additional processes are new, evaluation, model deployment etc many of our traditional SDLC principles are just as relevant as ever before, CI/CD pipelines being a great example of that. Checked in code that is predictably deployed alongside required services to run tests or promote through environments. Whether you are deploying a java calculator app or a multi agent customer service bot, CI/CD even in this new world is a non-negotiable. We can see that our geolocation feature is running on our Azure Container Apps revision and we can begin to evaluate if we agree with CoPilot that all the feature requirements have been met. In this case they have. If they hadn't we'd just jump into the PR and add a new comment with "@copilot" requesting our changes. Step 5: SRE Agent - Proactive agentic day two operations. The SRE agent service on Azure is an operations-focused agent that continuously watches a running service using telemetry such as logs, metrics, and traces. When it detects incidents or reliability risks, it can investigate signals, correlate likely causes, and propose or initiate response actions such as opening issues, creating runbook-guided fixes, or escalating to an on-call engineer. It effectively automates parts of day two operations while keeping humans in control of approval and remediation. It can be run in two different permission models: one with a reader role that can temporarily take user permissions for approved actions when identified. The other model is a privileged level that allows it to autonomously take approved actions on resources and resource types within the resource groups it is monitoring. In our example, our SRE agent could take actions to ensure our container app runs as intended: restarting pods, changing traffic allocations, and alerting for secret expiry. The SRE agent can also perform detailed debugging to save human SREs time, summarising the issue, fixes tried so far, and narrowing down potential root causes to reduce time to resolution, even across the most complex issues. My initial concern with these types of autonomous fixes (be it VPA on Kubernetes or an SRE agent across your infrastructure) is always that they can very quickly mask problems, or become an anti-pattern where you have drift between your IaC and what is actually running in Azure. One of my favourite features of SRE agents is sub-agents. Sub-agents can be created to handle very specific tasks that the primary SRE agent can leverage. Examples include alerting, report generation, and potentially other third-party integrations or tooling that require a more concise context. In my example, I created a GitHub sub-agent to be called by the primary agent after every issue that is resolved. When called, the GitHub sub-agent creates an issue summarising the origin, context, and resolution. This really brings us full circle. We can then potentially assign this to our coding agent to implement the fix before we proceed with the rest of the cycle; for example, a change where a port is incorrect in some Bicep, or min scale has been adjusted because of latency observed by the SRE agent. These are quick fixes that can be easily implemented by a coding agent, subsequently creating an autonomous feedback loop with human review. Conclusion: The journey through this AI-led SDLC demonstrates that it is possible, with today’s tooling, to improve any existing SDLC with AI assistance, evolving from simply using a chat interface in an IDE. By combining Speckit, spec-driven development, autonomous coding agents, AI-augmented quality checks, deterministic CI/CD pipelines, and proactive SRE agents, we see an emerging ecosystem where human creativity and oversight guide an increasingly capable fleet of collaborative agents. As with all AI solutions we design today, I remind myself that “this is as bad as it gets”. If the last two years are anything to go by, the rate of change in this space means this article may look very different in 12 months. I imagine Spec-to-issue will no longer be required as a bridge, as native solutions evolve to make this process even smoother. There are also some areas of an AI-led SDLC that are not included in this post, things like reviewing the inner-loop process or the use of existing enterprise patterns and blueprints. I also did not review use of third-party plugins or tools available through GitHub. These would make for an interesting expansion of the demo. We also did not look at the creation of custom coding agents, which could be hosted in Microsoft Foundry; this is especially pertinent with the recent announcement of Anthropic models now being available to deploy in Foundry. Does today’s tooling mean that developers, QAs, and engineers are no longer required? Absolutely not (and if I am honest, I can’t see that changing any time soon). However, it is evidently clear that in the next 12 months, enterprises who reshape their SDLC (and any other business process) to become one augmented by agents will innovate faster, learn faster, and deliver faster, leaving organisations who resist this shift struggling to keep up.21KViews9likes2CommentsGive your Foundry Agent Custom Tools with MCP Servers on Azure Functions
This blog post is for developers who have an MCP server deployed to Azure Functions and want to connect it to Microsoft Foundry agents. It walks through why you'd want to do this, the different authentication options available, and how to get your agent calling your MCP tools. Connect your MCP server on Azure Functions to Foundry Agent If you've been following along with this blog series, you know that Azure Functions is a great place to host remote MCP servers. You get scalable infrastructure, built-in auth, and serverless billing. All the good stuff. But hosting an MCP server is only half the picture. The real value comes when something actually uses those tools. Microsoft Foundry lets you build AI agents that can reason, plan, and take actions. By connecting your MCP server to an agent, you're giving it access to your custom tools, whether that's querying a database, calling an API, or running some business logic. The agent discovers your tools, decides when to call them, and uses the results to respond to the user. Why connect MCP servers to Foundry agents? You might already have an MCP server that works great with VS Code, VS, Cursor, or other MCP clients. Connecting that same server to a Foundry agent means you can reuse those tools in a completely different context, i.e. in an enterprise AI agent that your team or customers interact with. No need to rebuild anything. Your MCP server stays the same; you're just adding another consumer. Prerequisites Before proceeding, make sure you have the following: 1. An MCP server deployed to Azure Functions. If you don't have one yet, you can deploy one quickly by following one of the samples: Python TypeScript .NET 2. A Foundry project with a deployed model and a Foundry agent Authentication options Depending on where you are in development, you can pick what makes sense and upgrade later. Here's a summary: Method Description When to use Key-based (default) Agent authenticates by passing a shared function access key in the request header. This method is the default authentication for HTTP endpoints in Functions. Development, or when Entra auth isn't required. Microsoft Entra Agent authenticates using either its own identity (agent identity) or the shared identity of the Foundry project (project managed identity). Use agent identity for production scenarios, but limit shared identity to development. OAuth identity passthrough Agent prompts users to sign in and authorize access, using the provided token to authenticate. Production, when each user must authenticate individually. Unauthenticated Agent makes unauthenticated calls. Development only, or tools that access only public information. Connect your MCP server to your Foundry agent If your server uses key-based auth or is unauthenticated, it should be relatively straightforward to set up the connection from a Foundry agent. The Microsoft Entra and OAuth identity passthrough are options that require extra steps to set up. Check out detailed step-by-step instructions for each authentication method. At a high level, the process looks like this: Enable built-in MCP authentication : When you deploy a server to Azure Functions, key-based auth is the default. You'll need to disable that and enable built-in MCP auth instead. If you deployed one of the sample servers in the Prerequisite section, this step is already done for you. Get your MCP server endpoint URL: For MCP extension-based servers, it's https://<FUNCTION_APP_NAME>.azurewebsites.net/runtime/webhooks/mcp Get your credentials based on your chosen auth method: a managed identity configuration, OAuth credentials Add the MCP server as a tool in the Foundry portal by navigating to your agent, adding a new MCP tool, and providing the endpoint and credentials. Microsoft Entra connection required fields OAuth Identity required fields Once the server is configured as a tool, test it in the Agent Builder playground by sending a prompt that triggers one of your MCP tools. Closing thoughts What I find exciting about this is the composability. You build your MCP server once and it works everywhere: VS Code, VS, Cursor, ChatGPT, and now Foundry agents. The MCP protocol is becoming the universal interface for tool use in AI, and Azure Functions makes it easy to host these servers at scale and with security. Are you building agents with Foundry? Have you connected your MCP servers to other clients? I'd love to hear what tools you're exposing and how you're using them. Share with us your thoughts! What's next In the next blog post, we'll go deeper into other MCP topics and cover new MCP features and developments in Azure Functions. Stay tuned!430Views0likes0CommentsMCP Apps on Azure Functions: Quick Start with TypeScript
Azure Functions makes hosting MCP apps simple: build locally, create a secure endpoint, and deploy fast with Azure Developer CLI (azd). This guide shows you how using a weather app example. What Are MCP Apps? MCP Apps let MCP servers return interactive HTML interfaces such as data visualizations, forms, dashboards that render directly inside MCP-compatible hosts (Visual Studio Code Copilot, Claude, ChatGPT, etc.). Learn more about MCP Apps in the official documentation. Having an interactive UI removes many restrictions that plain texts have, such as if your scenario has: Interactive Data: Replacing lists with clickable maps or charts for deep exploration. Complex Setup: Use one-page forms instead of long, back-and-forth questioning. Rich Media: Embed native viewers to pan, zoom, or rotate 3D models and documents. Live Updates: Maintain real-time dashboards that refresh without new prompts. Workflow Management: Handle multi-step tasks like approvals with navigation buttons and persistent state. MCP App Hosting as a Feature Azure Functions provides an easy abstraction to help you build MCP servers without having to learn the nitty-gritty of the MCP protocol. When hosting your MCP App on Functions, you get: MCP tools (server logic): Handle client requests, call backend services, return structured data - Azure Functions manages the MCP protocol details for you MCP resources (UI payloads such as app widgets): Serve interactive HTML, JSON documents, or formatted content - just focus on your UI logic Secure HTTPS access: Built-in authentication using Azure Functions keys, plus built-in MCP authentication with OAuth support for enterprise-grade security Easy deployment with Bicep and azd: Infrastructure as Code for reliable deployments Local development: Test and debug locally before deploying Auto-scaling: Azure Functions handles scaling, retries, and monitoring automatically The weather app in this repo is an example of this feature, not the only use case. Architecture Overview Example: The classic Weather App The sample implementation includes: A GetWeather MCP tool that fetches weather by location (calls Open-Meteo geocoding and forecast APIs) A Weather Widget MCP resource that serves interactive HTML/JS code (runs in the client; fetches data via GetWeather tool) A TypeScript service layer that abstracts API calls and data transformation (runs on the server) Bidirectional communication: client-side UI calls server-side tools, receives data, renders locally Local and remote testing flow for MCP clients (via MCP Inspector, VS Code, or custom clients) How UI Rendering Works in MCP Apps In the Weather App example: Azure Functions serves getWeatherWidget as a resource → returns weather-app.ts compiled to HTML/JS Client renders the Weather Widget UI User interacts with the widget or requests are made internally The widget calls the getWeather tool → server processes and returns weather data The widget renders the weather data on the client side This architecture keeps the UI responsive locally while using server-side logic and data on demand. Quick Start Checkout repository: https://github.com/Azure-Samples/remote-mcp-functions-typescript Run locally: npm install npm run build func start Local endpoint: http://0.0.0.0:7071/runtime/webhooks/mcp Deploy to Azure: azd provision azd deploy Remote endpoint: https://.azurewebsites.net/runtime/webhooks/mcp TypeScript MCP Tools Snippet (Get Weather service) In Azure Functions, you define MCP tools using app.mcpTool(). The toolName and description tell clients what this tool does, toolProperties defines the input arguments (like location as a string), and handler points to your function that processes the request. app.mcpTool("getWeather", { toolName: "GetWeather", description: "Returns current weather for a location via Open-Meteo.", toolProperties: { location: arg.string().describe("City name to check weather for") }, handler: getWeather, }); Resource Trigger Snippet (Weather App Hook) MCP resources are defined using app.mcpResource(). The uri is how clients reference this resource, resourceName and description provide metadata, mimeType tells clients what type of content to expect, and handler is your function that returns the actual content (like HTML for a widget). app.mcpResource("getWeatherWidget", { uri: "ui://weather/index.html", resourceName: "Weather Widget", description: "Interactive weather display for MCP Apps", mimeType: "text/html;profile=mcp-app", handler: getWeatherWidget, }); Sample repos and references Complete sample repository with TypeScript implementation: https://github.com/Azure-Samples/remote-mcp-functions-typescript Official MCP extension documentation: https://learn.microsoft.com/azure/azure-functions/functions-bindings-mcp?pivots=programming-language-typescript Java sample: https://github.com/Azure-Samples/remote-mcp-functions-java .NET sample: https://github.com/Azure-Samples/remote-mcp-functions-dotnet Python sample: https://github.com/Azure-Samples/remote-mcp-functions-python MCP Inspector: https://github.com/modelcontextprotocol/inspector Final Takeaway MCP Apps are just MCP servers but they represent a paradigm shift by transforming the AI from a text-based chatbot into a functional interface. Instead of forcing users to navigate complex tasks through back-and-forth conversations, these apps embed interactive UIs and tools directly into the chat, significantly improving the user experience and the usefulness of MCP servers. Azure Functions allows developers to quickly build and host an MCP app by providing an easy abstraction and deployment experience. The platform also provides built-in features to secure and scale your MCP apps, plus a serverless pricing model so you can just focus on the business logic.364Views1like0CommentsUnit Testing Helm Charts with Terratest: A Pattern Guide for Type-Safe Validation
Helm charts are the de facto standard for packaging Kubernetes applications. But here's a question worth asking: how do you know your chart actually produces the manifests you expect, across every environment, before it reaches a cluster? If you're like most teams, the answer is some combination of helm template eyeball checks, catching issues in staging, or hoping for the best. That's slow, error-prone, and doesn't scale. In this post, we'll walk through a better way: a render-and-assert approach to unit testing Helm charts using Terratest and Go. The result? Type-safe, automated tests that run locally in seconds with no cluster required. The Problem Let's start with why this matters. Helm charts are templates that produce YAML, and templates have logic: conditionals, loops, value overrides per environment. That logic can break silently: A values-prod.yaml override points to the wrong container registry A security context gets removed during a refactor and nobody notices An ingress host is correct in dev but wrong in production HPA scaling bounds are accidentally swapped between environments Label selectors drift out of alignment with pod templates, causing orphaned ReplicaSets These aren't hypothetical scenarios. They're real bugs that slip through helm lint and code review because those tools don't understand what your chart should produce. They only check whether the YAML is syntactically valid. These bugs surface at deploy time, or worse, in production. So how do we catch them earlier? The Approach: Render and Assert The idea is straightforward. Instead of deploying to a cluster to see if things work, we render the chart locally and validate the output programmatically. Here's the three-step model: Render: Terratest calls helm template with your base values.yaml + an environment-specific values-<env>.yaml override Unmarshal: The rendered YAML is deserialized into real Kubernetes API structs (appsV1.Deployment, coreV1.ConfigMap, networkingV1.Ingress, etc.) Assert: Testify assertions validate every field that matters, including names, labels, security context, probes, resource limits, ingress routing, and more No cluster. No mocks. No flaky integration tests. Just fast, deterministic validation of your chart's output. Here's what that looks like in practice: // Arrange options := &helm.Options{ ValuesFiles: s.valuesFiles, } output := helm.RenderTemplate(s.T(), options, s.chartPath, s.releaseName, s.templates) // Act var deployment appsV1.Deployment helm.UnmarshalK8SYaml(s.T(), output, &deployment) // Assert: security context is hardened secCtx := deployment.Spec.Template.Spec.Containers[0].SecurityContext require.Equal(s.T(), int64(1000), *secCtx.RunAsUser) require.True(s.T(), *secCtx.RunAsNonRoot) require.True(s.T(), *secCtx.ReadOnlyRootFilesystem) require.False(s.T(), *secCtx.AllowPrivilegeEscalation) Notice something important here: because you're working with real Go structs, the compiler catches schema errors. If you typo a field path like secCtx.RunAsUsr, the code won't compile. With YAML-based assertion tools, that same typo would fail silently at runtime. This type safety is a big deal when you're validating complex resources like Deployments. What to Test: 16 Patterns Across 6 Categories That covers the how. But what should you actually assert? Through applying this approach across multiple charts, we've identified 16 test patterns that consistently catch real bugs. They fall into six categories: Category What Gets Validated Identity & Labels Resource names, 5 standard Helm/K8s labels, selector alignment Configuration Environment-specific configmap data, env var injection Container Image registry per env, ports, resource requests/limits Security Non-root user, read-only FS, dropped capabilities, AppArmor, seccomp, SA token automount Reliability Startup/liveness/readiness probes, volume mounts Networking & Scaling Ingress hosts/TLS per env, service port wiring, HPA bounds per env You don't need all 16 on day one. Start with resource name and label validation, since those apply to every resource and catch the most common _helpers.tpl bugs. Then add security and environment-specific patterns as your coverage grows. Now, let's look at how to structure these tests to handle the trickiest part: multiple environments. Multi-Environment Testing One of the most common Helm chart bugs is environment drift, where values that are correct in dev are wrong in production. A single test suite that only validates one set of values will miss these entirely. The solution is to maintain separate test suites per environment: tests/unit/my-chart/ ├── dev/ ← Asserts against values.yaml + values-dev.yaml ├── test/ ← Asserts against values.yaml + values-test.yaml └── prod/ ← Asserts against values.yaml + values-prod.yaml Each environment's tests assert the merged result of values.yaml + values-<env>.yaml. So when your values-prod.yaml overrides the container registry to prod.azurecr.io, the prod tests verify exactly that, while the dev tests verify dev.azurecr.io. This structure catches a class of bugs that no other approach does: "it works in dev" issues where an environment-specific override has a typo, a missing field, or an outdated value. But environment-specific configuration isn't the only thing worth testing per commit. Let's talk about security. Security as Code Security controls in Kubernetes manifests are notoriously easy to weaken by accident. Someone refactors a deployment template, removes a securityContext block they think is unused, and suddenly your containers are running as root in production. No linter catches this. No code reviewer is going to diff every field of a rendered manifest. With this approach, you encode your security posture directly into your test suite. Every deployment test should validate: Container runs as non-root (UID 1000) Root filesystem is read-only All Linux capabilities are dropped Privilege escalation is blocked AppArmor profile is set to runtime/default Seccomp profile is set to RuntimeDefault Service account token automount is disabled If someone removes a security control during a refactor, the test fails immediately, not after a security review weeks later. Security becomes a CI gate, not a review checklist. With patterns and environments covered, the next question is: how do you wire this into your CI/CD pipeline? CI/CD Integration with Azure DevOps These tests integrate naturally into Azure DevOps pipelines. Since they're just Go tests that call helm template under the hood, all you need is a Helm CLI and a Go runtime on your build agent. A typical multi-stage pipeline looks like: stages: - stage: Build # Package the Helm chart - stage: Dev # Lint + test against values-dev.yaml - stage: Test # Lint + test against values-test.yaml - stage: Production # Lint + test against values-prod.yaml Each stage uses a shared template that installs Helm and Go, extracts the packaged chart, runs helm lint, and executes the Go tests with gotestsum. Environment gates ensure production tests pass before deployment proceeds. Here's the key part of a reusable test template: - script: | export PATH=$PATH:/usr/local/go/bin:$(go env GOPATH)/bin go install gotest.tools/gotestsum@latest cd $(Pipeline.Workspace)/helm.artifact/tests/unit gotestsum --format testname --junitfile $(Agent.TempDirectory)/test-results.xml \ -- ./${{ parameters.helmTestPath }}/... -count=1 -timeout 50m displayName: 'Test helm chart' env: HELM_RELEASE_NAME: ${{ parameters.helmReleaseName }} HELM_VALUES_FILE_OVERRIDE: ${{ parameters.helmValuesFileOverride }} - task: PublishTestResults@2 displayName: 'Publish test results' inputs: testResultsFormat: 'JUnit' testResultsFiles: '$(Agent.TempDirectory)/test-results.xml' condition: always() The PublishTestResults@2 task makes pass/fail results visible on the build's Tests tab, showing individual test names, durations, and failure details. The condition: always() ensures results are published even when tests fail, so you always have visibility. At this point you might be wondering: why Go and Terratest? Why not a simpler YAML-based tool? Why Terratest + Go Instead of helm-unittest? helm-unittest is a popular YAML-based alternative, and it's a fair question. Both tools are valid. Here's why we landed on Terratest: Terratest + Go helm-unittest (YAML) Type safety Renders into real K8s API structs; compiler catches schema errors String matching on raw YAML; typos in field paths fail silently Language features Loops, conditionals, shared setup, table-driven tests Limited to YAML assertion DSL Debugging Standard Go debugger, stack traces YAML diff output only Ecosystem alignment Same language as Terraform tests, one testing stack Separate tool, YAML-only The type safety argument is the strongest. When you unmarshal into appsV1.Deployment, the Go compiler guarantees your assertions reference real fields. With helm-unittest, a YAML path like spec.template.spec.containers[0].securityContest (note the typo) would silently pass because it matches nothing, rather than failing loudly. That said, if your team has no Go experience and needs the lowest adoption barrier, helm-unittest is a reasonable starting point. For teams already using Go or Terraform, Terratest is the stronger long-term choice. Getting Started Ready to try this? Here's a minimal project structure to get you going: your-repo/ ├── charts/ │ └── your-chart/ │ ├── Chart.yaml │ ├── values.yaml │ ├── values-dev.yaml │ ├── values-test.yaml │ ├── values-prod.yaml │ └── templates/ ├── tests/ │ └── unit/ │ ├── go.mod │ └── your-chart/ │ ├── dev/ │ ├── test/ │ └── prod/ └── Makefile Prerequisites: Go 1.22+, Helm 3.14+ You'll need three Go module dependencies: github.com/gruntwork-io/terratest v0.46.16 github.com/stretchr/testify v1.8.4 k8s.io/api v0.28.4 Initialize your test module, write your first test using the patterns above, and run: cd tests/unit HELM_RELEASE_NAME=your-chart \ HELM_VALUES_FILE_OVERRIDE=values-dev.yaml \ go test -v ./your-chart/dev/... -timeout 30m Start with a ConfigMap test. It's the simplest resource type and lets you validate the full render-unmarshal-assert flow before tackling Deployments. Once that passes, work your way through the pattern categories, adding security and environment-specific assertions as you go. Wrapping Up Unit testing Helm charts with Terratest gives you something that helm lint and manual review can't: Type-safe validation: The compiler catches schema errors, not production Environment-specific coverage: Each environment's values are tested independently Security as code: Security controls are verified on every commit, not in periodic reviews Fast feedback: Tests run in seconds with no cluster required CI/CD integration: JUnit results published natively to Azure DevOps The patterns we've covered here are the ones that have caught the most real bugs for us. Start small with resource names and labels, and expand from there. The investment is modest, and the first time a test catches a broken values-prod.yaml override before it reaches production, it'll pay for itself. We'd Love Your Feedback We'd love to hear how this approach works for your team: Which patterns were most useful for your charts? What resource types or patterns are missing? How did the adoption experience go? Drop a comment below. Happy to dig into any of these topics further!277Views0likes0CommentsShared Agent Context: How We Are Tackling Partner Agent Collaboration
Your Azure SRE agent detects a spike in error rates. It triages with cloud-native telemetry, but the root cause trail leads into a third-party observability platform your team also runs. The agent can't see that data. A second agent can, one that speaks Datadog or Dynatrace or whatever your team chose. The two agents talk to each other using protocols like MCP or directly via an API endpoint and come up with a remediation. The harder question is what happens to the conversation afterward. TL;DR Two AI agents collaborate on incidents using two communication paths: a direct real-time channel (MCP) for fast investigation, and a shared memory layer that writes to systems your team already uses, like PagerDuty, GitHub Issues, or ServiceNow. No new tools to adopt. No ephemeral conversations that vanish when the incident closes. The problem Most operational AI agents work in isolation. Your cloud monitoring agent doesn't have access to your third-party observability stack. Your Datadog specialist doesn't know what your Azure resource topology looks like. When an incident spans both, a human has to bridge the gap manually. At 2 AM. With half the context missing. And even when two agents do exchange information directly, the conversation is ephemeral. The investigation ends, the findings disappear. The next on-call engineer sees a resolved alert with no record of what was tried, what was found, or why the remediation worked. The next agent that hits the same pattern starts over from scratch. What we needed was somewhere for both agents to persist their findings, somewhere humans could see it too. And we really didn't want to force teams onto a new system just to get there. Two communication paths Direct agent-to-agent (real-time) During an active investigation, the primary agent calls the partner agent directly. The partner runs whatever domain-specific analysis it's good at (log searches, span analysis, custom metric queries) and returns findings in real time. This is the fast path. The direct channel uses MCP, so any partner agent can plug in without custom integration work. The primary agent doesn't need to understand the internals of Datadog or Dynatrace. It asks questions, gets answers. Shared memory (durable) After the direct exchange, both agents write their actions and findings to external systems that humans already use. This is the durable path, the one that creates audit trails and makes handoffs work. The shared memory backends are systems your team already has open during an incident: Backend What gets written Good fit for Incident platform (e.g., PagerDuty) Timeline notes, on-call handoff context Teams with alerting-centric workflows Issue tracker (e.g., GitHub Issues) Code-level findings, root cause analysis, action comments Teams with dev workflow integration ITSM system (e.g., ServiceNow) Work notes, ITSM-compliant audit trail Enterprise IT, regulated industries The important thing: this doesn't require a new system. Agents write to whatever your team already uses. How it works Step Actor What happens Path 1 Alert source Monitoring fires an alert — 2 Primary agent Receives alert, triages, starts investigating with native tools Internal 3 Primary agent Calls partner agent for domain-specific analysis (third-party logs, spans) Direct via MCP or API 4 Partner agent Runs analysis, returns findings in real time Direct via MCP or API 5 Primary agent Correlates partner findings with native data, runs remediation Internal 6 Both agents Write findings, actions, and resolution to external systems Shared memory via existing sources 7 Agent or human Verifies resolution, closes incident Shared memory via existing sources Steps 3 through 5 happen in real time over the direct channel. Nothing gets written to shared memory until the investigation has actual results. The investigation is fast; the record-keeping is thorough. Who does what In this system the primary agent owns the full incident lifecycle: detection, triage, investigation, remediation, closure. The partner agent gets called when the primary agent needs to see into a part of the stack it can't access natively. It does the specialized deep-dive, returns what it found, and the primary agent takes it from there. Both agents write to shared memory and the primary agent acts on the proposed next steps. Primary agent Partner agent Communication Calls partner directly; writes to shared memory after Responds to calls; writes enrichment to shared memory Scope Full lifecycle Domain-specific deep-dive Tools Cloud-native monitoring, CLI, runbooks, issue trackers Third-party observability APIs Typical share ~80% of investigation + all remediation ~20%, specialized enrichment Why shared context should live where humans already work If your agent writes its findings to a system nobody checks, you've built a very expensive diary. Write them to a GitHub Issue, a ServiceNow ticket, a Jira epic, or whatever your team actually monitors, and the dynamics change: humans can participate without changing their workflow. Your team already watches these systems. When an agent posts its reasoning and pending decisions to a place engineers already check, anyone can review or correct it using the tools they know. Comments, reactions, status updates. No custom approval UI. The collaboration features built into your workflow tool become the oversight mechanism for free. That persistence pays off in a second way. Every entry the agent writes is a record that future runs can search. Instead of context that disappears when a conversation ends, you accumulate operational history. How was this incident type handled last time? What did the agent try? What did the human override? That history is retrievable by both people and agents through the same interface, without spinning up a separate vector database. You could build a dedicated agent database for all this. But nobody will look at it. Teams already have notifications, permissions, and audit trails configured in their existing tools. A purpose-built system means a new UI to learn, new permissions to manage, and one more thing competing for attention. Store context where people already look and you skip all of that. The best agent memory is the one your team is already reading. Design principles A few opinions that came out of watching real incidents: Investigate first, persist second. The primary agent calls the partner directly for real-time analysis. Both agents write to shared memory only after findings are collected. Investigation speed should never be bottlenecked by writes to external systems. Humans see everything through shared context. The direct path is agent-to-agent only, but the shared context layer is where humans can see the full picture and step in. Agents don't bypass human visibility. Append-only. Both agents' writes are additive. No overwrites, no deletions. You can always reconstruct the full history of an investigation. Backend-agnostic. Swapping PagerDuty for ServiceNow, or adding GitHub Issues alongside either one, is a connector config change. What this actually gets you The practical upside is pretty simple: investigations aren't waiting on writes to external systems, nothing is lost when the conversation ends, and the next on-call engineer picks up where the last one left off instead of starting over. Every action from both agents shows up in the systems humans already look at. Adding a new partner agent or a new shared memory backend is a connector change. The architecture doesn't care which specific tools your team chose. The fast path is for investigation. The durable path is for everything else.361Views0likes0CommentsTake Control of Every Message: Partial Failure Handling for Service Bus Triggers in Azure Functions
The Problem: All-or-Nothing Batch Processing in Azure Service Bus Azure Service Bus is one of the most widely used messaging services for building event-driven applications on Azure. When you use Azure Functions with a Service Bus trigger in batch mode, your function receives multiple messages at once for efficient, high-throughput processing. But what happens when one message in the batch fails? Your function receives a batch of 50 Service Bus messages. 49 process perfectly. 1 fails. What happens? In the default model, the entire batch fails. All 50 messages go back on the queue and get reprocessed, including the 49 that already succeeded. This leads to: Duplicate processing — messages that were already handled successfully get processed again Wasted compute — you pay for re-executing work that already completed Infinite retry loops — if that one "poison" message keeps failing, it blocks the entire batch indefinitely Idempotency burden — your downstream systems must handle duplicates gracefully, adding complexity to every consumer This is the classic all-or-nothing batch failure problem. Azure Functions solves it with per-message settlement. The Solution: Per-Message Settlement for Azure Service Bus Azure Functions gives you direct control over how each individual message is settled in real time, as you process it. Instead of treating the batch as all-or-nothing, you settle each message independently based on its processing outcome. With Service Bus message settlement actions in Azure Functions, you can: Action What It Does Complete Remove the message from the queue (successfully processed) Abandon Release the lock so the message returns to the queue for retry, optionally modifying application properties Dead-letter Move the message to the dead-letter queue (poison message handling) Defer Keep the message in the queue but make it only retrievable by sequence number This means in a batch of 50 messages, you can: Complete 47 that processed successfully Abandon 2 that hit a transient error (with updated retry metadata) Dead-letter 1 that is malformed and will never succeed All in a single function invocation. No reprocessing of successful messages. No building failure response objects. No all-or-nothing. Why This Matters 1. Eliminates Duplicate Processing When you complete messages individually, successfully processed messages are immediately removed from the queue. There's no chance of them being redelivered, even if other messages in the same batch fail. 2. Enables Granular Error Handling Different failures deserve different treatments. A malformed message should be dead-lettered immediately. A message that failed due to a transient database timeout should be abandoned for retry. A message that requires manual intervention should be deferred. Per-message settlement gives you this granularity. 3. Implements Exponential Backoff Without External Infrastructure By combining abandon with modified application properties, you can track retry counts per message and implement exponential backoff patterns directly in your function code, no additional queues or Durable Functions required. 4. Reduces Cost You stop paying for redundant re-execution of already-successful work. In high-throughput systems processing millions of messages, this can be a material cost reduction. 5. Simplifies Idempotency Requirements When successful messages are never redelivered, your downstream systems don't need to guard against duplicates as aggressively. This reduces architectural complexity and potential for bugs. Before: One Message = One Function Invocation Before batch support, there was no cardinality option, Azure Functions processed each Service Bus message as a separate function invocation. If your queue had 50 messages, the runtime spun up 50 individual executions. Single-Message Processing (The Old Way) import { app, InvocationContext } from '@azure/functions'; async function processOrder( message: unknown, // ← One message at a time, no batch context: InvocationContext ): Promise<void> { try { const order = message as Order; await processOrder(order); } catch (error) { context.error('Failed to process message:', error); // Message auto-complete by default. throw error; } } app.serviceBusQueue('processOrder', { connection: 'ServiceBusConnection', queueName: 'orders-queue', handler: processOrder, }); What this cost you: 50 messages on the queue Old (single-message) New (batch + settlement) Function invocations 50 separate invocations 1 invocation Connection overhead 50 separate DB/API connections 1 connection, reused across batch Compute cost 50× invocation overhead 1× invocation overhead Settlement control Binary: throw or don't 4 actions per message Every message paid the full price of a function invocation, startup, connection setup, teardown. At scale (millions of messages/day), this was a significant cost and latency penalty. And when a message failed, your only option was to throw (retry the whole message) or swallow the error (lose it silently). Code Examples Let's see how this looks across all three major Azure Functions language stacks. Node.js (TypeScript with @ azure/functions-extensions-servicebus) import '@azure/functions-extensions-servicebus'; import { app, InvocationContext } from '@azure/functions'; import { ServiceBusMessageContext, messageBodyAsJson } from '@azure/functions-extensions-servicebus'; interface Order { id: string; product: string; amount: number; } export async function processOrderBatch( sbContext: ServiceBusMessageContext, context: InvocationContext ): Promise<void> { const { messages, actions } = sbContext; for (const message of messages) { try { const order = messageBodyAsJson<Order>(message); await processOrder(order); await actions.complete(message); // ✅ Done } catch (error) { context.error(`Failed ${message.messageId}:`, error); await actions.deadletter(message); // ☠️ Poison } } } app.serviceBusQueue('processOrderBatch', { connection: 'ServiceBusConnection', queueName: 'orders-queue', sdkBinding: true, autoCompleteMessages: false, cardinality: 'many', handler: processOrderBatch, }); Key points: Enable sdkBinding: true and autoCompleteMessages: false to gain manual settlement control ServiceBusMessageContext provides both the messages array and actions object Settlement actions: complete(), abandon(), deadletter(), defer() Application properties can be passed to abandon() for retry tracking Built-in helpers like messageBodyAsJson<T>() handle Buffer-to-object parsing Full sample: serviceBusSampleWithComplete Python (V2 Programming Model) import json import logging from typing import List import azure.functions as func import azurefunctions.extensions.bindings.servicebus as servicebus app = func.FunctionApp(http_auth_level=func.AuthLevel.FUNCTION) @app.service_bus_queue_trigger(arg_name="messages", queue_name="orders-queue", connection="SERVICEBUS_CONNECTION", auto_complete_messages=False, cardinality="many") def process_order_batch(messages: List[servicebus.ServiceBusReceivedMessage], message_actions: servicebus.ServiceBusMessageActions): for message in messages: try: order = json.loads(message.body) process_order(order) message_actions.complete(message) # ✅ Done except Exception as e: logging.error(f"Failed {message.message_id}: {e}") message_actions.dead_letter(message) # ☠️ Poison def process_order(order): logging.info(f"Processing order: {order['id']}") Key points: Uses azurefunctions.extensions.bindings.servicebus for SDK-type bindings with ServiceBusReceivedMessage Supports both queue and topic triggers with cardinality="many" for batch processing Each message exposes SDK properties like body, enqueued_time_utc, lock_token, message_id, and sequence_number Full sample: servicebus_samples_settlement .NET (C# Isolated Worker) using Azure.Messaging.ServiceBus; using Microsoft.Azure.Functions.Worker; public class ServiceBusBatchProcessor(ILogger<ServiceBusBatchProcessor> logger) { [Function(nameof(ProcessOrderBatch))] public async Task ProcessOrderBatch( [ServiceBusTrigger("orders-queue", Connection = "ServiceBusConnection")] ServiceBusReceivedMessage[] messages, ServiceBusMessageActions messageActions) { foreach (var message in messages) { try { var order = message.Body.ToObjectFromJson<Order>(); await ProcessOrder(order); await messageActions.CompleteMessageAsync(message); // ✅ Done } catch (Exception ex) { logger.LogError(ex, "Failed {MessageId}", message.MessageId); await messageActions.DeadLetterMessageAsync(message); // ☠️ Poison } } } private Task ProcessOrder(Order order) => Task.CompletedTask; } public record Order(string Id, string Product, decimal Amount); Key points: Inject ServiceBusMessageActions directly alongside the message array Each message is individually settled with CompleteMessageAsync, DeadLetterMessageAsync, or AbandonMessageAsync Application properties can be modified on abandon to track retry metadata Full sample: ServiceBusReceivedMessageFunctions.cs386Views3likes0CommentsGet started with NeuBird Hawkeye MCP server in Azure SRE Agent
Integrate NeuBird Hawkeye MCP with Azure SRE Agent TL;DR If your infrastructure spans multiple clouds say Azure and GCP, or Azure alongside any other cloud provider investigating incidents means jumping between completely separate consoles, log systems, and monitoring stacks. Azure SRE Agent now integrates with NeuBird Hawkeye via Model Context Protocol (MCP), so you can investigate incidents across all of your clouds and monitoring tools from a single conversation. Key benefits: 90-second investigations vs 3-4 hours of manual dashboard-hopping Multi-cloud support - Azure, GCP, and other cloud providers investigated from a single conversation 42 MCP tools across 7 categories for investigation, analysis, and remediation Real-time streaming progress - watch investigations unfold step-by-step (v2.0+) MTTR tracking and continuous improvement metrics The problem: incidents don't stay in one cloud When an alert fires at 3 AM, your on-call engineer doesn't just need to find the problem — they need to figure out which cloud it's in. A single incident can involve an Azure Function calling a GCP Cloud Run service, with logs split across Azure Monitor and GCP Cloud Logging. Here's what that looks like: Challenge Time Cost Correlate signals across multiple monitoring tools 30-45 minutes Query logs and metrics from multiple clouds 45-60 minutes Piece together the chain of events 30-45 minutes Identify root cause and develop fixes 60-90 minutes Total 3-4 hours Sound familiar? "Is it the database? The cache? The load balancer? Let me check the GCP console... now Azure Monitor... now the other logging stack... wait, what time zone is this in?" What NeuBird Hawkeye does NeuBird Hawkeye is an autonomous incident investigation platform that connects to your cloud providers and uses AI to: Core capabilities: Investigate alerts from your monitoring tools automatically Query multiple data sources across cloud providers and observability platforms Generate detailed RCAs with incident timelines Provide corrective actions with ready-to-execute scripts Learn from your architecture through customizable instructions Supported Integrations: Category Platforms Cloud Providers Azure, Google Cloud Platform, AWS Monitoring Tools Datadog, Grafana, Dynatrace, New Relic Incident Management PagerDuty, ServiceNow, FireHydrant, Incident.io Log Aggregation CloudWatch, Azure Monitor, Google Cloud Logging How the integration works With the new Hawkeye MCP server integration, Azure SRE Agent leverages Hawkeye's autonomous investigation capabilities through natural language conversation. What is Model Context Protocol (MCP)? Model Context Protocol is an open standard that enables AI agents to securely connect to external tools and data sources. It's like a universal adapter for connecting LLMs to the real world. How it connects: Your SRE Agent can now talk to Hawkeye, which talks to Azure, GCP, and other cloud platforms—all through a single conversational interface. 42 MCP tools across 7 categories Category Tool Count Examples Projects 5 List, create, update, delete projects Connections 10 Azure, GCP, and other cloud provider connections Investigations 11 List alerts, investigate, monitor progress, RCA, follow-ups Instructions 7 Create, test, validate, apply investigation rules Analytics 4 MTTR, time saved, quality scores, session reports Discovery 2 Explore resources across all project connections Help 1 Interactive guidance system Remote MCP server Azure SRE Agent connects to the hosted Hawkeye MCP server via Streamable-HTTP transport. No local installation or Node.js required. Mode Best For How It Works Streamable-HTTP All users, teams, enterprise Connect to hosted MCP endpoint at https://<your-deployment-name>.app.neubird.ai/mcp Supports secure authentication via email/password headers or bearer tokens (JWT). Example investigation walkthrough Here's an example investigation with Azure SRE Agent + Hawkeye: Step 1: List Uninvestigated Alerts User: Show me uninvestigated alerts from the last 24 hours Hawkeye response: Hawkeye Projects: - Production (UUID: b4b902d4-a540-466b-96cf-4ec2f734b010) Uninvestigated Alerts (Production): 1. [P1] High CPU usage on api-service pods - 3 instances affected 2. [P2] Database connection pool exhausted - payment-service 3. [P3] Increased error rate in user-authentication endpoint Step 2: Investigate an Alert User: Investigate the high CPU alert The agent calls hawkeye_investigate_alert and streams real-time progress (v2.0+): Preparing telemetry sources [2s] Loading investigation context [3s] Classifying datasources (Step 1) [18s] Fine-tuning datasources [26s] Consulting telemetry (Query 7) [1m 6s] Analyzing telemetry data (Analysis 12) [1m 13s] Summarizing results [1m 29s] Investigation completed! Here's what I found... Step 3: Get Root Cause Analysis User: Show me the root cause analysis Hawkeye provides a comprehensive RCA: Incident summary Field Value Severity P1 Duration 45 minutes Status Resolved Affected Services api-service (3 pods) Root cause A memory leak in the api-service v2.3.1 deployment caused garbage collection pressure, leading to CPU exhaustion. The leak originated from unclosed database connections in the /users/search endpoint. Timeline Time Event 14:00 UTC Deployment of api-service v2.3.1 completed 14:15 UTC Memory usage begins climbing (500MB 1.2GB) 14:30 UTC GC pressure increases, CPU spikes to 95%+ 14:35 UTC Alert fired: High CPU usage 14:45 UTC Auto-scaling triggered (3 6 pods) 14:50 UTC Investigation identified unclosed connections 14:55 UTC Rollback initiated to v2.3.0 Corrective actions Immediate Fix: # Rollback to previous stable version kubectl set image deployment/api-service api-service=api-service:v2.3.0 # Force restart to clear leaked connections kubectl rollout restart deployment/api-service Preventive measures: Add connection pool monitoring to your cloud monitoring dashboard Implement database connection lifecycle tests in CI pipeline Add memory leak detection to staging environment tests Time saved Metric Value Manual investigation estimate 3-4 hours Hawkeye investigation time 90 seconds Time saved ~3.5 hours Step 4: Follow-Up Questions The conversation doesn't stop at the RCA. You can ask intelligent follow-ups: User: Has this happened before? Agent: Checking investigation history... Yes, a similar pattern was detected 2 weeks ago in the same service. The previous incident was also caused by unclosed database connections, but in a different endpoint (/orders/list). Recommendation: Implement connection lifecycle testing across all endpoints. User: What tests should we add to catch this? Agent: Based on the RCA, I recommend: 1. Connection pool exhaustion test: - Simulate high request volume to database-heavy endpoints - Monitor for unclosed connections over 1000 requests 2. Memory leak detection: - Run load tests with memory profiling enabled - Alert if memory growth exceeds 20% over 10-minute baseline 3. Integration test for connection lifecycle: - Verify all database connections are properly closed - Check connection pool statistics after each test run Key capabilities 1. Multi-Cloud Investigation Hawkeye unifies investigation across all your connected infrastructure. Example scenario: An alert fires about an Azure Function returning errors. Hawkeye can: Query Azure Monitor for function execution metrics Check GCP Cloud Logging for upstream API errors on the GCP side Review GCP Cloud Monitoring metrics for the dependent Cloud Run service Correlate with recent deployments in GitHub Actions or Azure DevOps "Finally, one place to investigate instead of 7 browser tabs!" 2. Instruction Management Customize how Hawkeye investigates incidents by creating instructions: Instruction Type Purpose Example SYSTEM Provide architecture context "We use microservices on Kubernetes with PostgreSQL and Redis" FILTER Reduce investigation noise "Only investigate P1 and P2 incidents" RCA Guide investigation steps "For database issues, check slow queries and connection pools first" GROUPING Group related alerts "Group alerts from the same service within 5 minutes" Instruction testing workflow Before deploying instructions to production, test them on past investigations: Step Action Tool 1 Validate content hawkeye_validate_instruction 2 Apply to test session hawkeye_apply_session_instruction 3 Rerun investigation hawkeye_rerun_session 4 Compare RCAs Manual review 5 Measure improvement Check quality score 6 Deploy if better hawkeye_create_project_instruction Note: Test instruction changes on historical data before applying them to live investigations. No more "oops, that filter was too aggressive!" 3. Analytics and Continuous Improvement Track the effectiveness of your incident response process: Metric What It Measures MTTR Mean Time to Resolution Time Saved Efficiency gains vs manual investigation Quality Score Accuracy and completeness of RCAs Noise Reduction Percentage of duplicate/grouped alerts Use cases for analytics: Justify investment in SRE tooling to leadership Demonstrate continuous improvement over time Identify patterns in recurring incidents Measure impact of instruction changes 4. Proactive Investigation You don't need an alert to investigate. Create manual investigations for proactive analysis: User: Investigate potential memory leak in user-api pods. Memory usage increased from 500MB to 1.2GB between 8am-10am UTC today. Hawkeye will: Query metrics for the specified time range Correlate with deployment events Check for similar patterns in the past Provide root cause analysis and recommendations When to use proactive investigation: Use Case Example Pre-production testing "Investigate performance regression in staging" Performance analysis "Why did latency increase after the last deploy?" Capacity planning "Analyze memory growth trends over the past month" Post-incident deep dive "What else happened during that outage?" Setup guide Prerequisites Azure SRE Agent resource Active Hawkeye account (contact NeuBird to get started) At least one connected cloud provider in Hawkeye (Azure, GCP, etc.) Step 1: Add the Remote MCP Connector Navigate to your SRE Agent at sre.azure.com (e.g., https://sre.azure.com/agents/subscriptions/3eaf90b4-f4fa-416e-a0aa-ac2321d9decb/resourceGroups/sre-agent/providers/Microsoft.App/agents/dbandaru-pagerduty ) Go to Builder > Connectors Click Add connector > MCP server (User provided connector) Field Value Name hawkeye-mcp Connection type Streamable-HTTP URL https://<your-deployment-name>.app.neubird.ai/mcp Authentication Custom headers Authentication headers: Header Value X-Hawkeye-Email Your Hawkeye email X-Hawkeye-Password Your Hawkeye password Or use bearer token (JWT) for CI/CD: Header Value Authorization Bearer <your-jwt-token> To obtain a bearer token: curl -s -X POST "https://<your-deployment-name>.app.neubird.ai/api/v1/user/login" \ -H "Content-Type: application/json" \ -d '{"email": "your@email.com", "password": "your-password"}' \ | jq -r '.access_token' Step 2: Create a Hawkeye skill After adding the connector, create a skill that knows how to use the Hawkeye tools. The skill has a system prompt tuned for incident investigation and a reference to your MCP connector. In the left navigation, select Builder > Skills Click Add skill Paste the following YAML configuration (see below) Click Save api_version: azuresre.ai/v1 kind: AgentConfiguration metadata: owner: your-team@contoso.com version: "1.0.0" spec: name: HawkeyeInvestigator display_name: Hawkeye Incident Investigator system_prompt: | You are an incident investigation specialist with access to NeuBird Hawkeye's autonomous investigation platform. ## Capabilities ### Finding alerts - List uninvestigated alerts from the last N hours/days - Filter alerts by severity (P1, P2, P3, P4) - Search alerts by keyword or service name ### Running investigations - Investigate existing alerts by alert ID - Create manual investigations for proactive analysis - Monitor investigation progress in real-time ### Root cause analysis - Retrieve detailed RCA reports with incident timelines - View chain of thought and reasoning - Get data sources and queries consulted - Ask follow-up questions about incidents ### Remediation - Execute corrective action scripts - Implement preventive measures - Generate post-mortem documentation ### Project management - List and switch between Hawkeye projects - View connected data sources and sync status - Create and manage investigation instructions - Get organization-wide incident analytics (MTTR, time saved) ## Best practices - Start with uninvestigated alerts from the last 24 hours - Investigations typically complete in 30-90 seconds - First investigation may take 5-10 minutes while connections sync - Review corrective actions before executing ## Permissions All investigations use the connected data sources in your Hawkeye project. Ensure connections are properly synced before investigating. mcp_connectors: - hawkeye-mcp handoffs: [] The mcp_connectors field references the connector name from Step 1. This gives the skill access to all 42 Hawkeye tools. Customizing the skill: Edit the system prompt to match your team's workflow. For example, add instructions like "Always check P1 alerts first" or "Include deployment history in every investigation." The YAML above is a starting point. Step 3: Test the Integration Open a chat session with your SRE Agent Type /agent and select HawkeyeInvestigator Try these prompts: Show me uninvestigated alerts from the last 24 hours List all Hawkeye projects and their connections Investigate the first P1 alert Show me the root cause analysis What corrective actions are recommended? Has this happened before? Security Authentication methods Method Headers Best For Email/Password X-Hawkeye-Email + X-Hawkeye-Password Simple setup, most use cases Bearer Token (JWT) Authorization: Bearer <token> CI/CD pipelines, OAuth, enterprise Data security Encrypted traffic - HTTPS with TLS 1.2+ Read-only access to cloud providers and monitoring tools SOC 2 compliant - Secure data processing environment RBAC support - Role-based access at project level Access controls Each user authenticates with their own Hawkeye credentials Investigations scoped to connected data sources in your project Respects existing IAM and RBAC policies Security note: Store credentials in environment variables, never in config files. Hawkeye only needs read access to investigate. Available MCP tools (42) Project tools (5) Tool Description hawkeye_list_projects List all Hawkeye projects hawkeye_create_project Create a new project hawkeye_get_project_details Get project configuration hawkeye_update_project Update project name or description hawkeye_delete_project Delete a project (requires confirmation) Connection tools (10) Tool Description hawkeye_list_connections List all available connections hawkeye_create_aws_connection Create AWS connection with IAM role hawkeye_create_datadog_connection Create Datadog connection with API keys hawkeye_wait_for_connection_sync Wait for connection to reach SYNCED state hawkeye_add_connection_to_project Link connections to a project hawkeye_list_project_connections List connections for a specific project + 4 additional tools Azure, GCP, and other connections Investigation tools (11) Tool Description hawkeye_list_sessions List investigation sessions with filtering hawkeye_investigate_alert Investigate an alert (supports real-time streaming) hawkeye_create_manual_investigation Create investigation from custom prompt (supports streaming) hawkeye_get_investigation_status Get real-time progress with step-by-step breakdown hawkeye_get_rca Retrieve root cause analysis hawkeye_continue_investigation Ask follow-up questions on completed investigations hawkeye_get_chain_of_thought View investigation reasoning steps hawkeye_get_investigation_sources List data sources consulted hawkeye_get_investigation_queries List queries executed during investigation hawkeye_get_follow_up_suggestions Get suggested follow-up questions hawkeye_get_rca_score Get investigation quality score Instruction tools (7) Tool Description hawkeye_list_project_instructions List project instructions with type/status filtering hawkeye_create_project_instruction Create SYSTEM/FILTER/RCA/GROUPING instruction hawkeye_validate_instruction Validate instruction content before applying hawkeye_apply_session_instruction Apply instruction to session for testing hawkeye_rerun_session Rerun investigation with updated instructions + 2 additional tools Update and delete instructions Analytics tools (4) Tool Description hawkeye_get_incident_report Get organization-wide analytics (MTTR, time saved) hawkeye_inspect_session Get session metadata hawkeye_get_session_report Get summary reports for multiple sessions hawkeye_get_session_summary Get detailed analysis and scoring for a session Discovery tools (2) Tool Description hawkeye_discover_project_resources Explore available resources across all project connections hawkeye_list_connection_resource_types Get resource types for connection type and telemetry type Help tools (1) Tool Description hawkeye_get_guidance Interactive help system with embedded knowledge base Use cases 1. Faster Incident Response Phase Before Hawkeye After Hawkeye Alert detection Alert notification Alert notification Investigation Log into multiple cloud consoles Ask: "Investigate this alert" Correlation Manual log/metric analysis Automated multi-source query Root cause 2-4 hours 2-3 minutes Remediation Write runbook, execute Copy/paste bash script, execute Result: roughly 95% reduction in MTTR for common incident types 2. Knowledge Retention The problem: Senior engineer leaves Tribal knowledge lost Junior engineers struggle with same issues The Hawkeye solution: Capture investigation patterns through instructions Preserve institutional knowledge in reusable rules Train new engineers with past investigation history 3. Reduced Toil Common repetitive investigations: Issue Type Manual Time Hawkeye Time Frequency Database connection issues 2 hours 90 seconds 3x/week Pod restart loops 1.5 hours 60 seconds 5x/week Deployment failures 3 hours 2 minutes 2x/week Result: engineers spend more time on prevention and architecture, less on firefighting 4. Cross-Team Collaboration Platform team provides: SYSTEM instructions describing architecture FILTER instructions for noise reduction RCA instructions for common patterns Application team benefits: Investigations leverage platform context No need for deep infrastructure knowledge Consistent incident response across teams 5. Continuous Learning Track and improve over time: Month MTTR Time Saved Quality Score Noise Reduction Month 1 45 min 15 hours 7.2/10 20% Month 3 12 min 45 hours 8.5/10 55% Month 6 3 min 90 hours 9.1/10 78% Result: data-driven improvement of incident response processes Next steps The Hawkeye MCP integration is available now for all Azure SRE Agent customers. Get started Contact NeuBird to set up a Hawkeye account Connect your cloud providers (Azure, GCP, etc.) Add the Hawkeye MCP connector to your SRE Agent Create a Hawkeye skill in Builder > Skills Start investigating! Learn more Hawkeye MCP documentation Tool reference (all 42 tools) Advanced workflows hawkeye-mcp-server on npm NeuBird help documentation Azure SRE Agent MCP integration guide NeuBird AI Need OAuth support? Contact NeuBird support: support@neubird.ai Try it out Ready to get started? Quick start checklist: Sign up for Hawkeye at https://neubird.ai/contact-us/ Connect your cloud infrastructure (Azure, GCP, etc.) Install the MCP connector in Azure SRE Agent Create a Hawkeye skill in Builder > Skills Test with "Show me uninvestigated alerts" Investigate your first incident in under 2 minutes! Questions? Drop a comment below or reach out to the Azure SRE Agent team. Want to see Hawkeye in action? Request a demo from NeuBird: https://neubird.ai/contact-us/ Azure SRE Agent helps SRE teams build automated incident response workflows. Learn more at aka.ms/sreagent. Tags: #Azure #SREAgent #NeuBird #Hawkeye #MCP #IncidentResponse #DevOps #SRE #AI #Automation #CloudOps #MTTR #RootCauseAnalysis201Views0likes0CommentsMigrating Ant Builds to Maven with GitHub Copilot app modernization
Many legacy Java applications still rely on Apache Ant for building, packaging, and dependency management. While Ant remains flexible, it lacks the structured lifecycle, dependency resolution, and ecosystem support that modern build tools like Maven provide. Migrating from Ant to Maven improves maintainability, build reproducibility, IDE compatibility, and enables modern Java workflows such as dependency upgrades, framework updates, and containerization. GitHub Copilot app modernization accelerates this transition by analyzing an Ant‑based project, generating a migration plan, and applying transformations to produce a Maven‑based build aligned with modern Java tooling. What GitHub Copilot app modernization Supports GitHub Copilot app modernization can help teams: Detect Ant build scripts (build.xml) and related custom task files Recommend Maven project structure and lifecycle alignment Generate an initial pom.xml with matched project metadata Map Ant targets to Maven phases where possible Identify external dependencies and translate them into Maven coordinates Migrate resource directories and compiled output locations Surface code or configuration changes required for a Maven‑driven build Validate the new Maven configuration through iterative builds This modernizes the build foundation before performing other upgrades such as JDK, Spring, Jakarta, or container‑readiness transformations. Project Analysis When you open an Ant‑based project in Visual Studio Code or IntelliJ IDEA, GitHub Copilot app modernization performs an analysis: Detects build.xml and auxiliary Ant scripts Identifies classpaths defined across Ant targets Evaluates manually referenced JARs in lib directories Inspects source layout and output directories Determines project metadata such as groupId, artifactId, and version Determines whether frameworks or libraries require updates before Maven migration This analysis forms the basis of the migration plan. Migration Plan Generation GitHub Copilot app modernization produces a migration plan that outlines: The recommended Maven project layout (src/main/java, src/test/java, resources directories) A generated pom.xml with discovered dependencies Mapped Ant targets to Maven lifecycle phases (compile, test, package) Plugin configurations needed to replicate custom Ant functionality Suggested removal of lib directory JARs in favor of dependency management Notes on unsupported or manual‑review areas (custom Ant tasks, script‑heavy targets, specialized packaging logic) You can review and adjust the plan before proceeding. Automated Transformations Once confirmed, GitHub Copilot app modernization applies targeted updates: Generates the project’s pom.xml Migrates dependency JAR references to Maven dependency entries Moves source and resource files into Maven‑compatible structure Updates ignore files, build output directories, and paths Introduces common Maven plugins for compiler, surefire, assembly, or shading Suggests replacements for custom Ant tasks if built‑in Maven plugins exist This automated work removes most of the manual lifting normally required for Ant → Maven transitions. Build & Fix Iteration After applying the transformations, the tool attempts to build the new Maven project: Runs the build Captures missing dependencies, incorrect scopes, or misaligned plugin versions Suggests targeted fixes Applies adjustments and rebuilds Iterates until the project compiles or no further automated fixes are possible This helps stabilize the migration quickly. Security & Behavior Validation GitHub Copilot app modernization also performs additional validation: Flags CVEs introduced or resolved through dependency discovery Alerts you to behavioral differences between Ant‑driven and Maven‑driven builds Highlights test failures, packaging differences, or altered classpaths that may need review These findings allow developers to refine the migration safely. Expected Output After the migration, you can expect: A newly generated and fully structured Maven project A populated pom.xml with dependencies, plugins, and metadata Updated project layout aligned with Maven standards Removed or deprecated Ant build files where appropriate Aligned dependency versions ready for further modernization A summary file detailing: Build changes Dependency mappings Code or config adjustments Remaining manual review items Developer Responsibilities While GitHub Copilot app modernization automates the mechanical migration from Ant to Maven, developers remain responsible for: Reviewing tests and build artifacts for behavioral differences Validating packaging steps for WAR/EAR/JAR outputs Replacing complex custom Ant scripts with proper Maven plugins Verifying deployment and CI workflows dependent on Ant build logic Confirming integration points that rely on Ant‑specific tasks or ordering Once validated, the Maven‑based structure becomes a strong foundation for further modernization such as JDK upgrades, Spring migration, Jakarta adoption, and containerization. Learn More For project setup and the complete modernization workflow, refer to the Microsoft Learn guide for upgrading Java projects with GitHub Copilot app modernization. Quickstart: Upgrade a Java Project with GitHub Copilot App Modernization | Microsoft Learn146Views1like0CommentsProactive Health Monitoring and Auto-Communication Now Available for Azure Container Registry
Today, we're introducing Azure Container Registry's (ACR) latest service health enhancement: automated auto-communication through Azure Service Health alerts. When ACR detects degradation in critical operations—authentication, image push, and pull—your teams are now proactively notified through Azure Service Health, delivering better transparency and faster communication without waiting for manual incident reporting. For platform teams, SRE organizations, and enterprises with strict SLA requirements, this means container registry health events are now communicated automatically and integrated into your existing incident management and observability workflows. Background: Why Registry Availability Matters Container registries sit at the heart of modern software delivery. Every CI/CD pipeline build, every Kubernetes pod startup, and every production deployment depends on the ability to authenticate, push artifacts, and pull images reliably. When a registry experiences degradation—even briefly—the downstream impact can cascade quickly: failed pipelines, delayed deployments, and application startup failures across multiple clusters and environments. Until now, ACR customers discovered service issues primarily through two paths: monitoring their own workloads for symptoms (failed pulls, auth errors), or checking the Azure Status page reactively. Neither approach gives your team the head start needed to coordinate an effective response before impact is felt. Auto-Communication Through Azure Service Health Alerts ACR now provides faster communication when: Degradation is detected in your region Automated remediation is in progress Engineering teams have been engaged and are actively mitigating These notifications arrive through Azure Service Health, the same platform your teams already use to track planned maintenance and health advisories across all your Azure resources. You receive timely visibility into registry health events—with rich context including tracking IDs, affected regions, impacted resources, and mitigation timelines—without needing to open a support request or continuously monitor dashboards. Who Benefits This capability delivers value across every team that depends on container registry availability: Enterprise platform teams managing centralized registries for large organizations will receive early warning before CI/CD pipelines begin failing across hundreds of development teams. SRE organizations can integrate ACR health signals into their existing incident management workflows—via webhook integration with PagerDuty, Opsgenie, ServiceNow, and similar tools—rather than relying on synthetic monitoring or customer reports. Teams with strict SLA requirements can now correlate production incidents with documented ACR service events, supporting post-incident reviews and customer communication. All ACR customers gain a level of registry observability that previously required custom monitoring infrastructure to approximate. A Part of ACR's Broader Observability Strategy Automated Service Health auto-communication is one component of ACR's ongoing investment in service health and observability. Combined with Azure Monitor metrics, diagnostic logs and events, Service Health alerts give your teams a layered observability posture: Signal What It Tells You Service Health alerts ACR-wide service events in your regions, with official mitigation status Azure Monitor metrics Registry-level request rates, success rates, and storage utilization. This will be available soon Diagnostic logs Repository and operation-level audit trail What's next: We are working on exposing additional ACR metrics through Azure Monitor, giving you deeper visibility into registry operations—such as authentication, pull and push API requests, and error breakdowns—directly in the Azure portal. This will enable self-service diagnostics, allowing your teams to investigate and troubleshoot registry issues independently without opening a support request. Getting Started To configure Service Health alerts for ACR, navigate to Service Health in the Azure portal, create an alert rule filtering on Container Registry, and attach an action group with your preferred notification channels (email, SMS, webhook). Alerts can also be created programmatically via ARM templates or Bicep for infrastructure-as-code workflows. For the full step-by-step setup guide—including recommended alert configurations for production-critical, maintenance awareness, and comprehensive monitoring scenarios—see Configure Service Health alerts for Azure Container Registry.373Views0likes0Comments