best practices
1681 TopicsThe Agent that investigates itself
Azure SRE Agent handles tens of thousands of incident investigations each week for internal Microsoft services and external teams running it for their own systems. Last month, one of those incidents was about the agent itself. Our KV cache hit rate alert started firing. Cached token percentage was dropping across the fleet. We didn't open dashboards. We simply asked the agent. It spawned parallel subagents, searched logs, read through its own source code, and produced the analysis. First finding: Claude Haiku at 0% cache hits. The agent checked the input distribution and found that the average call was ~180 tokens, well below Anthropic’s 4,096-token minimum for Haiku prompt caching. Structurally, these requests could never be cached. They were false positives. The real regression was in Claude Opus: cache hit rate fell from ~70% to ~48% over a week. The agent correlated the drop against the deployment history and traced it to a single PR that restructured prompt ordering, breaking the common prefix that caching relies on. It submitted two fixes: one to exclude all uncacheable requests from the alert, and the other to restore prefix stability in the prompt pipeline. That investigation is how we develop now. We rarely start with dashboards or manual log queries. We start by asking the agent. Three months earlier, it could not have done any of this. The breakthrough was not building better playbooks. It was harness engineering: enabling the agent to discover context as the investigation unfolded. This post is about the architecture decisions that made it possible. Where we started In our last post, Context Engineering for Reliable AI Agents: Lessons from Building Azure SRE Agent, we described how moving to a single generalist agent unlocked more complex investigations. The resolution rates were climbing, and for many internal teams, the agent could now autonomously investigate and mitigate roughly 50% of incidents. We were moving in the right direction. But the scores weren't uniform, and when we dug into why, the pattern was uncomfortable. The high-performing scenarios shared a trait: they'd been built with heavy human scaffolding. They relied on custom response plans for specific incident types, hand-built subagents for known failure modes, and pre-written log queries exposed as opaque tools. We weren’t measuring the agent’s reasoning – we were measuring how much engineering had gone into the scenario beforehand. On anything new, the agent had nowhere to start. We found these gaps through manual review. Every week, engineers read through lower-scored investigation threads and pushed fixes: tighten a prompt, fix a tool schema, add a guardrail. Each fix was real. But we could only review fifty threads a week. The agent was handling ten thousand. We were debugging at human speed. The gap between those two numbers was where our blind spots lived. We needed an agent powerful enough to take this toil off us. An agent which could investigate itself. Dogfooding wasn't a philosophy - it was the only way to scale. The Inversion: Three bets The problem we faced was structural - and the KV cache investigation shows it clearly. The cache rate drop was visible in telemetry, but the cause was not. The agent had to correlate telemetry with deployment history, inspect the relevant code, and reason over the diff that broke prefix stability. We kept hitting the same gap in different forms: logs pointing in multiple directions, failure modes in uninstrumented paths, regressions that only made sense at the commit level. Telemetry showed symptoms, but not what actually changed. We'd been building the agent to reason over telemetry. We needed it to reason over the system itself. The instinct when agents fail is to restrict them: pre-write the queries, pre-fetch the context, pre-curate the tools. It feels like control. In practice, it creates a ceiling. The agent can only handle what engineers anticipated in advance. The answer is an agent that can discover what it needs as the investigation unfolds. In the KV cache incident, each step, from metric anomaly to deployment history to a specific diff, followed from what the previous step revealed. It was not a pre-scripted path. Navigating towards the right context with progressive discovery is key to creating deep agents which can handle novel scenarios. Three architectural decisions made this possible – and each one compounded on the last. Bet 1: The Filesystem as the Agent's World Our first bet was to give the agent a filesystem as its workspace instead of a custom API layer. Everything it reasons over – source code, runbooks, query schemas, past investigation notes – is exposed as files. It interacts with that world using read_file, grep, find, and shell. No SearchCodebase API. No RetrieveMemory endpoint. This is an old Unix idea: reduce heterogeneous resources to a single interface. Coding agents already work this way. It turns out the same pattern works for an SRE agent. Frontier models are trained on developer workflows: navigating repositories, grepping logs, patching files, running commands. The filesystem is not an abstraction layered on top of that prior. It matches it. When we materialized the agent’s world as a repo-like workspace, our human "Intent Met" score - whether the agent's investigation addressed the actual root cause as judged by the on-call engineer - rose from 45% to 75% on novel incidents. But interface design is only half the story. The other half is what you put inside it. Code Repositories: the highest-leverage context Teams had prewritten log queries because they did not trust the agent to generate correct ones. That distrust was justified. Models hallucinate table names, guess column schemas, and write queries against the wrong cluster. But the answer was not tighter restriction. It was better grounding. The repo is the schema. Everything else is derived from it. When the agent reads the code that produces the logs, query construction stops being guesswork. It knows the exact exceptions thrown, and the conditions under which each path executes. Stack traces start making sense, and logs become legible. But beyond query grounding, code access unlocked three new capabilities that telemetry alone could not provide: Ground truth over documentation. Docs drift and dashboards show symptoms. The code is what the service actually does. In practice, most investigations only made sense when logs were read alongside implementation. Point-in-time investigation. The agent checks out the exact commit at incident time, not current HEAD, so it can correlate the failure against the actual diffs. That's what cracked the KV cache investigation: a PR broke prefix stability, and the diff was the only place this was visible. Without commit history, you can't distinguish a code regression from external factors. Reasoning even where telemetry is absent. Some code paths are not well instrumented. The agent can still trace logic through source and explain behavior even when logs do not exist. This is especially valuable in novel failure modes – the ones most likely to be missed precisely because no one thought to instrument them. Memory as a filesystem, not a vector store Our first memory system used RAG over past session learnings. It had a circular dependency: a limited agent learned from limited sessions and produced limited knowledge. Garbage in, garbage out. But the deeper problem was retrieval. In SRE Context, embedding similarity is a weak proxy for relevance. “KV cache regression” and “prompt prefix instability” may be distant in embedding space yet still describe the same causal chain. We tried re-ranking, query expansion, and hybrid search. None fixed the core mismatch between semantic similarity and diagnostic relevance. We replaced RAG with structured Markdown files that the agent reads and writes through its standard tool interface. The model names each file semantically: overview.md for a service summary, team.md for ownership and escalation paths, logs.md for cluster access and query patterns, debugging.md for failure modes and prior learnings. Each carry just enough context to orient the agent, with links to deeper files when needed. The key design choice was to let the model navigate memory, not retrieve it through query matching. The agent starts from a structured entry point and follows the evidence toward what matters. RAG assumes you know the right query before you know what you need. File traversal lets relevance emerge as context accumulates. This removed chunking, overlap tuning, and re-ranking entirely. It also proved more accurate, because frontier models are better at following context than embeddings are at guessing relevance. As a side benefit, memory state can be snapshotted periodically. One problem remains unsolved: staleness. When two sessions write conflicting patterns to debugging.md, the model must reconcile them. When a service changes behavior, old entries can become misleading. We rely on timestamps and explicit deprecation notes, but we do not have a systemic solution yet. This is an active area of work, and anyone building memory at scale will run into it. The sandbox as epistemic boundary The filesystem also defines what the agent can see. If something is not in the sandbox, the agent cannot reason about it. We treat that as a feature, not a limitation. Security boundaries and epistemic boundaries are enforced by the same mechanism. Inside that boundary, the agent has full execution: arbitrary bash, python, jq, and package installs through pip or apt. That scope unlocks capabilities we never would have built as custom tools. It opens PRs with gh cli, like the prompt-ordering fix from KV cache incident. It pushes Grafana dashboards, like a cache-hit-rate dashboard we now track by model. It installs domain-specific CLI tools mid-investigation when needed. No bespoke integration required, just a shell. The recurring lesson was simple: a generally capable agent in the right execution environment outperforms a specialized agent with bespoke tooling. Custom tools accumulate maintenance costs. Shell commands compose for free. Bet 2: Context Layering Code access tells the agent what a service does. It does not tell the agent what it can access, which resources its tools are scoped to, or where an investigation should begin. This gap surfaced immediately. Users would ask "which team do you handle incidents for?" and the agent had no answer. Tools alone are not enough. An integration also needs ambient context so the model knows what exists, how it is configured, and when to use it. We fixed this with context hooks: structured context injected at prompt construction time to orient the agent before it takes action. Connectors - what can I access? A manifest of wired systems such as Log Analytics, Outlook, and Grafana, along with their configuration. Repositories - what does this system do? Serialized repo trees, plus files like AGENTS.md, Copilot.md, and CLAUDE.md with team-specific instructions. Knowledge map - what have I learned before? A two-tier memory index with a top-level file linking to deeper scenario-specific files, so the model can drill down only when needed. Azure resource topology - where do things live? A serialized map of relationships across subscriptions, resource groups, and regions, so investigations start in the right scope. Together, these context hooks turn a cold start into an informed one. That matters because a bad early choice does not just waste tokens. It sends the investigation down the wrong trajectory. A capable agent still needs to know what exists, what matters, and where to start. Bet 3: Frugal Context Management Layered context creates a new problem: budget. Serialized repo trees, resource topology, connector manifests, and a memory index fill context fast. Once the agent starts reading source files and logs, complex incidents hit context limits. We needed our context usage to be deliberately frugal. Tool result compression via the filesystem Large tool outputs are expensive because they consume context before the agent has extracted any value from them. In many cases, only a small slice or a derived summary of that output is actually useful. Our framework exposes these results as files to the agent. The agent can then use tools like grep, jq, or python to process them outside the model interface, so that only the final result enters context. The filesystem isn't just a capability abstraction - it's also a budget management primitive. Context Pruning and Auto Compact Long investigations accumulate dead weight. As hypotheses narrow, earlier context becomes noise. We handle this with two compaction strategies. Context Pruning runs mid-session. When context usage crosses a threshold, we trim or drop stale tool calls and outputs - keeping the window focused on what still matters. Auto-Compact kicks in when a session approaches its context limit. The framework summarizes findings and working hypotheses, then resumes from that summary. From the user's perspective, there's no visible limit. Long investigations just work. Parallel subagents The KV cache investigation required reasoning along two independent hypotheses: whether the alert definition was sound, and whether cache behavior had actually regressed. The agent spawned parallel subagents for each task, each operating in its own context window. Once both finished, it merged their conclusions. This pattern generalizes to any task with independent components. It speeds up the search, keeps intermediate work from consuming the main context window, and prevents one hypothesis from biasing another. The Feedback loop These architectural bets have enabled us to close the original scaling gap. Instead of debugging the agent at human speed, we could finally start using it to fix itself. As an example, we were hitting various LLM errors: timeouts, 429s (too many requests), failures in the middle of response streaming, 400s from code bugs that produced malformed payloads. These paper cuts would cause investigations to stall midway and some conversations broke entirely. So, we set up a daily monitoring task for these failures. The agent searches for the last 24 hours of errors, clusters the top hitters, traces each to its root cause in the codebase, and submits a PR. We review it manually before merging. Over two weeks, the errors were reduced by more than 80%. Over the last month, we have successfully used our agent across a wide range of scenarios: Analyzed our user churn rate and built dashboards we now review weekly. Correlated which builds needed the most hotfixes, surfacing flaky areas of the codebase. Ran security analysis and found vulnerabilities in the read path. Helped fill out parts of its own Responsible AI review, with strict human review. Handles customer-reported issues and LiveSite alerts end to end. Whenever it gets stuck, we talk to it and teach it, ask it to update its memory, and it doesn't fail that class of problem again. The title of this post is literal. The agent investigating itself is not a metaphor. It is a real workflow, driven by scheduled tasks, incident triggers, and direct conversations with users. What We Learned We spent months building scaffolding to compensate for what the agent could not do. The breakthrough was removing it. Every prewritten query was a place we told the model not to think. Every curated tool was a decision made on its behalf. Every pre-fetched context was a guess about what would matter before we understood the problem. The inversion was simple but hard to accept: stop pre-computing the answer space. Give the model a structured starting point, a filesystem it knows how to navigate, context hooks that tell it what it can access, and budget management that keeps it sharp through long investigations. The agent that investigates itself is both the proof and the product of this approach. It finds its own bugs, traces them to root causes in its own code, and submits its own fixes. Not because we designed it to. Because we designed it to reason over systems, and it happens to be one. We are still learning. Staleness is unsolved, budget tuning remains largely empirical, and we regularly discover assumptions baked into context that quietly constrain the agent. But we have crossed a new threshold: from an agent that follows your playbook to one that writes the next one. Thanks to visagarwal for co-authoring this post.10KViews5likes0CommentsDemystifying GitHub Copilot Security Controls: easing concerns for organizational adoption
At a recent developer conference, I delivered a session on Legacy Code Rescue using GitHub Copilot App Modernization. Throughout the day, conversations with developers revealed a clear divide: some have fully embraced Agentic AI in their daily coding, while others remain cautious. Often, this hesitation isn't due to reluctance but stems from organizational concerns around security and regulatory compliance. Having witnessed similar patterns during past technology shifts, I understand how these barriers can slow adoption. In this blog, I'll demystify the most common security concerns about GitHub Copilot and explain how its built-in features address them, empowering organizations to confidently modernize their development workflows. GitHub Copilot Model Training A common question I received at the conference was whether GitHub uses your code as training data for GitHub Copilot. I always direct customers to the GitHub Copilot Trust Center for clarity, but the answer is straightforward: “No. GitHub uses neither Copilot Business nor Enterprise data to train the GitHub model.” Notice this restriction also applies to third-party models as well (e.g. Anthropic, Google). GitHub Copilot Intellectual Property indemnification policy A frequent concern I hear is, since GitHub Copilot’s underlying models are trained on sources that include public code, it might simply “copy and paste” code from those sources. Let’s clarify how this actually works: Does GitHub Copilot “copy/paste”? “The AI models that create Copilot’s suggestions may be trained on public code, but do not contain any code. When they generate a suggestion, they are not “copying and pasting” from any codebase.” To provide an additional layer of protection, GitHub Copilot includes a “duplicate detection filter”. This feature helps prevent suggestions that closely match public code from being surfaced. (Note: This duplicate detection currently does not apply to the Copilot coding agent.) More importantly, customers are protected by an Intellectual Property indemnification policy. This means that if you receive an unmodified suggestion from GitHub Copilot and face a copyright claim as a result, Microsoft will defend you in court. GitHub Copilot Data Retention Another frequent question I hear concerns GitHub Copilot’s data retention policies. For organizations on GitHub Copilot Business and Enterprise plans, retention practices depend on how and where the service is accessed from: Access through IDE for Chat and Code Completions: Prompts and Suggestions: Not retained. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. Other GitHub Copilot access and use: Prompts and Suggestions: Retained for 28 days. User Engagement Data: Kept for two years. Feedback Data: Stored for as long as needed for its intended purpose. For Copilot Coding Agent, session logs are retained for the life of the account in order to provide the service. Excluding content from GitHub Copilot To prevent GitHub Copilot from indexing sensitive files, you can configure content exclusions at the repository or organization level. In VS Code, use the .copilotignore file to exclude files client-side. Note that files listed in .gitignore are not indexed by default but may still be referenced if open or explicitly referenced (unless they’re excluded through .copilotignore or content exclusions). The life cycle of a GitHub Copilot code suggestion Here are the key protections at each stage of the life cycle of a GitHub Copilot code suggestion: In the IDE: Content exclusions prevent files, folders, or patterns from being included. GitHub proxy (pre-model safety): Prompts go through a GitHub proxy hosted in Microsoft Azure for pre-inference checks: screening for toxic or inappropriate language, relevance, and hacking attempts/jailbreak-style prompts before reaching the model. Model response: With the public code filter enabled, some suggestions are suppressed. The vulnerability protection feature blocks insecure coding patterns like hardcoded credentials or SQL injections in real time. Disable access to GitHub Copilot Free Due to the varying policies associated with GitHub Copilot Free, it is crucial for organizations to ensure it is disabled both in the IDE and on GitHub.com. Since not all IDEs currently offer a built-in option to disable Copilot Free, the most reliable method to prevent both accidental and intentional access is to implement firewall rule changes, as outlined in the official documentation. Agent Mode Allow List Accidental file system deletion by Agentic AI assistants can happen. With GitHub Copilot agent mode, the "Terminal auto approve” setting in VS Code can be used to prevent this. This setting can be managed centrally using a VS Code policy. MCP registry Organizations often want to restrict access to allow only trusted MCP servers. GitHub now offers an MCP registry feature for this purpose. This feature isn’t available in all IDEs and clients yet, but it's being developed. Compliance Certifications The GitHub Copilot Trust Center page lists GitHub Copilot's broad compliance credentials, surpassing many competitors in financial, security, privacy, cloud, and industry coverage. SOC 1 Type 2: Assurance over internal controls for financial reporting. SOC 2 Type 2: In-depth report covering Security, Availability, Processing Integrity, Confidentiality, and Privacy over time. SOC 3: General-use version of SOC 2 with broad executive-level assurance. ISO/IEC 27001:2013: Certification for a formal Information Security Management System (ISMS), based on risk management controls. CSA STAR Level 2: Includes a third-party attestation combining ISO 27001 or SOC 2 with additional cloud control matrix (CCM) requirements. TISAX: Trusted Information Security Assessment Exchange, covering automotive-sector security standards. In summary, while the adoption of AI tools like GitHub Copilot in software development can raise important questions around security, privacy, and compliance, it’s clear that existing safeguards in place help address these concerns. By understanding the safeguards, configurable controls, and robust compliance certifications offered, organizations and developers alike can feel more confident in embracing GitHub Copilot to accelerate innovation while maintaining trust and peace of mind.Proactive Health Monitoring and Auto-Communication Now Available for Azure Container Registry
Today, we're introducing Azure Container Registry's (ACR) latest service health enhancement: automated auto-communication through Azure Service Health alerts. When ACR detects degradation in critical operations—authentication, image push, and pull—your teams are now proactively notified through Azure Service Health, delivering better transparency and faster communication without waiting for manual incident reporting. For platform teams, SRE organizations, and enterprises with strict SLA requirements, this means container registry health events are now communicated automatically and integrated into your existing incident management and observability workflows. Background: Why Registry Availability Matters Container registries sit at the heart of modern software delivery. Every CI/CD pipeline build, every Kubernetes pod startup, and every production deployment depends on the ability to authenticate, push artifacts, and pull images reliably. When a registry experiences degradation—even briefly—the downstream impact can cascade quickly: failed pipelines, delayed deployments, and application startup failures across multiple clusters and environments. Until now, ACR customers discovered service issues primarily through two paths: monitoring their own workloads for symptoms (failed pulls, auth errors), or checking the Azure Status page reactively. Neither approach gives your team the head start needed to coordinate an effective response before impact is felt. Auto-Communication Through Azure Service Health Alerts ACR now provides faster communication when: Degradation is detected in your region Automated remediation is in progress Engineering teams have been engaged and are actively mitigating These notifications arrive through Azure Service Health, the same platform your teams already use to track planned maintenance and health advisories across all your Azure resources. You receive timely visibility into registry health events—with rich context including tracking IDs, affected regions, impacted resources, and mitigation timelines—without needing to open a support request or continuously monitor dashboards. Who Benefits This capability delivers value across every team that depends on container registry availability: Enterprise platform teams managing centralized registries for large organizations will receive early warning before CI/CD pipelines begin failing across hundreds of development teams. SRE organizations can integrate ACR health signals into their existing incident management workflows—via webhook integration with PagerDuty, Opsgenie, ServiceNow, and similar tools—rather than relying on synthetic monitoring or customer reports. Teams with strict SLA requirements can now correlate production incidents with documented ACR service events, supporting post-incident reviews and customer communication. All ACR customers gain a level of registry observability that previously required custom monitoring infrastructure to approximate. A Part of ACR's Broader Observability Strategy Automated Service Health auto-communication is one component of ACR's ongoing investment in service health and observability. Combined with Azure Monitor metrics, diagnostic logs and events, Service Health alerts give your teams a layered observability posture: Signal What It Tells You Service Health alerts ACR-wide service events in your regions, with official mitigation status Azure Monitor metrics Registry-level request rates, success rates, and storage utilization. This will be available soon Diagnostic logs Repository and operation-level audit trail What's next: We are working on exposing additional ACR metrics through Azure Monitor, giving you deeper visibility into registry operations—such as authentication, pull and push API requests, and error breakdowns—directly in the Azure portal. This will enable self-service diagnostics, allowing your teams to investigate and troubleshoot registry issues independently without opening a support request. Getting Started To configure Service Health alerts for ACR, navigate to Service Health in the Azure portal, create an alert rule filtering on Container Registry, and attach an action group with your preferred notification channels (email, SMS, webhook). Alerts can also be created programmatically via ARM templates or Bicep for infrastructure-as-code workflows. For the full step-by-step setup guide—including recommended alert configurations for production-critical, maintenance awareness, and comprehensive monitoring scenarios—see Configure Service Health alerts for Azure Container Registry.136Views0likes0CommentsModern Database Protection: From Visibility to Threat Detection with Microsoft Defender for Cloud
Databases sit at the heart of modern businesses. They support everyday apps, reports and AI tools. For example, any time you engage a site that requires a username and password, there is a database at the back end that stores your login information. As organizations adopt multi-cloud and hybrid architectures, databases are generated all the time, creating database sprawl. As a result, tracking and managing every database, catching misconfigurations and vulnerabilities, knowing where sensitive information lives, all becomes increasingly difficult leaving a huge security gap. And because companies store their most valuable data, like your login information, credit card and social security numbers, in databases, databases are the main target for threat actors. Securing databases is no longer optional, yet getting started can feel daunting. Database security needs to address the gaps mentioned above – help organizations see their databases to help them monitor for misconfigurations and vulnerabilities, sensitive information and any suspicious activities that occur within the database that are indicative of an attack. Further, database security must meet customers where they are – in multi-cloud and hybrid environments. This five part blog series will introduce and explore database-specific security needs and how Defender for Cloud addresses the gaps through its deep visibility into your database estate, detection of misconfiguration, vulnerabilities and sensitive information, threat protection with alerts and Integrated security platform to manage it all. This blog, part one, will begin with an overview of today’s database infrastructure security needs. Then we will introduce Microsoft Defender for Cloud’s unique database protection capabilities to help address this gap. Modern Database Architectures and Their Security Implications Modern databases can be deployed in two main ways: on your own infrastructure or as a cloud service. In an on-premises or IaaS (Infrastructure as a Service) setup, you manage the underlying server or virtual machine. For example, running a SQL Server on a self-managed Windows server—whether in your data center or on a cloud VM in Azure or AWS—is an IaaS deployment (Microsoft Defender for Cloud refers to these as “SQL servers on machines”) that require server maintenance. The other approach is PaaS (Platform as a Service), where a cloud provider manages the host server for you. In a PaaS scenario, you simply use a hosted database service (such as Azure SQL Database, Azure SQL Managed Instance, Azure Database for PostgreSQL, or Amazon RDS) without worrying about the operating system or server maintenance. In either case, you need to secure both the database host (the server or VM) and the database itself (the data and database engine). It’s also important to distinguish between a database’s control plane and data plane. The control plane includes the external settings that govern your database environment—like network firewall rules or who can access the system. The data plane involves information and queries inside the database. An attacker might exploit a weak firewall setting on the control plane or use stolen credentials to run malicious queries on the data plane. To fully protect a database, you need visibility into both planes to catch suspicious behavior. Effective database protection must span both IaaS and PaaS environments and monitor both the control plane and data plane because they are common targets for threat actors. Security teams can then detect suspicious activity such as SQL injections, brute-force attempts, and lateral movement through your environment. A Unified Approach to Database Protection Built for Multicloud Modern database environments are fragmented across deployment models, database ownership, and teams. Databases run across IaaS and PaaS, span control and data planes, and in multiple clouds, yet protection is often pieced together from disconnected point solutions Microsoft Defender for Cloud is a cloud native application protection platform (CNAPP) solution that provides a unified, cloud-native approach to database protection—bringing together discovery, posture management, and threat detection across SQL (Iaas and Paas), open-source relational databases (OSS), and Cosmos DB databases. Defender for Cloud’s database protection uses both agent-based and agentless solutions to protect database resources on-premises, hybrid, multi-cloud and Azure. A lightweight agent-based solution is used for SQL servers on Azure virtual machines or virtual machines hosted outside Azure and allows for deeper inspection, while an agentless approach for managed databases stored in Azure or AWS RDS resources provide protection with seamless integration. Additionally, Defender for Cloud brings in other signals from the cloud environment, surfacing a secure score for security posture, an asset inventory, regulatory compliance, governance capabilities, and a cloud security graph that allows for proactive risk exploration. The value of database security in Defender for Cloud starts with pre and post breach visibility. Vulnerability assessment and data security posture management helps security admins understand their database security posture and, by following Defender for Cloud’s recommendations, security admins can harden their environment proactively. Vulnerability assessments scans surface remediation steps for configurations that do not follow industry’s best practices. These recommendations may include enabling encryption when data is at rest where applicable or database server should restrict public access ranges. Data security posture management in Defender for Cloud automatically helps security admins prioritize the riskiest databases by discovering sensitive data and surfacing related exposure and risk. When databases are associated with certain risks, Defender for Cloud will provide its findings in three ways: risk-based security recommendations, attack path analysis with Defender CSPM and the data and AI dashboard. The risk level is determined by other context related to the resource like, internet exposure or sensitive information. This way, Security admins will have a solid understanding of their database environment pre-breach and will have a prioritized list of resources to remediate based on risk or posture level. While we can do our best to harden the environment, breaches can still happen. Timely post-breach response is just as important. Threat detection capabilities within Defender for Cloud will identify anomalous activity in near real time so SOC analytes can take action to contain the attack immediately. Defender for Cloud monitors both the control and the data plane for any anomalous activity that indicates a threat, from brute force attack detections to access and query anomalies. To provide a unified security experience, Defender for Cloud natively integrates with the Microsoft Defender Portal. The Defender portal brings signals from Defender for Cloud to provide a single cloud-agnostic security experience, equipping security teams with tools like secure score for security posture, attack paths, and incidents and alerts. When anomalous activities occur in the environment, time is of the essence. Security teams must have context and tools to investigate a database resource, both in the control plan and the data plane, to remediate and mitigate future attacks quickly. Defender for Cloud and the Defender portal brings together a security ecosystem that allows SOC analysts to investigate, correlate activities and incidents with alerts, contain and respond accordingly. Take Action: Close the Database Blind Spot Today Modern database environments demand more than isolated controls or point solutions. As databases span hybrid and multiple clouds, security teams need a unified approach that delivers visibility, context, and actionable protection where the data lives. Microsoft Defender for Cloud provides organizations the visibility into all of your databases in a centralized Defender portal using its unique control and data plane findings so that security teams can identify misconfigurations. prioritize them based on cloud-context risk-based recommendations or proactively identify other attack scenarios using the attack path analysis while SOC analysts can investigate alerts and act quickly. Follow this story for part two. We’ll go into Defender for Cloud’s unique visibility into database resources to find misconfiguration gaps, sensitive information exposure, and contextual risks that may exist in your environment. Resources: Get started with Defender for Databases. Learn more about SQL vulnerability assessment. Learn more about Data Security Posture Management Learn more about Advanced Threat Protection Reviewers: YuriDiogenes, lisetteranga, talberdahImplementing the Backend-for-Frontend (BFF) / Curated API Pattern Using Azure API Management
Modern digital applications rarely serve a single type of client. Web portals, mobile apps, partner integrations, and internal tools often consume the same backend services—yet each has different performance, payload, and UX requirements. Exposing backend APIs directly to all clients frequently leads to over-fetching, chatty networks, and tight coupling between UI and backend domain models. This is where a Curated API or Backend for Frontend API design pattern becomes useful. What Is the Backend-for-Frontend (BFF) Pattern? The Backend-for-Frontend (BFF)—also known as the Curated API pattern—solves this problem by introducing a client-specific API layer that shapes, aggregates, and optimizes data specifically for the consuming experience. There is very good architectural guidance on this at Azure Architecture Center [Check out the 1st Link on Citation section] The BFF pattern introduces a dedicated backend layer for each frontend experience. Instead of exposing generic backend services directly, the BFF: Aggregates data from multiple backend services Filters and reshapes responses Optimizes payloads for a specific client Shields clients from backend complexity and change Each frontend (web, mobile, partner) can evolve independently, without forcing backend services to accommodate UI-specific concerns. Why Azure API Management Is a Natural Fit for BFF Azure API Management is commonly used as an API gateway, but its policy engine enables much more than routing and security. Using APIM policies, you can: Call multiple backend services (sequentially or in parallel) Transform request and response payloads to provide a unform experience Apply caching, rate limiting, authentication, and resiliency policies All of this can be achieved without modifying backend code, making APIM an excellent place to implement the BFF pattern. When Should You Use a Curated API in APIM? Using APIM as a BFF makes sense when: Frontend clients require optimized, experience-specific payloads Backend services must remain generic and reusable You want to reduce round trips from mobile or low-bandwidth clients You want to implement uniform polices for cross cutting concerns, authentication/authorization, caching, rate-limiting and logging, etc. You want to avoid building and operating a separate aggregation service You need strong governance, security, and observability at the API layer How the BFF Pattern Works in Azure API Management There is a Git Hub Repository [Check out the 2nd Link on Citation section] that provides a wealth of information and samples on how to create complex APIM policies. I recently contributed to this repository with a sample policy for Curated APIs [Check out the 3rd Link on Citation section] At a high level, the policy follows this flow: APIM receives a single client request APIM issues parallel calls to multiple backend services as shown below <wait for="all"> <send-request mode="copy" response-variable-name="operation1" timeout="{{bff-timeout}}" ignore-error="false"> <set-url>@("{{bff-baseurl}}/operation1?param1=" + context.Request.Url.Query.GetValueOrDefault("param1", "value1"))</set-url> </send-request> <send-request mode="copy" response-variable-name="operation2" timeout="{{bff-timeout}}" ignore-error="false"> <set-url>{{bff-baseurl}}/operation2</set-url> </send-request> <send-request mode="copy" response-variable-name="operation3" timeout="{{bff-timeout}}" ignore-error="false"> <set-url>{{bff-baseurl}}/operation3</set-url> </send-request> <send-request mode="copy" response-variable-name="operation4" timeout="{{bff-timeout}}" ignore-error="false"> <set-url>{{bff-baseurl}}/operation4</set-url> </send-request> </wait> Few things to consider The Wait policy allows us to make multiple requests using nested send-request policies. The for="all" attribute value implies that the policy execution will await all the nested send requests before moving to the next one. {{bff-baseurl}}: This example assumes a single base URL for all end points. It does not have to be. The calls can be made to any endpoint response-variable-name attribute sets a unique variable name to hold response object from each of the parallel calls. This will be used later in the policy to transform and produce the curated result. timeout attribute: This example assumes uniform timeouts for each endpoint, but it might vary as well. ignore-error: set this to true only when you are not concerned about the response from the backend (like a fire and forget request) otherwise keep it false so that the response variable captures the response with error code. Once responses from all the requests have been received (or timed out) the policy execution moves to the next policy Then the responses from all requests are collected and transformed into a single response data <!-- Collect the complete response in a variable. --> <set-variable name="finalResponseData" value="@{ JObject finalResponse = new JObject(); int finalStatus = 200; // This assumes the final success status (If all backend calls succeed) is 200 - OK, can be customized. string finalStatusReason = "OK"; void ParseBody(JObject element, string propertyName, IResponse response){ string body = ""; if(response!=null){ body = response.Body.As<string>(); try{ var jsonBody = JToken.Parse(body); element.Add(propertyName, jsonBody); } catch(Exception ex){ element.Add(propertyName, body); } } else{ element.Add(propertyName, body); //Add empty body if the response was not captured } } JObject PrepareResponse(string responseVariableName){ JObject responseElement = new JObject(); responseElement.Add("operation", responseVariableName); IResponse response = context.Variables.GetValueOrDefault<IResponse>(responseVariableName); if(response == null){ finalStatus = 207; // if any of the responses are null; the final status will be 207 finalStatusReason = "Multi Status"; ParseBody(responseElement, "error", response); return responseElement; } int status = response.StatusCode; responseElement.Add("status", status); if(status == 200){ // This assumes all the backend APIs return 200, if they return other success responses (e.g. 201) add them here ParseBody(responseElement, "body", response); } else{ // if any of the response codes are non success, the final status will be 207 finalStatus = 207; finalStatusReason = "Multi Status"; ParseBody(responseElement, "error", response); } return responseElement; } // Gather responses into JSON Array // Pass on the each of the response variable names here. JArray finalResponseBody = new JArray(); finalResponseBody.Add(PrepareResponse("operation1")); finalResponseBody.Add(PrepareResponse("operation2")); finalResponseBody.Add(PrepareResponse("operation3")); finalResponseBody.Add(PrepareResponse("operation4")); // Populate finalResponse with aggregated body and status information finalResponse.Add("body", finalResponseBody); finalResponse.Add("status", finalStatus); finalResponse.Add("reason", finalStatusReason); return finalResponse; }" /> What this code does is prepare the response into a single JSON Object. using the help of the PrepareResponse function. The JSON not only collects the response body from each response variable, but it also captures the response codes and determines the final response code based on the individual response codes. For the purpose of his example, I have assumed all operations are GET operations and if all operations return 200 then the overall response is 200-OK, otherwise it is 206 -Partial Content. This can be customized to the actual scenario as needed. Once the final response variable is ready, then construct and return a single response based on the above calculation <!-- This shows how to return the final response code and body. Other response elements (e.g. outbound headers) can be curated and added here the same way --> <return-response> <set-status code="@((int)((JObject)context.Variables["finalResponseData"]).SelectToken("status"))" reason="@(((JObject)context.Variables["finalResponseData"]).SelectToken("reason").ToString())" /> <set-body>@(((JObject)context.Variables["finalResponseData"]).SelectToken("body").ToString(Newtonsoft.Json.Formatting.None))</set-body> </return-response> This effectively turns APIM into an experience-specific backend tailored to frontend needs. When not to use APIM for BFF Implementation? While this approach works well when you want to curate a few responses together and apply a unified set of policies, there are some cases where you might want to rethink this approach When the need for transformation is complex. Maintaining a lot of code in APIM is not fun. If the response transformation requires a lot of code that needs to be unit tested and code that might change over time, it might be better to sand up a curation service. Azure Functions and Azure Container Apps are well suited for this. When each backend endpoint requires very complex request transformation, then that also increases the amount of code, then that would also indicate a need for an independent curation service. If you are not already using APIM then this does not warrant adding one to your architecture just to implement BFF. Conclusion Using APIM is one of the many approaches you can use to create a BFF layer on top of your existing endpoint. Let me know your thoughts con the comments on what you think of this approach. Citations Azure Architecture Center – Backend-for-Frontends Pattern Azure API Management Policy Snippets (GitHub) Curated APIs Policy Example (GitHub) Send-request Policy ReferenceMS designer is NOT WORKING!
It only makes 1 image not 4, it takes way too long time,. and it doesnt follow prompting,. and it makes imegs in 3:2 not 16:9... thsi has been goign on for over 2 weeks,... alsmot 3.. i have been in contact with support many times woith zero help.. HOW can you have a product in your office bunlde ant it not workign an zero supprot on it? I need teh application to make my videos,. im just abptu to cancel all subscritopons to micrposft an switch to apple..16Views0likes0CommentsOn-demand webinar: Maximize the Cost Efficiency of AI Agents on Azure
AI agents are quickly becoming central to how organizations automate work, engage customers, and unlock new insights. But as adoption accelerates, so do questions about cost, ROI, and long-term sustainability. That’s exactly what the Maximize the Cost Efficiency of AI Agents on Azure webinar is designed to address. The webinar will provide practical guidance on building and scaling AI agents on Azure with financial discipline in mind. Rather than focusing only on technology, the session helps learners connect AI design decisions to real business outcomes—covering everything from identifying high-impact use cases and understanding cost drivers to forecasting ROI. Whether you’re just starting your AI journey or expanding AI agents across the enterprise, the session will equip you with strategies to make informed, cost-conscious decisions at every stage—from architecture and model selection to ongoing optimization and governance. Who should attend? If you are in one of these roles and are a decision maker or can influence decision makers in AI decisions or need to show ROI metrics on AI, this session is for you. Developer Administrator Solution Architect AI Engineer Business Analyst Business User Technology Manager Why attending the webinar? In the webinar, you’ll hear how to translate theory into real-world scenarios, walk through common cost pitfalls, and show how organizations are applying these principles in practice. Most importantly, the webinar helps you connect the dots faster, turning what you’ve learned into actionable insights you can apply immediately, ask questions live, and gain clarity on how to maximize ROI while scaling AI responsibly. If you care about building AI agents that are not only innovative but also efficient, governable, and financially sustainable, this training—and this webinar that complements it—are well worth your time. Missed it? Watch it on-demand Who will speak at the webinar? Your speakers will be: Carlotta Castelluccio: Carlotta is a Senior AI Advocate with the mission of helping every developer to succeed with AI, by building innovative solutions responsibly. To achieve this goal, she develops technical content, and she hosts skilling sessions, enabling her audience to take the most out of AI technologies and to have an impact on Microsoft AI products’ roadmap. Nitya Narasimhan: Nitya is a PhD and Polyglot with 25+ years of software research & development experience spanning mobile, web, cloud and AI. She is an innovator (12+ patents), a visual storyteller (@sketchtedocs), and an experienced community builder in the Greater New York area. As a senior AI Advocate on the Core AI Developer Relations team, she acts as "developer 0" for the Microsoft Foundry platform, providing product feedback and empowering AI developers to build trustworthy AI solutions with code samples, open-source curricula and content-initiatives like Model Mondays. Prior to joining Microsoft, she spent a decade in Motorola Labs working on ubiquitous & mobile computing research, founded Google Developer Groups in New York, and consulted for startups building real-time experiences for enterprise. Her current interests span Model understanding & customization, E2E Observability & Safety, and agentic AI workflows for maintainable software. Moderator Lee Stott is a Principal Cloud Advocate at Microsoft, working in the Core AI Developer Relations Team. He helps developers and organizations build responsibly with AI and cloud technologies through open-source projects, technical guidance, and global developer programs. Based in the UK, Lee brings deep hands-on experience across AI, Azure, and developer tooling. Useful resources Microsoft Learn Training Path: https://aka.ms/maximize-cost-efficiency-ai-agents-training Session Deck: https://aka.ms/maximize-cost-efficiency-ai-agents-deckMCP vs mcp-cli: Dynamic Tool Discovery for Token-Efficient AI Agents
Introduction The AI agent ecosystem is evolving rapidly, and with it comes a scaling challenge that many developers are hitting context window bloat. When building systems that integrate with multiple MCP (Model Context Protocol) servers, you're forced to load all tool definitions upfront—consuming thousands of tokens just to describe what tools could be available. mcp-cli: a lightweight tool that changes how we interact with MCP servers. But before diving into mcp-cli, it's essential to understand the foundational protocol itself, the design trade-offs between static and dynamic approaches, and how they differ fundamentally. Part 1: Understanding MCP (Model Context Protocol) What is MCP? The Model Context Protocol (MCP) is an open standard for connecting AI agents and applications to external tools, APIs, and data sources. Think of it as a universal interface that allows: AI Agents (Claude, Gemini, etc.) to discover and call tools Tool Providers to expose capabilities in a standardized way Seamless Integration between diverse systems without custom adapters New to MCP see https://aka.ms/mcp-for-beginners How MCP Works MCP operates on a simple premise: define tools with clear schemas and let clients discover and invoke them. Basic MCP Flow: Tool Provider (MCP Server) ↓ [Tool Definitions + Schemas] ↓ AI Agent / Client ↓ [Discover Tools] → [Invoke Tools] → [Get Results] Example: A GitHub MCP server exposes tools like: search_repositories - Search GitHub repos create_issue - Create a GitHub issue list_pull_requests - List open PRs Each tool comes with a JSON schema describing its parameters, types, and requirements. The Static Integration Problem Traditionally, MCP integration works like this: Startup: Load ALL tool definitions from all servers Context Window: Send every tool schema to the AI model Invocation: Model chooses which tool to call Execution: Tool is invoked and result returned The Problem: When you have multiple MCP servers, the token cost becomes substantial: Scenario Token Count 6 MCP Servers, 60 tools (static loading) ~47,000 tokens After dynamic discovery ~400 tokens Token Reduction 99% 🚀 For a production system with 10+ servers exposing 100+ tools, you're burning through thousands of tokens just describing capabilities, leaving less context for actual reasoning and problem-solving. Key Issues: ❌ Reduced effective context length for actual work ❌ More frequent context compactions ❌ Hard limits on simultaneous MCP servers ❌ Higher API costs Part 2: Enter mcp-cli – Dynamic Context Discovery What is mcp-cli? mcp-cli is a lightweight CLI tool (written in Bun, compiled to a single binary) that implements dynamic context discovery for MCP servers. Instead of loading everything upfront, it pulls in information only when needed. Static vs. Dynamic: The Paradigm Shift Traditional MCP (Static Context): AI Agent Says: "Load all tool definitions from all servers" ↓ Context Window Bloat ❌ ↓ Limited space for reasoning mcp-cli (Dynamic Discovery): AI Agent Says: "What servers exist?" ↓ mcp-cli responds AI Agent Says: "What are the params for tool X?" ↓ mcp-cli responds AI Agent Says: "Execute tool X" mcp-cli executes and responds Result: You only pay for information you actually use. ✅ Core Capabilities mcp-cli provides three primary commands: 1. Discover - What servers and tools exist? mcp-cli Lists all configured MCP servers and their tools. 2. Inspect - What does a specific tool do? mcp-cli info <server> <tool> Returns the full JSON schema for a tool (parameters, descriptions, types). 3. Execute - Run a tool mcp-cli call <server> <tool> '{"arg": "value"}' Executes the tool and returns results. Key Features of mcp-cli Feature Benefit Stdio & HTTP Support Works with both local and remote MCP servers Connection Pooling Lazy-spawn daemon avoids repeated startup overhead Tool Filtering Control which tools are available via allowedTools/disabledTools Glob Searching Find tools matching patterns: mcp-cli grep "*mail*" AI Agent Ready Designed for use in system instructions and agent skills Lightweight Single binary, minimal dependencies Part 3: Detailed Comparison Table Aspect Traditional MCP mcp-cli Protocol HTTP/REST or Stdio Stdio/HTTP (via CLI) Context Loading Static (upfront) Dynamic (on-demand) Tool Discovery All at once Lazy enumeration Schema Inspection Pre-loaded On-request Token Usage High (~47k for 60 tools) Low (~400 for 60 tools) Best For Direct server integration AI agent tool use Implementation Server-side focus CLI-side focus Complexity Medium Low (CLI handles it) Startup Time One call Multiple calls (optimized) Scaling Limited by context Unlimited (pay per use) Integration Custom implementation Pre-built mcp-cli Part 4: When to Use Each Approach Use Traditional MCP (HTTP Endpoints) when: ✅ Building a direct server integration ✅ You have few tools (< 10) and don't care about context waste ✅ You need full control over HTTP requests/responses ✅ You're building a specialized integration (not AI agents) ✅ Real-time synchronous calls are required Use mcp-cli when: ✅ Integrating with AI agents (Claude, Gemini, etc.) ✅ You have multiple MCP servers (> 2-3) ✅ Token efficiency is critical ✅ You want a standardized, battle-tested tool ✅ You prefer CLI-based automation ✅ Connection pooling and lazy loading are beneficial ✅ You're building agent skills or system instructions Conclusion MCP (Model Context Protocol) defines the standard for tool sharing and discovery. mcp-cli is the practical tool that makes MCP efficient for AI agents by implementing dynamic context discovery. The fundamental difference: MCP mcp-cli What The protocol standard The CLI tool Where Both server and client Client-side CLI Problem Solved Tool standardization Context bloat Architecture Protocol Implementation Think of it this way: MCP is the language, mcp-cli is the interpreter that speaks fluently. For AI agent systems, dynamic discovery via mcp-cli is becoming the standard. For direct integrations, traditional MCP HTTP endpoints work fine. The choice depends on your use case, but increasingly, the industry is trending toward mcp-cli for its efficiency and scalability. Resources MCP Specification mcp-cli GitHub New to MCP see https://aka.ms/mcp-for-beginners Practical demo: AnveshMS/mcp-cli-example