observability
37 TopicsBuilding Production-Ready, Secure, Observable, AI Agents with Real-Time Voice with Microsoft Foundry
We're excited to announce the general availability of Foundry Agent Service, Observability in Foundry Control Plane, and the Microsoft Foundry portal — plus Voice Live integration with Agent Service in public preview — giving teams a production-ready platform to build, deploy, and operate intelligent AI agents with enterprise-grade security and observability.6.6KViews2likes0CommentsGenerally Available: Evaluations, Monitoring, and Tracing in Microsoft Foundry
If you've shipped an AI agent to production, you've likely run into the same uncomfortable realization: the hard part isn't getting the agent to work - it's keeping it working. Models get updated, prompts get tweaked, retrieval pipelines drift, and user traffic surfaces edge cases that never appeared in your eval suite. Quality isn't something you establish once. It's something you have to continuously measure. Today, we're making that continuous measurement a first-class operational capability. Evaluations, Monitoring, and Tracing in Microsoft Foundry are now generally available through Foundry Control Plane. These aren't standalone tools bolted onto the side of the platform - they're deeply integrated with Azure Monitor, which means AI agent observability now lives in the same operational plane as the rest of your infrastructure. The Problem With Point-in-Time Evaluation Most evaluation workflows are designed around a pre-deployment gate. You build a test dataset, run your evals, review the scores, and ship. That approach has real value - but it has a hard ceiling. In production, agent behavior is a function of many things that change independently of your code: Foundation model updates ship continuously and can shift output style, reasoning patterns, and edge case handling in ways that don't always surface on your benchmark set. Prompt changes can have nonlinear effects downstream, especially in multi-step agentic flows. Retrieval pipeline drift changes what context your agent actually sees at inference time. A document index fresh last month may have stale or subtly different content today. Real-world traffic distribution is never exactly what you sampled for your test set. Production surfaces long-tail inputs that feel obvious in hindsight but were invisible during development. The implication is straightforward: evaluation has to be continuous, not episodic. You need quality signals at development time, at every CI/CD commit, and continuously against live production traffic - all using the same evaluator definitions so results are comparable across environments. That's the core design principle behind Foundry Observability. Continuous Evaluation Across the Full AI Lifecycle Built-In Evaluators Foundry's built-in evaluators cover the most critical quality and safety dimensions for production agent systems: Coherence and Relevance measure whether responses are internally consistent and on-topic relative to the input. These are table-stakes signals for any conversational or task-completion agent. Groundedness is particularly important for RAG-based architectures. It measures whether the model's output is actually supported by the retrieved context - as opposed to plausible-sounding content the model generated from its parametric memory. Groundedness failures are a leading indicator of hallucination risk in production, and they're often invisible to human reviewers at scale. Retrieval Quality evaluates the retrieval step independently from generation. Groundedness failures can originate in two places: the model may be ignoring good context, or the retrieval pipeline may not be surfacing relevant context in the first place. Splitting these signals makes it much easier to pinpoint root cause. Safety and Policy Alignment evaluates whether outputs meet your deployment's policy requirements - content safety, topic restrictions, response format compliance, and similar constraints. These evaluators are designed to run at every stage of the AI lifecycle: Local development - run evals inline as you iterate on prompts, retrieval config, or orchestration logic CI/CD pipelines - gate every commit against your quality baselines; catch regressions before they reach production Production traffic monitoring - continuously evaluate sampled live traffic and surface trends over time Because the evaluators are identical across all three contexts, a score in CI means the same thing as a score in production monitoring. See the Practical Guide to Evaluations and the Built-in Evaluators Reference for a deeper walkthrough. Custom Evaluators - Encoding Your Own Definition of Quality Built-in evaluators cover common signals well, but production agents often need to satisfy criteria specific to a domain, regulatory environment, or internal standard. Foundry supports two types of custom evaluators (currently in public preview): LLM-as-a-Judge evaluators let you configure a prompt and grading rubric, then use a language model to apply that rubric to your agent's outputs. This is the right approach for quality dimensions that require reasoning or contextual judgment - whether a response appropriately acknowledges uncertainty, whether a customer-facing message matches your brand tone, or whether a clinical summary meets documentation standards. You write a judge prompt with a scoring scale (e.g., 1–5 with criteria for each level) that evaluates a given {input} / {response} pair. Foundry runs this at scale and aggregates scores into your dashboards alongside built-in results. Code-based evaluators are Python functions that implement any evaluation logic you can express programmatically - regex matching, schema validation, business rule checks, compliance assertions, or calls to external systems. If your organization has documented policies about what a valid agent response looks like, you can encode those policies directly into your evaluation pipeline. Custom and built-in evaluators compose naturally - running against the same traffic, producing results in the same schema, feeding into the same dashboards and alert rules. Monitoring and Alerting - AI Quality as an Operational Signal All observability data produced by Foundry - evaluation results, traces, latency, token usage, and quality metrics - is published directly to Azure Monitor. This is where the integration pays off for teams already on Azure. What this enables that siloed AI monitoring tools can't: Cross-stack correlation. When your groundedness score drops, is it a model update, a retrieval pipeline issue, or an infrastructure problem affecting latency? With AI quality signals and infrastructure telemetry in the same Azure Monitor Application Insights workspace, you can answer that in minutes rather than hours of manual correlation across disconnected systems. Unified alerting. Configure Azure Monitor alert rules on any evaluation metric - trigger a PagerDuty incident when groundedness drops below threshold, send a Teams notification when safety violations spike, or create automated runbook responses when retrieval quality degrades. These are the same alert mechanisms your SRE team already uses. Enterprise governance by default. Azure Monitor's RBAC, retention policies, diagnostic settings, and audit logging apply automatically to all AI observability data. You inherit the governance framework your organization has already built and approved. Grafana and existing dashboards. If your team uses Azure Managed Grafana, evaluation metrics can flow into existing dashboards alongside your other operational metrics - a single pane of glass for application health, infrastructure performance, and AI agent quality. The Agent Monitoring Dashboard in the Foundry portal provides an AI-native view out of the box - evaluation metric trends, safety threshold status, quality score distributions, and latency breakdowns. Everything in that dashboard is backed by Azure Monitor data, so SRE teams can always drill deeper. End-to-End Tracing: From Quality Signal to Root Cause A groundedness score tells you something is wrong. A trace tells you exactly where the failure occurred and what the agent actually did. Foundry provides OpenTelemetry-based distributed tracing that follows each request through your entire agent system: model calls, tool invocations, retrieval steps, orchestration logic, and cross-agent handoffs. Traces capture the full execution path - inputs, outputs, latency at each step, tool call parameters and responses, and token usage. The key design decision: evaluation results are linked directly to traces. When you see a low groundedness score in your monitoring dashboard, you navigate directly to the specific trace that produced it - no manual timestamp correlation, no separate trace ID lookup. The connection is made automatically. Foundry auto-collects traces across the frameworks your agents are likely already built on: Microsoft Agent Framework Semantic Kernel LangChain and LangGraph OpenAI Agents SDK For custom or less common orchestration frameworks, the Azure Monitor OpenTelemetry Distro provides an instrumentation path. Microsoft is also contributing upstream to the OpenTelemetry project - working with Cisco Outshift, we've contributed semantic conventions for multi-agent trace correlation, standardizing how agent identity, task context, and cross-agent handoffs are represented in OTel spans. Note: Tracing is currently in public preview, with GA shipping by end of March. Prompt Optimizer (Public Preview) One persistent friction point in agent development is the iteration loop between writing prompts and measuring their effect. You make a change, run your evals, look at the delta, try to infer what about the change mattered, and repeat. Prompt Optimizer tightens this loop. It analyzes your existing prompt and applies structured prompt engineering techniques - clarifying ambiguous instructions, improving formatting for model comprehension, restructuring few-shot examples, making implicit constraints explicit - with paragraph-level explanations for every change it makes. The transparency is deliberate. Rather than producing a black-box "optimized" prompt, it shows you exactly what it changed and why. You can add constraints, trigger another optimization pass, and iterate until satisfied. When you're done, apply it with one click. The value compounds alongside continuous evaluation: run your eval suite against the current prompt, optimize, run evals again, see the measured improvement. That feedback loop - optimize, measure, optimize - is the closest thing to a systematic approach to prompt engineering that currently exists. What Makes our Approach to Observability Different There are other evaluation and observability tools in the AI ecosystem. The differentiation in Foundry's approach comes down to specific architectural choices: Unified lifecycle coverage, not just pre-deployment testing. Most existing evaluation tools are designed for offline, pre-deployment use. Foundry's evaluators run in the same form at development time, in CI/CD, and against live production traffic. Your quality metrics are actually comparable across the lifecycle - you can tell whether production quality matches what you saw in testing, rather than operating two separate measurement systems that can't be compared. No separate observability silo. Publishing all observability data to Azure Monitor means you don't operate a separate system for AI quality alongside your existing infrastructure monitoring. AI incidents route through your existing on-call rotations. AI quality data is subject to the same retention and compliance controls as the rest of your telemetry. Framework-agnostic tracing. Auto-instrumentation across Semantic Kernel, LangChain, LangGraph, and the OpenAI Agents SDK means you're not locked into a specific orchestration framework. The OpenTelemetry foundation means trace data is portable to any compatible backend, protecting your investment as the tooling landscape evolves. Composable evaluators. Built-in and custom evaluators run in the same pipeline, against the same traffic, producing results in the same schema, feeding into the same dashboards and alert rules. You don't choose between generic coverage and domain-specific precision - you get both. Evaluation linked to traces. Most systems treat evaluation and tracing as separate concerns. Foundry treats them as two views of the same event - closing the loop between detecting a quality problem and diagnosing it. Getting Started If you're building agents on Microsoft Foundry, or using Semantic Kernel, LangChain, LangGraph, or the OpenAI Agents SDK and want to add production observability, the entry point is Foundry Control Plane. Try it You'll need a Foundry project with an agent and an Azure OpenAI deployment. Enable observability by navigating to Foundry Control Plane and connecting your Azure Monitor workspace. Then walk through the Practical Guide to Evaluations, explore the Built-in Evaluators Reference, and set up end-to-end tracing for your agents.3.1KViews1like0CommentsRetina 1.0 Is Now Available
We are excited to announce the first major release of Retina - a significant milestone for the project. This version brings along many new features, enhancements and bug fixes. The Retina maintainer team would like to thank all contributors, community members, and early adopters who helped make this 1.0 release possible. What is Retina? Retina is an open-source, Kubernetes network observability platform. It enables you to continuously observe and measure network health, and investigate network issues on-demand with integrated Kubernetes-native workflows. Why Retina? Kubernetes networking failures are rarely isolated or easy to reproduce. Pods are ephemeral, services span multiple nodes, and network traffic crosses multiple layers (CNI, kube-proxy, node networking, policies), making crucial evidence difficult to capture. Manually connecting to nodes and stitching together logs or packet captures simply does not scale as clusters grow in size and complexity. A modern approach to observability must automate and centralize data collection while exposing rich, actionable insights. Retina represents a major step forward in solving the complexities of Kubernetes observability by leveraging the power of eBPF. Its cloud-agnostic design, deep integration with Hubble, and support for both real-time metrics and on-demand packet captures make it an invaluable tool for DevOps, SecOps, and compliance teams across diverse environments. What Does It Do? Retina can collect two types of telemetry: metrics and packet captures. The Retina shell enables ad-hoc troubleshooting via pre-installed networking tools. Metrics Metrics provide continuous observability. They can be exported to multiple storage options such as Prometheus or Azure Monitor, and visualized in a variety of ways, including Grafana or Azure Log Analytics. Retina supports two control planes: Hubble and Standard. Both are supported regardless of the underlying CNI. The choice of control plane affects the metrics which are collected. Hubble metrics Standard metrics You can customize which metrics are collected by enabling/disabling their corresponding plugins. Some examples of metrics may include: Incoming/outcoming traffic Dropped packets TCP/UDP DNS API Server latency Node/interface statistics Packet Captures Captures provide on-demand observability. They allow users to perform distributed packet captures across the cluster, based on specified Nodes/Pods and other supported filters. They can be triggered via the CLI or through the capture CRD, and may be output to persistent storage options such as the host filesystem, a PVC, or a storage blob. The result of the capture contains more than just a .pcap file. Retina also captures a number of networking metadata such as iptables rules, socket statistics, kernel network information from /proc/net, and more. Shell The Retina shell enables deep ad-hoc troubleshooting by providing a suite of networking tools. The CLI command starts an interactive shell on a Kubernetes node that runs a container image which includes standard tools such as ping or curl, as well as specialized tools like bpftool, pwru, Inspektor Gadget and more. The Retina shell is currently only available on Linux. Note that some tools require particular capabilities to execute. These can be passed as parameters through the CLI. Use Cases Debugging Pod Connectivity Issues: When services can’t communicate, Retina enables rapid, automated distributed packet capture and drop metrics, drastically reducing troubleshooting time. The Retina shell also brings specialized tools for deep manual investigations. Continuous Monitoring of Network Health: Operators can set up alerts and dashboards for DNS failures, API server latency, or packet drops, gaining ongoing visibility into cluster networking. Security Auditing and Compliance: Flow logs (in Hubble mode) and metrics support security investigations and compliance reporting, enabling quick identification of unexpected connections or data transfers. Multi-Cluster / Multi-Cloud Visibility: Retina standardizes network observability across clouds, supporting unified dashboards and processes for SRE teams. Where Does It Run? Retina is designed for broad compatibility across Kubernetes distributions, cloud providers, and operating systems. There are no Azure-specific dependencies - Retina runs anywhere Kubernetes does. Operating Systems: Both Linux and Windows nodes are supported. Kubernetes Distributions: Retina is distribution-agnostic, deployable on managed services (AKS, EKS, GKE) or self-managed clusters. CNI / Network Stack: Retina works with any CNI, focusing on kernel-level events rather than CNI-specific logs. Cloud Integration: Retina exports metrics to Azure Monitor and Log Analytics, with pre-built Grafana dashboards for AKS. Integration with AWS CloudWatch or GCP Stackdriver is possible via Prometheus. Observability Stacks: Retina integrates with Prometheus & Grafana, Cilium Hubble (for flow logs and UI), and can be extended to other exporters. Design Overview Retina’s architecture consists of two layers: a data collection layer in the kernel-space, and processing layer that converts low-level signals into Kubernetes-aware telemetry in the user-space. When Retina is installed, each node in the cluster runs a Retina agent which collects raw network telemetry from the host kernel - backed by eBPF on Linux, and HNS/VFP on Windows. The agent processes the raw network data and enriches it with Kubernetes metadata, which is then exported for consumption by monitoring tools such as Prometheus, Grafana, or Hubble UI. Modularity and extensibility are central to the design philosophy. Retina's plugin model lets you enable only the telemetry you need, and add new sources by implementing a common plugin interface. Built-in plugins include Drop Reason, DNS, Packet Forward, and more. Check out our architecture docs for a deeper dive into Retina's design. Get Started Thanks to Helm charts deploying Retina is streamlined across all environments, and can be done with one configurable command. For complete documentation, visit our installation docs. To install Retina with the Standard control plane and Basic metrics mode: VERSION=$( curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name) helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina \ --version $VERSION \ --namespace kube-system \ --set image.tag=$VERSION \ --set operator.tag=$VERSION \ --set logLevel=info \ --set operator.enabled=true \ --set enabledPlugin_linux="\[dropreason\,packetforward\,linuxutil\,dns\]" Once Retina is running in your cluster, you can then configure Prometheus and Grafana to scrape and visualize your metrics. Install the Retina CLI with Krew: kubectl krew install retina Get Involved Retina is open-source under the MIT License and welcomes community contributions. Since its announcement in early 2024, the project has gained significant traction, with contributors from multiple organizations helping to expand its capabilities. The project is hosted on GitHub · microsoft/retina and documentation is available at retina.sh. If you would like to contribute to Retina you can follow our contributor guide. What's Next? Retina 1.1 of course! We are also discussing the future roadmap, and exploring the possibility of moving the project to community ownership. Stay tuned! In the meantime, we welcome you to raise an issue if you find any bugs, or start a discussion if you have any questions or suggestions. You can also reach out to the Retina team via email, we would love to hear from you! References Retina Deep Dive into Retina Open-Source Kubernetes Network Observability Troubleshooting Network Issues with Retina Retina: Bridging Kubernetes Observability and eBPF Across the Clouds694Views0likes0CommentsProject Pavilion Presence at KubeCon NA 2025
KubeCon + CloudNativeCon NA took place in Atlanta, Georgia, from 10-13 November, and continued to highlight the ongoing growth of the open source, cloud-native community. Microsoft participated throughout the event and supported several open source projects in the Project Pavilion. Microsoft’s involvement reflected our commitment to upstream collaboration, open governance, and enabling developers to build secure, scalable and portable applications across the ecosystem. The Project Pavilion serves as a dedicated, vendor-neutral space on the KubeCon show floor reserved for CNCF projects. Unlike the corporate booths, it focuses entirely on open source collaboration. It brings maintainers and contributors together with end users for hands-on demos, technical discussions, and roadmap insights. This space helps attendees discover emerging technologies and understand how different projects fit into the cloud-native ecosystem. It plays a critical role for idea exchanges, resolving challenges and strengthening collaboration across CNCF approved technologies. Why Our Presence Matters KubeCon NA remains one of the most influential gatherings for developers and organizations shaping the future of cloud-native computing. For Microsoft, participating in the Project Pavilion helps advance our goals of: Open governance and community-driven innovation Scaling vital cloud-native technologies Secure and sustainable operations Learning from practitioners and adopters Enabling developers across clouds and platforms Many of Microsoft’s products and cloud services are built on or aligned with CNCF and open-source technologies. Being active within these communities ensures that we are contributing back to the ecosystem we depend on and designing by collaborating with the community, not just for it. Microsoft-Supported Pavilion Projects containerd Representative: Wei Fu The containerd team engaged with project maintainers and ecosystem partners to explore solutions for improving AI model workflows. A key focus was the challenge of handling large OCI artifacts (often 500+ GiB) used in AI training workloads. Current image-pulling flows require containerd to fetch and fully unpack blobs, which significantly delays pod startup for large models. Collaborators from Docker, NTT, and ModelPack discussed a non-unpacking workflow that would allow training workloads to consume model data directly. The team plans to prototype this behavior as an experimental feature in containerd. Additional discussions included updates related to nerdbox and next steps for the erofs snapshotter. Copacetic Representative: Joshua Duffney The Copa booth attracted roughly 75 attendees, with strong representation from federal agencies and financial institutions, a sign of growing adoption in regulated industries. A lightning talk delivered at the conference significantly boosted traffic and engagement. Key feedback and insights included: High interest in customizable package update sources Demand for application-level patching beyond OS-level updates Need for clearer CI/CD integration patterns Expectations around in-cluster image patching Questions about runtime support, including Podman The conversations revealed several documentation gaps and feature opportunities that will inform Copa’s roadmap and future enablement efforts. Drasi Representative: Nandita Valsan KubeCon NA 2025 marked Drasi’s first in-person presence since its launch in October 2024 and its entry into the CNCF Sandbox in early 2025. With multiple kiosk slots, the team interacted with ~70 visitors across shifts. Engagement highlights included: New community members joining the Drasi Discord and starring GitHub repositories Meaningful discussions with observability and incident management vendors interested in change-driven architectures Positive reception to Aman Singh’s conference talk, which led attendees back to the booth for deeper technical conversations Post-event follow-ups are underway with several sponsors and partners to explore collaboration opportunities. Flatcar Container Linux Representatives: Sudhanva Huruli and Vamsi Kavuru The Flatcar project had some fantastic conversations at the pavilion. Attendees were eager to learn about bare metal provisioning, GPU support for AI workloads, and how Flatcar’s fully automated build and test process keeps things simple and developer friendly. Questions around Talos vs. Flatcar and CoreOS sparked lively discussions, with the team emphasizing Flatcar’s usability and independence from an OS-level API. Interest came from government agencies and financial institutions, and the preview of Flatcar on AKS opened the door to deeper conversations about real-world adoption. The Project Pavilion proved to be the perfect venue for authentic, technical exchanges. Flux Representatives: Dipti Pai The Flux booth was active throughout all three days of the Project Pavilion, where Microsoft joined other maintainers to highlight new capabilities in Flux 2.7, including improved multi-tenancy, enhanced observability, and streamlined cloud-native integrations. Visitors shared real-world GitOps experiences, both successes and challenges, which provided valuable insights for the project’s ongoing development. Microsoft’s involvement reinforced strong collaboration within the Flux community and continued commitment to advancing GitOps practices. Headlamp Representatives: Joaquim Rocha, Will Case, and Oleksandr Dubenko Headlamp had a booth for all three days of the conference, engaging with both longstanding users and first-time attendees. The increased visibility from becoming a Kubernetes sub-project was evident, with many attendees sharing their usage patterns across large tech organizations and smaller industrial teams. The booth enabled maintainers to: Gather insights into how teams use Headlamp in different environments Introduce Headlamp to new users discovering it via talks or hallway conversations Build stronger connections with the community and understand evolving needs Inspektor Gadget Representatives: Jose Blanquicet and Mauricio Vásquez Bernal Hosting a half-day kiosk session, Inspektor Gadget welcomed approximately 25 visitors. Attendees included newcomers interested in learning the basics and existing users looking for updates. The team showcased new capabilities, including the tcpdump gadget and Prometheus metrics export, and invited visitors to the upcoming contribfest to encourage participation. Istio Representatives: Keith Mattix, Jackie Maertens, Steven Jin Xuan, Niranjan Shankar, and Mike Morris The Istio booth continued to attract a mix of experienced adopters and newcomers seeking guidance. Technical discussions focused on: Enhancements to multicluster support in ambient mode Migration paths from sidecars to ambient Improvements in Gateway API availability and usage Performance and operational benefits for large-scale deployments Users, including several Azure customers, expressed appreciation for Microsoft’s sustained investment in Istio as part of their service mesh infrastructure. Notary Project Representative: Feynman Zhou and Toddy Mladenov The Notary Project booth saw significant interest from practitioners concerned with software supply chain security. Attendees discussed signing, verification workflows, and integrations with Azure services and Kubernetes clusters. The conversations will influence upcoming improvements across Notary Project and Ratify, reinforcing Microsoft’s commitment to secure artifacts and verifiable software distribution. Open Policy Agent (OPA) - Gatekeeper Representative: Jaydip Gabani The OPA/Gatekeeper booth enabled maintainers to connect with both new and existing users to explore use cases around policy enforcement, Rego/CEL authoring, and managing large policy sets. Many conversations surfaced opportunities around simplifying best practices and reducing management complexity. The team also promoted participation in an ongoing Gatekeeper/OPA survey to guide future improvements. ORAS Representative: Feynman Zhou and Toddy Mladenov ORAS engaged developers interested in OCI artifacts beyond container images which includes AI/ML models, metadata, backups, and multi-cloud artifact workflows. Attendees appreciated ORAS’s ecosystem integrations and found the booth examples useful for understanding how artifacts are tagged, packaged, and distributed. Many users shared how they leverage ORAS with Azure Container Registry and other OCI-compatible registries. Radius Representative: Zach Casper The Radius booth attracted the attention of platform engineers looking for ways to simplify their developer's experience while being able to enforce enterprise-grade infrastructure and security best practices. Attendees saw demos on deploying a database to Kubernetes and using managed databases from AWS and Azure without modifying the application deployment logic. They also saw a preview of Radius integration with GitHub Copilot enabling AI coding agents to autonomously deploy and test applications in the cloud. Conclusion KubeCon + CloudNativeCon North America 2025 reinforced the essential role of open source communities in driving innovation across cloud native technologies. Through the Project Pavilion, Microsoft teams were able to exchange knowledge with other maintainers, gather user feedback, and support projects that form foundational components of modern cloud infrastructure. Microsoft remains committed to building alongside the community and strengthening the ecosystem that powers so much of today’s cloud-native development. For anyone interested in exploring or contributing to these open source efforts, please reach out directly to each project’s community to get involved, or contact Lexi Nadolski at lexinadolski@microsoft.com for more information.254Views1like0CommentsBeyond the Chat Window: How Change-Driven Architecture Enables Ambient AI Agents
AI agents are everywhere now. Powering chat interfaces, answering questions, helping with code. We've gotten remarkably good at this conversational paradigm. But while the world has been focused on chat experiences, something new is quietly emerging: ambient agents. These aren't replacements for chat, they're an entirely new category of AI system that operates in the background, sensing, processing, and responding to the world in real time. And here's the thing, this is a new frontier. The infrastructure we need to build these systems barely exists yet. Or at least, it didn't until now. Two Worlds: Conversational and Ambient Let me paint you a picture of the conversational AI paradigm we know well. You open a chat window. You type a question. You wait. The AI responds. Rinse and repeat. It's the digital equivalent of having a brilliant assistant sitting at a desk, ready to help when you tap them on the shoulder. Now imagine a completely different kind of assistant. One that watches for important changes, anticipates needs, and springs into action without being asked. That's the promise of ambient agents. AI systems that, as LangChain puts it: "listen to an event stream and act on it accordingly, potentially acting on multiple events at a time." This isn't an evolution of chat; it's a fundamentally different interaction paradigm. Both have their place. Chat is great for collaboration and back-and-forth reasoning. Ambient agents excel at continuous monitoring and autonomous response. Instead of human-initiated conversations, ambient agents operate through detecting changes in upstream systems and maintaining context across time without constant prompting. The use cases are compelling and distinct from chat. Imagine a project management assistant that operates in two modes: you can chat with it to ask, "summarize project status", but it also runs in the background, constantly monitoring new tickets that are created, or deployment pipelines that fail, automatically reassigning tasks. Or consider a DevOps agent that you can query conversationally ("what's our current CPU usage?") but also monitors your infrastructure continuously, detecting anomalies and starting remediation before you even know there's a problem. The Challenge: Real-Time Change Detection Here's where building ambient agents gets tricky. While chat-based agents work perfectly within the request-response paradigm, ambient agents need something entirely different: continuous monitoring and real-time change detection. How do you efficiently detect changes across multiple data sources? How do you avoid the performance nightmare of constant polling? How do you ensure your agent reacts instantly when something critical happens? Developers trying to build ambient agents hit the same wall: creating a reliable, scalable change detection system is hard. You either end up with: Polling hell: Constantly querying databases, burning through resources, and still missing changes between polls Legacy system rewrites: Massive expensive multi-year projects to re-write legacy systems so that they produce domain events Webhook spaghetti: Managing dozens of event sources, each with different formats and reliability guarantees This is where the story takes an interesting turn. Enter Drasi: The Change Detection Engine You Didn't Know You Needed Drasi is not another AI framework. Instead, it solves the problem that ambient agents need solved: intelligent change detection. Think of it as the sensory system for your AI agents, the infrastructure that lets them perceive changes in the world. Drasi is built around three simple components: Sources: Connectivity to the systems that Drasi can observe as sources of change (PostgreSQL, MySQL, Cosmos DB, Kubernetes, EventHub) Continuous Queries: Graph-based queries (using Cypher/GQL) that monitor for specific change patterns Reactions: What happens when a continuous query detects changes, or lack thereof But here's the killer feature: Drasi doesn't just detect that something changed. It understands what changed and why it matters, and even if something should have changed but did not. Using continuous queries, you can define complex conditions that your agents care about, and Drasi handles all the plumbing to deliver those insights in real time. The Bridge: langchain-drasi Integration Now, detecting changes is only part of the challenge. You need to connect those changes to your AI agents in a way that makes sense. That's where langchain-drasi comes in, a purpose-built integration that bridges Drasi's change detection with LangChain's agent frameworks. It achieves this by leveraging the Drasi MCP Reaction, which exposes Drasi continuous queries as MCP resources. The integration provides a simple Tool that agents can use to: Discover available queries automatically Read current query results on demand Subscribe to real-time updates that flow directly into agent memory and workflow Here's what this looks like in practice: from langchain_drasi import create_drasi_tool, MCPConnectionConfig # Configure connection to Drasi MCP server mcp_config = MCPConnectionConfig(server_url="http://localhost:8083") # Create the tool with notification handlers drasi_tool = create_drasi_tool( mcp_config=mcp_config, notification_handlers=[buffer_handler, console_handler] ) # Now your agent can discover and subscribe to data changes # No more polling, no more webhooks, just reactive intelligence The beauty is in the notification handlers: pre-built components that determine how changes flow into your agent's consciousness: BufferHandler: Queues changes for sequential processing LangGraphMemoryHandler: Automatically integrates changes into agent checkpoints LoggingHandler: Integrates with standard logging infrastructure This isn't just plumbing; it's the foundation for what we might call "change-driven architecture" for AI systems. Example: The Seeker Agent Has Entered the Chat Let's make this concrete with my favorite example from the langchain-drasi repository: a hide and seek inspired non-player character (NPC) AI agent that seeks human players in a multi-player game environment. The Scenario Imagine a game where players move around a 2D map, updating their positions in a PostgreSQL database. But here's the twist: the NPC agent doesn't have omniscient vision. It can only detect players under specific conditions: Stationary targets: When a player doesn't move for more than 3 seconds (they're exposed) Frantic movement: When a player moves more than once in less than a second (panicking reveals your position) This creates interesting strategic gameplay, players must balance staying still (safe from detection but vulnerable if found) with moving carefully (one move per second is the sweet spot). The NPC agent seeks based on these glimpses of player activity. These detection rules are defined as Drasi continuous queries that monitor the player positions table. For reference, these are the two continuous queries we will use: When a player doesn't move for more than 3 seconds, this is a great example of detecting the absence of change use the trueLater function: MATCH (p:player { type: 'human' }) WHERE drasi.trueLater( drasi.changeDateTime(p) <= (datetime.realtime() - duration( { seconds: 3 } )), drasi.changeDateTime(p) + duration( { seconds: 3 } ) ) RETURN p.id, p.x, p.y When a player moves more than once in less than a second is an example of using the previousValue function to compare that current state with a prior state: MATCH (p:player { type: 'human' }) WHERE drasi.changeDateTime(p).epochMillis - drasi.previousValue(drasi.changeDateTime(p).epochMillis) < 1000 RETURN p.id, p.x, p.y Here's the neat part: you can dynamically adjust the game's difficulty by adding or removing queries with different conditions; no code changes required, just deploy new Drasi queries. The traditional approach would have your agent constantly polling the data source checking these conditions: "Any player moves? How about now? Now? Now?" The Workflow in Action The agent operates through a LangGraph based state machine with two distinct phases: 1. Setup Phase (First Run Only) Setup queries prompt - Prompts the AI model to discover available Drasi queries Setup queries call model - AI model calls the Drasi tool with discover operation Setup queries tools - Executes the Drasi tool calls to subscribe to relevant queries This phase loops until the AI model has discovered and subscribed to all relevant queries 2. Main Seeking Loop (Continuous) Check sensors - Consumes any new Drasi notifications from the buffer into the workflow state Evaluate targets - Uses AI model to parse sensor data and extract target positions Select and plan - Selects closest target and plans path Execute move - Executes the next move via game API Loop continues indefinitely, reacting to new notifications No polling. No delays. No wasted resources checking positions that don't meet the detection criteria. Just pure, reactive intelligence flowing from meaningful data changes to agent actions. The continuous queries act as intelligent filters, only alerting the agent when relevant changes occur. Click here for the full implementation The Bigger Picture: Change-Driven Architecture What we're seeing with Drasi and ambient agents isn't just a new tool, it's a new architectural pattern for AI systems. The core idea is profound: AI agents can react to the world changing, not just wait to be asked about it. This pattern enables entirely new categories of applications that complement traditional chat interfaces. The example might seem playful, but it demonstrates that AI agents can perceive and react to their environment in real time. Today it's seeking players in a game. Tomorrow it could be: Managing city traffic flows based on real-time sensor data Coordinating disaster response as situations evolve Optimizing supply chains as demand patterns shift Protecting networks as threats emerge The change detection infrastructure is here. The patterns are emerging. The only question is: what will you build? Where to Go from Here Ready to dive deeper? Here are your next steps: Explore Drasi: Head to drasi.io and discover the power of the change detection platform Try langchain-drasi: Clone the GitHub repository and run the Hide-and-Seek example yourself Join the conversation: The space is new and needs diverse perspectives. Join the community on Discord. Let us know if you have built ambient agents and what challenges you faced with real-time change detection.377Views2likes0Comments