observability
14 TopicsShift-Left Governance for AI Agents: How the Agent Governance Toolkit Helps You Catch Violations
In part one of this series, we covered AGT’s runtime governance: the policy engine, zero-trust identity, execution sandboxing, and the OWASP Agentic AI risk mapping. That post focused on what happens when an agent acts: policy evaluation at the moment a tool call fires, trust scoring when agents communicate, audit logging when decisions are made. Runtime governance is essential. But it is the last line of defense. After that post went live, a pattern emerged in conversations with teams adopting AGT. The same question kept coming up: runtime checks are useful, but what about everything before production? We realized runtime governance was only half the story. So we went back and built tooling for every stage of your software development lifecycle, from the moment a developer saves a file to the moment an artifact ships to users. Why Runtime Governance Is Not Enough AI agents are a new class of workload. They reason about what to do, select tools, call APIs, read databases, and spawn sub-processes, often in loops that run without direct human oversight. The OWASP Agentic AI Top 10 (published December 2025) identifies risks like excessive agency, insecure tool use, privilege escalation, and supply chain compromise. These risks span the entire lifecycle, not just runtime. Consider a few scenarios that runtime governance alone cannot prevent: A developer commits a policy YAML file with a typo that silently disables all deny rules. The agent runs unprotected until someone notices. A dependency update introduces a package with a known critical CVE. The agent starts using a vulnerable library before any security team reviews it. A contributor adds a raw cryptographic import to an application module, bypassing the security-audited signing library. The code compiles and ships. A GitHub Actions workflow uses an expression injection pattern that allows an attacker to execute arbitrary code in CI. A release ships without a Software Bill of Materials (SBOM), making it impossible to trace which components are affected when the next log4j-style vulnerability drops. Each of these is a governance failure, but none of them happens at runtime. They happen at commit time, at PR review time, at build time, or at release time. A comprehensive governance strategy needs coverage at every stage. Four Stages of Pre-Runtime Governance Governance violations can enter a codebase at four distinct stages of the development lifecycle. Each stage has a different class of risk, and each needs a different kind of check: Stage When It Runs What It Catches AGT Tooling Commit-time Before code leaves the developer machine Malformed policies, schema violations, secrets, stub code, unauthorized crypto Pre-commit hooks, quality gates PR-time When a pull request is opened or updated Vulnerable dependencies, missing attestation, secrets in history, unpinned versions GitHub Actions (attestation, dependency review, secret scanning, supply chain checks) CI/Build-time On every push and pull request to main Compliance violations, binary security issues, dependency confusion, workflow injection Governance Verify action, Security Scan action, CodeQL, BinSkim, policy validation Release-time Before artifacts are published Missing provenance, unsigned artifacts, incomplete SBOMs SBOM generation, Sigstore signing, build attestation, OpenSSF Scorecard Just as with bugs, the earlier you catch a governance violation, the cheaper it is to fix. A malformed policy file caught at commit time costs zero CI minutes. A secret caught in PR review never reaches the default branch. A dependency confusion attack blocked in CI never reaches production. An unsigned artifact blocked at release time never reaches users. Stage 1: Commit-Time Governance with Pre-Commit Hooks The fastest governance feedback loop is local. Within the AGT project, we’ve implemented three pre-commit hooks that run automatically whenever a developer stages files for commit, validating governance artifacts before they ever leave the developer's machine. Built-In Hooks The toolkit's .pre-commit-hooks.yaml defines three hooks that any repository can adopt: Hook ID What It Validates File Pattern validate-policy YAML/JSON policy files against the AGT policy schema, checking for required fields, valid operators, and structural correctness Files matching *polic*.yaml, *polic*.yml, *polic*.json validate-plugin-manifest Plugin manifest files for required fields and schema compliance Files matching plugin.json, plugin.yaml, plugin.yml evaluate-plugin-policy Plugin manifests against a governance policy file, evaluating whether the plugin would be allowed under the organization's rules Files matching plugin.json, plugin.yaml, plugin.yml To adopt these hooks, add AGT as a pre-commit hook source: # .pre-commit-config.yaml repos: - repo: https://github.com/microsoft/agent-governance-toolkit rev: main # pin to a release tag in production hooks: - id: validate-policy - id: validate-plugin-manifest - id: evaluate-plugin-policy args: ['--policy', 'policies/marketplace-policy.yaml'] Then install and run: pip install pre-commit pre-commit install pre-commit run --all-files Extended Quality Gates Beyond schema validation, we built a pre-commit rollout template (see the full example in the repository) with additional governance-specific quality gates designed to help prevent common security anti-patterns from entering the codebase: Policy validation (agt-validate): Runs the full AGT policy CLI in strict mode, catching not just schema errors but semantic issues like conflicting rules. Health check (agt-doctor): Runs on pre-push (before code leaves the machine entirely), performing a broader health check of the governance configuration. Plugin metadata check (agency-json-required): Ensures every plugin directory contains the required agency.json metadata file. Stub detection (no-stubs): Blocks TODO, FIXME, HACK, and raise NotImplementedError markers in staged production code. Test files are excluded. Unauthorized crypto detection (no-custom-crypto): Blocks raw cryptographic imports (hashlib, hmac, crypto.subtle, System.Security.Cryptography, ring, ed25519-dalek) outside designated security modules. This helps ensure all cryptographic operations go through the audited AGT signing libraries. Secret scanning (detect-secrets): Integrates Yelp's detect-secrets for pattern-based secret detection on every commit. Phased Rollout for Teams Adopting pre-commit hooks across a team requires a thoughtful rollout. The AGT documentation includes a phased adoption guide: Week 1: Install hooks in permissive mode. Hooks warn on violations but do not block the commit. This lets developers see what would be caught without disrupting workflow. Week 2: Switch to strict mode for policy validation only. Policy files must pass schema validation to be committed. Week 3: Enable all hooks as blocking. Stubs, unauthorized crypto, and secrets are now blocked at commit time. Week 4: Graduate to full blocking mode and remove the permissive fallback. This approach helps teams build confidence in the governance tooling before it becomes a hard gate. Stage 2: PR-Time Gates Pre-commit hooks catch issues on the developer's machine, but they can be bypassed (force push, direct GitHub edits, hooks not installed). PR-time gates provide the second layer of defense, running in GitHub Actions on every pull request before merge is allowed. Governance Attestation The Governance Attestation action validates that PR authors have completed a structured attestation checklist before their code can merge. The default checklist covers seven sections: Security review Privacy review Legal review Responsible AI review Accessibility review Release Readiness / Safe Deployment Org-specific Launch Gates The action is fully configurable. Organizations can customize the required sections, set a minimum PR body length, and choose their own attestation format. Outputs include the validation status, a list of errors for missing sections, and a JSON mapping of sections to checkbox counts. Here is an example workflow: # .github/workflows/pr-governance.yml name: PR Governance on: pull_request: types: [opened, edited, synchronize] jobs: attestation: runs-on: ubuntu-latest steps: - uses: microsoft/agent-governance-toolkit/action/governance-attestation@main with: required-sections: | 1) Security review 2) Privacy review 3) Responsible AI review Dependency Review The dependency review workflow helps block PRs that introduce dependencies with known CVEs or disallowed licenses. It uses the GitHub dependency-review-action with a curated license allowlist: - uses: actions/dependency-review-action@v4 with: fail-on-severity: moderate comment-summary-in-pr: always allow-licenses: > MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, PSF-2.0, Python-2.0, 0BSD, Unlicense, CC0-1.0, CC-BY-4.0, Zlib, BSL-1.0, MPL-2.0 This runs on every PR that touches dependency manifests (package.json, Cargo.toml, pyproject.toml, requirements.txt). Dependencies with moderate or higher CVEs are flagged, and dependencies with licenses not on the allowlist are blocked. Secret Scanning The secret scanning workflow runs on every PR to the main branch and on a weekly schedule. It combines two complementary approaches: Gitleaks: Pattern-based secret detection across the full git history, catching API keys, tokens, and credentials that may have been committed at any point. High-entropy string scanning: Regex-based detection of common secret patterns including GitHub tokens (ghp_, gho_), AWS access keys (AKIA), Slack tokens (xox), and base64-encoded strings with high entropy. Supply Chain Integrity A dedicated supply chain check workflow triggers when dependency manifest files change. It enforces two rules that help prevent supply chain attacks: Exact version pinning: No ^ or ~ version ranges in package.json files. This prevents unexpected minor/patch version updates that could introduce compromised code. Lockfile presence: Every package directory with dependencies must have a corresponding lockfile (package-lock.json, pnpm-lock.yaml, or yarn.lock). Lockfiles help ensure reproducible builds with verified integrity hashes. Quality Gates The quality gates workflow mirrors the pre-commit hooks at the PR level, providing defense in depth. It runs four checks on every pull request: Gate Purpose No Stubs/TODOs Blocks TODO, FIXME, HACK markers in production code (test files excluded) No Unauthorized Crypto Blocks raw cryptographic imports outside designated security modules Security Audit Required Changes to security-sensitive paths require accompanying audit documentation Dependency Audit Trail Vendored patches must have an audit trail explaining the patch and its provenance These gates catch anything that bypasses pre-commit hooks: force-pushed commits, direct GitHub web edits, commits from contributors who have not installed the hooks. Stage 3: CI/Build-Time Governance Once a PR passes the gate workflows, the main CI pipeline and specialized workflows perform deeper, more computationally intensive analysis. The Governance Verify Action The Governance Verify action is the primary CI-time governance check. It is a GitHub Actions composite action that installs the toolkit and runs the compliance CLI against your repository. It supports four modes: Command What It Does governance-verify Runs the full compliance verification suite, checking governance controls and reporting how many pass marketplace-verify Validates a plugin manifest against marketplace requirements (required fields, signing, metadata) policy-evaluate Evaluates a specific policy file against a JSON context, returning the allow/deny decision with the matched rule all Runs governance-verify, then marketplace-verify and policy-evaluate if the corresponding paths are provided Here is an example: # .github/workflows/governance-ci.yml name: Governance CI on: [push, pull_request] jobs: verify: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: microsoft/agent-governance-toolkit/action@main with: command: all policy-path: policies/ manifest-path: plugin.json output-format: json fail-on-warning: 'true' The action outputs structured data including controls-passed, controls-total, violations count, and full command output in JSON format. This makes it straightforward to integrate with dashboards, Slack notifications, or downstream decision logic. The Security Scan Action A separate security scan action scans directories for secrets, CVEs, and dangerous code patterns. Unlike the PR-time secret scanning (which focuses on git history), this action performs deep content analysis of the current codebase: - uses: microsoft/agent-governance-toolkit/action/security-scan@main with: paths: 'plugins/ scripts/' min-severity: high exemptions-file: .security-exemptions.json The action supports configurable severity thresholds (critical, high, medium, low), an exemptions file for acknowledged findings, and structured JSON output with findings-count, blocking-count, and detailed findings. Policy Validation Workflow A dedicated policy validation workflow triggers whenever YAML files or the policy engine source code changes. It performs two jobs in sequence: Validate policies: Discovers all policy files matching the *policy* naming convention, then validates each file using the AGT policy CLI. Test policies: Runs the policy CLI unit tests to verify that policy evaluation behavior is correct after the changes. This ensures that policy file edits do not break the policy engine and that policy semantics are preserved. CodeQL and Static Analysis AGT uses GitHub's CodeQL for semantic static analysis of Python and TypeScript code. The CodeQL workflow runs on pushes and PRs, performing deep dataflow analysis that goes beyond pattern matching. Results are uploaded as SARIF to GitHub's Security tab, providing a centralized view of code quality issues. Dependency Confusion Scanning A dedicated CI job runs a dependency confusion scanner on every build. This is a targeted defense against a specific supply chain attack vector where an attacker registers a public package with the same name as an internal package. The scanner checks that: Internal package names do not collide with public PyPI or npm packages Notebook pip install commands only reference packages that are registered and expected Workflow Security Auditing When GitHub Actions workflow files change, a workflow security job scans for common CI/CD security issues: Expression injection: Detects patterns like ${{ github.event.pull_request.title }} used directly in run: blocks, which can allow arbitrary code execution. Overly permissive permissions: Flags workflows that request more permissions than necessary. Unpinned action references: Detects actions referenced by branch name instead of commit SHA, which is a supply chain risk. .NET Binary Analysis with BinSkim For the .NET SDK (Microsoft.AgentGovernance), the CI pipeline runs Microsoft BinSkim binary security analysis on compiled assemblies. BinSkim checks for security-relevant compiler and linker settings in compiled binaries, such as DEP (Data Execution Prevention), ASLR (Address Space Layout Randomization), and stack protection. Results are uploaded as SARIF to GitHub code scanning alongside the CodeQL results. The ci-complete Gate Pattern With many CI jobs that conditionally run based on path filters, AGT uses a pattern called ci-complete: a single gate job that is configured as the sole required status check in branch protection. This job runs unconditionally (if: always()), depends on all other CI jobs, and checks that none of them failed. Jobs that were skipped (because no relevant files changed) are acceptable. This pattern ensures that branch protection works correctly with conditional CI jobs, preventing the common issue where skipped jobs report as "skipped" and fail required status checks. Language-Specific Compile-Time Enforcement Beyond the language-agnostic CI checks, each AGT SDK uses its language's native compiler and tooling to enforce governance standards at compile time. .NET: The Strictest Compile-Time Checks The .NET SDK (Microsoft.AgentGovernance) enforces compile-time governance through MSBuild properties in Directory.Build.props and Directory.Build.targets, which apply automatically to every project in the SDK: Feature MSBuild Property Effect Nullable reference types <Nullable>enable</Nullable> The compiler warns on every possible null dereference, helping prevent NullReferenceException at compile time Warnings as errors <TreatWarningsAsErrors>true All compiler warnings become build errors for packable projects; no warnings can be shipped to consumers Strong-name signing <SignAssembly>true</SignAssembly> Assemblies are signed with a strong-name key (AgentGovernance.snk), enabling identity verification Deterministic builds <ContinuousIntegrationBuild>true Identical source code produces bit-for-bit identical binaries in CI, enabling build verification SourceLink Microsoft.SourceLink.GitHub package Users can step into AGT source code when debugging, supporting transparency and auditability Symbol packages <IncludeSymbols>true</IncludeSymbols> .snupkg symbol packages are published alongside NuGet packages for debugging support TypeScript: Strict Compilation and Linting The TypeScript SDK (@microsoft/agentmesh-sdk) uses strict compiler settings and ESLint for build-time governance: Strict mode ("strict": true in tsconfig.json) enables all strict type-checking options, including noImplicitAny, strictNullChecks, strictFunctionTypes, and strictBindCallApply. Consistent file naming (forceConsistentCasingInFileNames) prevents cross-platform issues where imports work on case-insensitive file systems (Windows, macOS) but fail on case-sensitive ones (Linux CI). Declaration generation (declaration: true with declarationMap: true) produces .d.ts files for consumers, enabling downstream type checking. ESLint with @typescript-eslint provides static analysis during the build process, catching issues beyond what the TypeScript compiler checks. Python: Type Safety and Fast Linting Python packages in AGT use typed package markers and static analysis tooling configured in pyproject.toml: py.typed marker: Each package includes a py.typed file, signalling to type checkers (mypy, pyright, Pylance) that the package supports type checking. Consumers get type errors if they misuse the AGT API. mypy: Configured as a dev dependency with project-specific settings in pyproject.toml. Provides static type checking that catches type mismatches before runtime. ruff: A fast Python linter written in Rust, configured in pyproject.toml and enforced in CI. Ruff checks for hundreds of code quality rules at build time. Stage 4: Release-Time Gates Before artifacts reach users, the release pipeline adds a final layer of verification. These gates help ensure that what ships is exactly what was built, is signed by the expected publisher, and has a complete inventory of its components. Gate Tool What It Produces SBOM generation Anchore/Syft SPDX and CycloneDX software bills of materials listing every component, dependency, and licence Python signing Sigstore Cryptographic signature using OpenID Connect identity, verifiable without manual key distribution .NET signing RELEASE PIPELINE Microsoft Authenticode and NuGet signing through the release pipeline Build provenance actions/attest-build-provenance SLSA provenance attestation linking the artifact to its source commit and build environment SBOM attestation actions/attest-sbom Binds the SBOM to the specific release artifact, creating a verifiable link between the inventory and the binary Additionally, the OpenSSF Scorecard runs on schedule, providing an automated security posture assessment that covers branch protection, dependency management, CI/CD practices, and more. The score is published to the OpenSSF Scorecard website, giving consumers a transparent view of the project security practices. How It All Fits Together: Defense in Depth This approach follows a defense-in-depth principle: every check exists at multiple layers, so that bypassing one layer does not compromise the whole system. Secret scanning, for example, runs at three levels: detect-secrets at commit time (pre-commit hook), Gitleaks at PR time (secret scanning workflow), and the Security Scan action at CI time (content analysis). A developer who bypasses pre-commit hooks will still be caught by the PR-time gate. A contributor who force-pushes past the PR gate will still be caught by the CI pipeline. Similarly, policy validation runs at commit time (validate-policy hook), at PR time (quality gates), and at CI time (policy validation workflow). Each layer adds depth: the commit-time hook catches schema errors, the CI pipeline catches semantic issues and runs regression tests. The ci-complete gate job ties everything together. By depending on every CI job and serving as the single required status check, it ensures that no code merges to the main branch unless every applicable check has passed. Getting Started You can adopt AGT's shift-left governance incrementally. Here are three starting points, from lowest to highest effort: 1. Add the Governance Verify Action (5 minutes) Add a single GitHub Actions workflow that runs the compliance check on every PR: # .github/workflows/governance.yml name: Governance on: [pull_request] jobs: verify: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: microsoft/agent-governance-toolkit/action@main with: command: governance-verify 2. Enable Pre-Commit Hooks (15 minutes) Add a .pre-commit-config.yaml referencing AGT's hooks, install them, and run against all existing files to establish a baseline. Start in permissive mode and graduate to strict over four weeks. 3. Full Pipeline Integration (1-2 hours) Add the complete set of PR-time gates (attestation, dependency review, secret scanning, supply chain checks, quality gates), configure the Security Scan action for your plugin directories, and enable SBOM generation and signing in your release workflow. The AGT repository itself serves as a reference implementation: every workflow described in this post is running in production at aka.ms/agent-governance-toolkit. Important Notes The policy files, workflow configurations, and code samples in this post are illustrative examples. Your organization's governance requirements may differ. Review and customize all configurations before deploying to production. The Agent Governance Toolkit is designed to help organizations implement governance controls for AI agents; it does not guarantee compliance with any specific regulatory framework. Always consult your organization's security and legal teams when defining governance policies. What Comes Next Pre-runtime governance is one piece of the puzzle. Combined with the runtime governance capabilities covered in part one of this series (policy engines, zero-trust identity, execution sandboxing, audit logging), it provides coverage across the full lifecycle. The project continues to grow. Since the initial release, we’ve added a multi-stage policy pipeline (pre_input, pre_tool, post_tool, pre_output stages), approval workflows with human-in-the-loop gates, DLP attribute ratchets for monotonic session state, and OpenTelemetry instrumentation for governance operations. Over 45 step-by-step tutorials are available in the documentation. Everything described in this post is available today in the public GitHub repository. The full source, documentation, tutorials, and examples are at aka.ms/agent-governance-toolkit, open source under the MIT license. We welcome contributions, feedback, and issue reports from the community.65Views0likes0CommentsProject Pavilion Presence at KubeCon EU 2026
KubeCon + CloudNativeCon Europe 2026 took place from 23 to 26 March at RAI Amsterdam, and it was a strong one. The themes running through the week reflected where the cloud native community is right now: AI moving from experimentation into production, platform engineering continuing to mature, and security and sovereignty top of mind for organizations across Europe. Microsoft was there throughout, and once again supported a range of open source projects in the Project Pavilion. The Project Pavilion is a dedicated, vendor-neutral space on the show floor reserved for CNCF projects. It is where the work gets talked about honestly. Maintainers and contributors meet directly with end users, share what they are building, get real feedback on what is and is not working, and have the kinds of technical conversations that are hard to have anywhere else. For open source communities, it is one of the most valuable parts of the event. Why Our Presence Matters Microsoft's products and services are built on and alongside many of the technologies represented in the pavilion, and the health of these communities matters to us directly. Showing up means our teams hear firsthand what is working, what is missing, and where these projects need to go next. It also means we get to contribute as community members, not just as a company name on a sponsor board. That distinction matters to us, and to the communities we are part of. Microsoft-Supported Pavilion Projects Confidential Containers Representative: Jeremi Piotrowski The Confidential Containers booth gave attendees a chance to learn more about the project and its approach to protecting workloads using hardware-based trusted execution environments. Jeremi was on hand throughout the kiosk hours, fielding questions from interested users and developers exploring confidential computing in Kubernetes environments. Conversations touched on use cases around data privacy, regulated workloads, and the role Confidential Containers plays in the broader cloud-native security landscape. Drasi Representative: Daniel Gerlag and Nandita Valsan The Drasi team had a busy time in the pavilion, engaging around 40 attendees across two kiosk shifts in focused technical conversations. Most visitors were developers and platform engineers curious about change-driven architectures and real-time data processing. There was strong positive feedback on the newly introduced Drasi Server modes and embeddable library, which complement Drasi for Kubernetes. The team came away with useful validation of current design decisions and good input for the roadmap ahead. Envoy Representative: Mikhail Krinkin The Envoy booth was staffed for the full duration of KubeCon EU by maintainers from Microsoft, Google, Isovalent, and Tetrate, reflecting the broad and healthy contributor base behind the project. The biggest topic at the booth was migration from ingress-nginx to Gateway API implementations. The archival of ingress-nginx pushed a lot of users into making changes they were not quite ready for, and questions ranged from technical specifics like HTTP default differences between Envoy and Nginx, to more foundational questions about what Envoy and Gateway API actually are. The team had anticipated this and invested in the ingress2gateway project to give users a clear migration path. Extensibility was another frequent conversation topic, with dynamic modules increasingly becoming the go-to answer for user-specific requirements. Starting with the 1.38 release of Envoy, dynamic modules will have a backward compatible ABI, a sign of real production readiness for that feature. Flatcar Representative: Thilo Fromm and Mathieu Tortuyaux The Flatcar booth had great energy, with maintainers from Microsoft, STACKIT, and CloudBase joining for conversations throughout the pavilion hours. Operational sovereignty came up again and again as a theme, with users and consulting partners sharing how they are building their Kubernetes offerings on Flatcar because of how reliable and secure it is. There were a lot of meaningful conversations. Lambda.ai currently runs Flatcar on their control plane and is looking at extending it to worker and customer clusters, with interest in contributing to the project. ReeVo has built their hosted Kubernetes distro on Flatcar across multi-cloud and bare metal environments and is planning to move hundreds of customer clusters over soon. Users from ClearScore, Avassa, Recorded Future, and several other organizations also stopped by with positive feedback on the project's robustness and security. STACKIT uses Flatcar as the default OS for their hosted Kubernetes offering and sponsors a full-time maintainer for the project. The team also connected with TAG Infrastructure to talk through Flatcar's CNCF graduation progress. Headlamp Representatives: René Dudfield and Santhosh Nagaraj S The Headlamp booth was a busy one, with users, contributors, and partner projects all stopping by throughout the pavilion hours. Conversations covered real-world deployments, federation challenges, multi-tenant namespace visibility, and feature requests like multi-CR data aggregation. There was notable interest from consultancies deploying Headlamp across hundreds of customer clusters, as well as from companies already running it at cloud scale. Several CNCF projects expressed interest in building UIs for their own projects inside Headlamp, with a few even getting started right there at the conference. The team also heard from users getting budget approved to migrate from the deprecated Kubernetes Dashboard, which is a good sign for the project's growing momentum. Demand for air-gapped AI agent support and deeper Azure and AKS integrations for internal developer platforms came up as clear areas to watch. Hyperlight Representative: Ralph Squillace The Hyperlight booth ran as a half-day session on Tuesday, in line with the project's current Sandbox status, but the corner location in the project area made a real difference in visibility. Ralph was fielding questions from the moment the doors opened, with a steady stream of visitors right up until the shift ended. Live and recorded demos were central to the conversations, helping attendees quickly grasp what Hyperlight does and how it fits into their environments. One standout visit came from an engineer at SAP who spent nearly an hour at the booth, pushing the conversation from fundamentals and embedding examples all the way through to agentic protection scenarios in Kubernetes. That conversation continued beyond KubeCon and turned into a scheduled meeting to explore a proof of concept, a good example of the kind of follow-on engagement the pavilion can generate. Inspektor Gadget Representative: Michael Friese and Qasim Sarfraz The Inspektor Gadget booth had a lot of great energy, drawing in contributors, new users, and people just discovering the project for the first time. There was genuine excitement around Inspektor Gadget Desktop and its visual troubleshooting experience for Kubernetes and Linux environments. The integration with HolmesGPT, which was also featured in the keynote, came up frequently and was one of the main talking points throughout the event. A theme that surfaced consistently in conversations with platform engineers was multi-tenancy, with teams looking for ways to safely give developers ad-hoc access to troubleshoot issues independently while keeping overall control at the platform level. It was a good set of conversations that reflected both the project's maturity and the growing demand for a flexible observability framework. Istio Representative: Mitch Connors, Mikhail Krinkin, Jackie Maertens and Mike Morris The Istio booth had steady traffic throughout the conference, with a noticeable shift in who was stopping by. More visitors came from teams with existing sidecar-based production deployments looking for guidance on moving to ambient mode, which is a change from previous years when ambient interest was mostly coming from greenfield users. The motivation to make the move was often tied to cost optimization and performance, with teams having read case studies and feeling more confident about the direction. That said, the increased interest also surfaced some real gaps, including requests for clearer migration guidance, more clarity around architectural differences like mTLS egress workflows, and better support for VM-based workloads. The team is planning to prioritize migration guidance over the coming months. The updated Istio Day format, with a half day of sessions at the Cloud Native Theater stage, also drew a strong crowd with standing room only throughout. Notary Project Representative: Toddy Mladenov and Flora Taagen The Notary Project kiosk drew a wide range of visitors, from people learning about container image signing for the first time to experienced engineers asking detailed questions about what is coming next on the roadmap. A major highlight of the week was the project's conference talk on per-layer dm-verity signing, which drew a packed room and over 660 online sign-ups, one of the stronger turnouts for a project-level session at the event. The talk walked through how the new capability moves container security beyond pull-time verification to continuous runtime protection, backed by dm-verity, which generated a lengthy Q&A and a lot of enthusiasm from the audience. The team also sees a real opportunity ahead as AI workloads push organizations to think harder about the integrity of models, datasets, and container images, and the interest at the booth reinforced that Notary Project is well positioned to play a big role in securing those workflows. ORAS Representative: Toddy Mladenov The ORAS kiosk was staffed by maintainers from Microsoft, NVIDIA, and Red Hat, a good reflection of the healthy multi-vendor community the project has built. Attendees engaged with maintainers on ORAS use cases and adoption, with conversations ranging from how artifacts are tagged and packaged to how ORAS fits into broader multi-cloud workflows. One practical takeaway from maintainer conversations was around leveraging the ORAS SDK more often as a substitute for CLI operations when working with container registries, which helps teams build simpler and more robust tooling. Radius Representative: Sylvain Niles and Will Tsai The Radius booth, supported by the Microsoft Azure Incubations team, attracted a good mix of enterprise platform teams, prospective adopters, and fellow open source maintainers throughout the pavilion hours. There was strong interest in the extensible Radius Resource Types feature and how it helps teams abstract infrastructure complexity and move workloads across different environments. Conversations also surfaced useful feedback on where the project should focus next, including agent-driven infrastructure workflows and using the Radius application graph to improve observability and operational visibility for cloud-native applications. Conclusion KubeCon EU 2026 was a good reminder of why this community continues to grow. The conversations in the Project Pavilion were substantive, the feedback was honest, and the connections made there will carry forward into the work. Microsoft will be back for KubeCon NA in Salt Lake City this November, and we are already looking forward to it. If you are interested in getting involved with any of these projects, the best starting point is each project's community directly. You are also welcome to reach out to Lexi Nadolski at lexinadolski@microsoft.com with any questions.58Views0likes0CommentsAgent Governance Toolkit: Architecture Deep Dive, Policy Engines, Trust, and SRE for AI Agents
Last week we announced the Agent Governance Toolkit on the Microsoft Open Source Blog, an open-source project that brings runtime security governance to autonomous AI agents. In that announcement, we covered the why: AI agents are making autonomous decisions in production, and the security patterns that kept systems safe for decades need to be applied to this new class of workload. In this post, we'll go deeper into the how: the architecture, the implementation details, and what it takes to run governed agents in production. The Problem: Production Infrastructure Meets Autonomous Agents If you manage production infrastructure, you already know the playbook: least privilege, mandatory access controls, process isolation, audit logging, and circuit breakers for cascading failures. These patterns have kept production systems safe for decades. Now imagine a new class of workload arriving on your infrastructure, AI agents that autonomously execute code, call APIs, read databases, and spawn sub-processes. They reason about what to do, select tools, and act in loops. And in many current deployments, they do all of this without the security controls you'd demand of any other production workload. That gap is what led us to build the Agent Governance Toolkit: an open-source project, that applies proven security concepts from operating systems, service meshes, and SRE to the emerging world of autonomous AI agents. To frame this in familiar terms: most AI agent frameworks today are like running every process as root, no access controls, no isolation, no audit trail. The Agent Governance Toolkit is the kernel, the service mesh, and the SRE platform for AI agents. When an agent calls a tool, say, `DELETE FROM users WHERE created_at < NOW()`, there is typically no policy layer checking whether that action is within scope. There is no identity verification when one agent communicates with another. There is no resource limit preventing an agent from making 10,000 API calls in a minute. And there is no circuit breaker to contain cascading failures when things go wrong. OWASP Agentic Security Initiative In December 2025, OWASP published the Agentic AI Top 10: the first formal taxonomy of risks specific to autonomous AI agents. The list reads like a security engineer's nightmare: goal hijacking, tool misuse, identity abuse, memory poisoning, cascading failures, rogue agents, and more. If you've ever hardened a production server, these risks will feel both familiar and urgent. The Agent Governance Toolkit is designed to help address all 10 of these risks through deterministic policy enforcement, cryptographic identity, execution isolation, and reliability engineering patterns. Note: The OWASP Agentic Security Initiative has since adopted the ASI 2026 taxonomy (ASI01–ASI10). The toolkit's copilot-governance package now uses these identifiers with backward compatibility for the original AT numbering. Architecture: Nine Packages, One Governance Stack The toolkit is structured as a v3.0.0 Public Preview monorepo with nine independently installable packages: Package What It Does Agent OS Stateless policy engine, intercepts agent actions before execution with configurable pattern matching and semantic intent classification Agent Mesh Cryptographic identity (DIDs with Ed25519), Inter-Agent Trust Protocol (IATP), and trust-gated communication between agents Agent Hypervisor Execution rings inspired by CPU privilege levels, saga orchestration for multi-step transactions, and shared session management Agent Runtime Runtime supervision with kill switches, dynamic resource allocation, and execution lifecycle management Agent SRE SLOs, error budgets, circuit breakers, chaos engineering, and progressive delivery, production reliability practices adapted for AI agents Agent Compliance Automated governance verification with compliance grading and regulatory framework mapping (EU AI Act, NIST AI RMF, HIPAA, SOC 2) Agent Lightning Reinforcement learning training governance with policy-enforced runners and reward shaping Agent Marketplace Plugin lifecycle management with Ed25519 signing, trust-tiered capability gating, and SBOM generation Integrations 20+ framework adapters for LangChain, CrewAI, AutoGen, Semantic Kernel, Google ADK, Microsoft Agent Framework, OpenAI Agents SDK, and more Agent OS: The Policy Engine Agent OS intercepts agent tool calls before they execute: from agent_os import StatelessKernel, ExecutionContext, Policy kernel = StatelessKernel() ctx = ExecutionContext( agent_id="analyst-1", policies=[ Policy.read_only(), # No write operations Policy.rate_limit(100, "1m"), # Max 100 calls/minute Policy.require_approval( actions=["delete_*", "write_production_*"], min_approvals=2, approval_timeout_minutes=30, ), ], ) result = await kernel.execute( action="delete_user_record", params={"user_id": 12345}, context=ctx, ) The policy engine works in two layers: configurable pattern matching (with sample rule sets for SQL injection, privilege escalation, and prompt injection that users customize for their environment) and a semantic intent classifier that helps detect dangerous goals regardless of phrasing. When an action is classified as `DESTRUCTIVE_DATA`, `DATA_EXFILTRATION`, or `PRIVILEGE_ESCALATION`, the engine blocks it, routes it for human approval, or downgrades the agent's trust level, depending on the configured policy. Important: All policy rules, detection patterns, and sensitivity thresholds are externalized to YAML configuration files. The toolkit ships with sample configurations in `examples/policies/` that must be reviewed and customized before production deployment. No built-in rule set should be considered exhaustive. Policy languages supported: YAML, OPA Rego, and Cedar. The kernel is stateless by design, each request carries its own context. This means you can deploy it behind a load balancer, as a sidecar container in Kubernetes, or in a serverless function, with no shared state to manage. On AKS or any Kubernetes cluster, it fits naturally into existing deployment patterns. Helm charts are available for agent-os, agent-mesh, and agent-sre. Agent Mesh: Zero-Trust Identity for Agents In service mesh architectures, services prove their identity via mTLS certificates before communicating. AgentMesh applies the same principle to AI agents using decentralized identifiers (DIDs) with Ed25519 cryptography and the Inter-Agent Trust Protocol (IATP): from agentmesh import AgentIdentity, TrustBridge identity = AgentIdentity.create( name="data-analyst", sponsor="alice@company.com", # Human accountability capabilities=["read:data", "write:reports"], ) # identity.did -> "did:mesh:data-analyst:a7f3b2..." bridge = TrustBridge() verification = await bridge.verify_peer( peer_id="did:mesh:other-agent", required_trust_score=700, # Must score >= 700/1000 ) A critical feature is trust decay: an agent's trust score decreases over time without positive signals. An agent trusted last week but silent since then gradually becomes untrusted, modeling the reality that trust requires ongoing demonstration, not a one-time grant. Delegation chains enforce scope narrowing: a parent agent with read+write permissions can delegate only read access to a child agent, never escalate. Agent Hypervisor: Execution Rings CPU architectures use privilege rings (Ring 0 for kernel, Ring 3 for userspace) to isolate workloads. The Agent Hypervisor applies this model to AI agents: Ring Trust Level Capabilities Ring 0 (Kernel) Score ≥ 900 Full system access, can modify policies Ring 1 (Supervisor) Score ≥ 700 Cross-agent coordination, elevated tool access Ring 2 (User) Score ≥ 400 Standard tool access within assigned scope Ring 3 (Untrusted) Score < 400 Read-only, sandboxed execution only New and untrusted agents start in Ring 3 and earn their way up, exactly the principle of least privilege that production engineers apply to every other workload. Each ring enforces per-agent resource limits: maximum execution time, memory caps, CPU throttling, and request rate limits. If a Ring 2 agent attempts a Ring 1 operation, it gets blocked, just like a userspace process trying to access kernel memory. These ring definitions and their associated trust score thresholds are fully configurable via policy. Organizations can define custom ring structures, adjust the number of rings, set different trust score thresholds for transitions, and configure per-ring resource limits to match their security requirements. The hypervisor also provides saga orchestration for multi-step operations. When an agent executes a sequence, draft email → send → update CRM, and the final step fails, compensating actions fire in reverse. Borrowed from distributed transaction patterns, this ensures multi-agent workflows maintain consistency even when individual steps fail. Agent SRE: SLOs and Circuit Breakers for Agents If you practice SRE, you measure services by SLOs and manage risk through error budgets. Agent SRE extends this to AI agents: When an agent's safety SLI drops below 99 percent, meaning more than 1 percent of its actions violate policy, the system automatically restricts the agent's capabilities until it recovers. This is the same error-budget model that SRE teams use for production services, applied to agent behavior. We also built nine chaos engineering fault injection templates: network delays, LLM provider failures, tool timeouts, trust score manipulation, memory corruption, and concurrent access races. Because the only way to know if your agent system is resilient is to break it intentionally. Agent SRE integrates with your existing observability stack through adapters for Datadog, PagerDuty, Prometheus, OpenTelemetry, Langfuse, LangSmith, Arize, MLflow, and more. Message broker adapters support Kafka, Redis, NATS, Azure Service Bus, AWS SQS, and RabbitMQ. Compliance and Observability If your organization already maps to CIS Benchmarks, NIST AI RMF, or other frameworks for infrastructure compliance, the OWASP Agentic Top 10 is the equivalent standard for AI agent workloads. The toolkit's agent-compliance package provides automated governance grading against these frameworks. The toolkit is framework-agnostic, with 20+ adapters that hook into each framework's native extension points, so adding governance to an existing agent is typically a few lines of configuration, not a rewrite. The toolkit exports metrics to any OpenTelemetry-compatible platform, Prometheus, Grafana, Datadog, Arize, or Langfuse. If you're already running an observability stack for your infrastructure, agent governance metrics flow through the same pipeline. Key metrics include: policy decisions per second, trust score distributions, ring transitions, SLO burn rates, circuit breaker state, and governance workflow latency. Getting Started # Install all packages pip install agent-governance-toolkit[full] # Or individual packages pip install agent-os-kernel agent-mesh agent-sre The toolkit is available across language ecosystems: Python, TypeScript (`@microsoft/agentmesh-sdk` on npm), Rust, Go, and .NET (`Microsoft.AgentGovernance` on NuGet). Azure Integrations While the toolkit is platform-agnostic, we've included integrations that help enable the fastest path to production, on Azure: Azure Kubernetes Service (AKS): Deploy the policy engine as a sidecar container alongside your agents. Helm charts provide production-ready manifests for agent-os, agent-mesh, and agent-sre. Azure AI Foundry Agent Service: Use the built-in middleware integration for agents deployed through Azure AI Foundry. OpenClaw Sidecar: One compelling deployment scenario is running OpenClaw, the open-source autonomous agent, inside a container with the Agent Governance Toolkit deployed as a sidecar. This gives you policy enforcement, identity verification, and SLO monitoring over OpenClaw's autonomous operations. On Azure Kubernetes Service (AKS), the deployment is a standard pod with two containers: OpenClaw as the primary workload and the governance toolkit as the sidecar, communicating over localhost. We have a reference architecture and Helm chart available in the repository. The same sidecar pattern works with any containerized agent, OpenClaw is a particularly compelling example because of the interest in autonomous agent safety. Tutorials and Resources 34+ step-by-step tutorials covering policy engines, trust, compliance, MCP security, observability, and cross-platform SDK usage are available in the repository. git clone https://github.com/microsoft/agent-governance-toolkit cd agent-governance-toolkit pip install -e "packages/agent-os[dev]" -e "packages/agent-mesh[dev]" -e "packages/agent-sre[dev]" # Run the demo python -m agent_os.demo What's Next AI agents are becoming autonomous decision-makers in production infrastructure, executing code, managing databases, and orchestrating services. The security patterns that kept production systems safe for decades, least privilege, mandatory access controls, process isolation, audit logging, are exactly what these new workloads need. We built them. They're open source. We're building this in the open because agent security is too important for any single organization to solve alone: Security research: Adversarial testing, red-team results, and vulnerability reports strengthen the toolkit for everyone. Community contributions: Framework adapters, detection rules, and compliance mappings from the community expand coverage across ecosystems. We are committed to open governance. We're releasing this project under Microsoft today, and we aspire to move it into a foundation home, such as the AI and Data Foundation (AAIF), where it can benefit from cross-industry stewardship. We're actively engaging with foundation partners on this path. The Agent Governance Toolkit is open source under the MIT license. Contributions welcome at github.com/microsoft/agent-governance-toolkit.1.1KViews0likes0CommentsRetina 1.0 Is Now Available
We are excited to announce the first major release of Retina - a significant milestone for the project. This version brings along many new features, enhancements and bug fixes. The Retina maintainer team would like to thank all contributors, community members, and early adopters who helped make this 1.0 release possible. What is Retina? Retina is an open-source, Kubernetes network observability platform. It enables you to continuously observe and measure network health, and investigate network issues on-demand with integrated Kubernetes-native workflows. Why Retina? Kubernetes networking failures are rarely isolated or easy to reproduce. Pods are ephemeral, services span multiple nodes, and network traffic crosses multiple layers (CNI, kube-proxy, node networking, policies), making crucial evidence difficult to capture. Manually connecting to nodes and stitching together logs or packet captures simply does not scale as clusters grow in size and complexity. A modern approach to observability must automate and centralize data collection while exposing rich, actionable insights. Retina represents a major step forward in solving the complexities of Kubernetes observability by leveraging the power of eBPF. Its cloud-agnostic design, deep integration with Hubble, and support for both real-time metrics and on-demand packet captures make it an invaluable tool for DevOps, SecOps, and compliance teams across diverse environments. What Does It Do? Retina can collect two types of telemetry: metrics and packet captures. The Retina shell enables ad-hoc troubleshooting via pre-installed networking tools. Metrics Metrics provide continuous observability. They can be exported to multiple storage options such as Prometheus or Azure Monitor, and visualized in a variety of ways, including Grafana or Azure Log Analytics. Retina supports two control planes: Hubble and Standard. Both are supported regardless of the underlying CNI. The choice of control plane affects the metrics which are collected. Hubble metrics Standard metrics You can customize which metrics are collected by enabling/disabling their corresponding plugins. Some examples of metrics may include: Incoming/outcoming traffic Dropped packets TCP/UDP DNS API Server latency Node/interface statistics Packet Captures Captures provide on-demand observability. They allow users to perform distributed packet captures across the cluster, based on specified Nodes/Pods and other supported filters. They can be triggered via the CLI or through the capture CRD, and may be output to persistent storage options such as the host filesystem, a PVC, or a storage blob. The result of the capture contains more than just a .pcap file. Retina also captures a number of networking metadata such as iptables rules, socket statistics, kernel network information from /proc/net, and more. Shell The Retina shell enables deep ad-hoc troubleshooting by providing a suite of networking tools. The CLI command starts an interactive shell on a Kubernetes node that runs a container image which includes standard tools such as ping or curl, as well as specialized tools like bpftool, pwru, Inspektor Gadget and more. The Retina shell is currently only available on Linux. Note that some tools require particular capabilities to execute. These can be passed as parameters through the CLI. Use Cases Debugging Pod Connectivity Issues: When services can’t communicate, Retina enables rapid, automated distributed packet capture and drop metrics, drastically reducing troubleshooting time. The Retina shell also brings specialized tools for deep manual investigations. Continuous Monitoring of Network Health: Operators can set up alerts and dashboards for DNS failures, API server latency, or packet drops, gaining ongoing visibility into cluster networking. Security Auditing and Compliance: Flow logs (in Hubble mode) and metrics support security investigations and compliance reporting, enabling quick identification of unexpected connections or data transfers. Multi-Cluster / Multi-Cloud Visibility: Retina standardizes network observability across clouds, supporting unified dashboards and processes for SRE teams. Where Does It Run? Retina is designed for broad compatibility across Kubernetes distributions, cloud providers, and operating systems. There are no Azure-specific dependencies - Retina runs anywhere Kubernetes does. Operating Systems: Both Linux and Windows nodes are supported. Kubernetes Distributions: Retina is distribution-agnostic, deployable on managed services (AKS, EKS, GKE) or self-managed clusters. CNI / Network Stack: Retina works with any CNI, focusing on kernel-level events rather than CNI-specific logs. Cloud Integration: Retina exports metrics to Azure Monitor and Log Analytics, with pre-built Grafana dashboards for AKS. Integration with AWS CloudWatch or GCP Stackdriver is possible via Prometheus. Observability Stacks: Retina integrates with Prometheus & Grafana, Cilium Hubble (for flow logs and UI), and can be extended to other exporters. Design Overview Retina’s architecture consists of two layers: a data collection layer in the kernel-space, and processing layer that converts low-level signals into Kubernetes-aware telemetry in the user-space. When Retina is installed, each node in the cluster runs a Retina agent which collects raw network telemetry from the host kernel - backed by eBPF on Linux, and HNS/VFP on Windows. The agent processes the raw network data and enriches it with Kubernetes metadata, which is then exported for consumption by monitoring tools such as Prometheus, Grafana, or Hubble UI. Modularity and extensibility are central to the design philosophy. Retina's plugin model lets you enable only the telemetry you need, and add new sources by implementing a common plugin interface. Built-in plugins include Drop Reason, DNS, Packet Forward, and more. Check out our architecture docs for a deeper dive into Retina's design. Get Started Thanks to Helm charts deploying Retina is streamlined across all environments, and can be done with one configurable command. For complete documentation, visit our installation docs. To install Retina with the Standard control plane and Basic metrics mode: VERSION=$( curl -sL https://api.github.com/repos/microsoft/retina/releases/latest | jq -r .name) helm upgrade --install retina oci://ghcr.io/microsoft/retina/charts/retina \ --version $VERSION \ --namespace kube-system \ --set image.tag=$VERSION \ --set operator.tag=$VERSION \ --set logLevel=info \ --set operator.enabled=true \ --set enabledPlugin_linux="\[dropreason\,packetforward\,linuxutil\,dns\]" Once Retina is running in your cluster, you can then configure Prometheus and Grafana to scrape and visualize your metrics. Install the Retina CLI with Krew: kubectl krew install retina Get Involved Retina is open-source under the MIT License and welcomes community contributions. Since its announcement in early 2024, the project has gained significant traction, with contributors from multiple organizations helping to expand its capabilities. The project is hosted on GitHub · microsoft/retina and documentation is available at retina.sh. If you would like to contribute to Retina you can follow our contributor guide. What's Next? Retina 1.1 of course! We are also discussing the future roadmap, and exploring the possibility of moving the project to community ownership. Stay tuned! In the meantime, we welcome you to raise an issue if you find any bugs, or start a discussion if you have any questions or suggestions. You can also reach out to the Retina team via email, we would love to hear from you! References Retina Deep Dive into Retina Open-Source Kubernetes Network Observability Troubleshooting Network Issues with Retina Retina: Bridging Kubernetes Observability and eBPF Across the Clouds770Views0likes0CommentsProject Pavilion Presence at KubeCon NA 2025
KubeCon + CloudNativeCon NA took place in Atlanta, Georgia, from 10-13 November, and continued to highlight the ongoing growth of the open source, cloud-native community. Microsoft participated throughout the event and supported several open source projects in the Project Pavilion. Microsoft’s involvement reflected our commitment to upstream collaboration, open governance, and enabling developers to build secure, scalable and portable applications across the ecosystem. The Project Pavilion serves as a dedicated, vendor-neutral space on the KubeCon show floor reserved for CNCF projects. Unlike the corporate booths, it focuses entirely on open source collaboration. It brings maintainers and contributors together with end users for hands-on demos, technical discussions, and roadmap insights. This space helps attendees discover emerging technologies and understand how different projects fit into the cloud-native ecosystem. It plays a critical role for idea exchanges, resolving challenges and strengthening collaboration across CNCF approved technologies. Why Our Presence Matters KubeCon NA remains one of the most influential gatherings for developers and organizations shaping the future of cloud-native computing. For Microsoft, participating in the Project Pavilion helps advance our goals of: Open governance and community-driven innovation Scaling vital cloud-native technologies Secure and sustainable operations Learning from practitioners and adopters Enabling developers across clouds and platforms Many of Microsoft’s products and cloud services are built on or aligned with CNCF and open-source technologies. Being active within these communities ensures that we are contributing back to the ecosystem we depend on and designing by collaborating with the community, not just for it. Microsoft-Supported Pavilion Projects containerd Representative: Wei Fu The containerd team engaged with project maintainers and ecosystem partners to explore solutions for improving AI model workflows. A key focus was the challenge of handling large OCI artifacts (often 500+ GiB) used in AI training workloads. Current image-pulling flows require containerd to fetch and fully unpack blobs, which significantly delays pod startup for large models. Collaborators from Docker, NTT, and ModelPack discussed a non-unpacking workflow that would allow training workloads to consume model data directly. The team plans to prototype this behavior as an experimental feature in containerd. Additional discussions included updates related to nerdbox and next steps for the erofs snapshotter. Copacetic Representative: Joshua Duffney The Copa booth attracted roughly 75 attendees, with strong representation from federal agencies and financial institutions, a sign of growing adoption in regulated industries. A lightning talk delivered at the conference significantly boosted traffic and engagement. Key feedback and insights included: High interest in customizable package update sources Demand for application-level patching beyond OS-level updates Need for clearer CI/CD integration patterns Expectations around in-cluster image patching Questions about runtime support, including Podman The conversations revealed several documentation gaps and feature opportunities that will inform Copa’s roadmap and future enablement efforts. Drasi Representative: Nandita Valsan KubeCon NA 2025 marked Drasi’s first in-person presence since its launch in October 2024 and its entry into the CNCF Sandbox in early 2025. With multiple kiosk slots, the team interacted with ~70 visitors across shifts. Engagement highlights included: New community members joining the Drasi Discord and starring GitHub repositories Meaningful discussions with observability and incident management vendors interested in change-driven architectures Positive reception to Aman Singh’s conference talk, which led attendees back to the booth for deeper technical conversations Post-event follow-ups are underway with several sponsors and partners to explore collaboration opportunities. Flatcar Container Linux Representatives: Sudhanva Huruli and Vamsi Kavuru The Flatcar project had some fantastic conversations at the pavilion. Attendees were eager to learn about bare metal provisioning, GPU support for AI workloads, and how Flatcar’s fully automated build and test process keeps things simple and developer friendly. Questions around Talos vs. Flatcar and CoreOS sparked lively discussions, with the team emphasizing Flatcar’s usability and independence from an OS-level API. Interest came from government agencies and financial institutions, and the preview of Flatcar on AKS opened the door to deeper conversations about real-world adoption. The Project Pavilion proved to be the perfect venue for authentic, technical exchanges. Flux Representatives: Dipti Pai The Flux booth was active throughout all three days of the Project Pavilion, where Microsoft joined other maintainers to highlight new capabilities in Flux 2.7, including improved multi-tenancy, enhanced observability, and streamlined cloud-native integrations. Visitors shared real-world GitOps experiences, both successes and challenges, which provided valuable insights for the project’s ongoing development. Microsoft’s involvement reinforced strong collaboration within the Flux community and continued commitment to advancing GitOps practices. Headlamp Representatives: Joaquim Rocha, Will Case, and Oleksandr Dubenko Headlamp had a booth for all three days of the conference, engaging with both longstanding users and first-time attendees. The increased visibility from becoming a Kubernetes sub-project was evident, with many attendees sharing their usage patterns across large tech organizations and smaller industrial teams. The booth enabled maintainers to: Gather insights into how teams use Headlamp in different environments Introduce Headlamp to new users discovering it via talks or hallway conversations Build stronger connections with the community and understand evolving needs Inspektor Gadget Representatives: Jose Blanquicet and Mauricio Vásquez Bernal Hosting a half-day kiosk session, Inspektor Gadget welcomed approximately 25 visitors. Attendees included newcomers interested in learning the basics and existing users looking for updates. The team showcased new capabilities, including the tcpdump gadget and Prometheus metrics export, and invited visitors to the upcoming contribfest to encourage participation. Istio Representatives: Keith Mattix, Jackie Maertens, Steven Jin Xuan, Niranjan Shankar, and Mike Morris The Istio booth continued to attract a mix of experienced adopters and newcomers seeking guidance. Technical discussions focused on: Enhancements to multicluster support in ambient mode Migration paths from sidecars to ambient Improvements in Gateway API availability and usage Performance and operational benefits for large-scale deployments Users, including several Azure customers, expressed appreciation for Microsoft’s sustained investment in Istio as part of their service mesh infrastructure. Notary Project Representative: Feynman Zhou and Toddy Mladenov The Notary Project booth saw significant interest from practitioners concerned with software supply chain security. Attendees discussed signing, verification workflows, and integrations with Azure services and Kubernetes clusters. The conversations will influence upcoming improvements across Notary Project and Ratify, reinforcing Microsoft’s commitment to secure artifacts and verifiable software distribution. Open Policy Agent (OPA) - Gatekeeper Representative: Jaydip Gabani The OPA/Gatekeeper booth enabled maintainers to connect with both new and existing users to explore use cases around policy enforcement, Rego/CEL authoring, and managing large policy sets. Many conversations surfaced opportunities around simplifying best practices and reducing management complexity. The team also promoted participation in an ongoing Gatekeeper/OPA survey to guide future improvements. ORAS Representative: Feynman Zhou and Toddy Mladenov ORAS engaged developers interested in OCI artifacts beyond container images which includes AI/ML models, metadata, backups, and multi-cloud artifact workflows. Attendees appreciated ORAS’s ecosystem integrations and found the booth examples useful for understanding how artifacts are tagged, packaged, and distributed. Many users shared how they leverage ORAS with Azure Container Registry and other OCI-compatible registries. Radius Representative: Zach Casper The Radius booth attracted the attention of platform engineers looking for ways to simplify their developer's experience while being able to enforce enterprise-grade infrastructure and security best practices. Attendees saw demos on deploying a database to Kubernetes and using managed databases from AWS and Azure without modifying the application deployment logic. They also saw a preview of Radius integration with GitHub Copilot enabling AI coding agents to autonomously deploy and test applications in the cloud. Conclusion KubeCon + CloudNativeCon North America 2025 reinforced the essential role of open source communities in driving innovation across cloud native technologies. Through the Project Pavilion, Microsoft teams were able to exchange knowledge with other maintainers, gather user feedback, and support projects that form foundational components of modern cloud infrastructure. Microsoft remains committed to building alongside the community and strengthening the ecosystem that powers so much of today’s cloud-native development. For anyone interested in exploring or contributing to these open source efforts, please reach out directly to each project’s community to get involved, or contact Lexi Nadolski at lexinadolski@microsoft.com for more information.275Views1like0CommentsBeyond the Chat Window: How Change-Driven Architecture Enables Ambient AI Agents
AI agents are everywhere now. Powering chat interfaces, answering questions, helping with code. We've gotten remarkably good at this conversational paradigm. But while the world has been focused on chat experiences, something new is quietly emerging: ambient agents. These aren't replacements for chat, they're an entirely new category of AI system that operates in the background, sensing, processing, and responding to the world in real time. And here's the thing, this is a new frontier. The infrastructure we need to build these systems barely exists yet. Or at least, it didn't until now. Two Worlds: Conversational and Ambient Let me paint you a picture of the conversational AI paradigm we know well. You open a chat window. You type a question. You wait. The AI responds. Rinse and repeat. It's the digital equivalent of having a brilliant assistant sitting at a desk, ready to help when you tap them on the shoulder. Now imagine a completely different kind of assistant. One that watches for important changes, anticipates needs, and springs into action without being asked. That's the promise of ambient agents. AI systems that, as LangChain puts it: "listen to an event stream and act on it accordingly, potentially acting on multiple events at a time." This isn't an evolution of chat; it's a fundamentally different interaction paradigm. Both have their place. Chat is great for collaboration and back-and-forth reasoning. Ambient agents excel at continuous monitoring and autonomous response. Instead of human-initiated conversations, ambient agents operate through detecting changes in upstream systems and maintaining context across time without constant prompting. The use cases are compelling and distinct from chat. Imagine a project management assistant that operates in two modes: you can chat with it to ask, "summarize project status", but it also runs in the background, constantly monitoring new tickets that are created, or deployment pipelines that fail, automatically reassigning tasks. Or consider a DevOps agent that you can query conversationally ("what's our current CPU usage?") but also monitors your infrastructure continuously, detecting anomalies and starting remediation before you even know there's a problem. The Challenge: Real-Time Change Detection Here's where building ambient agents gets tricky. While chat-based agents work perfectly within the request-response paradigm, ambient agents need something entirely different: continuous monitoring and real-time change detection. How do you efficiently detect changes across multiple data sources? How do you avoid the performance nightmare of constant polling? How do you ensure your agent reacts instantly when something critical happens? Developers trying to build ambient agents hit the same wall: creating a reliable, scalable change detection system is hard. You either end up with: Polling hell: Constantly querying databases, burning through resources, and still missing changes between polls Legacy system rewrites: Massive expensive multi-year projects to re-write legacy systems so that they produce domain events Webhook spaghetti: Managing dozens of event sources, each with different formats and reliability guarantees This is where the story takes an interesting turn. Enter Drasi: The Change Detection Engine You Didn't Know You Needed Drasi is not another AI framework. Instead, it solves the problem that ambient agents need solved: intelligent change detection. Think of it as the sensory system for your AI agents, the infrastructure that lets them perceive changes in the world. Drasi is built around three simple components: Sources: Connectivity to the systems that Drasi can observe as sources of change (PostgreSQL, MySQL, Cosmos DB, Kubernetes, EventHub) Continuous Queries: Graph-based queries (using Cypher/GQL) that monitor for specific change patterns Reactions: What happens when a continuous query detects changes, or lack thereof But here's the killer feature: Drasi doesn't just detect that something changed. It understands what changed and why it matters, and even if something should have changed but did not. Using continuous queries, you can define complex conditions that your agents care about, and Drasi handles all the plumbing to deliver those insights in real time. The Bridge: langchain-drasi Integration Now, detecting changes is only part of the challenge. You need to connect those changes to your AI agents in a way that makes sense. That's where langchain-drasi comes in, a purpose-built integration that bridges Drasi's change detection with LangChain's agent frameworks. It achieves this by leveraging the Drasi MCP Reaction, which exposes Drasi continuous queries as MCP resources. The integration provides a simple Tool that agents can use to: Discover available queries automatically Read current query results on demand Subscribe to real-time updates that flow directly into agent memory and workflow Here's what this looks like in practice: from langchain_drasi import create_drasi_tool, MCPConnectionConfig # Configure connection to Drasi MCP server mcp_config = MCPConnectionConfig(server_url="http://localhost:8083") # Create the tool with notification handlers drasi_tool = create_drasi_tool( mcp_config=mcp_config, notification_handlers=[buffer_handler, console_handler] ) # Now your agent can discover and subscribe to data changes # No more polling, no more webhooks, just reactive intelligence The beauty is in the notification handlers: pre-built components that determine how changes flow into your agent's consciousness: BufferHandler: Queues changes for sequential processing LangGraphMemoryHandler: Automatically integrates changes into agent checkpoints LoggingHandler: Integrates with standard logging infrastructure This isn't just plumbing; it's the foundation for what we might call "change-driven architecture" for AI systems. Example: The Seeker Agent Has Entered the Chat Let's make this concrete with my favorite example from the langchain-drasi repository: a hide and seek inspired non-player character (NPC) AI agent that seeks human players in a multi-player game environment. The Scenario Imagine a game where players move around a 2D map, updating their positions in a PostgreSQL database. But here's the twist: the NPC agent doesn't have omniscient vision. It can only detect players under specific conditions: Stationary targets: When a player doesn't move for more than 3 seconds (they're exposed) Frantic movement: When a player moves more than once in less than a second (panicking reveals your position) This creates interesting strategic gameplay, players must balance staying still (safe from detection but vulnerable if found) with moving carefully (one move per second is the sweet spot). The NPC agent seeks based on these glimpses of player activity. These detection rules are defined as Drasi continuous queries that monitor the player positions table. For reference, these are the two continuous queries we will use: When a player doesn't move for more than 3 seconds, this is a great example of detecting the absence of change use the trueLater function: MATCH (p:player { type: 'human' }) WHERE drasi.trueLater( drasi.changeDateTime(p) <= (datetime.realtime() - duration( { seconds: 3 } )), drasi.changeDateTime(p) + duration( { seconds: 3 } ) ) RETURN p.id, p.x, p.y When a player moves more than once in less than a second is an example of using the previousValue function to compare that current state with a prior state: MATCH (p:player { type: 'human' }) WHERE drasi.changeDateTime(p).epochMillis - drasi.previousValue(drasi.changeDateTime(p).epochMillis) < 1000 RETURN p.id, p.x, p.y Here's the neat part: you can dynamically adjust the game's difficulty by adding or removing queries with different conditions; no code changes required, just deploy new Drasi queries. The traditional approach would have your agent constantly polling the data source checking these conditions: "Any player moves? How about now? Now? Now?" The Workflow in Action The agent operates through a LangGraph based state machine with two distinct phases: 1. Setup Phase (First Run Only) Setup queries prompt - Prompts the AI model to discover available Drasi queries Setup queries call model - AI model calls the Drasi tool with discover operation Setup queries tools - Executes the Drasi tool calls to subscribe to relevant queries This phase loops until the AI model has discovered and subscribed to all relevant queries 2. Main Seeking Loop (Continuous) Check sensors - Consumes any new Drasi notifications from the buffer into the workflow state Evaluate targets - Uses AI model to parse sensor data and extract target positions Select and plan - Selects closest target and plans path Execute move - Executes the next move via game API Loop continues indefinitely, reacting to new notifications No polling. No delays. No wasted resources checking positions that don't meet the detection criteria. Just pure, reactive intelligence flowing from meaningful data changes to agent actions. The continuous queries act as intelligent filters, only alerting the agent when relevant changes occur. Click here for the full implementation The Bigger Picture: Change-Driven Architecture What we're seeing with Drasi and ambient agents isn't just a new tool, it's a new architectural pattern for AI systems. The core idea is profound: AI agents can react to the world changing, not just wait to be asked about it. This pattern enables entirely new categories of applications that complement traditional chat interfaces. The example might seem playful, but it demonstrates that AI agents can perceive and react to their environment in real time. Today it's seeking players in a game. Tomorrow it could be: Managing city traffic flows based on real-time sensor data Coordinating disaster response as situations evolve Optimizing supply chains as demand patterns shift Protecting networks as threats emerge The change detection infrastructure is here. The patterns are emerging. The only question is: what will you build? Where to Go from Here Ready to dive deeper? Here are your next steps: Explore Drasi: Head to drasi.io and discover the power of the change detection platform Try langchain-drasi: Clone the GitHub repository and run the Hide-and-Seek example yourself Join the conversation: The space is new and needs diverse perspectives. Join the community on Discord. Let us know if you have built ambient agents and what challenges you faced with real-time change detection.426Views2likes0CommentsFrom Policy to Practice: Built-In CIS Benchmarks on Azure - Flexible, Hybrid-Ready
Security is more important than ever. The industry-standard for secure machine configuration is the Center for Internet Security (CIS) Benchmarks. These benchmarks provide consensus-based prescriptive guidance to help organizations harden diverse systems, reduce risk, and streamline compliance with major regulatory frameworks and industry standards like NIST, HIPAA, and PCI DSS. In our previous post, we outlined our plans to improve the Linux server compliance and hardening experience on Azure and shared a vision for integrating CIS Benchmarks. Today, that vision has turned into reality. We're now announcing the next phase of this work: Center for Internet Security (CIS) Benchmarks are now available on Azure for all Azure endorsed distros, at no additional cost to Azure and Azure Arc customers. With today's announcement, you get access to the CIS Benchmarks on Azure with full parity to what’s published by the Center for Internet Security (CIS). You can adjust parameters or define exceptions, tailoring security to your needs and applying consistent controls across cloud, hybrid, and on-premises environments - without having to implement every control manually. Thanks to this flexible architecture, you can truly manage compliance as code. How we achieve parity To ensure accuracy and trust, we rely on and ingest CIS machine-readable Benchmark content (OVAL/XCCDF files) as the source of truth. This guarantees that the controls and rules you apply in Azure match the official CIS specifications, reducing drift and ensuring compliance confidence. What’s new under the hood At the core of this update is azure-osconfig’s new compliance engine - a lightweight, open-source module developed by the Azure Core Linux team. It evaluates Linux systems directly against industry-standard benchmarks like CIS, supporting both audit and, in the future, auto-remediation. This enables accurate, scalable compliance checks across large Linux fleets. Here you can read more about azure-osconfig. Dynamic rule evaluation The new compliance engine supports simple fact-checking operations, evaluation of logic operations on them (e.g., anyOf, allOf) and Lua based scripting, which allows to express complex checks required by the CIS Critical Security Controls - all evaluated natively without external scripts. Scalable architecture for large fleets When the assignment is created, the Azure control plane instructs the machine to pull the latest Policy package via the Machine Configuration agent. Azure-osconfig’s compliance engine is integrated as a light-weight library to the package and called by Machine Configuration agent for evaluation – which happens every 15-30minutes. This ensures near real-time compliance state without overwhelming resources and enables consistent evaluation across thousands of VMs and Azure Arc-enabled servers. Future-ready for remediation and enforcement While the Public Preview starts with audit-only mode, the roadmap includes per-rule remediation and enforcement using technologies like eBPF for kernel-level controls. This will allow proactive prevention of configuration drift and runtime hardening at scale. Please reach out if you interested in auto-remediation or enforcement. Extensibility beyond CIS Benchmarks The architecture was designed to support other security and compliance standards as well and isn’t limited to CIS Benchmarks. The compliance engine is modular, and we plan to extend the platform with STIG and other relevant industry benchmarks. This positions Azure as a platform for a place where you can manage your compliance from a single control-plane without duplicating efforts elsewhere. Collaboration with the CIS This milestone reflects a close collaboration between Microsoft and the CIS to bring industry-standard security guidance into Azure as a built-in capability. Our shared goal is to make cloud-native compliance practical and consistent, while giving customers the flexibility to meet their unique requirements. We are committed to continuously supporting new Benchmark releases, expanding coverage with new distributions and easing adoption through built-in workflows, such as moving from your current Benchmark version to a new version while preserving your custom configurations. Certification and trust We can proudly announce that azure-osconfig has met all the requirements and is officially certified by the CIS for Benchmark assessment, so you can trust compliance results as authoritative. Minor benchmark updates will be applied automatically, while major version will be released separately. We will include workflows to help migrate customizations seamlessly across versions. Key Highlights Built-in CIS Benchmarks for Azure Endorsed Linux distributions Full parity with official CIS Benchmarks content and certified by the CIS for Benchmark Assessment Flexible configuration: adjust parameters, define exceptions, tune severity Hybrid support: enforce the same baseline across Azure, on-prem, and multi-cloud with Azure Arc Reporting format in CIS tooling style Supported use cases Certified CIS Benchmarks for all Azure Endorsed Distros - Audit only (L1/L2 server profiles) Hybrid / On-premises and other cloud machines with Azure Arc for the supported distros Compliance as Code (example via Github -> Azure OIDC auth and API integration) Compatible with GuestConfig workbook What’s next? Our next mission is to bring the previously announced auto-remediation capability into this experience, expand the distribution coverage and elevate our workflows even further. We’re focused on empowering you to resolve issues while honoring the unique operational complexity of your environments. Stay tuned! Get Started Documentation link for this capability Enable CIS Benchmarks in Machine Configuration and select the “Official Center for Internet Security (CIS) Benchmarks for Linux Workloads” then select the distributions for your assignment, and customize as needed. In case if you want any additional distribution supported or have any feedback for azure-osconfig – please open an Azure support case or a Github issue here Relevant Ignite 2025 session: Hybrid workload compliance from policy to practice on Azure Connect with us at Ignite Meet the Linux team and stop by the Linux on Azure booth to see these innovations in action: Session Type Session Code Session Name Date/Time (PST) Theatre THR 712 Hybrid workload compliance from policy to practice on Azure Tue, Nov 18/ 3:15 PM – 3:45 PM Breakout BRK 143 Optimizing performance, deployments, and security for Linux on Azure Thu, Nov 20/ 1:00 PM – 1:45 PM Breakout BRK 144 Build, modernize, and secure AKS workloads with Azure Linux Wed, Nov 19/ 1:30 PM – 2:15 PM Breakout BRK 104 From VMs and containers to AI apps with Azure Red Hat OpenShift Thu, Nov 20/ 8:30 AM – 9:15 AM Theatre THR 701 From Container to Node: Building Minimal-CVE Solutions with Azure Linux Wed, Nov 19/ 3:30 PM – 4:00 PM Lab Lab 505 Fast track your Linux and PostgreSQL migration with Azure Migrate Tue, Nov 18/ 4:30 PM – 5:45 PM PST Wed, Nov 19/ 3:45 PM – 5:00 PM PST Thu, Nov 20/ 9:00 AM – 10:15 AM PST1.3KViews0likes0CommentseBPF-Powered Observability Beyond Azure: A Multi-Cloud Perspective with Retina
Kubernetes simplifies container orchestration but introduces observability challenges due to dynamic pod lifecycles and complex inter-service communication. eBPF technology addresses these issues by providing deep system insights and efficient monitoring. The open-source Retina project leverages eBPF for comprehensive, cloud-agnostic network observability across AKS, GKE, and EKS, enhancing troubleshooting and optimization through real-world demo scenarios.1.3KViews10likes0CommentsAutomating the Linux Quality Assurance with LISA on Azure
Introduction Building on the insights from our previous blog regarding how MSFT ensures the quality of Linux images, this article aims to elaborate on the open-source tools that are instrumental in securing exceptional performance, reliability, and overall excellence of virtual machines on Azure. While numerous testing tools are available for validating Linux kernels, guest OS images and user space packages across various cloud platforms, finding a comprehensive testing framework that addresses the entire platform stack remains a significant challenge. A robust framework is essential, one that seamlessly integrates with Azure's environment while providing the coverage for major testing tools, such as LTP and kselftest and covers critical areas like networking, storage and specialized workloads, including Confidential VMs, HPC, and GPU scenarios. This unified testing framework is invaluable for developers, Linux distribution providers, and customers who build custom kernels and images. This is where LISA (Linux Integration Services Automation) comes into play. LISA is an open-source tool specifically designed to automate and enhance the testing and validation processes for Linux kernels and guest OS images on Azure. In this blog, we will provide the history of LISA, its key advantages, the wide range of test cases it supports, and why it is an indispensable resource for the open-source community. Moreover, LISA is available under the MIT License, making it free to use, modify, and contribute. History of LISA LISA was initially developed as an internal tool by Microsoft to streamline the testing process of Linux images and kernel validations on Azure. Recognizing the value it could bring to the broader community, Microsoft open-sourced LISA, inviting developers and organizations worldwide to leverage and enhance its capabilities. This move aligned with Microsoft's growing commitment to open-source collaboration, fostering innovation and shared growth within the industry. LISA serves as a robust solution to validate and certify that Linux images meet the stringent requirements of modern cloud environments. By integrating LISA into the development and deployment pipeline, teams can: Enhance Quality Assurance: Catch and resolve issues early in the development cycle. Reduce Time to Market: Accelerate deployment by automating repetitive testing tasks. Build Trust with Users: Deliver stable and secure applications, bolstering user confidence. Collaborate and Innovate: Leverage community-driven improvements and share insights. Benefits of Using LISA Scalability: Designed to run large-scale test cases, from 1 test case to 10k test cases in one command. Multiple platform orchestration: LISA is created with modular design, to support run the same test cases on various platforms including Microsoft Azure, Windows HyperV, BareMetal, and other cloud-based platforms. Customization: Users can customize test cases, workflow, and other components to fit specific needs, allowing for targeted testing strategies. It’s like building kernels on-the-fly, sending results to custom database, etc. Community Collaboration: Being open source under the MIT License, LISA encourages community contributions, fostering continuous improvement and shared expertise. Extensive Test Coverage: It offers a rich suite of test cases covering various aspects of compatibility of Azure and Linux VMs, from kernel, storage, networking to middleware. How it works Infrastructure LISA is designed to be componentized and maximize compatibility with different distros. Test cases can focus only on test logic. Once test requirements (machines, CPU, memory, etc) are defined, just write the test logic without worrying about environment setup or stopping services on different distributions. Orchestration. LISA uses platform APIs to create, modify and delete VMs. For example, LISA uses Azure API to create VMs, run test cases, and delete VMs. During the test case running, LISA uses Azure API to collect serial log and can hot add/remove data disks. If other platforms implement the same serial log and data disk APIs, the test cases can run on the other platforms seamlessly. Ensure distro compatibility by abstracting over 100 commands in test cases, allowing focus on validation logic rather than distro compatibility. Pre-processing workflow assists in building the kernel on-the-fly, installing the kernel from package repositories, or modifying all test environments. Test matrix helps one run to test all. For example, one run can test different vm sizes on Azure, or different images, even different VM sizes and different images together. Anything is parameterizable, can be tested in a matrix. Customizable notifiers enable the saving of test results and files to any type of storage and database. Agentless and low dependency LISA operates test systems via SSH without requiring additional dependencies, ensuring compatibility with any system that supports SSH. Although some test cases require installing extra dependencies, LISA itself does not. This allows LISA to perform tests on systems with limited resources or even different operating systems. For instance, LISA can run on Linux, FreeBSD, Windows, and ESXi. Getting Started with LISA Ready to dive in? Visit the LISA project at aka.ms/lisa to access the documentation. Install: Follow the installation guide provided in the repository to set up LISA in your testing environment. Run: Follow the instructions to run LISA on local machine, Azure or existing systems. Extend: Follow the documents to extend LISA by test cases, data sources, tools, platform, workflow, etc. Join the Community: Engage with other users and contributors through forums and discussions to share experiences and best practices. Contribute: Modify existing test cases or create new ones to suit your needs. Share your contributions with the community to enhance LISA's capabilities. Conclusion LISA offers open-source collaborative testing solutions designed to operate across diverse environments and scenarios, effectively narrowing the gap between enterprise demands and community-led innovation. By leveraging LISA, customers can ensure their Linux deployments are reliable and optimized for performance. Its comprehensive testing capabilities, combined with the flexibility and support of an active community, make LISA an indispensable tool for anyone involved in Linux quality assurance and testing. Your feedback is invaluable, and we would greatly appreciate your insights.682Views1like0Comments