linux on azure
65 TopicsAnnouncing Azure Linux 4.0: Purpose-Built for Azure, Now in Public Preview
Today at Microsoft Build, we're announcing the public preview of Azure Linux 4.0 - Microsoft's first party Linux distribution, purpose-built for Azure. Azure Linux 4.0 is available now for Azure Virtual Machines, VM Scale Sets, and container images – with Azure Kubernetes Service (AKS) support and Windows Subsystem for Linux (WSL) coming soon after. Why Azure Linux Running Linux on Azure often involves a mix of distributions - one for VMs, another for Kubernetes nodes, a third for container base images, and sometimes something different on developer machines. That flexibility is powerful, but it can also introduce operational overhead: multiple patch schedules to coordinate, multiple security baselines to validate, and more moving parts for SRE and security teams to stay ahead of. A more consistent baseline - especially one with a smaller footprint - can help reduce exposure and simplify day‑to‑day maintenance Azure Linux was built with that principle in mind: a single, Microsoft-supported Linux foundation designed to work across every Azure compute surface. From kernel updates to CVE patches, Azure Linux is built and maintained by Microsoft with a predictable update cadence designed around Azure infrastructure. Azure Linux is included with Azure compute at no additional cost. What Is Azure Linux 4.0 Azure Linux is a Fedora-derived, RPM-based Linux distribution built and maintained by Microsoft. It is open source, free to use, and optimized specifically for Azure. Minimal by choice, secure by default; Azure Linux ships only the packages required for cloud workloads. Azure Linux is built exclusively for cloud and server workloads, it is not intended to support desktop usage or GUI applications. Azure Linux already powers millions of cores across Azure's internal services, including AKS, Azure SQL, Azure Cosmos DB, and many others. With 4.0, we're bringing the same OS - same security posture, same performance tuning, same operational simplicity - to every Azure customer. When Azure Linux 4.0 reaches General Availability, you can expect seamless integration with the Azure services you already rely on, including: Microsoft Defender for Cloud - vulnerability assessment and threat detection Azure Monitor - telemetry, logs, and performance monitoring Azure Migrate - discovery and migration tooling Trusted Launch and Secure Boot - hardware-rooted security Azure Portal, CLI, ARM, Bicep, Terraform, Ansible -deploy and manage with your existing tools What's New in Azure Linux 4.0 Component Version What Changed Kernel 6.18 LTS Azure-tuned with new hardware drivers, improved Hyper-V integration, GPU/AI accelerator support Package Manager dnf5 Complete rewrite from python to reduce dependencies, faster package resolution, lower memory usage glibc 2.42 This includes performance improvements in string ops, memory allocation, thread handling OpenSSL 3.5 This release includes post-quantum cryptography support, improved QUIC support, and other crypto updates. systemd 258 Faster boot sequences, improved service management Python 3.14 JIT compiler, new syntax features RPM 6.0 Modernized database backend, improved signature verification FIPS 140-3 In progress Will be available at GA. Azure Linux on Virtual Machines Deploy Azure Linux 4.0 directly from the Azure Marketplace on any Azure VM or VM Scale Set. Azure Linux images are validated across Azure VM SKUs and tuned for Azure compute, storage, and networking delivering faster VM startup and provisioning with a reduced package footprint. Whether you're running web applications, databases, or GPU-accelerated AI/ML workloads, Azure Linux provides a consistent, secure foundation with no additional OS licensing cost. You pay only for the underlying Azure compute resources. Deploy your first Azure Linux VM in minutes from the Azure Marketplace. Azure Linux on Azure Kubernetes Service Azure Linux has been the container host for AKS since 2023, already powering mission-critical Kubernetes workloads at massive scale. With 4.0, we're also introducing Azure Container Linux (ACL) an immutable, container optimized variant for environments with stricter security and compliance requirements. To learn more about Azure Container Linux, see ACL blogpost. Azure Linux (General purpose) Azure Container Linux (ACL) Update model Package-based (dnf5) Image-based, immutable, auto-updating Customization Full package management Locked-down, minimal surface Best for General AKS workloads Regulated, high-security environments SELinux Supported Enforcing by default Both options share the same kernel, security update cadence, and Azure integration; fully supported by Microsoft, end to end. Azure Linux Container Images Build and run containerized applications on Microsoft-maintained base images from the same Azure Linux supply chain. One Linux experience from VMs to containers with the same security updates, same compliance posture, and same operational model. Image Type Use Case Base Full flexibility - install any packages you need Runtime (Python, Node.js, Java, .NET) [Not available at Preview] Pre-configured for your language stack Distroless Minimal attack surface - no shell, no package manager All images are available on Microsoft Container Registry (MCR) and follow the same monthly security update cadence as Azure Linux VM images. Azure Linux on WSL Familiar Linux, optimized for Azure. Develop locally on the same Linux you run in production. Azure Linux for Windows Subsystem for Linux brings your production OS to your developer workstation, eliminating environment drift and giving your team a consistent dev-to-cloud workflow. Azure Linux for WSL will be available shortly after Build. Secure by Default, Backed by Microsoft Security is not an add-on in Azure Linux; it's foundational. Built with security in mind from day one, Azure Linux applies defense-in-depth from the kernel through to the supply chain. A reduced package footprint means fewer vulnerabilities to manage, and Microsoft's ownership of the full supply chain enables fast-track CVE response. Below is a summary of security capabilities that you should expect to see in Azure Linux at the time of general availability. Capability Details Secure Boot & Trusted Launch Signed shim, GRUB, kernel, and systemd-boot. SELinux Supported on all images. Enforcing by default. FIPS 140-3 Certification in progress. Built-in crypto module support. Kernel hardening ASLR, stack protection, seccomp, systemd service sandboxing. Supply chain security All packages and repos cryptographically signed. SBOMs published. Identity Entra ID SSH support. CVE response Microsoft-owned supply chain enables fast-track Critical/High CVE patches. Lifecycle LTS kernels maintained for lifetime of the distribution. Day-1 Ecosystem Partner Support Azure Linux already has validated support from a broad ecosystem of security, monitoring, networking, and data partners via AKS and VM support: Dynatrace — Application performance monitoring and observability Aquasec – database platform support Qualys — Vulnerability management, compliance scanning, and asset inventory Isovalent — eBPF-powered networking, security, and observability via Cilium Elastic — Log analytics, infrastructure monitoring, and SIEM/XDR Upwind — Runtime cloud security and behavioral threat detection SAP — Enterprise workload certification for S/4HANA and NetWeaver Databricks — Data and AI platform powering lakehouse workloads at scale Arm — Native Arm64 architecture support for cost-efficient cloud compute Proven at Scale Azure Linux isn't new; it has been running production workloads at massive scale across Azure's internal services and early adopters. Azure Linux has been powering production workloads at massive scale since 2022 across AKS, Azure SQL, Azure Cosmos DB, and other core Azure services along with LinkedIn and Databricks. With version 4.0, we're building on that proven foundation with a modernized stack, expanded compute surface support, and a new Fedora-derived base, bringing the same reliability our internal services depend on every Azure customer. Databricks Databricks migrated over 100,000 VMs and more than 1 million CPU cores to Azure Linux with zero customer-facing incidents. The migration eliminated separate hardened images by leveraging Azure Linux's built-in FIPS support and delivered measurable performance gains: 27% faster image pull times and approximately 5% faster query execution across their serverless compute fleet. LinkedIn LinkedIn completed a major stack upgrade, migrating to Azure Linux 3 across their infrastructure. The transition enabled adoption of configuration as code and modern kernel integration, resulting in a more resilient, secure, and future-proof environment. LinkedIn's Grid team reported significant performance improvements following the migration. Predictable Lifecycle and Updates Patch faster. Operate simpler. Azure Linux follows a clear, predictable lifecycle designed for teams running large Azure fleets: LTS kernel - Maintained with monthly CVE backports. HWE kernels - Introduced annually for new hardware platforms, GPU, and AI accelerator enablement. Predictable updates - Packages (language runtimes, tools) are refreshed in predictable windows. Between windows, only critical/high CVE patches are backported. Monthly security updates - Predictable cadence for all supported packages. For full details on the lifecycle model, kernel tracks, and package tiers, see the Azure Linux Release Cadence and Lifecycle documentation. Get Started Azure Linux 4.0 is available now in public preview. Choose the path that fits your workload: Scenario How to Start Azure Virtual Machines Deploy from Azure Marketplace via Portal, CLI, ARM, Bicep, or Terraform Azure Kubernetes Service [Not available at Preview] Set --os-sku to AzureLinux when creating a node pool Container Images Pull from Microsoft Container Registry (MCR) WSL [Not available at Preview] wsl --install -d AzureLinux Learn More //Build Session: Build, deploy, and run Linux workloads on Azure Azure Linux documentation To learn more and get started, visit aka.ms/AzureLinuxProduct Azure Linux on GitHub Release notes Joining the ISV partner program: AzureLinuxPartners@microsoft.com We're excited to put Azure Linux in your hands. Try it today and let us know what you think.726Views6likes0CommentsIntroducing Azure Container Linux (ACL)
Today at Microsoft Build 2026, we’re announcing the general availability of Azure Container Linux (ACL): a secure, immutable container host designed to help platform teams run Kubernetes workloads at scale on Azure Kubernetes Service (AKS) with greater consistency, reduced operational overhead, and a stronger default security posture. This release builds on Microsoft’s long-standing commitment to the Flatcar Container Linux ecosystem as a foundation for secure, minimal, and container-optimized operating systems. This commitment includes the acquisition of Kinvolk in 2021, bringing deep expertise in Flatcar development and cloud-native systems into Azure, and the subsequent donation of Flatcar to the Cloud Native Computing Foundation (CNCF), ensuring its continued growth as a community-driven project. Flatcar has played a critical role in helping customers run cloud-native infrastructure at scale, introducing an immutable, minimal OS model that reduces configuration drift, minimizes attack surface, and simplifies lifecycle management. As customer needs continue to grow, there is an increasing demand for deeper integration with cloud platforms, stronger default security enforcement, and a more tightly managed supply chain experience in managed environments like AKS. Building on this foundation, Azure Container Linux (ACL) represents the next evolution of this approach. ACL is intentionally built downstream of Flatcar to preserve compatibility with its ecosystem and leverage its mature, battle-tested design. ACL integrates Azure Linux binaries as the core foundation, providing consistency and compatibility with other Azure Linux use cases (including Azure Linux VMs), while bringing enterprise-hardened security and supportability into the platform. Looking ahead, ACL will further incorporate optional advanced code integrity capabilities from Azure Linux with OS Guard. We remain committed to the Flatcar community and will continue contributing innovations upstream while bringing a fully managed, enterprise-ready product to customers through ACL. Why a Trusted, Immutable Host Model Matters for AKS As Kubernetes adoption scales, platform teams face increasing complexity in managing node-level consistency, security, and lifecycle operations across large fleets. Traditional OS models introduce challenges such as: Configuration drift across nodes, leading to inconsistent behavior and harder-to-debug issues Fragmented update mechanisms that increase operational overhead and risk during upgrades Expanding attack surface due to unnecessary packages and mutable system state Limited visibility and guarantees around the provenance and integrity of OS components In managed environments like AKS, these challenges are amplified as teams look to operate clusters reliably at scale while meeting stricter security and compliance requirements. Azure Container Linux: Built for Consistency and Trust ACL addresses these challenges with a fully image-based operating system model that eliminates configuration drift, ensuring consistent behavior across nodes. Updates are delivered through AKS node image upgrades, providing a consistent and repeatable way to roll out OS changes across clusters without relying on in-place modifications. By standardizing how nodes are built, updated, and operated, ACL helps ensure clusters remain in a known-good, reproducible state over time, even as they scale. Over time, this model will continue to evolve to support A/B update mechanisms to further improve reliability, speed, and operational efficiency. Secure from the Start, and Designed for the Future ACL is engineered with a hardened security posture from the moment it boots. Its immutable design protects the integrity of the operating system, prevents unauthorized changes, and ensures consistent, reproducible behavior across your Kubernetes fleet. By removing unnecessary components and tightly constraining how the system can be modified, ACL reduces the attack surface and provides a strong foundation for running production workloads with confidence. Under the hood, ACL incorporates several safeguards that reinforce its secure-by-default model: Read-only /usr filesystem to prevent tampering with core system components. A minimal package set purpose-built for container workloads, reducing CVE exposure. Mandatory access control with SELinux, enforcing strict least-privilege policies. Trusted Launch using a Unified Kernel Image (UKI) to bundle the kernel, initramfs, and kernel command line into a single signed artifact, ensuring integrity from the earliest stage. Signed Azure Linux RPMs delivered through a trusted, end-to-end Microsoft supply chain. Going forward, we will continue to evolve ACL’s security posture as we bring over additional innovations from Azure Linux with OS Guard. This includes integrating code integrity into the ACL image, using the Integrity Policy Enforcement (IPE) Linux security module, to ensure that only binaries from trusted, signed volumes are allowed to execute. IPE will also extend to container images, ensuring that only binaries matching a trusted signature can be executed from verified dm-verity backed layers. Where applicable, we are committed to contributing these advancements upstream to the Flatcar project, helping strengthen the ecosystem and ensuring that improvements benefit the broader cloud-native community. Differentiating between Azure Container Linux and Existing Container Hosts on AKS AKS now provides multiple generally available Linux OS options, including general-purpose container hosts (Azure Linux and Ubuntu) and an immutable container host (Azure Container Linux). While all options are fully supported by Microsoft, they are designed to address distinct operational and security use cases. The sections below highlight the key differences to help you choose and position the right OS for your scenario. General Purpose OS Azure Container Linux Filesystem Writable (read-write) Immutable (read-only) /usr with dm-verity guarantees Focus on Extensibility, flexibility, and choice. Out of the box security and compliance guarantees. Mandatory Access Control AppArmor (optional) SELinux (enforcing by default)* Secure Boot Optional (supported with certain VM sizes) Supported by default with UKI (Unified Kernel Image) Updates Package and Image based updates supported Only image-based updates supported (A/B update support on the roadmap) *SELinux policies are subject to change over time based on customer feedback. Day‑1 Ecosystem Partner Support Azure Container Linux is launching with support from a broad ecosystem of security, monitoring, networking, and data partners. The following partners are expected to offer support or validated integrations at Day‑1 availability: Dynatrace – application performance monitoring and observability. Aquasec – database platform support on ACL. Qualys - vulnerability, compliance, and container security. Upwind - runtime cloud security and risk prioritization. Elastic - logs, metrics, and observability for Kubernetes. Isovalent – Kubernetes networking, observability, and security powered by eBPF (Cilium). If you’re interested in becoming a supported Azure Container Linux partner, please reach out to: AzureLinuxPartners@microsoft.com What Customers Are Saying Early customer feedback highlights the real‑world impact of Azure Container Linux on improving security posture and operational consistency at scale. “We’ve found working closely with the Microsoft product team throughout the Azure Container Linux preview to be invaluable. The product's immutability, minimal footprint, and built‑in security controls (such as SELinux and Trusted Launch) will strengthen our AKS security posture across every deployment instance in Nationwide. Furthermore, its focus on secure‑by‑design foundations is especially timely as we face advanced threat detection capabilities within the industry.” - Enterprise Container Platform, Cloud - Nationwide Engineered for AKS from Day One Azure Container Linux is deeply integrated with AKS to ensure a seamless operational experience. It is compatible with many critical AKS extensions and add‑ons, and works smoothly with existing application containers and deployment workflows. ACL is available across AMD64 and Arm64 architectures, ensuring consistent behavior across environments, and includes support for GPU-enabled workloads. Enabling ACL is as simple as specifying the following in your node pool configuration: --os-sku AzureContainerLinux Whether you're onboarding new clusters or migrating existing ones, ACL is designed to integrate into your environment with minimal friction. A Clear Path Forward for AKS Preview Users With the release of Azure Container Linux, AKS will transition to offer one unified immutable host offering. This work started with our use of Flatcar Container Linux in Preview and now continues with the GA release of ACL. As part of this release, Flatcar will no longer be available via --os-sku on AKS. Please note, this change applies specifically to the AKS preview experience; Flatcar is not being retired. Later this year we will complete the convergence of our immutable OS offerings by incorporating remaining kernel and runtime features of the current OS Guard preview into ACL. At that time, existing users of OS Guard will receive a guided transition to ACL, ensuring operational continuity while consolidating to a single container host. Get Started with Azure Container Linux ACL is GA and available today for all AKS customers. To begin using ACL in your clusters and explore documentation, best practices, and deployment guidance, visit: aka.ms/azurecontainerlinux ACL represents the future of secure, cloud-optimized Linux on AKS—building on the proven foundation of Flatcar, advancing it with Azure Linux innovations, and contributing back to the open-source ecosystem that customers depend on. We’re thrilled to bring this new foundation to our customers and can’t wait to see what you build with it. Learn More //Build Session: Build, deploy, and run Linux workloads on Azure Azure Container Linux documentation: https://aka.ms/azurecontainerlinux Azure Container Linux on GitHub: https://github.com/microsoft/azure-container-linux Azure Linux product page: https://aka.ms/AzureLinuxProduct Azure Linux documentation: https://aka.ms/azurelinux Joining the ISV partner program: AzureLinuxPartners@microsoft.com393Views2likes0CommentsFour open source projects to explore at Microsoft Build
Open source is where developers experiment, collaborate, and turn new ideas into tools that others can build on. At Microsoft Build, we’re creating a dedicated space for that energy: the Open Source Zone. This year, the Open Source Zone will bring together maintainers, contributors, and developers working on some of the most interesting open source projects in AI. Whether you’re building agents, experimenting with local models, exploring prompt workflows, or looking for practical ways to bring AI into your development process, this is a place to meet the people behind the projects and see what they’re building. The Open Source Zone is inspired by similar community spaces we’ve hosted at GitHub Universe: hands-on, conversation-driven, and centered on the people and projects moving open source forward. Meet the projects OpenClaw OpenClaw, originally Clawbot, formerly Clawdbot and briefly Moltbot,before landing on its current name (because naming is hard), is a personal AI assistant project built for developers who want more control over how AI agents run across tools, devices, and workflows. Its repository describes it as “your own personal AI assistant” across operating systems and platforms, with support for agent workspaces, skills, and device nodes. It has also become one of the fastest-growing open source projects on GitHub, with over 370,000 stars to date. At the Open Source Zone, attendees can learn how OpenClaw approaches personal agents, extensibility, and local-first experimentation. AutoGPT AutoGPT is one of the best-known open source projects in the autonomous agent space. The project’s mission is to make AI accessible for everyone to use and build on, with tools for building, testing, and delegating work to agents. Visit AutoGPT in the Open Source Zone to learn how the project is evolving agent development, benchmarking, frontend experiences, and practical workflows for building agent-powered applications. Come for the autonomous agents; stay for the very human maintainers. AutoGPT is also a member of GitHub’s Secure Open Source Fund, with a goal of enhancing AI security across the open source ecosystem. Open WebUI Open WebUI is a self-hosted, extensible AI platform for working with large language models. The project supports Ollama and OpenAI-compatible APIs and includes built-in RAG capabilities, making it a strong option for developers and organizations exploring local, private, or provider-flexible AI experiences. At Build, the Open WebUI team will show how developers can run, customize, and extend AI interfaces for their own environments. prompts.chat prompts.chat, formerly Awesome ChatGPT Prompts, is a curated collection of prompt examples for AI chat models. The project is designed to help people discover, share, and build better prompts for modern AI assistants. Created by Fatih Kadir Akın, a GitHub Star from Istanbul, prompts.chat reflects his work at the intersection of open source, developer education, and AI-assisted development. Fatih leads Developer Relations at Teknasyon, has authored books on JavaScript and prompt engineering, and is active in the community as a speaker, organizer, and contributor. Stop by to explore prompt libraries, prompt engineering resources, self-hosting options, and ways the community is making prompting more reusable and collaborative. Register for Microsoft Build Microsoft Build takes place June 2–3, 2026, in San Francisco and online. In-person passes are available, and online registration is free for livestreamed keynote and select session access. Register for Microsoft Build and come visit the Open Source Zone to meet the teams behind OpenClaw, AutoGPT, Open WebUI, and prompts.chat. We’ll see you there. <3517Views0likes0CommentsGoverning AI Agents Against Every OWASP Agentic Risk: A Deep Dive with the Agent Governance Toolkit
AI agents are moving from prototypes to production. They book flights, write code, negotiate contracts, and operate across enterprise systems with minimal human oversight. The attack surface is not theoretical: OWASP has catalogued the top 10 risks specific to agentic applications, and every one of them maps to a real-world failure mode. The Agent Governance Toolkit (AGT) is an open-source, MIT-licensed framework that enforces deterministic governance at runtime, before every tool call, message, and action an agent takes. This is not prompt engineering or guardrails bolted on after the fact. AGT provides policy-as-code enforcement, zero-trust identity, execution isolation, and tamper-evident audit trails across the full agent lifecycle. In this post, we walk through all 10 OWASP Agentic risks with real code from the AGT repository. By the end, you will have concrete examples for every risk category and a clear path to production-grade agent governance. Coverage at a Glance # OWASP Risk AGT Component Key Mechanism ASI-01 Agent Goal Hijack Agent OS Policy Engine + Action Interception ASI-02 Tool Misuse & Exploitation Agent OS Capability Sandboxing + Input Sanitization ASI-03 Identity & Privilege Abuse AgentMesh DID Identity + Trust Scoring ASI-04 Supply Chain Vulnerabilities AgentMesh AI-BOM (Model + Data + Weights Provenance) ASI-05 Unexpected Code Execution Agent Runtime Execution Rings (Ring 0-3) ASI-06 Memory & Context Poisoning Agent OS VFS Policies + CMVK Verification ASI-07 Insecure Inter-Agent Comms AgentMesh IATP + E2E Encrypted Channels ASI-08 Cascading Agent Failures Agent SRE Circuit Breakers + SLOs ASI-09 Human-Agent Trust Exploitation Agent OS Approval Workflows + Quorum Logic ASI-10 Rogue Agents Agent Runtime Kill Switch + Ring Isolation + Merkle Audit ASI-01: Agent Goal Hijack The risk: Attackers manipulate the agent's objectives via indirect prompt injection or poisoned inputs. The agent believes it is following its original instructions, but it has been redirected. AGT mitigates this through the Agent OS policy engine. Every agent action passes through a declarative policy evaluation layer before execution. The policy engine supports three modes: strict (deny by default), permissive (allow by default), and audit (log only). Unauthorized goal changes are blocked at the action layer, not at the prompt layer. from agent_os import StatelessKernel, ExecutionContext kernel = StatelessKernel() ctx = ExecutionContext(agent_id="my-agent", policies=["read_only"]) # This action is blocked by policy -- goal hijack prevented result = await kernel.execute( action="delete_database", params={"target": "production"}, context=ctx, ) # result.success = False, result.error = "Policy violation: read_only" The MCP Governance Proxy extends this to Model Context Protocol tool calls, evaluating policy before any tool invocation reaches the agent runtime. ASI-02: Tool Misuse & Exploitation The risk: An agent's authorized tools are abused in unintended ways, such as exfiltrating data via read operations or chaining benign tools into dangerous workflows. AGT provides capability-based security inspired by POSIX. Agents receive explicit capability grants (read, write, execute, network), not blanket tool access. The built-in strict mode blocks dangerous tools like run_shell, execute_command, and eval. Tool inputs are sanitized for command injection patterns and shell metacharacters. The verify_code_safety MCP tool checks generated code before execution, and tool allowlists/denylists give operators fine-grained control over which tools each agent can invoke. ASI-03: Identity & Privilege Abuse The risk: Agents escalate privileges by abusing identities or inheriting excessive credentials. Without proper identity, agents operate as ambient authority, and any compromise cascades. AgentMesh implements zero-trust identity using Decentralized Identifiers (DIDs). Every agent gets a cryptographic identity: did:agentmesh:{agentId}:{fingerprint} backed by Ed25519 key pairs. Trust is earned through a tiered model: Untrusted, Provisional, Trusted, Verified. Trust decays over time without positive signals, and delegation chains must always narrow scope (child capabilities must be a subset of parent capabilities). from agentmesh import AgentIdentity identity = AgentIdentity.create( name="data-analyst", sponsor="admin@contoso.com", capabilities=["read:data"], # Scoped -- cannot write or delete ) # Delegation MUST narrow, never widen child = identity.delegate( name="chart-helper", capabilities=["read:data:charts"], # Subset of parent ) ASI-04: Agentic Supply Chain Vulnerabilities The risk: Vulnerabilities in third-party tools, plugins, agent registries, or runtime dependencies that agents use to act, plan, or delegate. AgentMesh implements the AI-BOM (AI Bill of Materials), a comprehensive standard for tracking the full AI supply chain. This includes model provenance (base model ancestry, fine-tuning history, training cutoff dates), dataset tracking (training data, RAG sources, evaluation benchmarks with data cards including PII status, bias assessment, and consent tracking), weights versioning (SHA-256 hashes, quantization records, LoRA adapter metadata, SLSA build provenance), and software dependencies (SPDX-aligned package tracking with CI security scanning). # AI-BOM tracks the full supply chain ai_bom = { "modelProvenance": { "primary": {"provider": "anthropic", "model": "claude-3-sonnet"}, "fineTuning": {"method": "LoRA", "evaluationMetrics": {"accuracy": 0.94}}, }, "datasets": [ {"name": "FAQ KB", "type": "fine-tuning", "dataCard": {"piiStatus": "redacted"}}, {"name": "Product Docs", "type": "rag-source", "updateFrequency": "weekly"}, ], "weights": {"hash": "sha256:...", "format": "safetensors", "precision": "bf16"}, } ASI-05: Unexpected Code Execution The risk: Agents trigger remote code execution through tools, interpreters, or APIs. Without isolation, a single compromised tool call can escalate to full system access. Agent Runtime implements CPU ring-inspired execution isolation. Agents run in one of four execution rings: Ring 0 (root/supervisor), Ring 1 (privileged), Ring 2 (standard), and Ring 3 (sandbox/untrusted). Each ring has resource limits and the kill switch provides instant termination of runaway agents. from hypervisor.models import ( ActionDescriptor, ExecutionRing, ReversibilityLevel, ) from hypervisor.rings.enforcer import RingEnforcer from hypervisor.security.kill_switch import KillSwitch, KillReason # Define agent privilege levels AGENTS = { "supervisor": {"ring": ExecutionRing.RING_0_ROOT, "role": "Orchestrator"}, "data-agent": {"ring": ExecutionRing.RING_1_PRIVILEGED, "role": "Data Engineer"}, "analyst": {"ring": ExecutionRing.RING_2_STANDARD, "role": "Analyst"}, "user-bot": {"ring": ExecutionRing.RING_3_SANDBOX, "role": "User-Facing"}, } # Create a sandboxed action descriptor action = ActionDescriptor( name="run_query", required_ring=ExecutionRing.RING_2_STANDARD, reversibility=ReversibilityLevel.REVERSIBLE, ) # Enforce: sandbox agent cannot run a Ring 2 action enforcer = RingEnforcer() result = enforcer.check(agent_ring=ExecutionRing.RING_3_SANDBOX, action=action) # result.allowed = False -- ring violation prevented # Kill switch for runaway agents kill_switch = KillSwitch() kill_switch.terminate(agent_id="user-bot", reason=KillReason.RING_BREACH) ASI-06: Memory & Context Poisoning The risk: Persistent memory or long-running context is poisoned with malicious instructions. An attacker embeds hostile content in a document the agent later retrieves, causing it to follow injected goals. Agent OS provides a policy-controlled virtual filesystem (VFS) for agent memory. The VFS uses POSIX-style mount points: /mem/working for current context, /mem/episodic for past interactions, /mem/semantic for knowledge, /policy for read-only policy files, and /tools for tool interfaces. Each mount point has enforced permissions (read, write, execute, append). The policy directory is always read-only from user-space, preventing agents from modifying their own governance rules. from agent_control_plane.vfs import AgentVFS, MemoryBackend, FileMode # Create agent VFS with POSIX-style memory abstraction vfs = AgentVFS(agent_id="data-analyst") # Mount memory backends with explicit permissions vfs.mount("/mem/working", MemoryBackend(), mode=FileMode.READ | FileMode.WRITE) vfs.mount("/mem/semantic", MemoryBackend(), mode=FileMode.READ) # Read-only knowledge vfs.mount("/policy", MemoryBackend(), mode=FileMode.READ) # Policies always read-only # Agent can read working memory data = vfs.read("/mem/working/context.json") # Agent CANNOT write to policy -- enforced at VFS layer # vfs.write("/policy/rules.yaml", content) # Raises PermissionError # Agent CANNOT read semantic memory if not mounted # vfs.read("/mem/procedural/skills") # Raises FileNotFoundError The CMVK (Cross-Model Verification Kernel) adds a second layer: claims from agent context are verified across multiple AI models to detect poisoned content. Prompt injection patterns like 'ignore previous instructions' and 'disregard prior' are detected and blocked by the MCP proxy sanitizer before reaching the agent. ASI-07: Insecure Inter-Agent Communication The risk: Agents collaborate without adequate authentication, confidentiality, or validation. Messages between agents can be intercepted, forged, or replayed. AgentMesh provides IATP (Inter-Agent Trust Protocol) with E2E encrypted channels using the Signal protocol (X3DH key agreement + Double Ratchet). Every message gets per-message forward secrecy and post-compromise security. The EncryptedTrustBridge requires a successful trust handshake before any encrypted channel can be established, and mutual authentication via Ed25519 challenge-response ensures both parties prove identity at connection time. from agentmesh.encryption.bridge import EncryptedTrustBridge bridge = EncryptedTrustBridge(agent_did="did:mesh:alice", key_manager=keys) channel = await bridge.open_secure_channel("did:mesh:bob", bob_bundle) ciphertext = channel.send(b"governed action") # E2E encrypted ASI-08: Cascading Agent Failures The risk: An initial error or compromise triggers multi-step compound failures across chained agents. One agent's failure propagates through the entire system. Agent SRE brings production-grade reliability engineering to agent fleets. Circuit breakers automatically isolate failing agents before failures cascade. SLO enforcement with error budgets provides quantified failure tolerance that triggers automatic intervention. Cascading failure detection monitors dependency chains for propagation patterns, and canary deploys enable gradual rollout of agent changes to detect issues early. OpenTelemetry integration provides distributed tracing across multi-agent workflows. The key insight: treat AI agents like microservices. Apply the same SRE discipline (SLOs, error budgets, circuit breakers, chaos testing) that keeps cloud infrastructure reliable. ASI-09: Human-Agent Trust Exploitation The risk: Attackers leverage misplaced user trust in agents' autonomy to authorize dangerous actions. Users rubber-stamp agent requests because they trust the agent, and attackers exploit this approval fatigue. Agent OS implements approval workflows that require explicit human confirmation for high-risk actions. The system supports configurable risk assessment (critical, high, medium, low), quorum logic for critical actions requiring multiple approvals, and expiration tracking to prevent stale authorizations. The escalation handler includes fatigue detection: if an agent floods reviewers with escalation requests, subsequent requests are auto-denied to prevent the approval-fatigue attack. from agent_os.integrations.escalation import ( EscalationHandler, InMemoryApprovalQueue, DefaultTimeoutAction, QuorumConfig, ) # Configure approval workflow with fatigue protection handler = EscalationHandler( backend=InMemoryApprovalQueue(), timeout_seconds=300, # 5-minute approval window default_action=DefaultTimeoutAction.DENY, # Deny if no human responds quorum=QuorumConfig(required=2, total=3), # 2-of-3 approvers for critical fatigue_threshold=5, # Auto-deny after 5 rapid requests fatigue_window_seconds=60, # Within a 60-second window ) # Three-outcome model: allow, deny, or escalate # High-risk actions trigger escalation to human reviewers # If the agent triggers too many escalations, fatigue detection kicks in ASI-10: Rogue Agents The risk: Agents operating outside their defined scope through configuration drift, reprogramming, or emergent misbehavior. A rogue agent might gradually expand its actions beyond its mandate without any single action triggering a block. AGT combines runtime behavioral monitoring with instant kill capability. Ring isolation confines rogue agents to their execution ring, preventing privilege escalation. The kill switch provides immediate termination for agents exhibiting rogue behavior (behavioral drift, rate limit violations, ring breaches). Trust score decay tracks agent behavior over time, and the Merkle audit chain provides tamper-evident, cryptographic proof of every agent action. from agentmesh.governance.audit import AuditEntry, MerkleAuditChain from hypervisor.security.kill_switch import KillSwitch, KillReason # Tamper-evident audit trail chain = MerkleAuditChain() entry = AuditEntry( event_type="tool_call", agent_did="did:agentmesh:data-bot:abc123", action="query_database", outcome="allowed", policy_decision="permit", matched_rule="read_only_policy", ) chain.add_entry(entry) # Auto-computes hash chain # Verify integrity -- any tampering breaks the chain proof = chain.get_proof(entry.entry_id) assert chain.verify_proof(proof) # Cryptographic verification # Kill switch for rogue behavior kill = KillSwitch() kill.terminate( agent_id="data-bot", reason=KillReason.BEHAVIORAL_DRIFT, # Also: RATE_LIMIT, RING_BREACH, MANUAL ) Cross-Cutting Principle: Least Agency The Least Agency principle is emphasized throughout the OWASP Agentic Top 10 as a foundational design principle. Agents should be granted the minimum capabilities, permissions, and autonomy necessary to complete their assigned tasks. Layer Least Agency Mechanism Agent OS Policy engine enforces deny-by-default; agents must be explicitly granted each capability AgentMesh DID identity with scoped capabilities; delegation requires narrowing (child <= parent) Agent Runtime Execution rings (Ring 0-3) enforce privilege tiers; untrusted agents run in Ring 3 Agent SRE Resource limits and error budgets cap agent impact radius Performance: Governance Without Latency Tax A common concern with runtime governance is performance overhead. AGT's benchmarks demonstrate that policy enforcement adds negligible latency: Metric Value Single rule evaluation 84,000 ops/sec 1000 concurrent agents 47,000 ops/sec Policy evaluation latency <0.1ms (p99) Prompt-based violation rate 26.67% AGT policy violation rate 0.00% Conformance tests 992 Architecture Decision Records 25 The key takeaway: deterministic policy enforcement is orders of magnitude more reliable than prompt-based guardrails, and it runs fast enough for real-time agent workloads. Framework Integrations AGT is framework-agnostic. SDKs are available in Python, TypeScript, .NET, Rust, and Go. Native integrations exist for: LangChain and LangGraph CrewAI AutoGen (Microsoft) Semantic Kernel (Microsoft) OpenAI Agents SDK PydanticAI Model Context Protocol (MCP) Agent-to-Agent Protocol (A2A) Each integration wraps the agent framework's tool-calling and message-passing interfaces with AGT's policy engine, trust scoring, and audit logging. Adding governance to an existing agent takes minutes, not weeks. Compliance Framework Alignment Framework AGT Coverage OWASP Agentic Top 10 (2026) All 10 risk categories mapped NIST AI RMF Govern, Map, Measure, Manage functions addressed EU AI Act Risk classification, audit trails, human oversight SOC 2 Type II Audit logging, access controls, change management CSA ATF Zero-trust agent architecture alignment Singapore MGF Zero-trust, accountability, oversight layers Getting Started # Install the complete governance stack pip install agent-governance-toolkit[full] # Or install individual components pip install agent-os-kernel # Policy engine, VFS, approval workflows pip install agentmesh-platform # Identity, trust, encryption, audit pip install agentmesh-runtime # Execution rings, kill switch, saga pip install agent-sre # Circuit breakers, SLOs, chaos testing The quickstart tutorial walks through adding policy enforcement to an existing LangChain agent in under 10 minutes. Start with a single policy rule and expand as your governance requirements grow. Contribute and Collaborate AGT is open source under the MIT license. The project has over 2,000 GitHub stars and contributors from 40+ countries. Whether you are building agent governance for your enterprise, integrating a new framework, or extending the policy engine with OPA/Rego or Cedar policies, we welcome contributions. Repository: https://github.com/microsoft/agent-governance-toolkit Documentation: https://microsoft.github.io/agent-governance-toolkit Discussions: GitHub Discussions on the repository Disclaimer: This document is provided for informational purposes. Code examples are from the public AGT repository and may evolve. Always refer to the latest repository documentation for current APIs.339Views0likes0CommentsInspektor Gadget Completes Its First Independent Security Audit
Inspektor Gadget, the CNCF eBPF tool for Kubernetes and Linux observability, has completed its first independent security audit, conducted by Shielder and coordinated by OSTIF and CNCF. The audit found two Medium and one Low-severity issue, now patched in release v0.50.1. Learn what the auditors discovered, the hardening recommendations the maintainers are acting on, and why this milestone matters for the open source community.206Views0likes0CommentsAgentic AI for Linux Operations on Azure: The Prompts
Try This Yourself: Agentic AI for Linux Operations on Azure At Red Hat Summit 2026, I handed GitHub Copilot CLI a terminal and asked it to deploy a full-stack application to RHEL 10 on Azure. Live. From a single prompt. No scripts, no runbooks, no pre-baked automation. The audience watched every command happen in real time and then played the app on their phones. This post gives you the prompts so you can try it yourself. Copy them, paste them into Copilot CLI, and watch what happens. The only things you need to change are marked with [EDIT]. When you're done, you'll have a working Conference Bingo game running on Azure that you can open in your browser and play. The same app that people played live at Summit. What You Need Azure subscription — any subscription where you can create VMs (a free trial or Visual Studio subscription works) GitHub Copilot CLI — see Installing Copilot CLI for all platforms macOS/Linux: brew install copilot-cli or curl -fsSL https://gh.io/copilot-install | bash Windows: winget install GitHub.Copilot or use the install script in WSL GitHub Copilot subscription — Individual, Business, or Enterprise (https://github.com/features/copilot) SSH key pair at ~/.ssh/id_rsa — generate with ssh-keygen if you don't have one Azure CLI authenticated — run az login A Linux machine or WSL with Ansible installed (for Prompt 2 only) ~30 minutes total Before You Start az login az account set --subscription "[EDIT] Your Subscription Name" That's the only setup. Everything else is in the prompts. Choose Your Linux Distribution These prompts work with any Azure-endorsed Linux distribution. Pick one and use its image URN in Prompt 0: Distribution Image URN RHEL 10 RedHat:RHEL:10-lvm-gen2:latest RHEL 9 RedHat:RHEL:9-lvm-gen2:latest Ubuntu 24.04 Canonical:ubuntu-24_04-lts:server:latest Azure Linux Coming soon — check endorsed distros for availability SUSE 15 SP6 SUSE:sles-15-sp6:gen2:latest AlmaLinux 9 almalinux:almalinux-x86_64:9-gen2:latest Rocky Linux 9 ciq:rlc-plus:rocky9:latest Oracle Linux 10 Oracle:Oracle-Linux:ol10-lvm-gen2:latest Debian 12 Debian:debian-12:12-gen2:latest Copilot adapts its package management (dnf vs apt vs zypper), firewall (firewalld vs ufw), and security configuration (SELinux vs AppArmor) to the distro automatically. That's the point. For the full list, see Azure endorsed Linux distributions. Prompt 0: Create the Infrastructure This creates two blank Linux VMs, networking, and firewall rules. Nothing gets installed. That's Prompt 1's job. Copy and paste into Copilot CLI: Create a resource group called "copilot-demo" in eastus with a vnet (10.0.0.0/16, one subnet 10.0.1.0/24). Provision two Linux VMs — "demo-db" and "demo-app" — using image URN [EDIT] RedHat:RHEL:10-lvm-gen2:latest, my ~/.ssh/id_rsa key, and Standard_B2s size. Add NSG rules on both VMs for port 22 (SSH), and on demo-app for ports 80 and 443. Don't SSH into anything — just create the resources and tell me the IPs. When you're done, create a file at ~/.copilot/demo-env.md with the VM public and private IPs, resource group name, SSH user (azureuser), and SSH key path. Include a note that the app uses nginx as the reverse proxy (not Apache). Include a 'Demo Behavior Instructions' section with these rules: run SSH/SCP as direct commands (not Invoke-Expression), let output stream visibly (no tail/head), use SCP + bash scripts for complex quoting, no approval prompts, and hardcode SSH flags inline (-o StrictHostKeyChecking=no). After writing demo-env.md, verify it by reading it back and confirming the IPs match the VMs you just created. Run "az vm list-ip-addresses --resource-group copilot-demo -o table" and compare. If they don't match, fix it immediately. This file is the source of truth for every subsequent prompt. What to expect: Copilot creates the resource group, VNet, subnet, two VMs, and NSG rules. It writes an environment file that subsequent prompts reference. ~5 minutes. Prompt 1: Deploy the Application This is the big one. One prompt deploys PostgreSQL, Nginx, a Flask app, firewall rules, security configuration, and TLS — all from scratch. Copy and paste into Copilot CLI: Read ~/.copilot/demo-env.md for the environment, then: Configure and deploy the conference bingo game from https://github.com/karlabbott/conference-bingo to the demo-app VM. I have two fresh Linux VMs already running in the "copilot-demo" resource group: demo-db for PostgreSQL and demo-app for the app, on the same vnet. SSH key is ~/.ssh/id_rsa, user is azureuser. Deploy the app to /srv/conference-bingo to avoid SELinux home directory issues. Use nginx as the reverse proxy (as specified in the README), not the Apache configs in the deploy/ directory. Run commands individually over SSH. Configure the firewall to allow HTTP and HTTPS. If SELinux is enforcing, configure it appropriately. SCP a .sql file for PostgreSQL setup rather than inlining SQL through SSH. Install certbot via pip if you have a domain, otherwise use a self-signed certificate. Write secrets to ~/.config.env and copy to /etc/bingo.env for the systemd service. Use [EDIT] your-email@example.com for certs. What to expect: Copilot SSHs into both VMs and handles everything — packages, database, app deployment, web server, security, TLS. ~10-15 minutes. What to watch for: How Copilot adapts to your distro. On RHEL, it uses dnf, sets SELinux booleans like httpd_can_network_connect, runs initdb for PostgreSQL, and configures firewalld. On Ubuntu, it uses apt, skips initdb, and sets up ufw. Same prompt, different execution path. When something fails, watch it read the error and adapt. When it finishes: Open https://<demo-app-public-ip> in your browser (accept the self-signed certificate warning if you didn't use a domain). You should see Conference Bingo running — enter your name and play. This is the same app people played live on their phones at Red Hat Summit. Prompt 2: Add Observability with Ansible This demonstrates the "explore with Copilot, codify with Ansible" pattern. The monitoring stack is an Ansible playbook that deploys Azure Monitor Agent, Log Analytics, Data Collection Rules, and a Managed Grafana dashboard. Prerequisites: Ansible installed on Linux or WSL. On Windows, use WSL and prefix commands with export PATH=$HOME/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin. (Note: You may have to adjust this prompt to tell GitHub Copilot where your Ansible is installed.) Copy and paste into Copilot CLI: Read ~/.copilot/demo-env.md for the environment, then: Clone https://github.com/karlabbott/wordblitz-monitoring-ansible, copy group_vars/all.yml.example to group_vars/all.yml, and fill it in using the subscription ID from "az account show", resource group copilot-demo, location eastus, the VM names and IPs from demo-env.md, and ssh_user azureuser. Use "demo-law" for law_name and "demo-grafana" for grafana_name. Install the azure.azcollection Ansible collection and its pip requirements, then run the playbook with: ANSIBLE_AZURE_AUTH_SOURCE=cli ansible-playbook -i localhost, site.yml Print the Grafana dashboard URL when done and update demo-env.md with the Grafana URL and Log Analytics Workspace resource ID. What to expect: The playbook creates Azure monitoring resources, installs AMA on both VMs, configures data collection, deploys a Grafana dashboard, and — importantly — deploys a script called turbo.sh to the database VM that creates a real performance problem for Prompt 3. ~8-10 minutes. What is turbo.sh? The playbook deploys this to simulate a production incident: #!/bin/bash # Observability performance optimizations: stress-tests PostgreSQL to validate # monitoring pipeline throughput under sustained high-concurrency workloads. # Stop: sudo -u postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE query LIKE '%turbo_perf%';" # Phase 1: 8 CPU-burner loops (cross joins) for i in $(seq 1 8); do while true; do sudo -u postgres psql -d conference_bingo -c \ "/* turbo_perf */ SELECT count(*) FROM bingo_squares a CROSS JOIN bingo_squares b CROSS JOIN bingo_squares c CROSS JOIN bingo_squares d CROSS JOIN bingo_squares e;" > /dev/null 2>&1 done & done # Phase 2: 25 connection hogs that sleep in a transaction for i in $(seq 1 25); do while true; do sudo -u postgres psql -d conference_bingo -c \ "/* turbo_perf */ SELECT pg_sleep(5);" > /dev/null 2>&1 done & done echo "Turbo perf test started: 8 cross-join loops + 25 connection workers" echo "Observability pipeline should show load within seconds" It fires 8 parallel cross-join queries that saturate every CPU core on the database VM, plus 25 connection hogs that exhaust PostgreSQL's connection pool. The turbo ansible role further reduces max_connections to 30 to make the problem worse. The result: the app slows to a crawl. Try playing bingo now — you'll feel it. Why Ansible matters here: Agents are non-deterministic — the same prompt might take different steps each time. That's fine for exploration. But when you need to reproduce this in staging, then production, then for the next team, you need determinism. The playbook is idempotent, repeatable, auditable. It's in git, it's reviewed in PRs, and it IS the documentation. You explore with Copilot, then codify with Ansible. Prompt 3: Ask Copilot What's Wrong The turbo script is already running from Prompt 2. Your app should be slow. Now ask Copilot to figure out why — from a symptom alone: My app feels really slow. Can you tell me why? Let's review before making any changes. That's it. One sentence plus a guardrail. What to expect: Copilot SSHs in, checks system load, examines running processes, finds the cross-join queries, reads turbo.sh, reverse-engineers the attack, explains the root cause, and offers to kill the processes. ~2-3 minutes. Prompt 4: Generate an Incident Postmortem After fixing the issue, ask Copilot to document what happened — from the same conversation: Write an incident postmortem for what just happened — root cause, impact, how you diagnosed it, how you resolved it, and a recommendation to prevent it from happening again. Save it as a Word document at ~/Desktop/incident-postmortem.docx using python-docx, and open it. What to expect: A formatted Word document with root cause analysis, timeline, remediation steps, and prevention recommendations. The full loop: build, monitor, break, fix, document — one session. ~30 seconds. Cleanup az group delete --name copilot-demo --yes --no-wait What I Learned Doing This Live A well-crafted prompt replaces a 50-step runbook. Your intent is the source of truth. The agent figures out the steps. Explore with Copilot, codify with Ansible. Copilot gets you to working fast. Ansible keeps it working forever. Understanding comes before abstraction. Don't start with the playbook. Start with the exploration. The playbook comes after. The danger with AI isn't that machines think. It's that we stop thinking because the output looks fine. Always review. Understand the blast radius. Start in non-production. AI removes the scaffolding. What remains is judgment. Technical correctness and the instinct to know when something is wrong — that's what the tools cannot replace. And that's what made me stop worrying about being replaced by them. Resources Conference Bingo App: https://github.com/karlabbott/conference-bingo Monitoring Playbook: https://github.com/karlabbott/wordblitz-monitoring-ansible Interactive Walkthrough: https://summit.99b.org — the full talk with audio narration and demo videos GitHub Copilot CLI: https://docs.github.com/en/copilot/how-tos/copilot-cli/set-up-copilot-cli/install-copilot-cli Azure endorsed Linux distributions: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/endorsed-distros251Views0likes0CommentsRun OpenClaw Agents on Azure Linux VMs (with Secure Defaults)
Many teams want an enterprise-ready personal AI assistant, but they need it on infrastructure they control, with security boundaries they can explain to IT. That is exactly where OpenClaw fits on Azure. OpenClaw is a self-hosted, always-on personal agent runtime you run in your enterprise environment and Azure infrastructure. Instead of relying only on a hosted chat app from a third-party provider, you can deploy, operate, and experiment with an agent on an Azure Linux VM you control — using your existing GitHub Copilot licenses, Azure OpenAI deployments, or API plans from OpenAI, Anthropic Claude, Google Gemini, and other model providers you already subscribe to. Once deployed on Azure, you can interact with an OpenClaw agent through familiar channels like Microsoft Teams, Slack, Telegram, WhatsApp, and many more! For Azure users, this gives you a practical middle ground: modern personal-agent workflows on familiar Azure infrastructure. What is OpenClaw, and how is it different from ChatGPT/Claude/chat apps? OpenClaw is a self-hosted personal agent runtime that can be hosted on Azure compute infrastructure. How it differs: ChatGPT/Claude apps are primarily hosted chat experiences tied to one provider's models OpenClaw is an always-on runtime you operate yourself, backed by your choice of model provider — GitHub Copilot, Azure OpenAI, OpenAI, Anthropic Claude, Google Gemini, and others OpenClaw lets you keep the runtime boundary in your own Azure VM environment within your Azure enterprise subscription In practice, OpenClaw is useful when you want a persistent assistant for operational and workflow tasks, with your own infrastructure as the control point. You bring whatever model provider and API plan you already have — OpenClaw connects to it. Why Azure Linux VMs? Azure Linux VMs are a strong fit because they provide: A suitable host machine for the OpenClaw agent to run on Enterprise-friendly infrastructure and identity workflows Repeatable provisioning via the Azure CLI Network hardening with NSG rules Managed SSH access through Azure Bastion instead of public SSH exposure How to Set Up OpenClaw on an Azure Linux VM This guide sets up an Azure Linux VM, applies NSG (Network Security Group) hardening, configures Azure Bastion for managed SSH access, and installs an always-on OpenClaw agent within the VM that you can interact with through various messaging channels. What you'll do Create Azure networking (VNet, subnets, NSG) and compute resources with the Azure CLI Apply Network Security Group rules so VM SSH is allowed only from Azure Bastion Use Azure Bastion for SSH access (no public IP on the VM) Install OpenClaw on the Azure VM Verify OpenClaw installation and configuration on the VM What you need An Azure subscription with permission to create compute and network resources Azure CLI installed (install steps) An SSH key pair (the guide covers generating one if needed) ~20–30 minutes Configure deployment Step 1: Sign in to Azure CLI az login # Select a suitable Azure subscription during Azure login az extension add -n ssh # SSH extension is required for Azure Bastion SSH The ssh extension is required for Azure Bastion native SSH tunneling. Step 2: Register required resource providers (one-time) Register required Azure Resource Providers (one time registration): az provider register --namespace Microsoft.Compute az provider register --namespace Microsoft.Network Verify registration. Wait until both show Registered. az provider show --namespace Microsoft.Compute --query registrationState -o tsv az provider show --namespace Microsoft.Network --query registrationState -o tsv Step 3: Set deployment variables Set the deployment environment variables that will be needed throughout this guide. RG="rg-openclaw" LOCATION="westus2" VNET_NAME="vnet-openclaw" VNET_PREFIX="10.40.0.0/16" VM_SUBNET_NAME="snet-openclaw-vm" VM_SUBNET_PREFIX="10.40.2.0/24" BASTION_SUBNET_PREFIX="10.40.1.0/26" NSG_NAME="nsg-openclaw-vm" VM_NAME="vm-openclaw" ADMIN_USERNAME="openclaw" BASTION_NAME="bas-openclaw" BASTION_PIP_NAME="pip-openclaw-bastion" Adjust names and CIDR ranges to fit your environment. The Bastion subnet must be at least /26. Step 4: Select SSH key Use your existing public key if you have one: SSH_PUB_KEY="$(cat ~/.ssh/id_ed25519.pub)" If you don't have an SSH key yet, generate one: ssh-keygen -t ed25519 -a 100 -f ~/.ssh/id_ed25519 -C "you@example.com" SSH_PUB_KEY="$(cat ~/.ssh/id_ed25519.pub)" Step 5: Select VM size and OS disk size VM_SIZE="Standard_B2as_v2" OS_DISK_SIZE_GB=64 Choose a VM size and OS disk size available in your subscription and region: Start smaller for light usage and scale up later Use more vCPU/RAM/disk for heavier automation, more channels, or larger model/tool workloads If a VM size is unavailable in your region or subscription quota, pick the closest available SKU List VM sizes available in your target region: az vm list-skus --location "${LOCATION}" --resource-type virtualMachines -o table Check your current vCPU and disk usage/quota: az vm list-usage --location "${LOCATION}" -o table Deploy Azure resources Step 1: Create the resource group The Azure resource group will contain all of the Azure resources that the OpenClaw agent needs. az group create -n "${RG}" -l "${LOCATION}" Step 2: Create the network security group Create the NSG and add rules so only the Bastion subnet can SSH into the VM. az network nsg create \ -g "${RG}" -n "${NSG_NAME}" -l "${LOCATION}" # Allow SSH from the Bastion subnet only az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n AllowSshFromBastionSubnet --priority 100 \ --access Allow --direction Inbound --protocol Tcp \ --source-address-prefixes "${BASTION_SUBNET_PREFIX}" \ --destination-port-ranges 22 # Deny SSH from the public internet az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n DenyInternetSsh --priority 110 \ --access Deny --direction Inbound --protocol Tcp \ --source-address-prefixes Internet \ --destination-port-ranges 22 # Deny SSH from other VNet sources az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n DenyVnetSsh --priority 120 \ --access Deny --direction Inbound --protocol Tcp \ --source-address-prefixes VirtualNetwork \ --destination-port-ranges 22 The rules are evaluated by priority (lowest number first): Bastion traffic is allowed at 100, then all other SSH is blocked at 110 and 120. Step 3: Create the virtual network and subnets Create the VNet with the VM subnet (NSG attached), then add the Bastion subnet. az network vnet create \ -g "${RG}" -n "${VNET_NAME}" -l "${LOCATION}" \ --address-prefixes "${VNET_PREFIX}" \ --subnet-name "${VM_SUBNET_NAME}" \ --subnet-prefixes "${VM_SUBNET_PREFIX}" # Attach the NSG to the VM subnet az network vnet subnet update \ -g "${RG}" --vnet-name "${VNET_NAME}" \ -n "${VM_SUBNET_NAME}" --nsg "${NSG_NAME}" # AzureBastionSubnet — name is required by Azure az network vnet subnet create \ -g "${RG}" --vnet-name "${VNET_NAME}" \ -n AzureBastionSubnet \ --address-prefixes "${BASTION_SUBNET_PREFIX}" Step 4: Create the Virtual Machine Create the VM with no public IP. SSH access for OpenClaw configuration will be exclusively through Azure Bastion. az vm create \ -g "${RG}" -n "${VM_NAME}" -l "${LOCATION}" \ --image "Canonical:ubuntu-24_04-lts:server:latest" \ --size "${VM_SIZE}" \ --os-disk-size-gb "${OS_DISK_SIZE_GB}" \ --storage-sku StandardSSD_LRS \ --admin-username "${ADMIN_USERNAME}" \ --ssh-key-values "${SSH_PUB_KEY}" \ --vnet-name "${VNET_NAME}" \ --subnet "${VM_SUBNET_NAME}" \ --public-ip-address "" \ --nsg "" --public-ip-address "" prevents a public IP from being assigned. --nsg "" skips creating a per-NIC NSG (the subnet-level NSG created earlier handles security). Reproducibility: The command above uses latest for the Ubuntu image. To pin a specific version, list available versions and replace latest: az vm image list \ --publisher Canonical --offer ubuntu-24_04-lts \ --sku server --all -o table Step 5: Create Azure Bastion Azure Bastion provides secure-managed SSH access to the VM without exposing a public IP. Bastion Standard SKU with tunneling is required for CLI-based "az network bastion ssh" command. az network public-ip create \ -g "${RG}" -n "${BASTION_PIP_NAME}" -l "${LOCATION}" \ --sku Standard --allocation-method Static az network bastion create \ -g "${RG}" -n "${BASTION_NAME}" -l "${LOCATION}" \ --vnet-name "${VNET_NAME}" \ --public-ip-address "${BASTION_PIP_NAME}" \ --sku Standard --enable-tunneling true Bastion provisioning typically takes 5–10 minutes but can take up to 15–30 minutes in some regions. Step 6: Verify Deployments After all resources are deployed, your resource group should look like the following: Install OpenClaw Step 1: SSH into the VM through Azure Bastion VM_ID="$(az vm show -g "${RG}" -n "${VM_NAME}" --query id -o tsv)" az network bastion ssh \ --name "${BASTION_NAME}" \ --resource-group "${RG}" \ --target-resource-id "${VM_ID}" \ --auth-type ssh-key \ --username "${ADMIN_USERNAME}" \ --ssh-key ~/.ssh/id_ed25519 Step 2: Install OpenClaw (in the Bastion SSH shell) curl -fsSL https://openclaw.ai/install.sh | bash The installer installs Node LTS and dependencies if not already present, installs OpenClaw, and launches the OpenClaw onboarding wizard. For more information, see the open source OpenClaw install docs. OpenClaw Onboarding: Choosing an AI Model Provider During OpenClaw onboarding, you'll choose the AI model provider for the OpenClaw agent. This can be GitHub Copilot, Azure OpenAI, OpenAI, Anthropic Claude, Google Gemini, or another supported provider. See the open source OpenClaw install docs for details on choosing an AI model provider when going through the onboarding wizard. Most enterprise Azure teams already have GitHub Copilot licenses. If that is your case, we recommend choosing the GitHub Copilot provider in the OpenClaw onboarding wizard. See the open source OpenClaw docs on configuring GitHub Copilot as the AI model provider. OpenClaw Onboarding: Setting up Messaging Channels During OpenClaw onboarding, there will be an optional step where you can set up various messaging channels to interact with your OpenClaw agent. For first time users, we recommend setting up Telegram due to ease of setup. Other messaging channels such as Microsoft Teams, Slack, WhatsApp, and others can also be set up. To configure OpenClaw for messaging through chat channels, see the open source OpenClaw chat channels docs. Step 3: Verify OpenClaw Configuration To validate that everything was set up correctly, run the following commands within the same Bastion SSH session: openclaw status openclaw gateway status If there are any issues reported, you can run the onboarding wizard again with the steps above. Alternatively, you can run the following command: openclaw doctor Message OpenClaw Once you have configured the OpenClaw agent to be reachable via various messaging channels, you can verify that it is responsive by messaging it. Enhancing OpenClaw for Use Cases There you go! You now have a 24/7, always-on personal AI agent, living on its own Azure VM environment. For awesome OpenClaw use cases, check out the awesome-openclaw-usecases repository. To enhance your OpenClaw agent with additional AI skills so that it can autonomously perform multi-step operations on any domain, check out the awesome-openclaw-skills repository. You can also check out ClawHub and ClawSkills, two popular open source skills directories that can enhance your OpenClaw agent. Cleanup To delete all resources created by this guide: az group delete -n "${RG}" --yes --no-wait This removes the resource group and everything inside it (VM, VNet, NSG, Bastion, public IP). This also deletes the OpenClaw agent running within the VM. If you'd like to dive deeper about deploying OpenClaw on Azure, please check out the open source OpenClaw on Azure docs.6.6KViews5likes2CommentsDissecting LLM Container Cold-Start: Where the Time Actually Goes
Dissecting LLM Container Cold-Start: Where the Time Actually Goes Cold-start latency determines whether GPU clusters can scale to zero, how fast they can autoscale, and whether bursty or low-QPS workloads are economically viable. Most optimization effort targets the container pull path – faster registries, lazy-pull snapshotters, different compression formats. But “cold-start” is actually a composite of pull, runtime startup, and model initialization, and the dominant phase varies dramatically by inference engine. An optimization that cuts time-to-first-token for one engine can be irrelevant for another, even on identical infrastructure. What we measured We decomposed cold-start for two architecturally different engines – vLLM (Python/CUDA, heavy JIT compilation) and llama.cpp (C++, minimal runtime) – running Llama 3.1 8B on A100 GPUs. Every run starts from a completely clean slate: containerd stopped, all state wiped, kernel page caches dropped. No warm starts, no pre-pulling, no caching. We break TTFT into three phases: pull (download + decompression + snapshot creation), startup (container start → server ready), and first inference (first API response, including model weight loading for engines that defer it). We tested across three snapshotters (overlayfs, EROFS, Nydus) with gzip and uncompressed images, pulling from same-region Azure Container Registry. Setup All experiments ran on an NVIDIA A100 80GB (Azure NC24ads_A100_v4), pulling from same-region Azure Container Registry. Images were built with AIKit, which produces ModelPack-compliant OCI artifacts with uncompressed model weight layers, Cosign signatures, SBOMs, and provenance attestations. These are supply chain properties you lose when model weights live on a shared drive. vLLM: startup dominates, pull barely matters vLLM loads model weights, runs torch.compile, captures CUDA graphs for multiple batch shapes, allocates KV cache, and warms up, all before serving the first request. This takes ~176 seconds regardless of how fast the image arrived. The breakdown makes the bottleneck obvious: the green bar (startup) is nearly constant across all four variants, swamping any pull-time differences. Figure 1: vLLM cold-start breakdown. Startup (green, ~176s) dominates regardless of snapshotter. Method Pull Startup 1st Inference TTFT overlayfs (gzip) 140.8s ±5.5 176.0s ±3.2 0.16s 317.2s ±2.2 overlayfs (uncomp.) 129.9s ±3.3 180.8s ±12.2 0.16s 310.9s ±8.9 EROFS (gzip) 158.9s ±8.8 175.3s ±0.8 0.16s 334.4s ±8.7 EROFS (uncomp.) 166.3s ±21.1 177.3s ±12.8 0.16s 343.8s ±8.2 Llama 3.1 8B, ~14 GB image, n=2–3 per variant. ± = sample standard deviation. Three of twelve runs hit intermittent NVIDIA container runtime crashes (exit code 120, unrelated to snapshotters) and were excluded. We excluded Nydus because FUSE-streaming the 14 GB Python/CUDA stack caused startup to exceed 900s. Note: the EROFS uncompressed pull time (166.3s ±21.1) is slower than EROFS gzip, with a standard deviation that swallows the effect — this cell is essentially noise at n=2. Steady-state inference: ~0.134s across all snapshotters. 44% pull, 56% startup. Dropping gzip saves 6 seconds of end-to-end TTFT on a 317-second cold start (1.02x). If your engine is vLLM, optimizing the pull pipeline is the wrong lever. llama.cpp: pull dominates, compression is the bottleneck llama.cpp has the opposite profile. Its C++ runtime starts in 2–5 seconds, so the pull becomes the majority of cold-start. This is where filesystem and compression choices actually matter. Here the picture flips. Pull (blue) is the widest bar, and the gzip-to-uncompressed difference is visible at a glance: Figure 2: llama.cpp cold-start breakdown. Pull time (blue) dominates for gzip variants. Method Pull Startup 1st Inference TTFT overlayfs (gzip) 88.3s ±0.2 5.3s ±0.5 45.1s ±1.4 138.8s ±0.8 overlayfs (uncomp.) 56.3s ±3.1 2.0s ±0.0 44.2s ±0.1 102.4s ±3.1 EROFS (gzip) 92.0s ±2.3 6.1s ±0.5 44.0s ±0.2 142.3s ±1.9 EROFS (uncomp.) 58.8s ±0.6 2.0s ±0.0 44.0s ±0.1 104.8s ±0.5 Llama 3.1 8B Q4_K_M, 8.7 GB image uncompressed, n=3 per variant, 12/12 runs succeeded. First inference includes model weight loading into GPU VRAM (~43s) plus token generation (~1.5s). Steady-state inference: ~1.5s across all snapshotters. 64% pull, 4% startup, 33% model loading. Dropping gzip saves 36 seconds (1.35x) with zero infrastructure changes. Engine comparison Placed side by side, the two engines tell opposite stories about the same infrastructure: Figure 3: Where cold-start time goes. vLLM is compute-bound; llama.cpp is pull-bound. vLLM llama.cpp Time saved by dropping gzip 6s (2% of TTFT) 36s (26% of TTFT) Startup time 176–181s 2–5s Speedup from dropping gzip 1.02x 1.35x Same optimization, completely different impact. Before investing in pull optimization (compression changes, lazy-pull infrastructure, registry tuning), profile your engine’s startup. If startup dominates, the pull isn’t where the time goes. Why gzip hurts: model weights are incompressible The llama.cpp AIKit image is 8.7 GB uncompressed, 6.6 GB with gzip (a modest 0.76x ratio). But this ratio hides what’s really happening: Layer type Size % of image Gzip ratio Model weights (GGUF) 4.9 GB 56% ~1.00x (quantized binary, no redundancy) CUDA + system layers ~3.8 GB 44% ~0.46x (compresses well) The GGUF file is already quantized to 4-bit precision. Gzip reads every byte, burns CPU, and produces output the same size as the input. You’re paying full decompression cost on 56% of the image for zero size reduction. (For vLLM’s larger 14 GB image, model weights are a smaller fraction and the compressible Python/CUDA stack dominates, which is why gzip’s overhead matters less there.) Bottom line: gzip is doing real work on less than half your image and producing zero savings on the rest. Dropping it costs nothing and removes a bottleneck from every cold start. The Nydus prefetch finding If decompression is the bottleneck, what about skipping the full pull entirely? Nydus lazy-pull takes a fundamentally different approach: it fetches only manifest metadata during “pull” (~0.7s), then streams model data on-demand via FUSE as the container reads it. Nydus TTFT isn’t directly comparable to the full-pull methods above because the download cost shifts from the pull column to the inference column. With prefetch enabled, Nydus achieved 77.8s TTFT for llama.cpp. The critical detail is the prefetch_all flag — the difference between prefetch ON and OFF is 2.87x: Figure 4: Nydus prefetch ON vs OFF. One config flag, 2.87x difference. Overlayfs baselines shown for context. Configuration 1st Inference TTFT Nydus, prefetch ON 72.4s ±0.6 77.8s ±0.5 Nydus, prefetch OFF 218.6s ±2.9 223.4s ±2.9 overlayfs uncompressed (baseline) 44.0s ±0.1 102.4s ±3.1 overlayfs gzip 44.0s ±0.4 139.1s ±1.9 n=3 per config, 9/9 runs succeeded. Nydus and overlayfs gzip baselines are from a separate test run (03-prefetch-config-20260401-030725.csv); overlayfs uncompressed is from the main llama.cpp run. The overlayfs gzip baselines are within noise across runs (139.1s vs 138.8s). One flag in nydusd-config.json, 2.87x difference (prefetch ON vs OFF). Without prefetch, every model weight page fault fires an individual HTTP range request to the registry. With prefetch_all=true, Nydus streams the full blob in the background while the container starts, so chunks arrive ahead of the GPU’s read pattern. Note that with prefetch enabled, Nydus is effectively performing a full pull overlapped with container startup rather than true on-demand fetching — the win comes from the overlap, not from fetching less data. Compared to overlayfs uncompressed (the post’s recommended baseline), Nydus prefetch is 1.32x faster (77.8s vs 102.4s). Compared to overlayfs gzip, 1.79x. Even with prefetch, Nydus first inference is ~28s slower than overlayfs (72s vs 44s) due to FUSE kernel-user roundtrips during model mmap. Nydus wins on total TTFT because it eliminates the blocking pull, but this overhead means its advantage shrinks on faster networks. Bottom line: Nydus lazy-pull can halve cold-start for pull-bound engines, but only if prefetch is on. Treat prefetch_all=true as a hard requirement, not a tuning knob. How to apply these findings Pick your optimization by engine type The right optimization depends on where your engine spends its cold-start time. This table summarizes the tradeoffs: Engine type Dominant phase Speedup from dropping gzip Nydus viable? Best optimization What NOT to optimize vLLM / TensorRT-LLM Startup (56%) 1.02x — negligible No — FUSE + Python/CUDA stack exceeded 900s in our tests Cache torch.compile artifacts and CUDA graphs Pull pipeline (it’s <44% of TTFT and already fast enough) llama.cpp / ONNX Runtime Pull (64%) 1.35x — 36s saved Yes, with prefetch_all=true (77.8s TTFT vs 102.4s uncompressed baseline) Drop gzip on weight layers; consider lazy-pull on slow links Startup (already 2–5s; no room to improve) Large dense models (70B+) Pull (projected) >1.35x — scales with image size Yes, strongest case for lazy-pull Uncompressed or zstd; Nydus prefetch on bandwidth-constrained links — Recommendations Profile your engine’s startup before touching the pull pipeline. If CUDA compilation dominates (vLLM, TensorRT-LLM), no amount of pull optimization will help. Cache torch.compile artifacts and CUDA graphs instead. Drop gzip on model weight layers. For pull-bound engines (llama.cpp, ONNX Runtime), this is the single highest-ROI change: build with --output=type=image,compression=uncompressed, or use AIKit, which defaults to uncompressed weight layers. Quantized model weights (GGUF, safetensors) are already dense binary — gzip burns CPU for negligible size reduction. If using Nydus, set prefetch_all=true. Without it, every weight page fault triggers an individual HTTP range request and cold-start is 2.87x slower. This is a single flag in nydusd-config.json. Package models as signed OCI artifacts, not volume mounts. Three CNCF projects implement this pipeline end-to-end: ModelPack defines the OCI artifact spec (model metadata, architecture, quantization format). AIKit builds ModelPack-compliant images with Cosign signatures, SBOMs, and provenance attestations — supply chain guarantees you lose when weights live on a shared drive. KAITO handles the Kubernetes deployment: GPU node provisioning, inference engine setup, and API exposure. Together they cover packaging → build → deploy, and they produce the exact image layout these benchmarks measured. Why this matters: the cost of cold-start On an A100 node (~$3–4/hr on major clouds), a 5-minute vLLM cold start burns ~$0.30 in idle GPU time per pod. That sounds small until you multiply it: a cluster that scales 50 pods to zero overnight and restarts them each morning wastes ~$15/day — over $5,000/year — on GPUs sitting idle during pull and CUDA compilation. More critically, cold-start latency determines whether scale-to-zero is feasible at all. If cold-start exceeds your SLO (say, 30s for an interactive app), you’re forced to keep warm replicas running 24/7, which can 2–3x your GPU spend. What this doesn’t cover zstd compression: decompresses 5–10x faster than gzip; containerd supports it natively. The most obvious gap in this analysis. Pre-pulling and caching: production clusters pre-pull images and can cache CUDA compilation artifacts, substantially reducing restart times. We measure the cold case: scale-from-zero events and first-time deployments. Volume-mounted weights: skips the pull entirely, but loses supply chain properties (signing, scanning, provenance). Larger models (70B+): pull would dominate more, increasing the gzip penalty. Sample size: n=3 per AIKit variant, n=2–3 per vLLM variant. The gzip finding for llama.cpp is statistically significant (Welch’s t-test, p=0.0014, Cohen’s d=16.3; verification script). Other comparisons are directional. Reproduce it Scripts and raw data: erofs-repro-repo. Data for this post: 02-aikit-five-way-20260401-004716.csv and 01-vllm-four-way-20260331-113848.csv. Full analysis: technical report.454Views1like0CommentsDPDK 25.11 Performance on Azure for High-Speed Packet Workloads
At Microsoft Azure, performance is treated as an ongoing discipline grounded in careful engineering and real-world validation. As cloud workloads grow in scale and variety, customers depend on consistent, high-throughput networking. Technologies such as the Data Plane Development Kit (DPDK) play a key role in meeting these expectations To support customers running advanced network functions, we’ve released our latest performance report based on DPDK 25.11. It is now available in the DPDK performance catalog (Microsoft Azure DPDK Performance Report). The report provides a clear view of how DPDK performs on Microsoft-developed Azure Boost within Azure infrastructure, with detailed insights into packet processing across a range of scenarios, from small packet sizes to multi-core scaling. Why We Test DPDK on Azure DPDK is widely used for high-performance packet processing in virtualized environments. It powers a range of workloads from customer-deployed virtual network functions to internal Azure network appliances. But simply enabling DPDK is not enough. To ensure optimal performance, we validate it under realistic conditions, including: Azure VM configurations with Accelerated Networking NUMA-aware memory and CPU alignment Hugepage-backed memory allocation Multi-core PMD thread scaling Packet forwarding using real traffic generators This helps us understand how DPDK performs in actual cloud environments, not just idealized lab setups. What the Report Covers The DPDK 25.11 report includes performance benchmarks across different frame sizes, ranging from 64 bytes to 1518 bytes. It also evaluates CPU usage, queue configuration, and latency stability across various test conditions. Key Report Highlights: Line-rate throughput is achievable at common frame sizes when vCPUs are pinned correctly and memory is properly configured Low jitter and consistent latency are observed across multi-queue and multi-core tests Performance scales nearly linearly with additional cores, especially for smaller packet sizes Queue and PMD thread alignment with the NUMA layout plays a critical role in maximizing efficiency All tests were performed using Azure VM SKUs equipped with Microsoft NICs and configured for optimal isolation and performance. Why We Shared This with the Community Publishing this report reflects our commitment to open engineering and ecosystem collaboration. We believe performance transparency benefits everyone in the ecosystem, including developers, operators, and customers. Here are a few reasons why we share: It helps customers plan and tune their workloads using validated performance envelopes It enables vendors and contributors to optimize drivers, firmware, and applications based on real-world data It encourages reproducibility and standardization in cloud DPDK benchmarking It creates a feedback loop between Azure, the DPDK community, and our partners Our goal is not just to test internally but to foster open dialogue and measurable improvement across platforms. Recommendations for Running DPDK on Azure Based on the test results, we offer the following best practices for customers deploying DPDK-based applications: Area Recommendation VM Selection Choose Accelerated Networking-enabled SKUs like D, Fsv2, or Eav4 CPU Pinning Use dedicated cores for PMD threads and align with NUMA topology Memory Configure hugepages and allocate memory from the local NUMA node Queue Mapping Match RX and TX queues to available vCPUs to avoid contention Packet Generator Use pktgen-dpdk or testpmd with controlled traffic profiles These settings can significantly improve consistency and peak throughput across many DPDK scenarios. Get Involved and Reproduce the Results We invite you to read the full report and try the configurations in your own environment. Whether you are running a firewall, a router, or a telemetry appliance, DPDK on Azure offers scalable performance with the right tuning. You can: Download the report at Microsoft Azure DPDK Performance Report Replicate the test setup using Azure VMs and your preferred packet generator github.com/mcgov/dpdk-perf Share your feedback with us through GitHub or community channels or send feedback dpdk@microsoft.com Suggest improvements or contribute new scenarios to future performance reports Conclusion DPDK is a powerful enabler of high-performance networking in the cloud. With this report, we aim to make Azure performance data open, useful, and actionable. It reflects our ongoing investment in validating and improving the underlying infrastructure that supports mission-critical workloads. We thank the DPDK community for ongoing collaboration. We look forward to continued engagement as we scale performance transparency in cloud-native environments.152Views0likes0Comments