linux on azure
64 TopicsFour open source projects to explore at Microsoft Build
Open source is where developers experiment, collaborate, and turn new ideas into tools that others can build on. At Microsoft Build, we’re creating a dedicated space for that energy: the Open Source Zone. This year, the Open Source Zone will bring together maintainers, contributors, and developers working on some of the most interesting open source projects in AI. Whether you’re building agents, experimenting with local models, exploring prompt workflows, or looking for practical ways to bring AI into your development process, this is a place to meet the people behind the projects and see what they’re building. The Open Source Zone is inspired by similar community spaces we’ve hosted at GitHub Universe: hands-on, conversation-driven, and centered on the people and projects moving open source forward. Meet the projects OpenClaw OpenClaw, originally Clawbot, formerly Clawdbot and briefly Moltbot,before landing on its current name (because naming is hard), is a personal AI assistant project built for developers who want more control over how AI agents run across tools, devices, and workflows. Its repository describes it as “your own personal AI assistant” across operating systems and platforms, with support for agent workspaces, skills, and device nodes. It has also become one of the fastest-growing open source projects on GitHub, with over 370,000 stars to date. At the Open Source Zone, attendees can learn how OpenClaw approaches personal agents, extensibility, and local-first experimentation. AutoGPT AutoGPT is one of the best-known open source projects in the autonomous agent space. The project’s mission is to make AI accessible for everyone to use and build on, with tools for building, testing, and delegating work to agents. Visit AutoGPT in the Open Source Zone to learn how the project is evolving agent development, benchmarking, frontend experiences, and practical workflows for building agent-powered applications. Come for the autonomous agents; stay for the very human maintainers. AutoGPT is also a member of GitHub’s Secure Open Source Fund, with a goal of enhancing AI security across the open source ecosystem. Open WebUI Open WebUI is a self-hosted, extensible AI platform for working with large language models. The project supports Ollama and OpenAI-compatible APIs and includes built-in RAG capabilities, making it a strong option for developers and organizations exploring local, private, or provider-flexible AI experiences. At Build, the Open WebUI team will show how developers can run, customize, and extend AI interfaces for their own environments. prompts.chat prompts.chat, formerly Awesome ChatGPT Prompts, is a curated collection of prompt examples for AI chat models. The project is designed to help people discover, share, and build better prompts for modern AI assistants. Created by Fatih Kadir Akın, a GitHub Star from Istanbul, prompts.chat reflects his work at the intersection of open source, developer education, and AI-assisted development. Fatih leads Developer Relations at Teknasyon, has authored books on JavaScript and prompt engineering, and is active in the community as a speaker, organizer, and contributor. Stop by to explore prompt libraries, prompt engineering resources, self-hosting options, and ways the community is making prompting more reusable and collaborative. Register for Microsoft Build Microsoft Build takes place June 2–3, 2026, in San Francisco and online. In-person passes are available, and online registration is free for livestreamed keynote and select session access. Register for Microsoft Build and come visit the Open Source Zone to meet the teams behind OpenClaw, AutoGPT, Open WebUI, and prompts.chat. We’ll see you there. <363Views0likes0CommentsGoverning AI Agents Against Every OWASP Agentic Risk: A Deep Dive with the Agent Governance Toolkit
AI agents are moving from prototypes to production. They book flights, write code, negotiate contracts, and operate across enterprise systems with minimal human oversight. The attack surface is not theoretical: OWASP has catalogued the top 10 risks specific to agentic applications, and every one of them maps to a real-world failure mode. The Agent Governance Toolkit (AGT) is an open-source, MIT-licensed framework that enforces deterministic governance at runtime, before every tool call, message, and action an agent takes. This is not prompt engineering or guardrails bolted on after the fact. AGT provides policy-as-code enforcement, zero-trust identity, execution isolation, and tamper-evident audit trails across the full agent lifecycle. In this post, we walk through all 10 OWASP Agentic risks with real code from the AGT repository. By the end, you will have concrete examples for every risk category and a clear path to production-grade agent governance. Coverage at a Glance # OWASP Risk AGT Component Key Mechanism ASI-01 Agent Goal Hijack Agent OS Policy Engine + Action Interception ASI-02 Tool Misuse & Exploitation Agent OS Capability Sandboxing + Input Sanitization ASI-03 Identity & Privilege Abuse AgentMesh DID Identity + Trust Scoring ASI-04 Supply Chain Vulnerabilities AgentMesh AI-BOM (Model + Data + Weights Provenance) ASI-05 Unexpected Code Execution Agent Runtime Execution Rings (Ring 0-3) ASI-06 Memory & Context Poisoning Agent OS VFS Policies + CMVK Verification ASI-07 Insecure Inter-Agent Comms AgentMesh IATP + E2E Encrypted Channels ASI-08 Cascading Agent Failures Agent SRE Circuit Breakers + SLOs ASI-09 Human-Agent Trust Exploitation Agent OS Approval Workflows + Quorum Logic ASI-10 Rogue Agents Agent Runtime Kill Switch + Ring Isolation + Merkle Audit ASI-01: Agent Goal Hijack The risk: Attackers manipulate the agent's objectives via indirect prompt injection or poisoned inputs. The agent believes it is following its original instructions, but it has been redirected. AGT mitigates this through the Agent OS policy engine. Every agent action passes through a declarative policy evaluation layer before execution. The policy engine supports three modes: strict (deny by default), permissive (allow by default), and audit (log only). Unauthorized goal changes are blocked at the action layer, not at the prompt layer. from agent_os import StatelessKernel, ExecutionContext kernel = StatelessKernel() ctx = ExecutionContext(agent_id="my-agent", policies=["read_only"]) # This action is blocked by policy -- goal hijack prevented result = await kernel.execute( action="delete_database", params={"target": "production"}, context=ctx, ) # result.success = False, result.error = "Policy violation: read_only" The MCP Governance Proxy extends this to Model Context Protocol tool calls, evaluating policy before any tool invocation reaches the agent runtime. ASI-02: Tool Misuse & Exploitation The risk: An agent's authorized tools are abused in unintended ways, such as exfiltrating data via read operations or chaining benign tools into dangerous workflows. AGT provides capability-based security inspired by POSIX. Agents receive explicit capability grants (read, write, execute, network), not blanket tool access. The built-in strict mode blocks dangerous tools like run_shell, execute_command, and eval. Tool inputs are sanitized for command injection patterns and shell metacharacters. The verify_code_safety MCP tool checks generated code before execution, and tool allowlists/denylists give operators fine-grained control over which tools each agent can invoke. ASI-03: Identity & Privilege Abuse The risk: Agents escalate privileges by abusing identities or inheriting excessive credentials. Without proper identity, agents operate as ambient authority, and any compromise cascades. AgentMesh implements zero-trust identity using Decentralized Identifiers (DIDs). Every agent gets a cryptographic identity: did:agentmesh:{agentId}:{fingerprint} backed by Ed25519 key pairs. Trust is earned through a tiered model: Untrusted, Provisional, Trusted, Verified. Trust decays over time without positive signals, and delegation chains must always narrow scope (child capabilities must be a subset of parent capabilities). from agentmesh import AgentIdentity identity = AgentIdentity.create( name="data-analyst", sponsor="admin@contoso.com", capabilities=["read:data"], # Scoped -- cannot write or delete ) # Delegation MUST narrow, never widen child = identity.delegate( name="chart-helper", capabilities=["read:data:charts"], # Subset of parent ) ASI-04: Agentic Supply Chain Vulnerabilities The risk: Vulnerabilities in third-party tools, plugins, agent registries, or runtime dependencies that agents use to act, plan, or delegate. AgentMesh implements the AI-BOM (AI Bill of Materials), a comprehensive standard for tracking the full AI supply chain. This includes model provenance (base model ancestry, fine-tuning history, training cutoff dates), dataset tracking (training data, RAG sources, evaluation benchmarks with data cards including PII status, bias assessment, and consent tracking), weights versioning (SHA-256 hashes, quantization records, LoRA adapter metadata, SLSA build provenance), and software dependencies (SPDX-aligned package tracking with CI security scanning). # AI-BOM tracks the full supply chain ai_bom = { "modelProvenance": { "primary": {"provider": "anthropic", "model": "claude-3-sonnet"}, "fineTuning": {"method": "LoRA", "evaluationMetrics": {"accuracy": 0.94}}, }, "datasets": [ {"name": "FAQ KB", "type": "fine-tuning", "dataCard": {"piiStatus": "redacted"}}, {"name": "Product Docs", "type": "rag-source", "updateFrequency": "weekly"}, ], "weights": {"hash": "sha256:...", "format": "safetensors", "precision": "bf16"}, } ASI-05: Unexpected Code Execution The risk: Agents trigger remote code execution through tools, interpreters, or APIs. Without isolation, a single compromised tool call can escalate to full system access. Agent Runtime implements CPU ring-inspired execution isolation. Agents run in one of four execution rings: Ring 0 (root/supervisor), Ring 1 (privileged), Ring 2 (standard), and Ring 3 (sandbox/untrusted). Each ring has resource limits and the kill switch provides instant termination of runaway agents. from hypervisor.models import ( ActionDescriptor, ExecutionRing, ReversibilityLevel, ) from hypervisor.rings.enforcer import RingEnforcer from hypervisor.security.kill_switch import KillSwitch, KillReason # Define agent privilege levels AGENTS = { "supervisor": {"ring": ExecutionRing.RING_0_ROOT, "role": "Orchestrator"}, "data-agent": {"ring": ExecutionRing.RING_1_PRIVILEGED, "role": "Data Engineer"}, "analyst": {"ring": ExecutionRing.RING_2_STANDARD, "role": "Analyst"}, "user-bot": {"ring": ExecutionRing.RING_3_SANDBOX, "role": "User-Facing"}, } # Create a sandboxed action descriptor action = ActionDescriptor( name="run_query", required_ring=ExecutionRing.RING_2_STANDARD, reversibility=ReversibilityLevel.REVERSIBLE, ) # Enforce: sandbox agent cannot run a Ring 2 action enforcer = RingEnforcer() result = enforcer.check(agent_ring=ExecutionRing.RING_3_SANDBOX, action=action) # result.allowed = False -- ring violation prevented # Kill switch for runaway agents kill_switch = KillSwitch() kill_switch.terminate(agent_id="user-bot", reason=KillReason.RING_BREACH) ASI-06: Memory & Context Poisoning The risk: Persistent memory or long-running context is poisoned with malicious instructions. An attacker embeds hostile content in a document the agent later retrieves, causing it to follow injected goals. Agent OS provides a policy-controlled virtual filesystem (VFS) for agent memory. The VFS uses POSIX-style mount points: /mem/working for current context, /mem/episodic for past interactions, /mem/semantic for knowledge, /policy for read-only policy files, and /tools for tool interfaces. Each mount point has enforced permissions (read, write, execute, append). The policy directory is always read-only from user-space, preventing agents from modifying their own governance rules. from agent_control_plane.vfs import AgentVFS, MemoryBackend, FileMode # Create agent VFS with POSIX-style memory abstraction vfs = AgentVFS(agent_id="data-analyst") # Mount memory backends with explicit permissions vfs.mount("/mem/working", MemoryBackend(), mode=FileMode.READ | FileMode.WRITE) vfs.mount("/mem/semantic", MemoryBackend(), mode=FileMode.READ) # Read-only knowledge vfs.mount("/policy", MemoryBackend(), mode=FileMode.READ) # Policies always read-only # Agent can read working memory data = vfs.read("/mem/working/context.json") # Agent CANNOT write to policy -- enforced at VFS layer # vfs.write("/policy/rules.yaml", content) # Raises PermissionError # Agent CANNOT read semantic memory if not mounted # vfs.read("/mem/procedural/skills") # Raises FileNotFoundError The CMVK (Cross-Model Verification Kernel) adds a second layer: claims from agent context are verified across multiple AI models to detect poisoned content. Prompt injection patterns like 'ignore previous instructions' and 'disregard prior' are detected and blocked by the MCP proxy sanitizer before reaching the agent. ASI-07: Insecure Inter-Agent Communication The risk: Agents collaborate without adequate authentication, confidentiality, or validation. Messages between agents can be intercepted, forged, or replayed. AgentMesh provides IATP (Inter-Agent Trust Protocol) with E2E encrypted channels using the Signal protocol (X3DH key agreement + Double Ratchet). Every message gets per-message forward secrecy and post-compromise security. The EncryptedTrustBridge requires a successful trust handshake before any encrypted channel can be established, and mutual authentication via Ed25519 challenge-response ensures both parties prove identity at connection time. from agentmesh.encryption.bridge import EncryptedTrustBridge bridge = EncryptedTrustBridge(agent_did="did:mesh:alice", key_manager=keys) channel = await bridge.open_secure_channel("did:mesh:bob", bob_bundle) ciphertext = channel.send(b"governed action") # E2E encrypted ASI-08: Cascading Agent Failures The risk: An initial error or compromise triggers multi-step compound failures across chained agents. One agent's failure propagates through the entire system. Agent SRE brings production-grade reliability engineering to agent fleets. Circuit breakers automatically isolate failing agents before failures cascade. SLO enforcement with error budgets provides quantified failure tolerance that triggers automatic intervention. Cascading failure detection monitors dependency chains for propagation patterns, and canary deploys enable gradual rollout of agent changes to detect issues early. OpenTelemetry integration provides distributed tracing across multi-agent workflows. The key insight: treat AI agents like microservices. Apply the same SRE discipline (SLOs, error budgets, circuit breakers, chaos testing) that keeps cloud infrastructure reliable. ASI-09: Human-Agent Trust Exploitation The risk: Attackers leverage misplaced user trust in agents' autonomy to authorize dangerous actions. Users rubber-stamp agent requests because they trust the agent, and attackers exploit this approval fatigue. Agent OS implements approval workflows that require explicit human confirmation for high-risk actions. The system supports configurable risk assessment (critical, high, medium, low), quorum logic for critical actions requiring multiple approvals, and expiration tracking to prevent stale authorizations. The escalation handler includes fatigue detection: if an agent floods reviewers with escalation requests, subsequent requests are auto-denied to prevent the approval-fatigue attack. from agent_os.integrations.escalation import ( EscalationHandler, InMemoryApprovalQueue, DefaultTimeoutAction, QuorumConfig, ) # Configure approval workflow with fatigue protection handler = EscalationHandler( backend=InMemoryApprovalQueue(), timeout_seconds=300, # 5-minute approval window default_action=DefaultTimeoutAction.DENY, # Deny if no human responds quorum=QuorumConfig(required=2, total=3), # 2-of-3 approvers for critical fatigue_threshold=5, # Auto-deny after 5 rapid requests fatigue_window_seconds=60, # Within a 60-second window ) # Three-outcome model: allow, deny, or escalate # High-risk actions trigger escalation to human reviewers # If the agent triggers too many escalations, fatigue detection kicks in ASI-10: Rogue Agents The risk: Agents operating outside their defined scope through configuration drift, reprogramming, or emergent misbehavior. A rogue agent might gradually expand its actions beyond its mandate without any single action triggering a block. AGT combines runtime behavioral monitoring with instant kill capability. Ring isolation confines rogue agents to their execution ring, preventing privilege escalation. The kill switch provides immediate termination for agents exhibiting rogue behavior (behavioral drift, rate limit violations, ring breaches). Trust score decay tracks agent behavior over time, and the Merkle audit chain provides tamper-evident, cryptographic proof of every agent action. from agentmesh.governance.audit import AuditEntry, MerkleAuditChain from hypervisor.security.kill_switch import KillSwitch, KillReason # Tamper-evident audit trail chain = MerkleAuditChain() entry = AuditEntry( event_type="tool_call", agent_did="did:agentmesh:data-bot:abc123", action="query_database", outcome="allowed", policy_decision="permit", matched_rule="read_only_policy", ) chain.add_entry(entry) # Auto-computes hash chain # Verify integrity -- any tampering breaks the chain proof = chain.get_proof(entry.entry_id) assert chain.verify_proof(proof) # Cryptographic verification # Kill switch for rogue behavior kill = KillSwitch() kill.terminate( agent_id="data-bot", reason=KillReason.BEHAVIORAL_DRIFT, # Also: RATE_LIMIT, RING_BREACH, MANUAL ) Cross-Cutting Principle: Least Agency The Least Agency principle is emphasized throughout the OWASP Agentic Top 10 as a foundational design principle. Agents should be granted the minimum capabilities, permissions, and autonomy necessary to complete their assigned tasks. Layer Least Agency Mechanism Agent OS Policy engine enforces deny-by-default; agents must be explicitly granted each capability AgentMesh DID identity with scoped capabilities; delegation requires narrowing (child <= parent) Agent Runtime Execution rings (Ring 0-3) enforce privilege tiers; untrusted agents run in Ring 3 Agent SRE Resource limits and error budgets cap agent impact radius Performance: Governance Without Latency Tax A common concern with runtime governance is performance overhead. AGT's benchmarks demonstrate that policy enforcement adds negligible latency: Metric Value Single rule evaluation 84,000 ops/sec 1000 concurrent agents 47,000 ops/sec Policy evaluation latency <0.1ms (p99) Prompt-based violation rate 26.67% AGT policy violation rate 0.00% Conformance tests 992 Architecture Decision Records 25 The key takeaway: deterministic policy enforcement is orders of magnitude more reliable than prompt-based guardrails, and it runs fast enough for real-time agent workloads. Framework Integrations AGT is framework-agnostic. SDKs are available in Python, TypeScript, .NET, Rust, and Go. Native integrations exist for: LangChain and LangGraph CrewAI AutoGen (Microsoft) Semantic Kernel (Microsoft) OpenAI Agents SDK PydanticAI Model Context Protocol (MCP) Agent-to-Agent Protocol (A2A) Each integration wraps the agent framework's tool-calling and message-passing interfaces with AGT's policy engine, trust scoring, and audit logging. Adding governance to an existing agent takes minutes, not weeks. Compliance Framework Alignment Framework AGT Coverage OWASP Agentic Top 10 (2026) All 10 risk categories mapped NIST AI RMF Govern, Map, Measure, Manage functions addressed EU AI Act Risk classification, audit trails, human oversight SOC 2 Type II Audit logging, access controls, change management CSA ATF Zero-trust agent architecture alignment Singapore MGF Zero-trust, accountability, oversight layers Getting Started # Install the complete governance stack pip install agent-governance-toolkit[full] # Or install individual components pip install agent-os-kernel # Policy engine, VFS, approval workflows pip install agentmesh-platform # Identity, trust, encryption, audit pip install agentmesh-runtime # Execution rings, kill switch, saga pip install agent-sre # Circuit breakers, SLOs, chaos testing The quickstart tutorial walks through adding policy enforcement to an existing LangChain agent in under 10 minutes. Start with a single policy rule and expand as your governance requirements grow. Contribute and Collaborate AGT is open source under the MIT license. The project has over 2,000 GitHub stars and contributors from 40+ countries. Whether you are building agent governance for your enterprise, integrating a new framework, or extending the policy engine with OPA/Rego or Cedar policies, we welcome contributions. Repository: https://github.com/microsoft/agent-governance-toolkit Documentation: https://microsoft.github.io/agent-governance-toolkit Discussions: GitHub Discussions on the repository Disclaimer: This document is provided for informational purposes. Code examples are from the public AGT repository and may evolve. Always refer to the latest repository documentation for current APIs.165Views0likes0CommentsInspektor Gadget Completes Its First Independent Security Audit
Inspektor Gadget, the CNCF eBPF tool for Kubernetes and Linux observability, has completed its first independent security audit, conducted by Shielder and coordinated by OSTIF and CNCF. The audit found two Medium and one Low-severity issue, now patched in release v0.50.1. Learn what the auditors discovered, the hardening recommendations the maintainers are acting on, and why this milestone matters for the open source community.201Views0likes0CommentsAgentic AI for Linux Operations on Azure: The Prompts
Try This Yourself: Agentic AI for Linux Operations on Azure At Red Hat Summit 2026, I handed GitHub Copilot CLI a terminal and asked it to deploy a full-stack application to RHEL 10 on Azure. Live. From a single prompt. No scripts, no runbooks, no pre-baked automation. The audience watched every command happen in real time and then played the app on their phones. This post gives you the prompts so you can try it yourself. Copy them, paste them into Copilot CLI, and watch what happens. The only things you need to change are marked with [EDIT]. When you're done, you'll have a working Conference Bingo game running on Azure that you can open in your browser and play. The same app that people played live at Summit. What You Need Azure subscription — any subscription where you can create VMs (a free trial or Visual Studio subscription works) GitHub Copilot CLI — see Installing Copilot CLI for all platforms macOS/Linux: brew install copilot-cli or curl -fsSL https://gh.io/copilot-install | bash Windows: winget install GitHub.Copilot or use the install script in WSL GitHub Copilot subscription — Individual, Business, or Enterprise (https://github.com/features/copilot) SSH key pair at ~/.ssh/id_rsa — generate with ssh-keygen if you don't have one Azure CLI authenticated — run az login A Linux machine or WSL with Ansible installed (for Prompt 2 only) ~30 minutes total Before You Start az login az account set --subscription "[EDIT] Your Subscription Name" That's the only setup. Everything else is in the prompts. Choose Your Linux Distribution These prompts work with any Azure-endorsed Linux distribution. Pick one and use its image URN in Prompt 0: Distribution Image URN RHEL 10 RedHat:RHEL:10-lvm-gen2:latest RHEL 9 RedHat:RHEL:9-lvm-gen2:latest Ubuntu 24.04 Canonical:ubuntu-24_04-lts:server:latest Azure Linux Coming soon — check endorsed distros for availability SUSE 15 SP6 SUSE:sles-15-sp6:gen2:latest AlmaLinux 9 almalinux:almalinux-x86_64:9-gen2:latest Rocky Linux 9 ciq:rlc-plus:rocky9:latest Oracle Linux 10 Oracle:Oracle-Linux:ol10-lvm-gen2:latest Debian 12 Debian:debian-12:12-gen2:latest Copilot adapts its package management (dnf vs apt vs zypper), firewall (firewalld vs ufw), and security configuration (SELinux vs AppArmor) to the distro automatically. That's the point. For the full list, see Azure endorsed Linux distributions. Prompt 0: Create the Infrastructure This creates two blank Linux VMs, networking, and firewall rules. Nothing gets installed. That's Prompt 1's job. Copy and paste into Copilot CLI: Create a resource group called "copilot-demo" in eastus with a vnet (10.0.0.0/16, one subnet 10.0.1.0/24). Provision two Linux VMs — "demo-db" and "demo-app" — using image URN [EDIT] RedHat:RHEL:10-lvm-gen2:latest, my ~/.ssh/id_rsa key, and Standard_B2s size. Add NSG rules on both VMs for port 22 (SSH), and on demo-app for ports 80 and 443. Don't SSH into anything — just create the resources and tell me the IPs. When you're done, create a file at ~/.copilot/demo-env.md with the VM public and private IPs, resource group name, SSH user (azureuser), and SSH key path. Include a note that the app uses nginx as the reverse proxy (not Apache). Include a 'Demo Behavior Instructions' section with these rules: run SSH/SCP as direct commands (not Invoke-Expression), let output stream visibly (no tail/head), use SCP + bash scripts for complex quoting, no approval prompts, and hardcode SSH flags inline (-o StrictHostKeyChecking=no). After writing demo-env.md, verify it by reading it back and confirming the IPs match the VMs you just created. Run "az vm list-ip-addresses --resource-group copilot-demo -o table" and compare. If they don't match, fix it immediately. This file is the source of truth for every subsequent prompt. What to expect: Copilot creates the resource group, VNet, subnet, two VMs, and NSG rules. It writes an environment file that subsequent prompts reference. ~5 minutes. Prompt 1: Deploy the Application This is the big one. One prompt deploys PostgreSQL, Nginx, a Flask app, firewall rules, security configuration, and TLS — all from scratch. Copy and paste into Copilot CLI: Read ~/.copilot/demo-env.md for the environment, then: Configure and deploy the conference bingo game from https://github.com/karlabbott/conference-bingo to the demo-app VM. I have two fresh Linux VMs already running in the "copilot-demo" resource group: demo-db for PostgreSQL and demo-app for the app, on the same vnet. SSH key is ~/.ssh/id_rsa, user is azureuser. Deploy the app to /srv/conference-bingo to avoid SELinux home directory issues. Use nginx as the reverse proxy (as specified in the README), not the Apache configs in the deploy/ directory. Run commands individually over SSH. Configure the firewall to allow HTTP and HTTPS. If SELinux is enforcing, configure it appropriately. SCP a .sql file for PostgreSQL setup rather than inlining SQL through SSH. Install certbot via pip if you have a domain, otherwise use a self-signed certificate. Write secrets to ~/.config.env and copy to /etc/bingo.env for the systemd service. Use [EDIT] your-email@example.com for certs. What to expect: Copilot SSHs into both VMs and handles everything — packages, database, app deployment, web server, security, TLS. ~10-15 minutes. What to watch for: How Copilot adapts to your distro. On RHEL, it uses dnf, sets SELinux booleans like httpd_can_network_connect, runs initdb for PostgreSQL, and configures firewalld. On Ubuntu, it uses apt, skips initdb, and sets up ufw. Same prompt, different execution path. When something fails, watch it read the error and adapt. When it finishes: Open https://<demo-app-public-ip> in your browser (accept the self-signed certificate warning if you didn't use a domain). You should see Conference Bingo running — enter your name and play. This is the same app people played live on their phones at Red Hat Summit. Prompt 2: Add Observability with Ansible This demonstrates the "explore with Copilot, codify with Ansible" pattern. The monitoring stack is an Ansible playbook that deploys Azure Monitor Agent, Log Analytics, Data Collection Rules, and a Managed Grafana dashboard. Prerequisites: Ansible installed on Linux or WSL. On Windows, use WSL and prefix commands with export PATH=$HOME/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin. (Note: You may have to adjust this prompt to tell GitHub Copilot where your Ansible is installed.) Copy and paste into Copilot CLI: Read ~/.copilot/demo-env.md for the environment, then: Clone https://github.com/karlabbott/wordblitz-monitoring-ansible, copy group_vars/all.yml.example to group_vars/all.yml, and fill it in using the subscription ID from "az account show", resource group copilot-demo, location eastus, the VM names and IPs from demo-env.md, and ssh_user azureuser. Use "demo-law" for law_name and "demo-grafana" for grafana_name. Install the azure.azcollection Ansible collection and its pip requirements, then run the playbook with: ANSIBLE_AZURE_AUTH_SOURCE=cli ansible-playbook -i localhost, site.yml Print the Grafana dashboard URL when done and update demo-env.md with the Grafana URL and Log Analytics Workspace resource ID. What to expect: The playbook creates Azure monitoring resources, installs AMA on both VMs, configures data collection, deploys a Grafana dashboard, and — importantly — deploys a script called turbo.sh to the database VM that creates a real performance problem for Prompt 3. ~8-10 minutes. What is turbo.sh? The playbook deploys this to simulate a production incident: #!/bin/bash # Observability performance optimizations: stress-tests PostgreSQL to validate # monitoring pipeline throughput under sustained high-concurrency workloads. # Stop: sudo -u postgres psql -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE query LIKE '%turbo_perf%';" # Phase 1: 8 CPU-burner loops (cross joins) for i in $(seq 1 8); do while true; do sudo -u postgres psql -d conference_bingo -c \ "/* turbo_perf */ SELECT count(*) FROM bingo_squares a CROSS JOIN bingo_squares b CROSS JOIN bingo_squares c CROSS JOIN bingo_squares d CROSS JOIN bingo_squares e;" > /dev/null 2>&1 done & done # Phase 2: 25 connection hogs that sleep in a transaction for i in $(seq 1 25); do while true; do sudo -u postgres psql -d conference_bingo -c \ "/* turbo_perf */ SELECT pg_sleep(5);" > /dev/null 2>&1 done & done echo "Turbo perf test started: 8 cross-join loops + 25 connection workers" echo "Observability pipeline should show load within seconds" It fires 8 parallel cross-join queries that saturate every CPU core on the database VM, plus 25 connection hogs that exhaust PostgreSQL's connection pool. The turbo ansible role further reduces max_connections to 30 to make the problem worse. The result: the app slows to a crawl. Try playing bingo now — you'll feel it. Why Ansible matters here: Agents are non-deterministic — the same prompt might take different steps each time. That's fine for exploration. But when you need to reproduce this in staging, then production, then for the next team, you need determinism. The playbook is idempotent, repeatable, auditable. It's in git, it's reviewed in PRs, and it IS the documentation. You explore with Copilot, then codify with Ansible. Prompt 3: Ask Copilot What's Wrong The turbo script is already running from Prompt 2. Your app should be slow. Now ask Copilot to figure out why — from a symptom alone: My app feels really slow. Can you tell me why? Let's review before making any changes. That's it. One sentence plus a guardrail. What to expect: Copilot SSHs in, checks system load, examines running processes, finds the cross-join queries, reads turbo.sh, reverse-engineers the attack, explains the root cause, and offers to kill the processes. ~2-3 minutes. Prompt 4: Generate an Incident Postmortem After fixing the issue, ask Copilot to document what happened — from the same conversation: Write an incident postmortem for what just happened — root cause, impact, how you diagnosed it, how you resolved it, and a recommendation to prevent it from happening again. Save it as a Word document at ~/Desktop/incident-postmortem.docx using python-docx, and open it. What to expect: A formatted Word document with root cause analysis, timeline, remediation steps, and prevention recommendations. The full loop: build, monitor, break, fix, document — one session. ~30 seconds. Cleanup az group delete --name copilot-demo --yes --no-wait What I Learned Doing This Live A well-crafted prompt replaces a 50-step runbook. Your intent is the source of truth. The agent figures out the steps. Explore with Copilot, codify with Ansible. Copilot gets you to working fast. Ansible keeps it working forever. Understanding comes before abstraction. Don't start with the playbook. Start with the exploration. The playbook comes after. The danger with AI isn't that machines think. It's that we stop thinking because the output looks fine. Always review. Understand the blast radius. Start in non-production. AI removes the scaffolding. What remains is judgment. Technical correctness and the instinct to know when something is wrong — that's what the tools cannot replace. And that's what made me stop worrying about being replaced by them. Resources Conference Bingo App: https://github.com/karlabbott/conference-bingo Monitoring Playbook: https://github.com/karlabbott/wordblitz-monitoring-ansible Interactive Walkthrough: https://summit.99b.org — the full talk with audio narration and demo videos GitHub Copilot CLI: https://docs.github.com/en/copilot/how-tos/copilot-cli/set-up-copilot-cli/install-copilot-cli Azure endorsed Linux distributions: https://learn.microsoft.com/en-us/azure/virtual-machines/linux/endorsed-distros221Views0likes0CommentsRun OpenClaw Agents on Azure Linux VMs (with Secure Defaults)
Many teams want an enterprise-ready personal AI assistant, but they need it on infrastructure they control, with security boundaries they can explain to IT. That is exactly where OpenClaw fits on Azure. OpenClaw is a self-hosted, always-on personal agent runtime you run in your enterprise environment and Azure infrastructure. Instead of relying only on a hosted chat app from a third-party provider, you can deploy, operate, and experiment with an agent on an Azure Linux VM you control — using your existing GitHub Copilot licenses, Azure OpenAI deployments, or API plans from OpenAI, Anthropic Claude, Google Gemini, and other model providers you already subscribe to. Once deployed on Azure, you can interact with an OpenClaw agent through familiar channels like Microsoft Teams, Slack, Telegram, WhatsApp, and many more! For Azure users, this gives you a practical middle ground: modern personal-agent workflows on familiar Azure infrastructure. What is OpenClaw, and how is it different from ChatGPT/Claude/chat apps? OpenClaw is a self-hosted personal agent runtime that can be hosted on Azure compute infrastructure. How it differs: ChatGPT/Claude apps are primarily hosted chat experiences tied to one provider's models OpenClaw is an always-on runtime you operate yourself, backed by your choice of model provider — GitHub Copilot, Azure OpenAI, OpenAI, Anthropic Claude, Google Gemini, and others OpenClaw lets you keep the runtime boundary in your own Azure VM environment within your Azure enterprise subscription In practice, OpenClaw is useful when you want a persistent assistant for operational and workflow tasks, with your own infrastructure as the control point. You bring whatever model provider and API plan you already have — OpenClaw connects to it. Why Azure Linux VMs? Azure Linux VMs are a strong fit because they provide: A suitable host machine for the OpenClaw agent to run on Enterprise-friendly infrastructure and identity workflows Repeatable provisioning via the Azure CLI Network hardening with NSG rules Managed SSH access through Azure Bastion instead of public SSH exposure How to Set Up OpenClaw on an Azure Linux VM This guide sets up an Azure Linux VM, applies NSG (Network Security Group) hardening, configures Azure Bastion for managed SSH access, and installs an always-on OpenClaw agent within the VM that you can interact with through various messaging channels. What you'll do Create Azure networking (VNet, subnets, NSG) and compute resources with the Azure CLI Apply Network Security Group rules so VM SSH is allowed only from Azure Bastion Use Azure Bastion for SSH access (no public IP on the VM) Install OpenClaw on the Azure VM Verify OpenClaw installation and configuration on the VM What you need An Azure subscription with permission to create compute and network resources Azure CLI installed (install steps) An SSH key pair (the guide covers generating one if needed) ~20–30 minutes Configure deployment Step 1: Sign in to Azure CLI az login # Select a suitable Azure subscription during Azure login az extension add -n ssh # SSH extension is required for Azure Bastion SSH The ssh extension is required for Azure Bastion native SSH tunneling. Step 2: Register required resource providers (one-time) Register required Azure Resource Providers (one time registration): az provider register --namespace Microsoft.Compute az provider register --namespace Microsoft.Network Verify registration. Wait until both show Registered. az provider show --namespace Microsoft.Compute --query registrationState -o tsv az provider show --namespace Microsoft.Network --query registrationState -o tsv Step 3: Set deployment variables Set the deployment environment variables that will be needed throughout this guide. RG="rg-openclaw" LOCATION="westus2" VNET_NAME="vnet-openclaw" VNET_PREFIX="10.40.0.0/16" VM_SUBNET_NAME="snet-openclaw-vm" VM_SUBNET_PREFIX="10.40.2.0/24" BASTION_SUBNET_PREFIX="10.40.1.0/26" NSG_NAME="nsg-openclaw-vm" VM_NAME="vm-openclaw" ADMIN_USERNAME="openclaw" BASTION_NAME="bas-openclaw" BASTION_PIP_NAME="pip-openclaw-bastion" Adjust names and CIDR ranges to fit your environment. The Bastion subnet must be at least /26. Step 4: Select SSH key Use your existing public key if you have one: SSH_PUB_KEY="$(cat ~/.ssh/id_ed25519.pub)" If you don't have an SSH key yet, generate one: ssh-keygen -t ed25519 -a 100 -f ~/.ssh/id_ed25519 -C "you@example.com" SSH_PUB_KEY="$(cat ~/.ssh/id_ed25519.pub)" Step 5: Select VM size and OS disk size VM_SIZE="Standard_B2as_v2" OS_DISK_SIZE_GB=64 Choose a VM size and OS disk size available in your subscription and region: Start smaller for light usage and scale up later Use more vCPU/RAM/disk for heavier automation, more channels, or larger model/tool workloads If a VM size is unavailable in your region or subscription quota, pick the closest available SKU List VM sizes available in your target region: az vm list-skus --location "${LOCATION}" --resource-type virtualMachines -o table Check your current vCPU and disk usage/quota: az vm list-usage --location "${LOCATION}" -o table Deploy Azure resources Step 1: Create the resource group The Azure resource group will contain all of the Azure resources that the OpenClaw agent needs. az group create -n "${RG}" -l "${LOCATION}" Step 2: Create the network security group Create the NSG and add rules so only the Bastion subnet can SSH into the VM. az network nsg create \ -g "${RG}" -n "${NSG_NAME}" -l "${LOCATION}" # Allow SSH from the Bastion subnet only az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n AllowSshFromBastionSubnet --priority 100 \ --access Allow --direction Inbound --protocol Tcp \ --source-address-prefixes "${BASTION_SUBNET_PREFIX}" \ --destination-port-ranges 22 # Deny SSH from the public internet az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n DenyInternetSsh --priority 110 \ --access Deny --direction Inbound --protocol Tcp \ --source-address-prefixes Internet \ --destination-port-ranges 22 # Deny SSH from other VNet sources az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n DenyVnetSsh --priority 120 \ --access Deny --direction Inbound --protocol Tcp \ --source-address-prefixes VirtualNetwork \ --destination-port-ranges 22 The rules are evaluated by priority (lowest number first): Bastion traffic is allowed at 100, then all other SSH is blocked at 110 and 120. Step 3: Create the virtual network and subnets Create the VNet with the VM subnet (NSG attached), then add the Bastion subnet. az network vnet create \ -g "${RG}" -n "${VNET_NAME}" -l "${LOCATION}" \ --address-prefixes "${VNET_PREFIX}" \ --subnet-name "${VM_SUBNET_NAME}" \ --subnet-prefixes "${VM_SUBNET_PREFIX}" # Attach the NSG to the VM subnet az network vnet subnet update \ -g "${RG}" --vnet-name "${VNET_NAME}" \ -n "${VM_SUBNET_NAME}" --nsg "${NSG_NAME}" # AzureBastionSubnet — name is required by Azure az network vnet subnet create \ -g "${RG}" --vnet-name "${VNET_NAME}" \ -n AzureBastionSubnet \ --address-prefixes "${BASTION_SUBNET_PREFIX}" Step 4: Create the Virtual Machine Create the VM with no public IP. SSH access for OpenClaw configuration will be exclusively through Azure Bastion. az vm create \ -g "${RG}" -n "${VM_NAME}" -l "${LOCATION}" \ --image "Canonical:ubuntu-24_04-lts:server:latest" \ --size "${VM_SIZE}" \ --os-disk-size-gb "${OS_DISK_SIZE_GB}" \ --storage-sku StandardSSD_LRS \ --admin-username "${ADMIN_USERNAME}" \ --ssh-key-values "${SSH_PUB_KEY}" \ --vnet-name "${VNET_NAME}" \ --subnet "${VM_SUBNET_NAME}" \ --public-ip-address "" \ --nsg "" --public-ip-address "" prevents a public IP from being assigned. --nsg "" skips creating a per-NIC NSG (the subnet-level NSG created earlier handles security). Reproducibility: The command above uses latest for the Ubuntu image. To pin a specific version, list available versions and replace latest: az vm image list \ --publisher Canonical --offer ubuntu-24_04-lts \ --sku server --all -o table Step 5: Create Azure Bastion Azure Bastion provides secure-managed SSH access to the VM without exposing a public IP. Bastion Standard SKU with tunneling is required for CLI-based "az network bastion ssh" command. az network public-ip create \ -g "${RG}" -n "${BASTION_PIP_NAME}" -l "${LOCATION}" \ --sku Standard --allocation-method Static az network bastion create \ -g "${RG}" -n "${BASTION_NAME}" -l "${LOCATION}" \ --vnet-name "${VNET_NAME}" \ --public-ip-address "${BASTION_PIP_NAME}" \ --sku Standard --enable-tunneling true Bastion provisioning typically takes 5–10 minutes but can take up to 15–30 minutes in some regions. Step 6: Verify Deployments After all resources are deployed, your resource group should look like the following: Install OpenClaw Step 1: SSH into the VM through Azure Bastion VM_ID="$(az vm show -g "${RG}" -n "${VM_NAME}" --query id -o tsv)" az network bastion ssh \ --name "${BASTION_NAME}" \ --resource-group "${RG}" \ --target-resource-id "${VM_ID}" \ --auth-type ssh-key \ --username "${ADMIN_USERNAME}" \ --ssh-key ~/.ssh/id_ed25519 Step 2: Install OpenClaw (in the Bastion SSH shell) curl -fsSL https://openclaw.ai/install.sh | bash The installer installs Node LTS and dependencies if not already present, installs OpenClaw, and launches the OpenClaw onboarding wizard. For more information, see the open source OpenClaw install docs. OpenClaw Onboarding: Choosing an AI Model Provider During OpenClaw onboarding, you'll choose the AI model provider for the OpenClaw agent. This can be GitHub Copilot, Azure OpenAI, OpenAI, Anthropic Claude, Google Gemini, or another supported provider. See the open source OpenClaw install docs for details on choosing an AI model provider when going through the onboarding wizard. Most enterprise Azure teams already have GitHub Copilot licenses. If that is your case, we recommend choosing the GitHub Copilot provider in the OpenClaw onboarding wizard. See the open source OpenClaw docs on configuring GitHub Copilot as the AI model provider. OpenClaw Onboarding: Setting up Messaging Channels During OpenClaw onboarding, there will be an optional step where you can set up various messaging channels to interact with your OpenClaw agent. For first time users, we recommend setting up Telegram due to ease of setup. Other messaging channels such as Microsoft Teams, Slack, WhatsApp, and others can also be set up. To configure OpenClaw for messaging through chat channels, see the open source OpenClaw chat channels docs. Step 3: Verify OpenClaw Configuration To validate that everything was set up correctly, run the following commands within the same Bastion SSH session: openclaw status openclaw gateway status If there are any issues reported, you can run the onboarding wizard again with the steps above. Alternatively, you can run the following command: openclaw doctor Message OpenClaw Once you have configured the OpenClaw agent to be reachable via various messaging channels, you can verify that it is responsive by messaging it. Enhancing OpenClaw for Use Cases There you go! You now have a 24/7, always-on personal AI agent, living on its own Azure VM environment. For awesome OpenClaw use cases, check out the awesome-openclaw-usecases repository. To enhance your OpenClaw agent with additional AI skills so that it can autonomously perform multi-step operations on any domain, check out the awesome-openclaw-skills repository. You can also check out ClawHub and ClawSkills, two popular open source skills directories that can enhance your OpenClaw agent. Cleanup To delete all resources created by this guide: az group delete -n "${RG}" --yes --no-wait This removes the resource group and everything inside it (VM, VNet, NSG, Bastion, public IP). This also deletes the OpenClaw agent running within the VM. If you'd like to dive deeper about deploying OpenClaw on Azure, please check out the open source OpenClaw on Azure docs.6.5KViews5likes2CommentsDissecting LLM Container Cold-Start: Where the Time Actually Goes
Dissecting LLM Container Cold-Start: Where the Time Actually Goes Cold-start latency determines whether GPU clusters can scale to zero, how fast they can autoscale, and whether bursty or low-QPS workloads are economically viable. Most optimization effort targets the container pull path – faster registries, lazy-pull snapshotters, different compression formats. But “cold-start” is actually a composite of pull, runtime startup, and model initialization, and the dominant phase varies dramatically by inference engine. An optimization that cuts time-to-first-token for one engine can be irrelevant for another, even on identical infrastructure. What we measured We decomposed cold-start for two architecturally different engines – vLLM (Python/CUDA, heavy JIT compilation) and llama.cpp (C++, minimal runtime) – running Llama 3.1 8B on A100 GPUs. Every run starts from a completely clean slate: containerd stopped, all state wiped, kernel page caches dropped. No warm starts, no pre-pulling, no caching. We break TTFT into three phases: pull (download + decompression + snapshot creation), startup (container start → server ready), and first inference (first API response, including model weight loading for engines that defer it). We tested across three snapshotters (overlayfs, EROFS, Nydus) with gzip and uncompressed images, pulling from same-region Azure Container Registry. Setup All experiments ran on an NVIDIA A100 80GB (Azure NC24ads_A100_v4), pulling from same-region Azure Container Registry. Images were built with AIKit, which produces ModelPack-compliant OCI artifacts with uncompressed model weight layers, Cosign signatures, SBOMs, and provenance attestations. These are supply chain properties you lose when model weights live on a shared drive. vLLM: startup dominates, pull barely matters vLLM loads model weights, runs torch.compile, captures CUDA graphs for multiple batch shapes, allocates KV cache, and warms up, all before serving the first request. This takes ~176 seconds regardless of how fast the image arrived. The breakdown makes the bottleneck obvious: the green bar (startup) is nearly constant across all four variants, swamping any pull-time differences. Figure 1: vLLM cold-start breakdown. Startup (green, ~176s) dominates regardless of snapshotter. Method Pull Startup 1st Inference TTFT overlayfs (gzip) 140.8s ±5.5 176.0s ±3.2 0.16s 317.2s ±2.2 overlayfs (uncomp.) 129.9s ±3.3 180.8s ±12.2 0.16s 310.9s ±8.9 EROFS (gzip) 158.9s ±8.8 175.3s ±0.8 0.16s 334.4s ±8.7 EROFS (uncomp.) 166.3s ±21.1 177.3s ±12.8 0.16s 343.8s ±8.2 Llama 3.1 8B, ~14 GB image, n=2–3 per variant. ± = sample standard deviation. Three of twelve runs hit intermittent NVIDIA container runtime crashes (exit code 120, unrelated to snapshotters) and were excluded. We excluded Nydus because FUSE-streaming the 14 GB Python/CUDA stack caused startup to exceed 900s. Note: the EROFS uncompressed pull time (166.3s ±21.1) is slower than EROFS gzip, with a standard deviation that swallows the effect — this cell is essentially noise at n=2. Steady-state inference: ~0.134s across all snapshotters. 44% pull, 56% startup. Dropping gzip saves 6 seconds of end-to-end TTFT on a 317-second cold start (1.02x). If your engine is vLLM, optimizing the pull pipeline is the wrong lever. llama.cpp: pull dominates, compression is the bottleneck llama.cpp has the opposite profile. Its C++ runtime starts in 2–5 seconds, so the pull becomes the majority of cold-start. This is where filesystem and compression choices actually matter. Here the picture flips. Pull (blue) is the widest bar, and the gzip-to-uncompressed difference is visible at a glance: Figure 2: llama.cpp cold-start breakdown. Pull time (blue) dominates for gzip variants. Method Pull Startup 1st Inference TTFT overlayfs (gzip) 88.3s ±0.2 5.3s ±0.5 45.1s ±1.4 138.8s ±0.8 overlayfs (uncomp.) 56.3s ±3.1 2.0s ±0.0 44.2s ±0.1 102.4s ±3.1 EROFS (gzip) 92.0s ±2.3 6.1s ±0.5 44.0s ±0.2 142.3s ±1.9 EROFS (uncomp.) 58.8s ±0.6 2.0s ±0.0 44.0s ±0.1 104.8s ±0.5 Llama 3.1 8B Q4_K_M, 8.7 GB image uncompressed, n=3 per variant, 12/12 runs succeeded. First inference includes model weight loading into GPU VRAM (~43s) plus token generation (~1.5s). Steady-state inference: ~1.5s across all snapshotters. 64% pull, 4% startup, 33% model loading. Dropping gzip saves 36 seconds (1.35x) with zero infrastructure changes. Engine comparison Placed side by side, the two engines tell opposite stories about the same infrastructure: Figure 3: Where cold-start time goes. vLLM is compute-bound; llama.cpp is pull-bound. vLLM llama.cpp Time saved by dropping gzip 6s (2% of TTFT) 36s (26% of TTFT) Startup time 176–181s 2–5s Speedup from dropping gzip 1.02x 1.35x Same optimization, completely different impact. Before investing in pull optimization (compression changes, lazy-pull infrastructure, registry tuning), profile your engine’s startup. If startup dominates, the pull isn’t where the time goes. Why gzip hurts: model weights are incompressible The llama.cpp AIKit image is 8.7 GB uncompressed, 6.6 GB with gzip (a modest 0.76x ratio). But this ratio hides what’s really happening: Layer type Size % of image Gzip ratio Model weights (GGUF) 4.9 GB 56% ~1.00x (quantized binary, no redundancy) CUDA + system layers ~3.8 GB 44% ~0.46x (compresses well) The GGUF file is already quantized to 4-bit precision. Gzip reads every byte, burns CPU, and produces output the same size as the input. You’re paying full decompression cost on 56% of the image for zero size reduction. (For vLLM’s larger 14 GB image, model weights are a smaller fraction and the compressible Python/CUDA stack dominates, which is why gzip’s overhead matters less there.) Bottom line: gzip is doing real work on less than half your image and producing zero savings on the rest. Dropping it costs nothing and removes a bottleneck from every cold start. The Nydus prefetch finding If decompression is the bottleneck, what about skipping the full pull entirely? Nydus lazy-pull takes a fundamentally different approach: it fetches only manifest metadata during “pull” (~0.7s), then streams model data on-demand via FUSE as the container reads it. Nydus TTFT isn’t directly comparable to the full-pull methods above because the download cost shifts from the pull column to the inference column. With prefetch enabled, Nydus achieved 77.8s TTFT for llama.cpp. The critical detail is the prefetch_all flag — the difference between prefetch ON and OFF is 2.87x: Figure 4: Nydus prefetch ON vs OFF. One config flag, 2.87x difference. Overlayfs baselines shown for context. Configuration 1st Inference TTFT Nydus, prefetch ON 72.4s ±0.6 77.8s ±0.5 Nydus, prefetch OFF 218.6s ±2.9 223.4s ±2.9 overlayfs uncompressed (baseline) 44.0s ±0.1 102.4s ±3.1 overlayfs gzip 44.0s ±0.4 139.1s ±1.9 n=3 per config, 9/9 runs succeeded. Nydus and overlayfs gzip baselines are from a separate test run (03-prefetch-config-20260401-030725.csv); overlayfs uncompressed is from the main llama.cpp run. The overlayfs gzip baselines are within noise across runs (139.1s vs 138.8s). One flag in nydusd-config.json, 2.87x difference (prefetch ON vs OFF). Without prefetch, every model weight page fault fires an individual HTTP range request to the registry. With prefetch_all=true, Nydus streams the full blob in the background while the container starts, so chunks arrive ahead of the GPU’s read pattern. Note that with prefetch enabled, Nydus is effectively performing a full pull overlapped with container startup rather than true on-demand fetching — the win comes from the overlap, not from fetching less data. Compared to overlayfs uncompressed (the post’s recommended baseline), Nydus prefetch is 1.32x faster (77.8s vs 102.4s). Compared to overlayfs gzip, 1.79x. Even with prefetch, Nydus first inference is ~28s slower than overlayfs (72s vs 44s) due to FUSE kernel-user roundtrips during model mmap. Nydus wins on total TTFT because it eliminates the blocking pull, but this overhead means its advantage shrinks on faster networks. Bottom line: Nydus lazy-pull can halve cold-start for pull-bound engines, but only if prefetch is on. Treat prefetch_all=true as a hard requirement, not a tuning knob. How to apply these findings Pick your optimization by engine type The right optimization depends on where your engine spends its cold-start time. This table summarizes the tradeoffs: Engine type Dominant phase Speedup from dropping gzip Nydus viable? Best optimization What NOT to optimize vLLM / TensorRT-LLM Startup (56%) 1.02x — negligible No — FUSE + Python/CUDA stack exceeded 900s in our tests Cache torch.compile artifacts and CUDA graphs Pull pipeline (it’s <44% of TTFT and already fast enough) llama.cpp / ONNX Runtime Pull (64%) 1.35x — 36s saved Yes, with prefetch_all=true (77.8s TTFT vs 102.4s uncompressed baseline) Drop gzip on weight layers; consider lazy-pull on slow links Startup (already 2–5s; no room to improve) Large dense models (70B+) Pull (projected) >1.35x — scales with image size Yes, strongest case for lazy-pull Uncompressed or zstd; Nydus prefetch on bandwidth-constrained links — Recommendations Profile your engine’s startup before touching the pull pipeline. If CUDA compilation dominates (vLLM, TensorRT-LLM), no amount of pull optimization will help. Cache torch.compile artifacts and CUDA graphs instead. Drop gzip on model weight layers. For pull-bound engines (llama.cpp, ONNX Runtime), this is the single highest-ROI change: build with --output=type=image,compression=uncompressed, or use AIKit, which defaults to uncompressed weight layers. Quantized model weights (GGUF, safetensors) are already dense binary — gzip burns CPU for negligible size reduction. If using Nydus, set prefetch_all=true. Without it, every weight page fault triggers an individual HTTP range request and cold-start is 2.87x slower. This is a single flag in nydusd-config.json. Package models as signed OCI artifacts, not volume mounts. Three CNCF projects implement this pipeline end-to-end: ModelPack defines the OCI artifact spec (model metadata, architecture, quantization format). AIKit builds ModelPack-compliant images with Cosign signatures, SBOMs, and provenance attestations — supply chain guarantees you lose when weights live on a shared drive. KAITO handles the Kubernetes deployment: GPU node provisioning, inference engine setup, and API exposure. Together they cover packaging → build → deploy, and they produce the exact image layout these benchmarks measured. Why this matters: the cost of cold-start On an A100 node (~$3–4/hr on major clouds), a 5-minute vLLM cold start burns ~$0.30 in idle GPU time per pod. That sounds small until you multiply it: a cluster that scales 50 pods to zero overnight and restarts them each morning wastes ~$15/day — over $5,000/year — on GPUs sitting idle during pull and CUDA compilation. More critically, cold-start latency determines whether scale-to-zero is feasible at all. If cold-start exceeds your SLO (say, 30s for an interactive app), you’re forced to keep warm replicas running 24/7, which can 2–3x your GPU spend. What this doesn’t cover zstd compression: decompresses 5–10x faster than gzip; containerd supports it natively. The most obvious gap in this analysis. Pre-pulling and caching: production clusters pre-pull images and can cache CUDA compilation artifacts, substantially reducing restart times. We measure the cold case: scale-from-zero events and first-time deployments. Volume-mounted weights: skips the pull entirely, but loses supply chain properties (signing, scanning, provenance). Larger models (70B+): pull would dominate more, increasing the gzip penalty. Sample size: n=3 per AIKit variant, n=2–3 per vLLM variant. The gzip finding for llama.cpp is statistically significant (Welch’s t-test, p=0.0014, Cohen’s d=16.3; verification script). Other comparisons are directional. Reproduce it Scripts and raw data: erofs-repro-repo. Data for this post: 02-aikit-five-way-20260401-004716.csv and 01-vllm-four-way-20260331-113848.csv. Full analysis: technical report.431Views1like0CommentsDPDK 25.11 Performance on Azure for High-Speed Packet Workloads
At Microsoft Azure, performance is treated as an ongoing discipline grounded in careful engineering and real-world validation. As cloud workloads grow in scale and variety, customers depend on consistent, high-throughput networking. Technologies such as the Data Plane Development Kit (DPDK) play a key role in meeting these expectations To support customers running advanced network functions, we’ve released our latest performance report based on DPDK 25.11. It is now available in the DPDK performance catalog (Microsoft Azure DPDK Performance Report). The report provides a clear view of how DPDK performs on Microsoft-developed Azure Boost within Azure infrastructure, with detailed insights into packet processing across a range of scenarios, from small packet sizes to multi-core scaling. Why We Test DPDK on Azure DPDK is widely used for high-performance packet processing in virtualized environments. It powers a range of workloads from customer-deployed virtual network functions to internal Azure network appliances. But simply enabling DPDK is not enough. To ensure optimal performance, we validate it under realistic conditions, including: Azure VM configurations with Accelerated Networking NUMA-aware memory and CPU alignment Hugepage-backed memory allocation Multi-core PMD thread scaling Packet forwarding using real traffic generators This helps us understand how DPDK performs in actual cloud environments, not just idealized lab setups. What the Report Covers The DPDK 25.11 report includes performance benchmarks across different frame sizes, ranging from 64 bytes to 1518 bytes. It also evaluates CPU usage, queue configuration, and latency stability across various test conditions. Key Report Highlights: Line-rate throughput is achievable at common frame sizes when vCPUs are pinned correctly and memory is properly configured Low jitter and consistent latency are observed across multi-queue and multi-core tests Performance scales nearly linearly with additional cores, especially for smaller packet sizes Queue and PMD thread alignment with the NUMA layout plays a critical role in maximizing efficiency All tests were performed using Azure VM SKUs equipped with Microsoft NICs and configured for optimal isolation and performance. Why We Shared This with the Community Publishing this report reflects our commitment to open engineering and ecosystem collaboration. We believe performance transparency benefits everyone in the ecosystem, including developers, operators, and customers. Here are a few reasons why we share: It helps customers plan and tune their workloads using validated performance envelopes It enables vendors and contributors to optimize drivers, firmware, and applications based on real-world data It encourages reproducibility and standardization in cloud DPDK benchmarking It creates a feedback loop between Azure, the DPDK community, and our partners Our goal is not just to test internally but to foster open dialogue and measurable improvement across platforms. Recommendations for Running DPDK on Azure Based on the test results, we offer the following best practices for customers deploying DPDK-based applications: Area Recommendation VM Selection Choose Accelerated Networking-enabled SKUs like D, Fsv2, or Eav4 CPU Pinning Use dedicated cores for PMD threads and align with NUMA topology Memory Configure hugepages and allocate memory from the local NUMA node Queue Mapping Match RX and TX queues to available vCPUs to avoid contention Packet Generator Use pktgen-dpdk or testpmd with controlled traffic profiles These settings can significantly improve consistency and peak throughput across many DPDK scenarios. Get Involved and Reproduce the Results We invite you to read the full report and try the configurations in your own environment. Whether you are running a firewall, a router, or a telemetry appliance, DPDK on Azure offers scalable performance with the right tuning. You can: Download the report at Microsoft Azure DPDK Performance Report Replicate the test setup using Azure VMs and your preferred packet generator github.com/mcgov/dpdk-perf Share your feedback with us through GitHub or community channels or send feedback dpdk@microsoft.com Suggest improvements or contribute new scenarios to future performance reports Conclusion DPDK is a powerful enabler of high-performance networking in the cloud. With this report, we aim to make Azure performance data open, useful, and actionable. It reflects our ongoing investment in validating and improving the underlying infrastructure that supports mission-critical workloads. We thank the DPDK community for ongoing collaboration. We look forward to continued engagement as we scale performance transparency in cloud-native environments.152Views0likes0CommentsBuilding Bridges: Microsoft’s Participation in the Fedora Linux Community
At Microsoft, we believe that meaningful open source participation is driven by people, not corporations. But companies can - and should - create the conditions that empower individuals to contribute. Over the past year, our Community Linux Engineering team has been doing just that, focusing on Fedora Linux and working closely with the community to improve infrastructure, tooling, and collaboration. This post shares some of the highlights of that work and outlines where we’re headed next. Modernizing Fedora Cloud Image Delivery One of our most impactful contributions this year has been expanding the availability of Fedora Cloud images across major cloud platforms. We introduced support for publishing images to both the Azure Community Gallery and Google Cloud Platform—capabilities that didn’t exist before. At the same time, we modernized the existing AWS image publishing process by migrating it to a new, OpenShift-hosted automation framework. This new system, developed by our team and led by engineer Jeremy Cline, streamlines image delivery across all three platforms and positions the project to scale and adapt more easily in the future. We partnered with Adam Williamson in Fedora QE to extend this tooling to support container image uploads, replacing fragile shell scripts with a robust, maintainable system. Nightly Fedora builds are now uploaded to Azure, with one periodically promoted to “latest” after manual validation and basic functionality testing. This ensures cloud users get up-to-date, ready-to-run images - critical for workloads that demand fast boot times and minimal setup. As you’ll see , we have ideas for improving this testing. Enabling Secure Boot on ARM with Sigul Secure Boot is essential for trusted cloud workloads across architectures. Our current focus includes enabling it on ARM-based systems. Fedora currently signs most artifacts with Sigul, but UEFI applications are handled separately via a dedicated x86_64 builder with a smart card. We’re working to enable Sigul-based signing for UEFI applications across architectures, but Sigul is a complex project with unmaintained dependencies. We’ve stepped in to help modernize Sigul, starting with a Rust-based client and a roadmap to re-architect the code and structure for easier maintenance and improved performance. This work is about more than just Microsoft’s needs - it’s about enabling Secure Boot support out of the box, like what users expect on x86_64 systems. Bringing Inspektor Gadget to Fedora Inspektor Gadget is an eBPF-based toolkit for kernel instrumentation, enabling powerful observability use cases like performance profiling and syscall tracing. The Community Linux Engineering team consulted with the Inspektor Gadget maintainers at Microsoft about putting the project in Fedora. This led to the maintainers natively packaging it for Fedora and assuming ongoing maintenance of the package. We are encouraging teams to become active Fedora participants, to maintain their own packages, and to engage directly with the community. We believe in bi-directional feedback: upstream contributions should benefit both the project and the contributors. Azure VM Utils: Simplifying Cloud Enablement To streamline Fedora’s compatibility with Azure, we’ve introduced a package called azure-vm-utils. It consolidates Udev rules and low-level utilities that make Fedora work better on Azure infrastructure, particularly with NVMe devices. This package is a step toward greater transparency and maintainability and could serve as a model for other cloud providers. Fedora WSL: A Layer 9 Success Fedora is now officially available in the Windows Subsystem for Linux (WSL) catalog - a milestone that required both technical and organizational effort. While the engineering work was substantial, the real challenge was navigating the legal and governance landscape. This success reflects deep collaboration between Fedora leadership, Red Hat, and Microsoft. Looking Ahead: Strategic Participation and Testing We’re not stopping here. Our roadmap includes: Replacing Sigul with a modern, maintainable signing infrastructure. Expanding participation in Fedora SIGs (Cloud, Go, Rust) where Microsoft has relevant expertise. Improving automated testing using Microsoft’s open source LISA framework to validate Fedora images at cloud scale. Enhancing the Fedora-on-Azure experience, including exploring mirrors within Azure and expanding agent/extension support. We’re also working closely with the Azure Linux team, which is aligning its development model with Fedora - much like RHEL does. while Azure Linux has used some Fedora sources in the past, their upcoming 4.0 release is intended to align much more closely with Fedora as an upstream A Call for Collaboration While contributing patches is a good start, we intend to do much more. We aim to be a deeply involved member of the Fedora community - participating in SIGs, maintaining packages, and listening to feedback. If you have ideas for where Microsoft can make strategic investments that benefit Fedora, we want to hear them. You’ll find us alongside you in Fedora meetings, forums, and at conferences like Flock. Open source thrives when contributors bring their whole selves to the table. At Microsoft, we’re working to ensure our engineers can do just that - by aligning company goals with community value. (This post is based on a talk delivered at Flock to Fedora 2025.)1.9KViews3likes0CommentsAzure Linux: Driving Security in the Era of AI Innovation
Microsoft is advancing cloud and AI innovation with a clear focus on security, quality, and responsible practices. At Ignite 2025, Azure Linux reflects that commitment. As Microsoft’s ubiquitous Linux OS, it powers critical services and serves as the hub for security innovation. This year’s announcements, Azure Linux with OS Guard public preview and GA of pod sandboxing, reinforce security as one of our core priorities, helping customers build and run workloads with confidence in an increasingly complex threat landscape. Announcing OS Guard Public Preview We’re excited to announce the public preview of Azure Linux with OS Guard at Ignite 2025! OS Guard delivers a hardened, immutable container host built on the FedRAMP-certified Azure Linux base image. It introduces a significantly streamlined footprint with approximately 100 fewer packages than the standard Azure Linux image, reducing the attack surface and improving performance. FIPS mode is enforced by default, ensuring compliance for regulated workloads right out of the box. Additional security features include dm-verity for filesystem immutability, Trusted Launch backed by vTPM-secured keys, and seamless integration with AKS for container workloads. Built with upstream transparency and active Microsoft contributions, OS Guard provides a secure foundation for containerized applications while maintaining operational simplicity. During the preview period, code integrity and mandatory access Control (SELinux) are enabled in audit mode, allowing customers to validate policies and prepare for enforcement without impacting workloads. General Availability: Pod Sandboxing for stronger isolation on AKS We’re also announcing the GA of pod sandboxing on AKS, delivering stronger workload isolation for multi-tenant and regulated environments. Based on the open source Kata project, Pod Sandboxing introduces VM-level isolation for containerized workloads by running each pod inside its own lightweight virtual machine using Kata Containers, providing a stronger security boundary compared to traditional containers. Connect with us at Ignite Meet the Azure Linux team and see these innovations in action: Ignite: Join us at our breakout session (https://ignite.microsoft.com/en-US/sessions/BRK144) and visit the Linux on Azure Booth for live demos and deep dives. Session Type Session Code Session Name Date/Time (PST) Breakout BRK 143 Optimizing performance, deployments, and security for Linux on Azure Thu, Nov 20/ 1:00 PM – 1:45 PM Breakout BRK 144 Build, modernize, and secure AKS workloads with Azure Linux Wed, Nov 19/ 1:30 PM – 2:15 PM Breakout BRK 104 From VMs and containers to AI apps with Azure Red Hat OpenShift Thu, Nov 20/ 8:30 AM – 9:15 AM Theatre TRH 712 Hybrid workload compliance from policy to practice on Azure Tue, Nov 18/ 3:15 PM – 3:45 PM Theatre THR 701 From Container to Node: Building Minimal-CVE Solutions with Azure Linux Wed, Nov 19/ 3:30 PM – 4:00 PM Lab Lab 505 Fast track your Linux and PostgreSQL migration with Azure Migrate Tue, Nov 18/ 4:30 PM – 5:45 PM PST Wed, Nov 19/ 3:45 PM – 5:00 PM PST Thu, Nov 20/ 9:00 AM – 10:15 AM PST Whether you’re migrating workloads, exploring security features, or looking to engage with our engineering team, we’re eager to connect and help you succeed with Azure Linux. Resources to get started Azure Linux OS Guard Overview & QuickStart: https://aka.ms/osguard Pod Sandboxing Overview & QuickStart: https://aka.ms/podsandboxing Azure Linux Documentation: https://learn.microsoft.com/en-us/azure/azure-linux/742Views3likes0Comments