linux on azure
59 TopicsDissecting LLM Container Cold-Start: Where the Time Actually Goes
Cold-start latency determines whether GPU clusters can scale to zero, how fast they can autoscale, and whether bursty or low-QPS workloads are economically viable. Most optimization effort targets the container pull path – faster registries, lazy-pull snapshotters, different compression formats. But “cold-start” is actually a composite of pull, runtime startup, and model initialization, and the dominant phase varies dramatically by inference engine. An optimization that cuts time-to-first-token for one engine can be irrelevant for another, even on identical infrastructure. What we measured We decomposed cold-start for two architecturally different engines – vLLM (Python/CUDA, heavy JIT compilation) and llama.cpp (C++, minimal runtime) – running Llama 3.1 8B on A100 GPUs. Every run starts from a completely clean slate: containerd stopped, all state wiped, kernel page caches dropped. No warm starts, no pre-pulling, no caching. We break TTFT into three phases: pull (download + decompression + snapshot creation), startup (container start → server ready), and first inference (first API response, including model weight loading for engines that defer it). We tested across three snapshotters (overlayfs, EROFS, Nydus) with gzip and uncompressed images, pulling from same-region Azure Container Registry. Setup All experiments ran on an NVIDIA A100 80GB (Azure NC24ads_A100_v4), pulling from same-region Azure Container Registry. Images were built with AIKit, which produces ModelPack-compliant OCI artifacts with uncompressed model weight layers, Cosign signatures, SBOMs, and provenance attestations. These are supply chain properties you lose when model weights live on a shared drive. vLLM: startup dominates, pull barely matters vLLM loads model weights, runs torch.compile, captures CUDA graphs for multiple batch shapes, allocates KV cache, and warms up, all before serving the first request. This takes ~176 seconds regardless of how fast the image arrived. The breakdown makes the bottleneck obvious: the green bar (startup) is nearly constant across all four variants, swamping any pull-time differences. Figure 1: vLLM cold-start breakdown. Startup (green, ~176s) dominates regardless of snapshotter. Method Pull Startup 1st Inference TTFT overlayfs (gzip) 140.8s ±5.5 176.0s ±3.2 0.16s 317.2s ±2.2 overlayfs (uncomp.) 129.9s ±3.3 180.8s ±12.2 0.16s 310.9s ±8.9 EROFS (gzip) 158.9s ±8.8 175.3s ±0.8 0.16s 334.4s ±8.7 EROFS (uncomp.) 166.3s ±21.1 177.3s ±12.8 0.16s 343.8s ±8.2 Llama 3.1 8B Q4_K_M, ~14 GB image, n=2–3 per variant. ± = sample standard deviation. Three of twelve runs hit intermittent NVIDIA container runtime crashes (exit code 120, unrelated to snapshotters) and were excluded. We excluded Nydus because FUSE-streaming the 14 GB Python/CUDA stack caused startup to exceed 900s. Steady-state inference: ~0.134s across all snapshotters. 44% pull, 56% startup. Dropping gzip saves 11 seconds on a 317-second cold start (1.02x). If your engine is vLLM, optimizing the pull pipeline is the wrong lever. llama.cpp: pull dominates, compression is the bottleneck llama.cpp has the opposite profile. Its C++ runtime starts in 2–5 seconds, so the pull becomes the majority of cold-start. This is where filesystem and compression choices actually matter. Here the picture flips. Pull (blue) is the widest bar, and the gzip-to-uncompressed difference is visible at a glance: Figure 2: llama.cpp cold-start breakdown. Pull time (blue) dominates for gzip variants. Method Pull Startup 1st Inference TTFT overlayfs (gzip) 88.3s ±0.2 5.3s ±0.5 45.1s ±1.4 138.8s ±0.8 overlayfs (uncomp.) 56.3s ±3.1 2.0s ±0.0 44.2s ±0.1 102.4s ±3.1 EROFS (gzip) 92.0s ±2.3 6.1s ±0.5 44.0s ±0.2 142.3s ±1.9 EROFS (uncomp.) 58.8s ±0.6 2.0s ±0.0 44.0s ±0.1 104.8s ±0.5 Llama 3.1 8B Q4_K_M, ~8 GB image, n=3 per variant, 12/12 runs succeeded. First inference includes model weight loading into GPU VRAM (~43s) plus token generation (~1.5s). Steady-state inference: ~1.5s across all snapshotters. 64% pull, 4% startup, 33% model loading. Dropping gzip saves 32 seconds (1.35x) with zero infrastructure changes. Engine comparison Placed side by side, the two engines tell opposite stories about the same infrastructure: Figure 3: Where cold-start time goes. vLLM is compute-bound; llama.cpp is pull-bound. vLLM llama.cpp Time saved by dropping gzip 11s (3% of TTFT) 32s (23% of TTFT) Startup time 176–181s 2–5s Speedup from dropping gzip 1.02x 1.35x Same optimization, completely different impact. Before investing in pull optimization (compression changes, lazy-pull infrastructure, registry tuning), profile your engine’s startup. If startup dominates, the pull isn’t where the time goes. Why gzip hurts: model weights are incompressible The AIKit image is 8.7 GB uncompressed, 6.6 GB with gzip (a modest 0.76x ratio). But this ratio hides what’s really happening: Layer type Size % of image Gzip ratio Model weights (GGUF) 4.9 GB 56% ~1.00x (quantized binary, no redundancy) CUDA + system layers ~3.8 GB 44% ~0.46x (compresses well) The GGUF file is already quantized to 4-bit precision. Gzip reads every byte, burns CPU, and produces output the same size as the input. You’re paying full decompression cost on 56% of the image for zero size reduction. Bottom line: gzip is doing real work on less than half your image and producing zero savings on the rest. Dropping it costs nothing and removes a bottleneck from every cold start. The Nydus prefetch finding If decompression is the bottleneck, what about skipping the full pull entirely? Nydus lazy-pull takes a fundamentally different approach: it fetches only manifest metadata during “pull” (~0.7s), then streams model data on-demand via FUSE as the container reads it. Nydus TTFT isn’t directly comparable to the full-pull methods above because the download cost shifts from the pull column to the inference column. With prefetch enabled, Nydus achieved 77.8s TTFT for llama.cpp vs 139.1s for overlayfs gzip. The critical detail is the prefetch_all flag: Figure 4: Nydus prefetch ON vs OFF. One config flag, 2.87x difference. Overlayfs gzip shown as baseline. Configuration 1st Inference TTFT Nydus, prefetch ON 72.4s ±0.6 77.8s ±0.5 Nydus, prefetch OFF 218.6s ±2.9 223.4s ±2.9 overlayfs gzip (baseline) 44.0s ±0.4 139.1s ±1.9 n=3 per config, 9/9 runs succeeded. Data: 03-prefetch-config-20260401-030725.csv One flag in nydusd-config.json, 2.87x difference. Without prefetch, every model weight page fault fires an individual HTTP range request to the registry. With prefetch_all=true, Nydus streams the full blob in the background while the container starts, so chunks arrive ahead of the GPU’s read pattern. Even with prefetch, Nydus first inference is ~28s slower than overlayfs (72s vs 44s) due to FUSE kernel-user roundtrips during model mmap. Nydus wins on total TTFT because it eliminates the blocking pull, but this overhead means its advantage shrinks on faster networks. Bottom line: Nydus lazy-pull can halve cold-start for pull-bound engines, but only if prefetch is on. Treat prefetch_all=true as a hard requirement, not a tuning knob. How to apply these findings Pick your optimization by engine type The right optimization depends on where your engine spends its cold-start time. This table summarizes the tradeoffs: Engine type Dominant phase Speedup from dropping gzip Nydus viable? Best optimization What NOT to optimize vLLM / TensorRT-LLM Startup (56%) 1.02x — negligible No — FUSE + Python/CUDA stack exceeded 900s in our tests Cache torch.compile artifacts and CUDA graphs Pull pipeline (it’s <44% of TTFT and already fast enough) llama.cpp / ONNX Runtime Pull (64%) 1.35x — 32s saved Yes, with prefetch_all=true (77.8s TTFT vs 139s gzip baseline) Drop gzip on weight layers; consider lazy-pull on slow links Startup (already 2–5s; no room to improve) Large dense models (70B+) Pull (projected) >1.35x — scales with image size Yes, strongest case for lazy-pull Uncompressed or zstd; Nydus prefetch on bandwidth-constrained links — Recommendations Profile your engine’s startup before touching the pull pipeline. If CUDA compilation dominates (vLLM, TensorRT-LLM), no amount of pull optimization will help. Cache torch.compile artifacts and CUDA graphs instead — production clusters that do this reduce vLLM restarts to ~45–60s. Drop gzip on model weight layers. For pull-bound engines (llama.cpp, ONNX Runtime), this is the single highest-ROI change: build with --output=type=image,compression=uncompressed, or use AIKit, which defaults to uncompressed weight layers. Quantized model weights (GGUF, safetensors) are already dense binary — gzip burns CPU for a ~1.00x compression ratio on 56% of the image. If using Nydus, set prefetch_all=true. Without it, every weight page fault triggers an individual HTTP range request and cold-start is 2.87x slower. This is a single flag in nydusd-config.json. Package models as signed OCI artifacts, not volume mounts. Three CNCF projects implement this pipeline end-to-end: ModelPack defines the OCI artifact spec (model metadata, architecture, quantization format). AIKit builds ModelPack-compliant images with Cosign signatures, SBOMs, and provenance attestations — supply chain guarantees you lose when weights live on a shared drive. KAITO handles the Kubernetes deployment: GPU node provisioning, inference engine setup, and API exposure. Together they cover packaging → build → deploy, and they produce the exact image layout these benchmarks measured. Why this matters: the cost of cold-start On an A100 node (~$3–4/hr on major clouds), a 5-minute vLLM cold start burns ~$0.30 in idle GPU time per pod. That sounds small until you multiply it: a cluster that scales 50 pods to zero overnight and restarts them each morning wastes ~$15/day — over $5,000/year — on GPUs sitting idle during pull and CUDA compilation. More critically, cold-start latency determines whether scale-to-zero is feasible at all. If cold-start exceeds your SLO (say, 30s for an interactive app), you’re forced to keep warm replicas running 24/7, which can 2–3x your GPU spend. Cutting llama.cpp cold-start from 139s to 103s by dropping gzip doesn’t just save 36 seconds — it moves the needle on whether autoscaling is viable for your workload. What this doesn’t cover zstd compression: decompresses 5–10x faster than gzip; containerd supports it natively. The most obvious gap in this analysis. Pre-pulling and caching: production clusters pre-pull images and cache CUDA graphs, reducing vLLM restarts to ~45–60s. We measure the cold case: scale-from-zero events and first-time deployments. Volume-mounted weights: skips the pull entirely, but loses supply chain properties (signing, scanning, provenance). Larger models (70B+): pull would dominate more, increasing the gzip penalty. Sample size: n=3 per AIKit variant, n=2–3 per vLLM variant. The gzip finding for llama.cpp is statistically significant (Welch’s t-test, p=0.0014, Cohen’s d=16.3; verification script). Other comparisons are directional. Reproduce it Scripts and raw data: erofs-repro-repo. Data for this post: 02-aikit-five-way-20260401-004716.csv and 01-vllm-four-way-20260331-113848.csv. Full analysis: technical report.64Views0likes0CommentsRun OpenClaw Agents on Azure Linux VMs (with Secure Defaults)
Many teams want an enterprise-ready personal AI assistant, but they need it on infrastructure they control, with security boundaries they can explain to IT. That is exactly where OpenClaw fits on Azure. OpenClaw is a self-hosted, always-on personal agent runtime you run in your enterprise environment and Azure infrastructure. Instead of relying only on a hosted chat app from a third-party provider, you can deploy, operate, and experiment with an agent on an Azure Linux VM you control — using your existing GitHub Copilot licenses, Azure OpenAI deployments, or API plans from OpenAI, Anthropic Claude, Google Gemini, and other model providers you already subscribe to. Once deployed on Azure, you can interact with an OpenClaw agent through familiar channels like Microsoft Teams, Slack, Telegram, WhatsApp, and many more! For Azure users, this gives you a practical middle ground: modern personal-agent workflows on familiar Azure infrastructure. What is OpenClaw, and how is it different from ChatGPT/Claude/chat apps? OpenClaw is a self-hosted personal agent runtime that can be hosted on Azure compute infrastructure. How it differs: ChatGPT/Claude apps are primarily hosted chat experiences tied to one provider's models OpenClaw is an always-on runtime you operate yourself, backed by your choice of model provider — GitHub Copilot, Azure OpenAI, OpenAI, Anthropic Claude, Google Gemini, and others OpenClaw lets you keep the runtime boundary in your own Azure VM environment within your Azure enterprise subscription In practice, OpenClaw is useful when you want a persistent assistant for operational and workflow tasks, with your own infrastructure as the control point. You bring whatever model provider and API plan you already have — OpenClaw connects to it. Why Azure Linux VMs? Azure Linux VMs are a strong fit because they provide: A suitable host machine for the OpenClaw agent to run on Enterprise-friendly infrastructure and identity workflows Repeatable provisioning via the Azure CLI Network hardening with NSG rules Managed SSH access through Azure Bastion instead of public SSH exposure How to Set Up OpenClaw on an Azure Linux VM This guide sets up an Azure Linux VM, applies NSG (Network Security Group) hardening, configures Azure Bastion for managed SSH access, and installs an always-on OpenClaw agent within the VM that you can interact with through various messaging channels. What you'll do Create Azure networking (VNet, subnets, NSG) and compute resources with the Azure CLI Apply Network Security Group rules so VM SSH is allowed only from Azure Bastion Use Azure Bastion for SSH access (no public IP on the VM) Install OpenClaw on the Azure VM Verify OpenClaw installation and configuration on the VM What you need An Azure subscription with permission to create compute and network resources Azure CLI installed (install steps) An SSH key pair (the guide covers generating one if needed) ~20–30 minutes Configure deployment Step 1: Sign in to Azure CLI az login # Select a suitable Azure subscription during Azure login az extension add -n ssh # SSH extension is required for Azure Bastion SSH The ssh extension is required for Azure Bastion native SSH tunneling. Step 2: Register required resource providers (one-time) Register required Azure Resource Providers (one time registration): az provider register --namespace Microsoft.Compute az provider register --namespace Microsoft.Network Verify registration. Wait until both show Registered. az provider show --namespace Microsoft.Compute --query registrationState -o tsv az provider show --namespace Microsoft.Network --query registrationState -o tsv Step 3: Set deployment variables Set the deployment environment variables that will be needed throughout this guide. RG="rg-openclaw" LOCATION="westus2" VNET_NAME="vnet-openclaw" VNET_PREFIX="10.40.0.0/16" VM_SUBNET_NAME="snet-openclaw-vm" VM_SUBNET_PREFIX="10.40.2.0/24" BASTION_SUBNET_PREFIX="10.40.1.0/26" NSG_NAME="nsg-openclaw-vm" VM_NAME="vm-openclaw" ADMIN_USERNAME="openclaw" BASTION_NAME="bas-openclaw" BASTION_PIP_NAME="pip-openclaw-bastion" Adjust names and CIDR ranges to fit your environment. The Bastion subnet must be at least /26. Step 4: Select SSH key Use your existing public key if you have one: SSH_PUB_KEY="$(cat ~/.ssh/id_ed25519.pub)" If you don't have an SSH key yet, generate one: ssh-keygen -t ed25519 -a 100 -f ~/.ssh/id_ed25519 -C "you@example.com" SSH_PUB_KEY="$(cat ~/.ssh/id_ed25519.pub)" Step 5: Select VM size and OS disk size VM_SIZE="Standard_B2as_v2" OS_DISK_SIZE_GB=64 Choose a VM size and OS disk size available in your subscription and region: Start smaller for light usage and scale up later Use more vCPU/RAM/disk for heavier automation, more channels, or larger model/tool workloads If a VM size is unavailable in your region or subscription quota, pick the closest available SKU List VM sizes available in your target region: az vm list-skus --location "${LOCATION}" --resource-type virtualMachines -o table Check your current vCPU and disk usage/quota: az vm list-usage --location "${LOCATION}" -o table Deploy Azure resources Step 1: Create the resource group The Azure resource group will contain all of the Azure resources that the OpenClaw agent needs. az group create -n "${RG}" -l "${LOCATION}" Step 2: Create the network security group Create the NSG and add rules so only the Bastion subnet can SSH into the VM. az network nsg create \ -g "${RG}" -n "${NSG_NAME}" -l "${LOCATION}" # Allow SSH from the Bastion subnet only az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n AllowSshFromBastionSubnet --priority 100 \ --access Allow --direction Inbound --protocol Tcp \ --source-address-prefixes "${BASTION_SUBNET_PREFIX}" \ --destination-port-ranges 22 # Deny SSH from the public internet az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n DenyInternetSsh --priority 110 \ --access Deny --direction Inbound --protocol Tcp \ --source-address-prefixes Internet \ --destination-port-ranges 22 # Deny SSH from other VNet sources az network nsg rule create \ -g "${RG}" --nsg-name "${NSG_NAME}" \ -n DenyVnetSsh --priority 120 \ --access Deny --direction Inbound --protocol Tcp \ --source-address-prefixes VirtualNetwork \ --destination-port-ranges 22 The rules are evaluated by priority (lowest number first): Bastion traffic is allowed at 100, then all other SSH is blocked at 110 and 120. Step 3: Create the virtual network and subnets Create the VNet with the VM subnet (NSG attached), then add the Bastion subnet. az network vnet create \ -g "${RG}" -n "${VNET_NAME}" -l "${LOCATION}" \ --address-prefixes "${VNET_PREFIX}" \ --subnet-name "${VM_SUBNET_NAME}" \ --subnet-prefixes "${VM_SUBNET_PREFIX}" # Attach the NSG to the VM subnet az network vnet subnet update \ -g "${RG}" --vnet-name "${VNET_NAME}" \ -n "${VM_SUBNET_NAME}" --nsg "${NSG_NAME}" # AzureBastionSubnet — name is required by Azure az network vnet subnet create \ -g "${RG}" --vnet-name "${VNET_NAME}" \ -n AzureBastionSubnet \ --address-prefixes "${BASTION_SUBNET_PREFIX}" Step 4: Create the Virtual Machine Create the VM with no public IP. SSH access for OpenClaw configuration will be exclusively through Azure Bastion. az vm create \ -g "${RG}" -n "${VM_NAME}" -l "${LOCATION}" \ --image "Canonical:ubuntu-24_04-lts:server:latest" \ --size "${VM_SIZE}" \ --os-disk-size-gb "${OS_DISK_SIZE_GB}" \ --storage-sku StandardSSD_LRS \ --admin-username "${ADMIN_USERNAME}" \ --ssh-key-values "${SSH_PUB_KEY}" \ --vnet-name "${VNET_NAME}" \ --subnet "${VM_SUBNET_NAME}" \ --public-ip-address "" \ --nsg "" --public-ip-address "" prevents a public IP from being assigned. --nsg "" skips creating a per-NIC NSG (the subnet-level NSG created earlier handles security). Reproducibility: The command above uses latest for the Ubuntu image. To pin a specific version, list available versions and replace latest: az vm image list \ --publisher Canonical --offer ubuntu-24_04-lts \ --sku server --all -o table Step 5: Create Azure Bastion Azure Bastion provides secure-managed SSH access to the VM without exposing a public IP. Bastion Standard SKU with tunneling is required for CLI-based "az network bastion ssh" command. az network public-ip create \ -g "${RG}" -n "${BASTION_PIP_NAME}" -l "${LOCATION}" \ --sku Standard --allocation-method Static az network bastion create \ -g "${RG}" -n "${BASTION_NAME}" -l "${LOCATION}" \ --vnet-name "${VNET_NAME}" \ --public-ip-address "${BASTION_PIP_NAME}" \ --sku Standard --enable-tunneling true Bastion provisioning typically takes 5–10 minutes but can take up to 15–30 minutes in some regions. Step 6: Verify Deployments After all resources are deployed, your resource group should look like the following: Install OpenClaw Step 1: SSH into the VM through Azure Bastion VM_ID="$(az vm show -g "${RG}" -n "${VM_NAME}" --query id -o tsv)" az network bastion ssh \ --name "${BASTION_NAME}" \ --resource-group "${RG}" \ --target-resource-id "${VM_ID}" \ --auth-type ssh-key \ --username "${ADMIN_USERNAME}" \ --ssh-key ~/.ssh/id_ed25519 Step 2: Install OpenClaw (in the Bastion SSH shell) curl -fsSL https://openclaw.ai/install.sh | bash The installer installs Node LTS and dependencies if not already present, installs OpenClaw, and launches the OpenClaw onboarding wizard. For more information, see the open source OpenClaw install docs. OpenClaw Onboarding: Choosing an AI Model Provider During OpenClaw onboarding, you'll choose the AI model provider for the OpenClaw agent. This can be GitHub Copilot, Azure OpenAI, OpenAI, Anthropic Claude, Google Gemini, or another supported provider. See the open source OpenClaw install docs for details on choosing an AI model provider when going through the onboarding wizard. Most enterprise Azure teams already have GitHub Copilot licenses. If that is your case, we recommend choosing the GitHub Copilot provider in the OpenClaw onboarding wizard. See the open source OpenClaw docs on configuring GitHub Copilot as the AI model provider. OpenClaw Onboarding: Setting up Messaging Channels During OpenClaw onboarding, there will be an optional step where you can set up various messaging channels to interact with your OpenClaw agent. For first time users, we recommend setting up Telegram due to ease of setup. Other messaging channels such as Microsoft Teams, Slack, WhatsApp, and others can also be set up. To configure OpenClaw for messaging through chat channels, see the open source OpenClaw chat channels docs. Step 3: Verify OpenClaw Configuration To validate that everything was set up correctly, run the following commands within the same Bastion SSH session: openclaw status openclaw gateway status If there are any issues reported, you can run the onboarding wizard again with the steps above. Alternatively, you can run the following command: openclaw doctor Message OpenClaw Once you have configured the OpenClaw agent to be reachable via various messaging channels, you can verify that it is responsive by messaging it. Enhancing OpenClaw for Use Cases There you go! You now have a 24/7, always-on personal AI agent, living on its own Azure VM environment. For awesome OpenClaw use cases, check out the awesome-openclaw-usecases repository. To enhance your OpenClaw agent with additional AI skills so that it can autonomously perform multi-step operations on any domain, check out the awesome-openclaw-skills repository. You can also check out ClawHub and ClawSkills, two popular open source skills directories that can enhance your OpenClaw agent. Cleanup To delete all resources created by this guide: az group delete -n "${RG}" --yes --no-wait This removes the resource group and everything inside it (VM, VNet, NSG, Bastion, public IP). This also deletes the OpenClaw agent running within the VM. If you'd like to dive deeper about deploying OpenClaw on Azure, please check out the open source OpenClaw on Azure docs.5.4KViews5likes1CommentDPDK 25.11 Performance on Azure for High-Speed Packet Workloads
At Microsoft Azure, performance is treated as an ongoing discipline grounded in careful engineering and real-world validation. As cloud workloads grow in scale and variety, customers depend on consistent, high-throughput networking. Technologies such as the Data Plane Development Kit (DPDK) play a key role in meeting these expectations To support customers running advanced network functions, we’ve released our latest performance report based on DPDK 25.11. It is now available in the DPDK performance catalog (Microsoft Azure DPDK Performance Report). The report provides a clear view of how DPDK performs on Microsoft-developed Azure Boost within Azure infrastructure, with detailed insights into packet processing across a range of scenarios, from small packet sizes to multi-core scaling. Why We Test DPDK on Azure DPDK is widely used for high-performance packet processing in virtualized environments. It powers a range of workloads from customer-deployed virtual network functions to internal Azure network appliances. But simply enabling DPDK is not enough. To ensure optimal performance, we validate it under realistic conditions, including: Azure VM configurations with Accelerated Networking NUMA-aware memory and CPU alignment Hugepage-backed memory allocation Multi-core PMD thread scaling Packet forwarding using real traffic generators This helps us understand how DPDK performs in actual cloud environments, not just idealized lab setups. What the Report Covers The DPDK 25.11 report includes performance benchmarks across different frame sizes, ranging from 64 bytes to 1518 bytes. It also evaluates CPU usage, queue configuration, and latency stability across various test conditions. Key Report Highlights: Line-rate throughput is achievable at common frame sizes when vCPUs are pinned correctly and memory is properly configured Low jitter and consistent latency are observed across multi-queue and multi-core tests Performance scales nearly linearly with additional cores, especially for smaller packet sizes Queue and PMD thread alignment with the NUMA layout plays a critical role in maximizing efficiency All tests were performed using Azure VM SKUs equipped with Microsoft NICs and configured for optimal isolation and performance. Why We Shared This with the Community Publishing this report reflects our commitment to open engineering and ecosystem collaboration. We believe performance transparency benefits everyone in the ecosystem, including developers, operators, and customers. Here are a few reasons why we share: It helps customers plan and tune their workloads using validated performance envelopes It enables vendors and contributors to optimize drivers, firmware, and applications based on real-world data It encourages reproducibility and standardization in cloud DPDK benchmarking It creates a feedback loop between Azure, the DPDK community, and our partners Our goal is not just to test internally but to foster open dialogue and measurable improvement across platforms. Recommendations for Running DPDK on Azure Based on the test results, we offer the following best practices for customers deploying DPDK-based applications: Area Recommendation VM Selection Choose Accelerated Networking-enabled SKUs like D, Fsv2, or Eav4 CPU Pinning Use dedicated cores for PMD threads and align with NUMA topology Memory Configure hugepages and allocate memory from the local NUMA node Queue Mapping Match RX and TX queues to available vCPUs to avoid contention Packet Generator Use pktgen-dpdk or testpmd with controlled traffic profiles These settings can significantly improve consistency and peak throughput across many DPDK scenarios. Get Involved and Reproduce the Results We invite you to read the full report and try the configurations in your own environment. Whether you are running a firewall, a router, or a telemetry appliance, DPDK on Azure offers scalable performance with the right tuning. You can: Download the report at Microsoft Azure DPDK Performance Report Replicate the test setup using Azure VMs and your preferred packet generator github.com/mcgov/dpdk-perf Share your feedback with us through GitHub or community channels or send feedback dpdk@microsoft.com Suggest improvements or contribute new scenarios to future performance reports Conclusion DPDK is a powerful enabler of high-performance networking in the cloud. With this report, we aim to make Azure performance data open, useful, and actionable. It reflects our ongoing investment in validating and improving the underlying infrastructure that supports mission-critical workloads. We thank the DPDK community for ongoing collaboration. We look forward to continued engagement as we scale performance transparency in cloud-native environments.98Views0likes0CommentsBuilding Bridges: Microsoft’s Participation in the Fedora Linux Community
At Microsoft, we believe that meaningful open source participation is driven by people, not corporations. But companies can - and should - create the conditions that empower individuals to contribute. Over the past year, our Community Linux Engineering team has been doing just that, focusing on Fedora Linux and working closely with the community to improve infrastructure, tooling, and collaboration. This post shares some of the highlights of that work and outlines where we’re headed next. Modernizing Fedora Cloud Image Delivery One of our most impactful contributions this year has been expanding the availability of Fedora Cloud images across major cloud platforms. We introduced support for publishing images to both the Azure Community Gallery and Google Cloud Platform—capabilities that didn’t exist before. At the same time, we modernized the existing AWS image publishing process by migrating it to a new, OpenShift-hosted automation framework. This new system, developed by our team and led by engineer Jeremy Cline, streamlines image delivery across all three platforms and positions the project to scale and adapt more easily in the future. We partnered with Adam Williamson in Fedora QE to extend this tooling to support container image uploads, replacing fragile shell scripts with a robust, maintainable system. Nightly Fedora builds are now uploaded to Azure, with one periodically promoted to “latest” after manual validation and basic functionality testing. This ensures cloud users get up-to-date, ready-to-run images - critical for workloads that demand fast boot times and minimal setup. As you’ll see , we have ideas for improving this testing. Enabling Secure Boot on ARM with Sigul Secure Boot is essential for trusted cloud workloads across architectures. Our current focus includes enabling it on ARM-based systems. Fedora currently signs most artifacts with Sigul, but UEFI applications are handled separately via a dedicated x86_64 builder with a smart card. We’re working to enable Sigul-based signing for UEFI applications across architectures, but Sigul is a complex project with unmaintained dependencies. We’ve stepped in to help modernize Sigul, starting with a Rust-based client and a roadmap to re-architect the code and structure for easier maintenance and improved performance. This work is about more than just Microsoft’s needs - it’s about enabling Secure Boot support out of the box, like what users expect on x86_64 systems. Bringing Inspektor Gadget to Fedora Inspektor Gadget is an eBPF-based toolkit for kernel instrumentation, enabling powerful observability use cases like performance profiling and syscall tracing. The Community Linux Engineering team consulted with the Inspektor Gadget maintainers at Microsoft about putting the project in Fedora. This led to the maintainers natively packaging it for Fedora and assuming ongoing maintenance of the package. We are encouraging teams to become active Fedora participants, to maintain their own packages, and to engage directly with the community. We believe in bi-directional feedback: upstream contributions should benefit both the project and the contributors. Azure VM Utils: Simplifying Cloud Enablement To streamline Fedora’s compatibility with Azure, we’ve introduced a package called azure-vm-utils. It consolidates Udev rules and low-level utilities that make Fedora work better on Azure infrastructure, particularly with NVMe devices. This package is a step toward greater transparency and maintainability and could serve as a model for other cloud providers. Fedora WSL: A Layer 9 Success Fedora is now officially available in the Windows Subsystem for Linux (WSL) catalog - a milestone that required both technical and organizational effort. While the engineering work was substantial, the real challenge was navigating the legal and governance landscape. This success reflects deep collaboration between Fedora leadership, Red Hat, and Microsoft. Looking Ahead: Strategic Participation and Testing We’re not stopping here. Our roadmap includes: Replacing Sigul with a modern, maintainable signing infrastructure. Expanding participation in Fedora SIGs (Cloud, Go, Rust) where Microsoft has relevant expertise. Improving automated testing using Microsoft’s open source LISA framework to validate Fedora images at cloud scale. Enhancing the Fedora-on-Azure experience, including exploring mirrors within Azure and expanding agent/extension support. We’re also working closely with the Azure Linux team, which is aligning its development model with Fedora - much like RHEL does. while Azure Linux has used some Fedora sources in the past, their upcoming 4.0 release is intended to align much more closely with Fedora as an upstream A Call for Collaboration While contributing patches is a good start, we intend to do much more. We aim to be a deeply involved member of the Fedora community - participating in SIGs, maintaining packages, and listening to feedback. If you have ideas for where Microsoft can make strategic investments that benefit Fedora, we want to hear them. You’ll find us alongside you in Fedora meetings, forums, and at conferences like Flock. Open source thrives when contributors bring their whole selves to the table. At Microsoft, we’re working to ensure our engineers can do just that - by aligning company goals with community value. (This post is based on a talk delivered at Flock to Fedora 2025.)1.9KViews3likes0CommentsAzure Linux: Driving Security in the Era of AI Innovation
Microsoft is advancing cloud and AI innovation with a clear focus on security, quality, and responsible practices. At Ignite 2025, Azure Linux reflects that commitment. As Microsoft’s ubiquitous Linux OS, it powers critical services and serves as the hub for security innovation. This year’s announcements, Azure Linux with OS Guard public preview and GA of pod sandboxing, reinforce security as one of our core priorities, helping customers build and run workloads with confidence in an increasingly complex threat landscape. Announcing OS Guard Public Preview We’re excited to announce the public preview of Azure Linux with OS Guard at Ignite 2025! OS Guard delivers a hardened, immutable container host built on the FedRAMP-certified Azure Linux base image. It introduces a significantly streamlined footprint with approximately 100 fewer packages than the standard Azure Linux image, reducing the attack surface and improving performance. FIPS mode is enforced by default, ensuring compliance for regulated workloads right out of the box. Additional security features include dm-verity for filesystem immutability, Trusted Launch backed by vTPM-secured keys, and seamless integration with AKS for container workloads. Built with upstream transparency and active Microsoft contributions, OS Guard provides a secure foundation for containerized applications while maintaining operational simplicity. During the preview period, code integrity and mandatory access Control (SELinux) are enabled in audit mode, allowing customers to validate policies and prepare for enforcement without impacting workloads. General Availability: Pod Sandboxing for stronger isolation on AKS We’re also announcing the GA of pod sandboxing on AKS, delivering stronger workload isolation for multi-tenant and regulated environments. Based on the open source Kata project, Pod Sandboxing introduces VM-level isolation for containerized workloads by running each pod inside its own lightweight virtual machine using Kata Containers, providing a stronger security boundary compared to traditional containers. Connect with us at Ignite Meet the Azure Linux team and see these innovations in action: Ignite: Join us at our breakout session (https://ignite.microsoft.com/en-US/sessions/BRK144) and visit the Linux on Azure Booth for live demos and deep dives. Session Type Session Code Session Name Date/Time (PST) Breakout BRK 143 Optimizing performance, deployments, and security for Linux on Azure Thu, Nov 20/ 1:00 PM – 1:45 PM Breakout BRK 144 Build, modernize, and secure AKS workloads with Azure Linux Wed, Nov 19/ 1:30 PM – 2:15 PM Breakout BRK 104 From VMs and containers to AI apps with Azure Red Hat OpenShift Thu, Nov 20/ 8:30 AM – 9:15 AM Theatre TRH 712 Hybrid workload compliance from policy to practice on Azure Tue, Nov 18/ 3:15 PM – 3:45 PM Theatre THR 701 From Container to Node: Building Minimal-CVE Solutions with Azure Linux Wed, Nov 19/ 3:30 PM – 4:00 PM Lab Lab 505 Fast track your Linux and PostgreSQL migration with Azure Migrate Tue, Nov 18/ 4:30 PM – 5:45 PM PST Wed, Nov 19/ 3:45 PM – 5:00 PM PST Thu, Nov 20/ 9:00 AM – 10:15 AM PST Whether you’re migrating workloads, exploring security features, or looking to engage with our engineering team, we’re eager to connect and help you succeed with Azure Linux. Resources to get started Azure Linux OS Guard Overview & QuickStart: https://aka.ms/osguard Pod Sandboxing Overview & QuickStart: https://aka.ms/podsandboxing Azure Linux Documentation: https://learn.microsoft.com/en-us/azure/azure-linux/623Views3likes0CommentsFrom Policy to Practice: Built-In CIS Benchmarks on Azure - Flexible, Hybrid-Ready
Security is more important than ever. The industry-standard for secure machine configuration is the Center for Internet Security (CIS) Benchmarks. These benchmarks provide consensus-based prescriptive guidance to help organizations harden diverse systems, reduce risk, and streamline compliance with major regulatory frameworks and industry standards like NIST, HIPAA, and PCI DSS. In our previous post, we outlined our plans to improve the Linux server compliance and hardening experience on Azure and shared a vision for integrating CIS Benchmarks. Today, that vision has turned into reality. We're now announcing the next phase of this work: Center for Internet Security (CIS) Benchmarks are now available on Azure for all Azure endorsed distros, at no additional cost to Azure and Azure Arc customers. With today's announcement, you get access to the CIS Benchmarks on Azure with full parity to what’s published by the Center for Internet Security (CIS). You can adjust parameters or define exceptions, tailoring security to your needs and applying consistent controls across cloud, hybrid, and on-premises environments - without having to implement every control manually. Thanks to this flexible architecture, you can truly manage compliance as code. How we achieve parity To ensure accuracy and trust, we rely on and ingest CIS machine-readable Benchmark content (OVAL/XCCDF files) as the source of truth. This guarantees that the controls and rules you apply in Azure match the official CIS specifications, reducing drift and ensuring compliance confidence. What’s new under the hood At the core of this update is azure-osconfig’s new compliance engine - a lightweight, open-source module developed by the Azure Core Linux team. It evaluates Linux systems directly against industry-standard benchmarks like CIS, supporting both audit and, in the future, auto-remediation. This enables accurate, scalable compliance checks across large Linux fleets. Here you can read more about azure-osconfig. Dynamic rule evaluation The new compliance engine supports simple fact-checking operations, evaluation of logic operations on them (e.g., anyOf, allOf) and Lua based scripting, which allows to express complex checks required by the CIS Critical Security Controls - all evaluated natively without external scripts. Scalable architecture for large fleets When the assignment is created, the Azure control plane instructs the machine to pull the latest Policy package via the Machine Configuration agent. Azure-osconfig’s compliance engine is integrated as a light-weight library to the package and called by Machine Configuration agent for evaluation – which happens every 15-30minutes. This ensures near real-time compliance state without overwhelming resources and enables consistent evaluation across thousands of VMs and Azure Arc-enabled servers. Future-ready for remediation and enforcement While the Public Preview starts with audit-only mode, the roadmap includes per-rule remediation and enforcement using technologies like eBPF for kernel-level controls. This will allow proactive prevention of configuration drift and runtime hardening at scale. Please reach out if you interested in auto-remediation or enforcement. Extensibility beyond CIS Benchmarks The architecture was designed to support other security and compliance standards as well and isn’t limited to CIS Benchmarks. The compliance engine is modular, and we plan to extend the platform with STIG and other relevant industry benchmarks. This positions Azure as a platform for a place where you can manage your compliance from a single control-plane without duplicating efforts elsewhere. Collaboration with the CIS This milestone reflects a close collaboration between Microsoft and the CIS to bring industry-standard security guidance into Azure as a built-in capability. Our shared goal is to make cloud-native compliance practical and consistent, while giving customers the flexibility to meet their unique requirements. We are committed to continuously supporting new Benchmark releases, expanding coverage with new distributions and easing adoption through built-in workflows, such as moving from your current Benchmark version to a new version while preserving your custom configurations. Certification and trust We can proudly announce that azure-osconfig has met all the requirements and is officially certified by the CIS for Benchmark assessment, so you can trust compliance results as authoritative. Minor benchmark updates will be applied automatically, while major version will be released separately. We will include workflows to help migrate customizations seamlessly across versions. Key Highlights Built-in CIS Benchmarks for Azure Endorsed Linux distributions Full parity with official CIS Benchmarks content and certified by the CIS for Benchmark Assessment Flexible configuration: adjust parameters, define exceptions, tune severity Hybrid support: enforce the same baseline across Azure, on-prem, and multi-cloud with Azure Arc Reporting format in CIS tooling style Supported use cases Certified CIS Benchmarks for all Azure Endorsed Distros - Audit only (L1/L2 server profiles) Hybrid / On-premises and other cloud machines with Azure Arc for the supported distros Compliance as Code (example via Github -> Azure OIDC auth and API integration) Compatible with GuestConfig workbook What’s next? Our next mission is to bring the previously announced auto-remediation capability into this experience, expand the distribution coverage and elevate our workflows even further. We’re focused on empowering you to resolve issues while honoring the unique operational complexity of your environments. Stay tuned! Get Started Documentation link for this capability Enable CIS Benchmarks in Machine Configuration and select the “Official Center for Internet Security (CIS) Benchmarks for Linux Workloads” then select the distributions for your assignment, and customize as needed. In case if you want any additional distribution supported or have any feedback for azure-osconfig – please open an Azure support case or a Github issue here Relevant Ignite 2025 session: Hybrid workload compliance from policy to practice on Azure Connect with us at Ignite Meet the Linux team and stop by the Linux on Azure booth to see these innovations in action: Session Type Session Code Session Name Date/Time (PST) Theatre THR 712 Hybrid workload compliance from policy to practice on Azure Tue, Nov 18/ 3:15 PM – 3:45 PM Breakout BRK 143 Optimizing performance, deployments, and security for Linux on Azure Thu, Nov 20/ 1:00 PM – 1:45 PM Breakout BRK 144 Build, modernize, and secure AKS workloads with Azure Linux Wed, Nov 19/ 1:30 PM – 2:15 PM Breakout BRK 104 From VMs and containers to AI apps with Azure Red Hat OpenShift Thu, Nov 20/ 8:30 AM – 9:15 AM Theatre THR 701 From Container to Node: Building Minimal-CVE Solutions with Azure Linux Wed, Nov 19/ 3:30 PM – 4:00 PM Lab Lab 505 Fast track your Linux and PostgreSQL migration with Azure Migrate Tue, Nov 18/ 4:30 PM – 5:45 PM PST Wed, Nov 19/ 3:45 PM – 5:00 PM PST Thu, Nov 20/ 9:00 AM – 10:15 AM PST1.2KViews0likes0CommentsInnovations and Strengthening Platforms Reliability Through Open Source
The Linux Systems Group (LSG) at Microsoft is the team building OS innovations in Azure enabling secure and high-performance platforms that power millions of workloads worldwide. From providing the OS for Boost, optimizing Linux kernels for hyperscale environments or contributing to open-source projects like Rust-VMM and Cloud Hypervisor, LSG ensures customers get the best of Linux on Azure. Our work spans performance tuning, security hardening, and feature enablement for new silicon enablement and cutting-edge technologies, such as Confidential Computing, ARM64 and Nvidia Grace Blackwell all while strengthening the global open-source ecosystem. Our philosophy is simple: we develop in the open and upstream first, integrating improvements into our products after they’ve been accepted by the community. At Ignite we like to highlight a few open-source key contributions in 2025 that are the foundations for many product offerings and innovations you will see during the whole week. We helped bring seamless kernel update features (Kexec HandOver) to the Linux kernel, improved networking paths for AI platforms, strengthened container orchestration and security efforts, and shared engineering insights with global communities and conferences. This work reflects Microsoft’s long-standing commitment to open source, grounded in active upstream participation and close collaboration with partners across the ecosystem. Our engineers work side-by-side with maintainers, Linux distro partners, and silicon providers to ensure contributions land where they help the most, from kernel updates to improvements that support new silicon platforms. Linux Kernel Contributions Enabling Seamless Kernel Updates: Persistent uptime for critical services is a top priority. This year, Microsoft engineer Mike Rapoport successfully merged Kexec HandOver (KHO) into Linux 6.16 1 . KHO is a kernel mechanism that preserves memory state across a reboot (kexec), allowing systems to carry over important data when loading a new kernel. In practice, this means Microsoft can apply security patches or kernel updates to Azure platform and customers VMs without rebooting or with significantly reduced downtime. It’s a technical achievement with real impact: cloud providers and enterprises can update Linux on the fly, enhancing security and reliability for services that demand continuous availability. Optimizing Network Drivers for AI Scale: Massive AI models require massive bandwidth. Working closely with our partners deploying large AI workloads on Azure, LSG engineers delivered a breakthrough in Linux networking performance. LSG team rearchitected the receive path of the MANA network driver (used by our smart NICs) to eliminate wasted memory and enable recycling of buffers. 2x higher effective network throughput on 64 KB page systems 35% better memory efficiency for RX buffers 15% higher throughput and roughly half the memory use even on standard x86_64 VMs References MANA RX optimization patch: net: mana: Use page pool fragments for RX buffers LKML Linux Plumbers 2025 talk: Optimizing traffic receive (RX) path in Linux kernel MANA Driver for larger PAGE_SIZE systems Improving Reliability for Cloud Networking: In addition to raw performance, reliability got a boost. One critical fix addressed a race condition in the Hyper-V hv_netvsc driver that sometimes caused packet loss when a VM’s network channel initialized. By patching this upstream, we improved network stability for all Linux guests running on Hyper-V keeping customer VMs running smoothly during dynamic operations like scale-out or live migrations. Our engineers also upstreamed numerous improvements to Hyper-V device drivers (covering storage, memory, and general virtualization).We fixed interrupt handling bugs, eliminated outdated patches, and resolved issues affecting ARM64 architectures. Each of these fixes was contributed to the mainline kernel, ensuring that any Linux distribution running on Hyper-V or Azure benefits from the enhanced stability and performance. References Upstream fix: hv_netvsc race on early receive events: kernel.org commit referenced by Ubuntu bug Launchpad Ubuntu Azure backport write-up: Bug 2127705 – hv_netvsc: fix loss of early receive events from host during channel open Launchpad Older background on hv_netvsc packet-loss issues: kernel.org bug 81061 Strengthening Core Linux Infrastructure: Several of our contributions targeted fundamental kernel subsystems that all Linux users rely on. For example, we led significant enhancements to the Virtual File System (VFS) layer reworking how Linux handles process core dumps and expanding file management capabilities. These changes improve how Linux handles files and memory under the hood, benefiting scenarios from large-scale cloud storage to local development. We also continued upstream efforts to support advanced virtualization features.Our team is actively upstreaming the mshv_vtl driver (for managing secure partitions on Hyper-V) and improving Linux’s compatibility with nested virtualization on Azure’s Microsoft Hypervisor (MSHV). All this low-level work adds up to a more robust and feature-rich kernel for everyone. References Example VFS coredump work: split file coredumping into coredump_file() mshv_vtl driver patchset: Drivers: hv: Introduce new driver – mshv_vtl (v10) and v12 patch series on patchew Bolstering Linux Security in the Cloud: Security has been a major thread across our upstream contributions. One focus area is making container workloads easier to verify and control. Microsoft engineers proposed an approach for code integrity in containers built on containerd’s EROFS snapshotter, shared as an open RFC in the containerd project -GitHub. The idea is to use read-only images plus integrity metadata so that container file systems can be measured and checked against policy before they run. We also engaged deeply with industry partners on kernel vulnerability handling. Through the Cloud-LTS Linux CVE workgroup, cloud providers and vendors collaborate in the open on a shared analysis of Linux CVEs. The group maintains a public repository that records how each CVE affects various kernels and configurations, which helps reduce duplicated triage work and speeds up security responses. On the platform side, our engineers contributed fixes to the OP-TEE secure OS used in trusted execution and secure-boot scenarios, making sure that the cryptographic primitives required by Azure’s Linux boot flows behave correctly across supported devices. These changes help ensure that Linux verified boot chains remain reliable on Azure hardware. References containerd RFC: Code Integrity for OCI/containerd Containers using erofs-snapshotter GitHub Cloud-LTS public CVE analysis repo: cloud-lts/linux-cve-analysis Linux CVE workgroup session at Linux Plumbers 2025: Linux CVE workgroup OP-TEE project docs: OP-TEE documentation Developer Tools & Experience Smoother OS Management with Systemd: Ensuring Linux works seamlessly on Azure scale. The core init system systemd saw important improvements from our team this year. LSG contributed and merged upstream support for disk quota controls in systemd services. With new directives (like StateDirectoryQuota and CacheDirectoryQuota), administrators can easily enforce storage limits for service data, which is especially useful in scenarios like IoT devices with eMMC storage on Azure’s custom SoCs. In addition, Sea-Team added an auto-reload feature to systemd-journald, allowing log configuration changes to apply at runtime without restarting the logging service . These improvements, now part of upstream systemd, help Azure and other Linux environments perform updates or maintenance with minimal disruption to running services. These improvements help Azure and other environments roll out configuration updates with less impact on running workloads. References systemd quota directives: systemd.exec(5) – StateDirectoryQuota and related options systemd journald reload behavior: systemd-journald.service(8) Empowering Linux Quality at Scale: Running Linux on Azure at global scale requires extensive, repeatable testing. Microsoft continues to invest in LISA (Linux Integration Services Automation), an open-source framework that validates Linux kernels and distributions on Azure and other Hyper-V–based environments. Over the past year we expanded LISA with: New stress tests for rapid reboot sequences to catch elusive timing bugs Better failure diagnostics to make complex issues easier to root-cause Extended coverage for ARM64 scenarios and technologies like InfiniBand networking Integration of Azure VM SKU metadata and policy checks so that image validation can automatically confirm conformance to Azure requirements These changes help us qualify new kernels, distributions, and VM SKUs before they are shipped to customers. Because LISA is open source, partners and Linux vendors can run the same tests and share results, which raises quality across the ecosystem. References LISA GitHub repo: microsoft/lisa LISA documentation: Welcome to Linux Integration Services Automation LISA Documentation Community Engagement and Leadership Sharing Knowledge Globally: Open-source contribution is not just about code - it’s about people and knowledge exchange. Our team members took active roles in community events worldwide, reflecting Microsoft’s growing leadership in the Linux community. We were proud to be a Platinum Sponsor of the inaugural Open Source Summit India 2025 in Hyderabad, where LSG engineers served on the program committee and hosted technical sessions. At Linux Security Summit Europe 2025, Microsoft’s security experts shaped the agenda as program committee members, delivered talks (such as “The State of SELinux”), and even led panel discussions alongside colleagues from Intel, Arm, and others. And in Paris at Kernel Recipes 2025, our own SMEs shared kernel insights with fellow developers. By engaging in these events, Microsoft not only contributes code but also helps guide the conversation on the future of Linux. These relationships and public interactions build mutual trust and ensure that we remain closely aligned with community priorities. References Event: Open Source Summit India 2025 – Linux Foundation Paul Moore’s talk archive: LSS-EU 2025 Conference: Kernel Recipes 2025 and Kernel Recipes 2025 schedule Closing Thoughts Microsoft’s long-term commitment to open source remains strong, and the Linux Systems Group will continue contributing upstream, collaborating across the industry, and supporting the upstream communities that shape the technologies we rely on. Our work begins in upstream projects such as the Linux kernel, Kubernetes, and systemd, where improvements are shared openly before they reach Azure. The progress highlighted in this blog was made possible by the wider Linux community whose feedback, reviews, and shared ideas help refine every contribution. As we move ahead, we welcome maintainers, developers, and enterprise teams to engage with our projects, offer input, and collaborate with us. We will continue contributing code, sharing knowledge, and supporting the open-source technologies that power modern computing, working with the community to strengthen the foundation and shape a future that benefits everyone. References & Resources: Microsoft’s Open-Source Journey – Azure Blog https://techcommunity.microsoft.com/t5/linux-and-open-source-blog/linux-and-open-source-on-azure-quarterly-update-february-2025/ba-p/4382722 Cloud Hypervisor Project Rust-VMM Community Microsoft LISA (Linux Integration Services Automation) Repository Cloud-LTS Linux CVE Analysis Project678Views1like0CommentsLinux on Azure at Microsoft Ignite 2025: What’s New, What to Attend, and Where to Find Us
Microsoft Ignite 2025 is almost here, and we’re heading back to San Francisco from November 17-21 with a full digital experience for those joining online. Every year, Ignite brings together IT pros, developers, security teams, and technology leaders from around the world to explore the future of cloud, AI, and infrastructure. This year, Linux takes center stage in a big way. From new security innovations in Azure Linux to deeper AKS modernization capabilities and hands-on learning opportunities, Ignite 2025 is packed with content for anyone building, running, or securing Linux-based workloads in Azure. Below is your quick guide to the biggest Linux announcements and the must-see sessions. Major Linux Announcements at Ignite 2025 Public Preview: Built-in CIS Benchmarks for Azure Endorsed Linux Distributions CIS Benchmarks are now integrated directly into Azure Machine Configuration, giving you automated and customizable compliance monitoring across Azure, hybrid, and on-prem environments. This makes it easier to continuously govern your Linux estate at scale with no external tooling required. Public Preview: Azure Linux OS Guard Azure Linux OS Guard introduces a hardened, immutable Linux container host for AKS with FIPS mode enforced by default, a reduced attack surface, and tight AKS integration. It is ideal for highly regulated or sensitive workloads and brings stronger default security with less operational complexity. General Availability: Pod Sandboxing for AKS (Kata Containers) Pod Sandboxing with fully managed Kata Containers is now GA, delivering VM-level isolation for AKS workloads. This provides stronger separation of CPU, memory, and networking and is well-suited for multi-tenant applications or organizations with strict compliance boundaries. Linux Sessions at Ignite Whether you are optimizing performance, modernizing with containers, or exploring new security scenarios, there is something for every Linux practitioner. Breakout Sessions Session Code Session Name Date and Time (PST) BRK143 Optimizing performance, deployments, and security for Linux on Azure Thu Nov 20, 1:00 PM to 1:45 PM BRK144 Build, modernize, and secure AKS workloads with Azure Linux Wed Nov 19, 1:30 PM to 2:15 PM BRK104 From VMs and containers to AI apps with Azure Red Hat OpenShift Thu Nov 20, 8:30 AM to 9:15 AM BRK137 Nasdaq Boardvantage: AI-driven governance on PostgreSQL and AI Foundry Wed Nov 19, 11:30 AM to 12:15 PM Theatre Sessions Session Code Session Name Date and Time (PST) THR712 Hybrid workload compliance from policy to practice on Azure Tue Nov 18, 3:15 PM to 3:45 PM THR701 From Container to Node: Building Minimal-CVE Solutions with Azure Linux Wed Nov 19, 3:30 PM to 4:00 PM Hands-on Lab Lab 505: Fast track your Linux and PostgreSQL migration with Azure Migrate Tue Nov 18, 4:30 PM to 5:45 PM Wed Nov 19, 3:45 PM to 5:00 PM Thu Nov 20, 9:00 AM to 10:15 AM This interactive lab helps you assess, plan, and execute Linux and PostgreSQL migrations at scale using Azure Migrate’s end-to-end tooling. Meet the Linux on Azure Team at Ignite If you are attending in person, come say hello. Visit the Linux on Azure Expert Meetup stations inside the Microsoft Hub. You can ask questions directly to Microsoft’s Linux engineering and product experts, explore demos across Azure Linux, compliance, and migration, and get recommendations tailored to your workloads. We always love meeting customers and partners.434Views1like0CommentsDalec: Declarative Package and Container Builds
Build once, deploy everywhere. From a single YAML specification, Dalec produces native Linux packages (RPM, DEB) and container images - no Dockerfiles, no complex RPM spec or control files, just declarative configuration. Dalec, a Cloud Native Computing Foundation (CNCF) Sandbox project, is a Docker BuildKit frontend that enables users to build system packages and container images from declarative YAML specifications. As a BuildKit frontend, Dalec integrates directly into the Docker build process, requiring no additional tools beyond Docker itself.496Views0likes0Comments