observability
2 TopicsHow Microsoft Ensures the Quality of Linux VM Images and Platform Experiences on Azure?
In the continuously evolving landscape of cloud computing and AI, the quality and reliability of virtual machines (VMs) plays vital role for businesses running mission-critical workloads. With over 65% of Azure workloads running Linux our commitment to delivering high-quality Linux VM images and platforms remains unwavering. This involves overcoming unique challenges and implementing rigorous validation processes to ensure that every Linux VM image offered on Azure meets the high standards of quality and reliability. Ensuring the quality of Linux images and the overall platform experience on Azure involves addressing the challenges posed by a unique platform stack and the complexity of managing and validating multiple independent release cycles. High-quality Linux VMs are essential for ensuring consistent performance, minimizing downtime and regressions, and enhancing security by addressing vulnerabilities with timely updates. Figure 1: Complexity of Linux VMs in Azure VM Image Updates: Azure's Marketplace offers a diverse array of Linux distributions, each maintained by its respective publishers. These distributions release updates on their own schedules, independent of Azure's infrastructure updates. Package Updates: Within each Linux distribution, numerous packages are maintained and updated separately, adding another layer of complexity to the update and validation process. Extension and Agent Updates: Azure provides over 75+ guest VM extensions to enhance operating system capabilities, security, recovery etc. These extensions are updated independently, requiring careful validation to ensure compatibility and stability. Azure Infrastructure Updates: Azure regularly updates its underlying infrastructure, including components like Azure Boost, to improve reliability, performance, and security. VM SKUs and Sizes: Azure provides thousands of VM sizes with various combinations of CPU, memory, disk, and network configurations to meet diverse customer needs. Managing concurrent updates across all VMs poses significant QA challenges. To address this, Azure uses rigorous testing, gating and validation processes to ensure all components function reliably and meet customer expectations. Azure’s Approach to Overcoming Challenges To address these challenges, we have implemented a comprehensive validation strategy that involves testing at every stage of the image and kernel lifecycle. By adopting a shift-left approach, we execute Linux VM-specific test cases as early as possible. This strategy helps us catch failures close to the source of changes before they are deployed to Azure fleet. Our validation gates integrate with various entry points and provide coverage for a wide variety of scenarios on Azure. Upstream Kernel Validation: As a founding member of Kernel CI, Microsoft validates commits from Linux next and stable trees using Linux VMs in Azure and shares results with the community via Kernel CI DB. This enables us to detect regressions at early stages. Azure-Tuned Kernel Validation: Azure-Tuned Kernels provided by our endorsed distribution partners are thoroughly validated and signed off by Microsoft before it is released to the Azure fleet. Linux Guest Image Validation: The quality team works with endorsed distribution partners for major releases to conduct thorough validation. Each refreshed image, including those from third-party publishers, is validated and certified before being added to the marketplace. Automated pipelines are in place to validate the images once they are available in the Marketplace. Package Validation: Unattended Update: We conduct validation of packages updates with target distro to prevent regression and ensure that only tested snapshots are utilized for updating Linux VM in Azure. Guest Extension Validation: Every Azure-provided extensions undergoes Basic Validation Testing (BVT) across all images and kernel versions to ensure compatibility and functionality amidst any changes. Additionally, comprehensive release testing is conducted for major releases to maintain reliability and compatibility. New VM SKU Validation: Any new VM SKU undergoes validation to confirm it supports Linux before its release to the Azure fleet. This process includes functionality, performance and stress testing across various Linux distributions, and compatibility tests with existing Linux images in the fleet. Azure HostOS & Host Agent Validation: Updates to the Azure Host OS & Agents are thoroughly tested from the Linux guest OS perspective to confirm that changes in the Azure host environment do not result in regressions in compatibility, performance, or stability for Linux VMs. At any stage where regressions or bugs are identified, we block those releases to ensure they never reach customers. All issues are resolved and rigorously retested before images, kernels, or extension updates are made available. Through these robust validation processes, Azure ensures that Linux VMs consistently deliver to customer expectations, delivering a reliable, secure, and high-performance environment for mission-critical workloads. Validation Tools for VM Guest Images and Kernel To ensure the quality and reliability of Linux VM images and kernels on Azure, we leverage open-source kernel testing frameworks like LTP, kselftest, and fstest, along with extensive Azure-specific test cases available in LISA, to comprehensively validate all aspects of the platforms. LISA (Linux Integration Services Automation): Microsoft is committed to open source and that is no different with our testing framework LISA. LISA is an open-source core testing framework designed to meet all Linux validation needs. It includes over 400 tests covering performance, features and security, ensuring comprehensive validation of Linux images on Azure. By automating diverse test scenarios, LISA enables early detection and resolution of issues, enhancing the stability and performance of Linux VMs. Conclusion At Azure, Linux quality is a fundamental aspect of our commitment to delivering reliable VM images and platforms. Through comprehensive testing and strong collaboration with Linux distribution partners, we ensure quality and reliability of VMs while proactively identifying and resolving potential issues. This approach allows us to continually refine our processes and maintain the quality that customers expect from Azure. Quality is a core focus, and we remain dedicated to continuous improvement, delivering world-class Linux environments to businesses and customers. For us, quality is not just a priority—it’s our standard. Your feedback is invaluable, and we would greatly appreciate your insights.435Views0likes0CommentsEnhancing Observability with Inspektor Gadget
Thorough observability is essential to a pain free cloud experience. Azure provides many general-purpose observability tools, but you may want to create custom tooling . Inspektor Gadget is an open-source framework that makes customizable data collection easy. Microsoft recently contributed new features to Inspektor Gadget that further enhance its modular framework, making it even easier to meet your specific systems inspection needs. Of course, we also made it easy for Azure Kubernetes Service (AKS) users to use.953Views0likes0Comments