virtual machines
63 TopicsAnnouncing preview of new Azure Dasv7, Easv7, Fasv7-series VMs based on AMD EPYC™ ‘Turin’ processor
Today, Microsoft is announcing preview of the new Azure AMD-based Virtual Machines (VMs), powered by 5th Generation AMD EPYC™ (Turin) processors. The preview includes general purpose (Dasv7 & Dalsv7 series), memory-optimized (Easv7 series) and compute-optimized (Fasv7, Falsv7, Famsv7 series) VMs, available with and without local disks. These VMs are in preview in the following Azure regions: East US 2, North Europe, and West US 3. To request access to the preview, please fill out the Preview-Signup. The latest Azure AMD-based VMs deliver significant enhancements over the previous generation (v6) AMD-based VMs: improved CPU performance, greater scalability, and expanded configuration options to meet the needs of a wide range of workloads. Key improvements include: Up to 35% CPU performance improvement compared to equivalent sized (v6) AMD-based VMs. Significant performance gains on other workloads: Up to 25% for Java-based workloads Up to 65% for in-memory cache applications Up to 80% for crypto workloads Up to 130% for web server applications Maximum boost CPU frequency of 4.5 GHz, enabling faster operations for compute-intensive workloads. Expanded VM sizes: Dasv7-series, Dalsv7-series and Easv7-series now scale up to 160 vCPUs. Fasv7-series supports up to 80 vCPUs, with a new 1-core size. Increased memory capacity: Dasv7-series now offers up to 640 GiB of memory. Easv7-series scales up to 1280 GiB and is ideal for memory-intensive applications. Enhanced remote storage performance: VMs offer up to 20% higher IOPS and up to 50% greater throughput compared to similar sized previous generation (v6) VMs. New VM families introduced: Fadsv7, Faldsv7, and Famdsv7 are now available with local disk support. Expanded constrained-core offerings: New constrained-core sizes for Easv7 and Famsv7, available with and without local disks, helping to optimize licensing costs for core-based software licensing. These enhancements make these latest VMs a compelling choice for customers seeking high performance, cost efficiency, and workload flexibility on Azure. Additionally, these VMs leverage the latest Azure Boost technology enhancements to performance and security of these new VMs. The new VMs utilize the Microsoft Azure Network Adapter (MANA), a next-generation network interface that provides stable, forward-compatible drivers for Windows and Linux operating systems. These VMs also support the NVMe protocol for both local and remote disks. The 5th Generation AMD EPYC™ processor family, based on the newest ‘Zen 5’ core, provides enhanced capabilities for these new Azure AMD-based VM series such as AVX-512 with a full 512-bit data path for vector and floating-point operations, higher memory bandwidth, and improved instructions per clock compared to the previous generation. These updates provide increased throughput and ability to scale for compute-intensive tasks like AI and machine learning, scientific simulations, and financial analytics, among others. AMD Infinity Guard hardware-based security features, such as Transparent Secure Memory Encryption (TSME), continue in this generation to ensure sensitive information remains secure. These VMs support three memory (GiB)-to-vCPU ratios such as 2:1 (Dalsv7-series, Daldsv7-series, Falsv7-series and Faldsv7-series), 4:1 (Dasv7-series, Dadsv7-series, Fasv7-series and Fadsv7-series), and 8:1 (Easv7-series, Eadsv7-series, Famsv7-series and Famdsv7-series). The Dalsv7-series are ideal for workloads that require less RAM per vCPU that can reduce costs when running non-memory intensive applications, including web servers, video encoding, batch processing and more. The Dasv7-series VMs work well for many general computing workloads, such as e-commerce systems, web front ends, desktop virtualization solutions, customer relationship management applications, entry-level and mid-range databases, application servers, and more. The Easv7-series VMs are ideal for workloads such as memory-intensive enterprise applications, data warehousing, business intelligence, in-memory analytics, and financial transactions. The new Falsv7-series, Fasv7-series and Famsv7-series VM series do not have Simultaneous Multithreading (SMT), meaning a vCPU equals a full core, which makes these VMs well-suited for compute-intensive workloads needing the highest CPU performance, such as scientific simulations, financial modeling and risk analysis, gaming, and more. In addition to the standard sizes, the latest VM series are available in constrained-core sizes, with vCPU count constrained to one-half or one-quarter of the original VM size, giving you the flexibility to select the core and memory configuration that best fits your workloads. In addition to the new VM capabilities, the previously announced Azure Integrated HSM (Hardware Security Module), will be in Preview soon with the latest Azure AMD-based VMs. Azure Integrated HSM is an ephemeral HSM cache that enables secure key management within Azure virtual machines by ensuring that cryptographic keys remain protected inside a FIPS 140-3 Level 3-compliant boundary throughout their lifecycle. To explore this new feature, please sign up using the form provided below. These latest Azure AMD-based VMs will be charged during preview; pricing information will be shared with access to the VMs. Eligible new Azure customers can sign up for a free account and receive a $200 Azure credit. The new VMs support all remote disk types. To learn more about the disk types and their regional availability, please refer to Azure managed disk type. Disk storage is billed separately from virtual machines. You can learn more about these latest Azure AMD-based VMs by visiting the specification pages at Dasv7-series, Dadsv7-series, Dalsv7-series, Daldsv7-series, Easv7-series, Eadsv7-series, Fasv7-series, Fadsv7-series, Falsv7-series, Faldsv7-series, Famsv7-series and Famdsv7-series. The latest Azure AMD-based VMs provide options for your wide range of computing needs. Explore the new VMs today and discover how these VMs can enhance your workload performance and lower your costs. To request access to the preview, please fill out the Preview-Signup form. Have questions? Please reach us at Azure Support and our experts will be there to help you with your Azure journey.2.6KViews1like3CommentsAnnouncing Preview of New Azure Dnl/Dn/En v6 VMs powered by Intel 5th Gen processor & Azure Boost
We are thrilled to announce the public preview of Azure's first Network Optimized VMs powered by the latest 5th Gen Intel® Xeon® processor offering unparalleled performance and flexibility. The network optimized VMs will be relevant for workloads such as network virtual appliances, large-scale e-commerce applications, express route, application gateway, central DNS and monitoring servers, firewalls, media processing tasks that involve transferring large amounts of data quickly, and any workloads that require the ability to handle a high number of user connections and data transfers. Network Optimized VMs enhance networking performance by providing hardware acceleration for initial connection setup for certain traffic types, a task previously performed in software. These VMs will have lower end-to-end latency for initially establishing a connection or initial packet flow, as well as allow a VM to scale up the number of connections it manages more quickly. These Intel-based VMs come with three different memory-to-core ratios and offer options with and without local SSD across the VM families: Dnsv6, Dndsv6, Dnlsv6, Dnldsv6, Ensv6 and Endsv6 series. There are 55 VM sizes in total, ranging from 2 to 192 vCPU and up to 1.8TB of memory. The new Network Optimized VMs have higher network bandwidth per vCPU, numbers of vNICs per vCPU and connections per second. What’s New Compared to the current Intel Dl/D/Ev6 VMs, the network optimized VMs have: Up to 3x improvement in NW BW/vCPU than the current generation Intel Dl/D/Ev6 VMs 2x vNIC allocation on smaller vCPU sizes Up to 200 Gbps VM network bandwidth Up to 8x CPS connections enhancement across sizes Up to 192vCPU and >18GiB of memory Azure Boost which enables: Up to 400k IOPS and 12 GB/s remote storage throughput Up to 200 Gbps VM network bandwidth NVMe interface for local and remote disks Enhanced security through Total Memory Encryption (TME) technology Customers are excited about the new Azure Dnl/Dn/Ensv6 VMs “Palo Alto Networks, the global cybersecurity leader, is working with Microsoft to bring best-in-class Network Virtual Appliance performance capabilities to their customers. As the performance needs of customers on Azure continue to grow, innovations like Network Optimized VMs, Azure Boost, and Microsoft Azure Network Adapter (MANA) technology will help ensure that both our VM Series network virtual appliance and Cloud NGFW, our Azure native firewall service, can scale efficiently and cost-effectively,” said Rich Campagna, SVP Products, Palo Alto Networks. “We look forward to continuing our partnership with Microsoft to bring these innovations to life." General Purpose Workloads - Dnlsv6, Dnldsv6, Dnsv6, Dndsv6 The new Network Optimized Dnlsv6-series and Dnsv6 series VMs offer a balance of memory to CPU performance with increased scalability of up to 128 vCPUs and 512 GiB of RAM. Below is an overview of the specifications offered by the Dnlsv6-series and Dnsv6 series VMs. Series vCPU vNIC Network Bandwidth (Gbps) CPS Memory (GiB) Local Disk (GiB) Max Data Disks Dnlsv6-series 2 – 128 4 - 15 25.0 – 200.0 30K – 400K 4 – 256 n/a 8 – 64 Dnldsv6-series 2 – 128 4 - 15 25.0 – 200.0 30K – 400K 4 – 256 110 – 7,040 8 – 64 Dnsv6-series 2 – 128 4 - 15 25.0 – 200.0 30K – 400K 8 – 512 n/a 8 – 64 Dndsv6-series 2 – 128 4 - 15 25.0 – 200.0 30K – 400K 8 – 512 110 – 7,040 8 – 64 Memory Intensive Workloads - Ensv6 and Endsv6 The new Network Optimized Ensv6-series and Endsv6-series virtual machines are ideal for memory-intensive workloads offering up to 192vCPU and 1.8 TiB of RAM. Below is an overview of specifications offered by the Ensv6-series and Endsv6-series VMs. Series vCPU vNIC Network Bandwidth (Gbps) CPS Memory (GiB) Local Disk (GiB) Max Data Disks Ensv6-series 2 – 128 4 - 15 25.0 – 200.0 30K – 400K 16 – >1800 n/a 8 – 64 Endsv6-series 2 – 192 4 - 15 25.0 – 200.0 30K – 400K 16 – >1800 110 – 10,560 8 – 64 The Dnlv6, Dnv6, and Env6-series Azure Virtual Machines will offer options with and without local disk storage. These VMs are also compatible with remote persistent disk options including Premium SSD, Premium SSD v2, and Ultra Disk. Join the Preview Dnlv6, Dnv6, and Env6 series VMs are now available for preview in US East. VMs above 96 vCPUs and the VM series with local disk will be supported later in the preview. To request access to the preview, please fill out the survey form here. We look forward to hearing from you.2.2KViews1like2CommentsAnnouncing General Availability of Azure Da/Ea/Fasv7-series VMs based on AMD ‘Turin’ processors
Today, Microsoft is announcing the general availability of Azure’s new AMD based Virtual Machines (VMs) powered by 5th Gen AMD EPYC™ (Turin) processors. These VMs include general-purpose (Dasv7, Dalsv7), memory-optimized (Easv7), and compute-optimized (Fasv7, Falsv7, Famsv7) series, available with or without local disks. Azure’s latest AMD based VMs offer faster CPU performance, greater scalability, and flexible configurations, making them the ideal choice for high performance, cost efficiency, and diverse workloads. Key improvements include up to 35% better CPU performance and price-performance compared to equivalent v6 AMD-based VMs. Workload-specific gains are significant—up to 25% for Java applications, up to 65% for in-memory cache applications, up to 80% for crypto workloads, and up to 130% for web server applications just to name a few. Dalsv7-series VMs are cost-efficient for low memory workloads like web servers, video encoding, and batch processing. Dasv7-series suit general computing tasks such as e-commerce, web front ends, virtualization, customer relationship management applications (CRM), and entry to mid-range databases. Easv7-series target memory-heavy workloads like enterprise applications, data warehousing, business intelligence, in-memory analytics and more. Falsv7-, Fasv7-, and Famsv7 series deliver full-core performance without Simultaneous Multithreading (SMT) for compute-intensive tasks like scientific simulations, financial modeling, gaming and more. You can now choose constrained-core VM sizes — reducing the vCPU total by 50% or 75% while maintaining the other resources. Dasv7, Dalsv7, and Easv7 VMs now scale up to 160 vCPUs, an increase from 96 vCPUs in the previous generation. The Fasv7, Falsv7, and Famsv7 VMs, which do not include Simultaneous Multithreading (SMT), support up to 80 vCPUs—up from 64 vCPUs in the prior generation—and introduce a new 1-core option. These VMs offer a maximum boost CPU frequency of up to 4.5 GHz for faster compute-intensive operations. The new VMs deliver increased memory capacity —up to 640 GiB for Dasv7 and 1280 GiB for Easv7—making them ideal for memory-intensive workloads. They also support three memory (GiB)-to-vCPU ratios: 2:1 (Dalsv7-series, Daldsv7-series, Falsv7-series and Faldsv7-series), 4:1 (Dasv7-series, Dadsv7-series, Fasv7-series and Fadsv7-series), and 8:1 (Easv7-series, Eadsv7-series, Famsv7-series and Famdsv7-series). Remote storage performance is improved up to 20% higher IOPS, up to 50% greater throughput, while local storage performance offers up to 55% higher throughput. Network performance is also enhanced up to 75% compared to corresponding D-series and E-series v6 VMs. New VM series Fadsv7, Faldsv7, and Famdsv7, introduce local disk support. The new VMs leverage Azure Boost technology to enhance performance and security, utilize the Microsoft Azure Network Adapter (MANA), and support the NVMe protocol for both local and remote disks. The 5th Generation AMD EPYC™ processor family, based on the newest ‘Zen 5’ core, provides enhanced capabilities for these new Azure’s AMD based VM series such as AVX-512 with a full 512-bit data path for vector and floating-point operations, higher memory bandwidth, and improved instructions per clock compared to the previous generation. These updates provide the ability to handle compute-intensive tasks for AI and machine learning, scientific simulations, and financial analytics, among others. AMD Infinity Guard hardware-based security features, such as Transparent Secure Memory Encryption (TSME), continue in this generation to ensure sensitive information remains secure. These VMs are available in the following Azure regions: Australia East, Central US, Germany West Central, Japan East, North Europe, South Central US, Southeast Asia, UK South, West Europe, West US 2, and West US 3. The large 160 vCPU Easv7-series and Eadsv7-series sizes are available in North Europe, South Central US, West Europe, and West US 2. More regions are coming in 2026. Refer to Product Availability by Region for the latest information. Our customers have shared the benefits they’ve observed with these new VMs: “Elastic enables customers to drive innovation and cost-efficiency with our observability, security, and search solutions on Azure. In our testing, Azure’s latest Daldsv7 VMs provided up to 13% better indexing throughput compared to previous generation Daldsv6 VMs, and we are looking forward to the improved performance for Elasticsearch users deploying on Azure.” — Yuvraj Gupta, Director, Product Management, Elastic “The Easv7 series of Azure VMs offers a balanced mix of CPU, memory, storage, and network performance that suits the majority of Oracle Database configurations very well. The 80 Gbps network with the jumbo frame capability is especially helpful for efficient operation of FlashGrid Cluster with Oracle RAC on Azure VMs.” — Art Danielov, CEO, FlashGrid "Our analysis indicates that Azure’s new AMD based v7 series Virtual Machines demonstrate significantly higher performance compared to the v6 series, particularly in single-thread ratings. This advancement is highly beneficial, as several of our critical applications, such as ArcGIS Enterprise, are single-threaded and CPU-bound. Consequently, these faster v7 series VMs have resulted in improved performance with the same number of users, evidenced by lower server utilization and faster client-side response times." — Thomas Buchmann, Senior Cloud Architect, VertiGIS Here’s what our technology partners are saying: “AMD and Microsoft have built one of the industry’s most successful cloud partnerships, bringing over 60 VM series to market through years of deep engineering collaboration. With the new v7 Azure VMs powered by 5th Gen AMD EPYC processors, we’re setting a new benchmark for performance, efficiency, and scalability—giving customers the proven, leadership compute they expect from AMD in the world’s most demanding cloud environments.” — Steve Berg, Corporate Vice President and General Manager of the Server CPU Cloud Business Group at AMD “Our collaboration with Microsoft continues to empower developers and enterprises alike. The new AMD based v7-series VMs on Azure offer a powerful foundation for the full spectrum of modern workloads, from development to production AI/ML pipelines. We are excited to support this launch, ensuring every user gets a seamless experience on Ubuntu, with the enterprise security and long-term stability of Ubuntu Pro available for their most critical systems." — Jehudi Castro-Sierra, Public Cloud Alliances Director "The new Azure Da/Ea/Fa v7-series AMD Turin-based instances running SUSE Linux Enterprise Server provide a significant performance uplift during initial tests. They show an impressive 20%-40% increase with typical Linux kernel compilation tasks compared to the same instance sizes of the v6 series. This demonstrates the enhanced capabilities the v7 series brings to our joint customers seeking maximum efficiency and performance for their critical applications.” — Peter Schinagl, Sr. Technical Architect, SUSE You can learn more about these latest Azure AMD based VMs by visiting the specification pages at Dasv7-series, Dadsv7-series, Dalsv7-series, Daldsv7-series, Easv7-series, Eadsv7-series, Fasv7-series, Fadsv7-series, Falsv7-series, Faldsv7-series, Famsv7-series , Famdsv7-series, constrained-core sizes. For pricing details, visit the Azure Virtual Machines pricing page. These VMs support all remote disk types. See Azure managed disk type for additional details. Disk storage is billed separately. Azure Integrated HSM (Hardware Security Module) will continue to be in preview with these VMs. Azure Integrated HSM is an ephemeral HSM cache that enables secure key management within Azure VMs by ensuring that cryptographic keys remain protected inside a FIPS 140-3 Level 3-compliant boundary throughout their lifecycle. To explore this new feature, please sign up using the form. Have questions? Please reach us at Azure Support and our experts will be there to help you with your Azure journey.820Views3likes0CommentsYour guide to Azure Compute at Microsoft Ignite 2025
The countdown to Microsoft Ignite 2025 is almost over— Microsoft Ignite - November 18–21, 2025! Whether you’ll be joining us in person or tuning in virtually, this guide is your essential resource for everything Azure Compute. Explore the latest advancements, connect with product experts, and expand your cloud skills through curated sessions and interactive experiences. Attendees will have the opportunity to dive deep into new product capabilities and solutions, including ways to boost virtual machine performance, enhance resiliency, and optimize cloud operations. Be sure to add these sessions to your schedule for a personalized and can’t-miss Ignite experience. Bookmark this guide for quick access to all the latest Azure Compute news and updates throughout Ignite 2025! Featured sessions Tuesday BRK171: What's new and what's next in Azure IaaS Level: Intermediate 200 In this session, we’ll introduce the latest capabilities across compute, storage, and networking. Uncover the advancements in Azure IaaS, driving performance, resiliency, and cost efficiency. We will present how Azure’s global backbone, enhanced capabilities, and expanding portfolio can support mission-critical, cloud native and AI workloads —while built-in security and flexible tiering help right-size app deployments and accelerate modernization. Tuesday, November 18 | 2:30 PM-3:15 PM PST Wednesday BRK430: Inside Azure Innovations with Mark Russinovich Level: Advanced 300 Join Mark Russinovich, CTO and Technical Fellow of Microsoft Azure. Mark will take you on a tour of the latest innovations in Azure architecture and explain how Azure enables intelligent, modern, and innovative applications at scale in the cloud, on-premises, and on the edge. Featuring some of the latest Compute announcements with Azure Boost. Wednesday, November 19, 2:45 PM PST Other related IaaS sessions Use the following as a guide to build your session schedule with an emphasis on our Azure Compute topics. These sessions will be in person and recorded. Sessions Tuesday-Thursday will be live streamed. Thursday BRK176: Driving efficiency and cost optimization for Azure IaaS deployments Level: Intermediate 200 Control cloud spend without compromising performance. This session shows how Azure IaaS helps IT leaders optimize costs through flexible pricing, built-in tools, and smart resource planning. Learn how to align infrastructure choices with workload requirements, reduce TCO, and make informed decisions that support growth and innovation. You will gain a deeper understanding of how Azure delivers a comprehensive set of services, tools, and financial instruments to optimize your cloud costs at scale. Thursday, November 20 th , 9:45 AM PST BRK217: Resilience by design: Secure, scalable, AI-ready cloud with Azure Level: Advanced 300 Resiliency is foundational. Explore how resiliency on Azure enables secure, scalable, AI-ready cloud architectures. Learn to set resilience goals, simulate failures, and orchestrate recovery. See live demos and discover how shared responsibility empowers teams to deliver trusted, resilient outcomes. Thursday, November 20 th , 1:00 PM PST BRK178: Architecting for resiliency on Azure Infrastructure Level: Intermediate 200 Discover how to build resilient cloud solutions on Azure by leveraging availability zones, multi-region deployments, and fungible products. This session explores architectural patterns, platform capabilities, and best practices to ensure high availability, fault tolerance, and business continuity for mission-critical workloads in dynamic and distributed environments. Thursday, November 20, 1:00 PM PST BRK148: Architect resilient apps with Azure backup and reliability features Level: Advanced 300 Learn to use self-serve tools to strengthen zonal resiliency for critical workloads. Assess and validate resilience across VMs, DBs, and containers. Explore enhanced data and cyber resiliency with immutability and threat detection to guard against ransomware. Discover expanded workload coverage and real-time insights to proactively protect your applications and infrastructure. Thursday, November 20, 3:30 PM PST Friday BRK146: Resiliency and recovery with Azure Backup and Site Recovery Level: Advanced 300 This session will show how to secure, detect threats, and quickly recover critical workloads across Azure environments using advanced backup and disaster recovery solutions. It covers modern techniques like threat-aware backups, container protection, and seamless disaster recovery to help meet compliance and recovery objectives. Friday, November 21, 9:00 AM PST BRK149: Unlock cloud-scale observability and optimization with Azure Level: Advanced 300 In this session, we'll deep dive into how Azure Monitor delivers end-to-end observability across your cloud and hybrid environments, helping you detect issues early and reduce mean time to recovery. We'll also share how new Copilot in Azure agents can extend this visibility into actionable cost and carbon efficiency insights—helping you identify optimization opportunities, validating recommendations, and streamlining resource performance for business impact. Friday, November 21 st , 10:15 AM PST BRK173: Azure IaaS best practices to enhance performance and scale Level: Advanced 300 Azure IaaS can deliver excellent performance and scalability across a broad range of workloads. With high-throughput storage, low-latency networking, and intelligent auto-scaling, Azure supports demanding apps with precision. Learn how to optimize compute, storage, and network resources to meet performance goals, reduce costs, and scale confidently across global regions. Dive into the latest capabilities Azure Boost, Compute Fleet, Azure Virtual Machines, Azure Storage and Networking offer. Friday, November 21, 10:15 AM PST BRK172: Powering modern cloud workloads with Azure Compute Level: Advanced 300 Uncover new VM offerings announcements and explore innovations like Azure Boost. Dive into the latest compute innovation at the core of Azure IaaS. Whether you're running mission-critical enterprise apps or scaling cloud-native services, discover how these innovations are unlocking new value for customers and get a preview of what’s coming next. Friday, November 21, 11:30 AM PST BRK168: Azure IaaS platform security deep dive Level: Advanced 300 As organizations accelerate their cloud adoption, robust security for your Infrastructure as a Service platform is more critical than ever. This session will provide a comprehensive exploration of Azure’s security architecture, best practices, and innovations across four pillars: foundational security, compute security, network security, and storage security. Attendees will gain actionable insights to strengthen their cloud posture, ensure compliance, and protect sensitive workloads. Friday, November 21 st ,11:30 AM PST Upskill yourself with hands on labs This section explains that live demos and hands-on labs are exclusively available to those who attend in person, providing them with a direct, firsthand experience. Tuesday LAB500: Attain unified observability and optimization in Azure Level: Intermediate 200 Get an AI-powered view of your Azure workload health and performance while uncovering cost and carbon savings. In this lab, use AI to investigate anomalies, correlate telemetry, and drive optimization. Apply FinOps and sustainability insights, align health with SLI/SLO targets, and improve monitoring posture for lasting efficiency. Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees. Tuesday November 18 th , 2:45 PM PST LAB520: Start, Get and Stay Resilient with Azure Level: Intermediate 200 Understand the Start, Get, and Stay Resilient journey. Get equipped with tools & insights to architect mission critical applications with Azure’s Resiliency and Configuration experiences. Assess your resiliency posture, apply recommendations, validate your posture and orchestrate recovery. With the Essentials Machine Management bundle from Azure, manage and maintain the state of your resources, enforce configurations across devices and ensure resilience is not a one-time goal but an ongoing state. Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees. Tuesday, November 18 th , 4:30 PM PST888Views2likes1CommentKernel Dump based Online Repair
Introduction In the ever-evolving landscape of cloud computing, reliability remains paramount. As workloads scale and businesses depend on uninterrupted service, Azure continues to invest in technologies that enhance system resilience and minimizes customer impact in cases of failures. Azure Compute infrastructure operates at an unmatched scale, with certain Availability Zones (AZs) hosting nearly a million Azure Virtual Machines (Azure VMs) that run customer workloads. These Azure VMs depend on a sophisticated ecosystem of physical machines, networking infrastructure, storage systems, and other essential components. When failures occur at any of these layers—whether from hardware malfunctions, kernel issues, or network disruptions—customers may experience service interruptions. To address these challenges, Azure Compute Repair Platform plays a vital role in identifying, diagnosing, and applying mitigation strategies to resolve issues as quickly as possible. To further improve our ability to diagnose and resolve failures swiftly and accurately, we present a novel approach —a real-time kernel dump analysis technology aimed at identifying the root cause of issues and facilitating precise, data-driven repairs. This is an addition to the gamut of detection and mitigation strategies we already leverage. This capability is generally available in all Azure regions and helps our customers out, including our most critical customers. This project would not have been possible without the invaluable support and contributions of Binit Mishra, Dhruv Joshi, Abhay Ketkar, Gaurav Jagtiani, Mukhtar Ahmed, Siamak Ahari, Rajeev Acharya, Deepak Venkatesh, Abhinav Dua, Alvina Putri, Emma Montalvo, and Chantale Ninah — my heartfelt thanks to each of you. Real-Time Failure Diagnosis and Repair We have developed a novel approach to diagnosing and mitigating failures in Azure Compute infrastructure by understanding the state of the kernel on the Azure Host Machine through real-time collection and analysis of Live Kernel Dumps (LKD). This enables us to pinpoint the exact issue with the kernel and use that insight for precise repair actions, rather than applying a broad set of mitigation strategies. By reducing trial-and-error repair attempts, we significantly minimize downtime and accelerate issue resolution. Kernel dumps can help detect critical issues such as kernel panics, memory leaks, and driver failures. Kernel panics occur when the system encounters a fatal error, causing the kernel to stop functioning. Memory leaks, where memory is not properly released, can lead to system instability over time. Driver failures, often caused by faulty or incompatible drivers, can also be identified through kernel dump analysis. Importantly, it is the Repair Platform that triggers LKD collection and further consumes the LKD analysis to make informed decisions. By incorporating liver kernel dump analysis into our mitigation workflows, we enhance Azure’s ability to quickly diagnose, categorize, and resolve infrastructure issues, ultimately reducing system downtime and improving overall performance. Architecture How does this system work: Dump Collection: When an issue is detected, the Repair Platform triggers the collection of a Live Kernel Dump (LKD) on the machine hosting the affected Azure VM. Dump Upload: An agent running on the machine monitors a designated storage location for newly generated dumps. When a dump is detected, the agent uploads it from the Azure Host Machine to an online Analysis Service. Failure Classification: The Analysis Service processes the uploaded Live Kernel Dump (LKD), diagnoses the root cause of the failure, and categorizes it accordingly—for example, identifying a networking switch in a hung state. Persistence: The Analysis Service generates a detailed failure message and stores it in an Azure Table for tracking and retrieval. Automated Repair Decisions: The Repair Platform continuously monitors the Azure Table for failure messages. Once a failure is recorded, it retrieves the data and makes an informed repair decision. Impact By leveraging this approach, Azure Compute Repair Platform achieves both a better repair strategy and significant downtime savings. (A) Better Repair Strategy By precisely identifying failures, the Repair Platform can classify issues accurately and apply the most effective resolution method, minimizing unnecessary disruptions and enhancing long-term infrastructure stability. For instance, in the case of a VM Switch Hung issue, the Repair Platform attempts to mitigate the problem on the affected Azure Host Machine. However, if unsuccessful, it migrates the customer's workload to a more stable machine and initiates aggressive repairs on the faulty Azure Host Machine. While this restores service, it does not address the underlying cause, leaving the Azure Host Machine vulnerable to repeated VM Switch Hung failures. Enabling real-time failure classification, the Repair Platform could instead hold a subset of affected Azure Host Machines in a restricted state, preventing new Azure VMs from being assigned to them. This approach allows Azure’s hardware and network partners to run diagnostics, gain deeper insights into the failure, and implement targeted fixes. As a result, Azure has reduced recurring failures, minimized customer impact, and improved overall infrastructure reliability. While the VM Switch Hung issue serves as an example, this data-driven repair strategy can be extended to various failure scenarios, enabling faster recovery, fewer disruptions, and a more resilient platform. (B) Downtime Reduction The longer it takes to resolve an issue, the longer a customer workload may experience interruptions. As a result, downtime reduction is one of the key metrics we prioritize. We significantly reduce time to resolution by providing an early signal that pinpoints the exact issue. This allows the Repair Platform to perform targeted repairs rather than relying on time-consuming, broad mitigation strategies. Sample scenario: When a customer faces issues stopping or destroying an Azure VM, and the problem is severe enough that all repair attempts fail, the only option may be to migrate the customer's workload to a different Azure Host Machine. Today, this process can take up to 26 minutes before the decision to move the customer workload is reached. However, with this new approach, we are optimizing to detect the failure and surface the issue within 3 minutes, enabling a decision much earlier and reducing customer downtime by 23 minutes—a significant improvement in downtime reduction and customer resolution. Conclusion Online kernel dump analysis for machine issue resolution marks a significant advancement in Azure’s commitment to reliability, bringing us closer to a future where failures are not just detected but proactively mitigated in real time. By enabling real-time diagnostics and automated repair strategies, this approach is redefining Compute reliability—drastically reducing mitigation times, enhancing repair accuracy, and ensuring customers experience seamless service continuity. As we continue refining it, our focus remains on expanding its capabilities, enhancing kernel analysis, reducing analysis time, and strengthening the entire pipeline for greater efficiency and resilience. Stay tuned for further updates as we push the boundaries of intelligent cloud reliability.2.5KViews0likes0CommentsEnhancing Resiliency in Azure Compute Gallery
In today's cloud-driven world, ensuring the resiliency and recoverability of critical resources is top of mind for organizations of all sizes. Azure Compute Gallery (ACG) continues to evolve, introducing robust features that safeguard your virtual machine (VM) images and application artifacts. In this blog post, we'll explore two key resiliency innovations: the new Soft Delete feature (currently in preview) and Zonal Redundant Storage (ZRS) as the default storage type for image versions. Together, these features significantly reduce the risk of data loss and improve business continuity for Azure users. The Soft Delete Feature in Preview: A safety net for your Images Many Azure customers have struggled with accidental deletion of VM images, which disrupts workflows and causes data loss without any way to recover, often requiring users to rebuild images from scratch. Previously, removing an image from the Azure Compute Gallery was permanent and resulted in customer dissatisfaction due to service disruption and lengthy process of recreating the image. Now, with Soft Delete (currently available in public preview), Azure introduces a safeguard that makes it easy to recover deleted images within a specified retention period. How Soft Delete Works When Soft Delete is enabled on a gallery, deleting an image doesn't immediately remove it from the system. Instead, the image enters a "soft-deleted" state, where it remains recoverable for up to 7 days. During this grace period, administrators can review and restore images that may have been deleted by mistake, preventing permanent loss. After the retention period expires, the platform automatically performs a hard (permanent) delete, at which point recovery is no longer possible. Key Capabilities and User Experience Recovery period: Images are retained for a default period of 7 days, giving users time to identify and restore any resources deleted in error. Seamless Recovery: Recover soft-deleted images directly from the Azure Portal or via REST API, making it easy to integrate with automation and CI/CD pipelines. Role-Based Access: Only owners or users with the Compute Gallery Sharing Admin role at the subscription or gallery level can manage soft-deleted images, ensuring tight control over recovery and deletion operations. No Additional Cost: The Soft Delete feature is provided at no extra charge. After deletion, only one replica per region is retained, and standard storage charges apply until the image is permanently deleted. Comprehensive Support: Soft Delete is available for Private, Direct Shared, and Community Galleries. New and existing galleries can be configured to support the feature. To enable Soft Delete, you can update your gallery settings via the Azure Portal or use the Azure CLI. Once enabled, the "delete" operation will soft-delete images, and you can view, list, restore, or permanently remove these images as needed. Learn more about Soft Delete feature at https://aka.ms/sigsoftdelete Zonal Redundant Storage (ZRS) by Default Another major resiliency enhancement in Azure Compute Gallery is the default use of Zonal Redundant Storage (ZRS) for image versions. ZRS replicates your images across multiple availability zones within a region, ensuring that your resources remain available even if a zone experiences an outage. By defaulting to ZRS, Azure raises the baseline for image durability and access, reducing the risk of disruptions due to zone-level failures. Automatic Redundancy: All new image versions are stored using ZRS by default, without requiring manual configuration. High Availability: Your VM images are protected against the failure of any single availability zone within the region. Simplified Management: Users benefit from resilient storage without the need to explicitly set up or manage storage account redundancy settings. Default ZRS capability starts with API version 2025-03-03; Portal/SDK support will be added later. Why These Features Matter The combination of Soft Delete and ZRS by default provides Azure customers with enhanced operational reliability and data protection. Whether overseeing a suite of VM images for development and testing purposes or coordinating production deployments across multiple teams, these features offer the following benefits: Mitigate operational risks associated with accidental deletions or regional outages. Minimize downtime and reduce manual recovery processes. Promote compliance and security through advanced access controls and transparent recovery procedures. To evaluate the Soft Delete feature, you may register for the preview and activate it on your galleries through the Azure Portal or RestAPI. Please note that, during its preview phase, this capability is recommended for assessment and testing rather than for production environments. ZRS is already available out-of-the-box, delivering image availability starting with API version 2025-03-03. For comprehensive details and step-by-step guidance on enabling and utilizing Soft Delete, please review the public specification document at https://aka.ms/sigsoftdelete Conclusion Azure Compute Gallery continues to push the envelope on resource resiliency. With Soft Delete (preview) offering a reliable recovery mechanism for deleted images, and ZRS by default protecting your assets against zonal failures, Azure empowers you to build and manage VM deployments with peace of mind. Stay tuned for future updates as these features evolve toward general availability.374Views1like0CommentsRevolutionizing Reliability: Introducing the Azure Failure Prediction and Detection (AFPD) system
As part of the journey to consistently improve Azure reliability and platform stability, we launched Azure Failure Prediction & Detection (AFPD), Azure’s premiere shift-left reliability solution. AFPD became operational in 2024, unifying failure prediction, detection, mitigation, and remediation services into a single end-to-end system with the goal of preventing Azure Compute customer workload interruptions and repairing nodes at scale. AFPD builds upon previous reliability solutions such as Project Narya, adding new best practices and fleet health management capabilities on top of pre-existing failure prediction and mitigation capabilities. The end-to-end AFPD system has proven to further reduce the overall number of reboots by over 36% and allows for a proactive approach to maintaining the cloud. This system operates for all Azure Compute General Purpose, Specialized Compute, High Performance Computing (HPC)/Artificial Intelligence (AI) workloads and select Azure Storage scenarios. For a deeper dive, you can read the whitepaper here, which won Best Paper Award at the 2025 IEEE Cloud Summit!1.7KViews8likes0CommentsIncrease security for Azure VMs: Trusted launch in-place upgrade support now available!
Introduction We’re excited to announce that Trusted Launch in-place upgrade support is now available to help you strengthen the security of your Azure virtual machines and scale set resources—without the need for complex migrations or rebuilds. Generally available for existing Gen1 & Gen2 virtual machines (VMs), and for Gen1 & Gen2 VM Uniform scale sets In private preview for Gen1 & Gen2 VM Flex scale sets Trusted launch is strongly recommended by Microsoft as the secure path from the Unified Extensible Firmware Interface (UEFI) through the Windows kernel Trusted Boot sequence. It helps prevent bootkit malware in the boot process, ensuring your workloads start in a verified and uncompromised state. Disabling Trusted launch puts your infrastructure at risk of bootkit infections, making this upgrade not just beneficial—but essential. By leveraging in-place upgrade support, you can seamlessly enhance foundational security for your existing virtual machine and scale set resources with Trusted launch at no additional cost, ensuring protection against modern threats and readiness for future compliance needs. What is Trusted launch? Trusted Launch is a built-in Azure virtual machine and scale set capability that helps protect your virtual machines from advanced threats—right from the moment they start. It adds a layer of foundational security to your VMs by enabling: Secure Boot: Prevents unauthorized code like rootkits and bootkits from loading during startup. vTPM: Acts as a secure vault for encryption keys and boot measurements, enabling attestation of your VM’s integrity. Boot Integrity Monitoring: Guest attestation extension continuously checks that your VM boots into a trusted, uncompromised state. Trusted Launch enhances the security posture of a VM through cryptographic verification and ensures the VM boots to a desired secure state protecting it from attacks that modify operating system processes. This maintains the trust of the guest OS and adds defense-in-depth. It is essential for maintaining compliance with various regulatory requirements, including Azure Security Benchmark, FedRAMP, Cloud Computing SRG (STIG), HIPAA, PCI-DSS, and others. It’s a simple yet powerful way to enhance foundational security of your virtual machine and scale set resources—without changing how you deploy or manage your workloads. Upgrade security of existing VMs and Scale sets to Trusted launch Following table summarizes high level steps associated with Trusted launch upgrade of Gen1 and Gen2 VMs and Scale set including link to public documentation which contains detailed steps. Resource type High level steps Gen1 virtual machine Learn more: Upgrade existing Azure Gen1 VMs to Trusted launch Gen2 virtual machine Learn more: Enable Trusted launch on existing Azure Gen2 VMs Virtual machine scale set Learn more: Upgrade existing Azure Scale set to Trusted launch Conclusion We take the security of our cloud computing platform as priority, and this change is an important step towards ensuring that Azure VMs provide more secure environment for your applications and services. Upgrading your Azure VMs and Scale Sets to Trusted Launch is a simple yet powerful way to strengthen foundational infrastructure security—without disrupting your existing workloads. With in-place upgrade support now available, you can take advantage of foundational security features like Secure Boot and vTPM to protect against modern threats and meet compliance requirements—all at no additional cost. Next steps Whether you're running Gen1 (BIOS) or Gen2 (UEFI) VM resources, don’t wait to secure your infrastructure—upgrade your VMs and Scale-sets to Trusted Launch today. This upgrade can be completed with minimal effort and downtime. Upgrade your Gen1 VMs to Trusted Launch using generally available upgrade support with step-by-step guide. Upgrade your Gen2 VMs to Trusted Launch using generally available upgrade support with step-by-step guide. Upgrade your Gen1 or Gen2 Uniform Scale sets to Trusted launch using generally available upgrade support with step-by-step guide. For Gen1 or Gen2 Flex Scale sets, private preview access is now open – sign-up for preview and get early access to Trusted launch upgrade experience for Flex scale sets. Trusted launch is your first line of defence against bootkit malware, and upgrading ensures your VMs meet modern security and compliance standards. Act now to protect your workloads and make them resilient against future threats. Frequently Asked Questions Are all upgrade features generally available? Following table summarizes the status of each upgrade feature: Trusted launch upgrade support for resource type Status Learn more Gen1 virtual machine Generally available Upgrade existing Azure Gen1 VMs to Trusted launch Gen2-only virtual machine Generally available Enable Trusted launch on existing Azure Gen2 VMs Scale set (Uniform) Generally available Upgrade existing Azure Scale set to Trusted launch Scale set (Flex) Private preview Sign-up for preview at Enable Trusted Launch on Existing Flex Scale Sets (PREVIEW) What are the pre-requisites to enable Trusted launch? Before planning to upgrade of existing VM or Scale set to Trusted launch, ensure that: VM size of given VM or Scale set is supported for Trusted launch. Change the VM size to Trusted launch supported VM size if needed to support the upgrade. VM or Scale set is running operating system supported with Trusted launch. For Scale set resources, you can change the OS image reference to supported OS version along with Trusted launch upgrade. VM or Scale set is not dependent on Azure features currently not supported with Trusted launch. Azure Backup, if enabled for VMs, should be configured with the Enhanced Backup policy. Existing Azure VM backup can be migrated from the Standard to the Enhanced policy. Azure site recovery (ASR), if enabled for VMs, should be disabled prior to upgrade. You can re-enable ASR replication post completion of Trusted launch upgrade. What are the best practices to consider before upgrade? We recommend following certain best practices before you execute the upgrade to Trusted launch for VMs and Scale set hosting production workloads: Review the step-by-step guide published for Gen1 and Gen2 VM and Scale set including known limitations, issues, roll-back steps. Enable Trusted launch on a test VM or Scale set and determine if any changes are required to meet the prerequisites. Create restore points for VMs associated with production workloads before you enable the Trusted launch security type. You can use the restore points to re-create the disks and VM with the previous well-known state. Can I enable Trusted launch without changing OS from Gen1 (BIOS) to Gen2 (UEFI)? Trusted launch security capabilities (Secure boot, vTPM) can be enabled for Gen2 UEFI-based operating system only, it cannot be enabled for Gen1 BIOS-based operating system. How will my new or other VMs or Scale set be affected? The upgrade is executed on specific VM or Scale set resource only. It does not impact new or other existing Azure VMs, Scale set clusters already running in your environment. Can I roll back Trusted launch upgrade to Gen1 (BIOS) configuration? For virtual machines, you can roll back the Trusted launch upgrade to Gen2 VM without Trusted launch. You cannot in-place roll back from Trusted launch to Gen1 VM. For restoring Gen1 configuration, you’ll need to restore entire VM and disks from the backup or restore point of VM taken prior to upgrade. For scale sets, you can roll back the changes made to previous known good configuration including Gen1 configuration.754Views2likes0Comments