virtual machines
58 TopicsPerformance at Scale: The Role of Interconnects in Azure HPC & AI Infrastructure
Microsoft Azure’s high-performance computing (HPC) & AI infrastructure is designed from the ground up to support the world’s most demanding workloads. High-performance AI workloads are bandwidth-hungry and latency-sensitive. As models scale in size and complexity, the efficiency of the interconnect fabric—how CPUs, GPUs, and storage communicate—becomes a critical factor in overall system performance. Even with the fastest GPUs, poor interconnect design can lead to bottlenecks, underutilized hardware, and extended time-to-results. In this blog post, we will highlight one of the key enabling features for running large-scale distributed workloads on Azure: a highly tuned HPC-class interconnect. Azure has invested years of system-level engineering of the InfiniBand interconnect, into ready-to-use configurations for customers available on Azure’s HB-series and N-series virtual machine (VMs).Announcing Azure HBv5 Virtual Machines: A Breakthrough in Memory Bandwidth for HPC
Discover the new Azure HBv5 Virtual Machines, unveiled at Microsoft Ignite, designed for high-performance computing applications. With up to 7 TB/s of memory bandwidth and custom 4th Generation EPYC processors, these VMs are optimized for the most memory-intensive HPC workloads. Sign up for the preview starting in the first half of 2025 and see them in action at Supercomputing 2024 in AtlantaIntroducing NVads V710 v5 series VMs
Cost-optimized AI inference, virtual workstations, and cloud gaming. AI inferencing and graphics-intensive applications continue to demand cost-effective, low power, high performance GPUs with more GPU memory and faster CPUs. Today we are thrilled to announce the General Availability of NVads V710 v5-series virtual machines (VMs) to expand our NV VM lineup to meet these very needs. As we mentioned in our previous preview announcement, customers running small-to-medium AI/ML inferencing workloads, Virtual Desktop Infrastructure (VDI), visualization, and cloud gaming workloads need NVads V710 v5-series VMs. Each VM is powered by an AMD Radeon™ Pro V710 GPU with up to 28 GB of GPU memory, which easily serves popular small open-source language models and handles the most demanding visualization scenarios. On top of this, the vCPU cores are backed by high-frequency 4th Generation AMD EPYC™ CPUs (3.9 GHz base and 4.3 GHz max frequency) for compute-intensive workloads that demand both CPU and GPU performance. Right size the VM for your workload needs. With NVads V710 v5 VMs, you only pay for the GPU/CPU compute you need. GPU partitioning capabilities enable customers to allocate fractions of the GPU – as small as 1/6 th of a V710 GPU – according to their workload requirements. This flexibility is ideal for customers that need to support a variety of inferencing and graphical workloads efficiently without requiring a full GPU for each application. The series provides several options ranging from 1/6 of a GPU with 4 GB of memory, perfect for lightweight virtual desktop experiences, to a full V710 GPU with a massive 28 GB for graphics intensive engineering applications or AI. Out of the box performance with ROCm and Radeon PRO Graphics. AMD Radeon PRO Graphics provides a seamless and reliable experience for visualization-focused workloads. The GPU and drivers are optimized or certified for all the major ISV solutions from vendors such as Adobe and Autodesk. They also support the latest ROCm releases and are designed to seamlessly integrate with popular machine learning frameworks like PyTorch, Triton, ONNX, and vLLM to serve small to medium language models. Upgrade to get up to 2.5x boost. Our own internal benchmarks of popular VDI, rendering, and visualization tests show NVads V710 v5 VMs are up to 2.5x faster when compared to NV v4 VMs. This means you can get more work done, faster, and have an overall better experience. Customer Momentum At Azure, we work closely with our partners and customers so they can take full advantage of these new VMs and accelerate their applications. Listen to what our partners at Dizzion, Cognata, and NAIO had to say. "The new Azure NVads V710 instances, powered by AMD Radeon Pro V710 GPUs, offer exceptional performance and flexibility at competitive prices. Dizzion Desktop as a Service customers delivering CAD, BIM, edge AI, and other high-performance workloads have eagerly awaited this addition to the market.” – Ruben Spruijt, Field CTO, Dizzion “In our experience, the V710 delivers excellent performance across both CPU and GPU workloads, making it a highly capable platform for a wide range of use cases. It offers a robust and reliable software stack, particularly well-suited for OpenGL and machine learning applications.” – Danny Atsmon, CEO, Cognata “We’ve tested the V710 thoroughly across a range of AI workloads, and the performance has really impressed us. It’s fast, stable, and scales well across different scenarios. It’s become a reliable, cost-effective part of our stack, and we’ll keep building on top of it as we expand our projects.” – Dr.-Ing. Timo Sämann, Chief AI Scientist, NAIO Product Details vCPUs 4th Generation AMD EPYC™ CPU. Configurations from 4 to 28 vCPUs (3.95 GHz base, 4.3 GHz max). Memory 16 GB to 160 GB GPU AMD Radeon PRO V710 GPU with 28 GB GDDR6 memory. 1/6, 1/3, 1/2, or full GPU. Storage Up to 1 TB temporary disk Networking Up to 80 Gbps Azure Accelerated Networking NVads V710 v5 VMs, now available in 5 Azure regions. We are happy to announce NVads V710 v5 VMs are now available in East US, North Central US, South Central US, West US, and West Europe. To learn more about our VMs and how to get started, please visit our documentation page.Fusing Simulation with Deep Learning: Technical Insights from the Frontlines on Azure
In the evolving landscape of engineering, manufacturing, and scientific research, two worlds are colliding to create groundbreaking opportunities: physics-based simulation and data-driven deep learning. Traditionally, simulations have driven innovation in areas such as structural mechanics, fluid dynamics, and electromagnetics. Meanwhile, deep learning has revolutionized fields like computer vision and natural language processing. The intersection of these two disciplines—Simulation and Deep Learning—represents a powerful new paradigm, one that Azure is uniquely positioned to support at scale.Achieving Optimal Performance for DeepSeek Expert Parallelism (DeepEP) on Azure
DeepEP DeepEP is a high-performance communication library developed by DeepSeek AI to optimize Mixture-of-Experts (MoE) and expert parallelism (EP) in large-scale AI models. It provides high-throughput, low-latency all-to-all GPU kernels for MoE dispatch and combine operations, which are critical for efficiently routing data between expert modules during training and inference. DeepEP includes specialized kernels for asymmetric-domain bandwidth forwarding—such as transfers between NVLink and RDMA/InfiniBand domains—and requires only 20 Streaming Multiprocessors (SMs) to saturate both. Tokens are first transmitted via IB to GPUs with matching in-node indices, then forwarded via NVLink to target experts, fully overlapping both communication paths. It leverages NVSHMEM for efficient one-sided communication, enabling low-latency data movement without host involvement. With its network-aware design and deep integration with MoE algorithms, DeepEP is a foundational component for scalable, high-performance expert model training and inference. The Importance of NUMA Affinity NUMA affinity refers to how well a process or thread is aligned with the memory and hardware resources—such as CPUs, GPUs, or NICs—within a Non-Uniform Memory Access (NUMA) system. In a NUMA architecture, the system’s memory is divided among multiple nodes (often corresponding to CPU sockets), and each node can access its local memory faster than the memory attached to other nodes. NUMA affinity is about ensuring that a process runs on a CPU (or accesses a device) that is physically close to the memory or network resources it needs, minimizing latency and maximizing bandwidth. NUMA affinity is particularly critical in multi-GPU and multi-node systems where GPUs communicate with each other or with the network through NICs. If a GPU is not NUMA-affined to the NIC it uses, data may be routed across additional interconnects like PCIe switches or CPU sockets, increasing communication latency and reducing throughput. By maintaining proper NUMA affinity—ensuring, for example, that a GPU communicates through a NIC on the same NUMA node—systems can achieve significantly better performance, especially in communication-heavy workloads like distributed deep learning, MoE expert dispatch, or all-to-all collective operations. NVIDIA DGX H100 system topology (Courtesy: https://docs.nvidia.com/dgx/dgxh100-user-guide/introduction-to-dgxh100.html) Affinity Considerations on Azure NDv5 VMs (H100) The command lscpu can be used to get information about NUMA to cores binding. This is from the output of lscpu on an NVIDIA DGX H100 system, showing that the system has two NUMA nodes: cores 0–47 belong to NUMA node 0, and cores 48–95 belong to NUMA node 1. NUMA: NUMA node(s): 2 NUMA node0 CPU(s): 0-47 NUMA node1 CPU(s): 48-95 In addition, the lstopo command along with the bus_id of GPUs and HCA (Host Channel Adapter) cards can be used to find the mapping between NUMA nodes, CPU cores, GPUs, and HCA. Affinity-Aware Code Adjustments to Boost DeepEP Performance To improve DeepEP performance on Azure, we introduced code changes that explicitly bind processes to the right set of cores, GPU, and HCA., ensuring alignment with the system's NUMA topology. These modifications reduce cross-NUMA communication overhead and improve data locality, which is crucial for communication-heavy workloads like expert parallelism. For this, we integrated the libnuma library using ctypes to enable memory binding to specific NUMA nodes, ensuring that memory allocations are local to the process’s assigned CPU cores. We also used the psutil library to explicitly set CPU affinity, binding each process to a distinct range of cores based on its rank. This reduces cross-node traffic and improves cache locality. As mentioned earlier, on the NVIDIA DGX H100 system, we have two NUMA nodes with 48 cores per NUMA. With 8 processes per node, we can assign 12 cores to each process on this system. These settings are applied early in the init_dist() function, ensuring that compute and communication operations benefit from optimal CPU and memory placement. diff --git a/tests/utils.py b/tests/utils.py index a574366..fffa905 100644 --- a/tests/utils.py +++ b/tests/utils.py @@ -1,10 +1,34 @@ import os import sys +import psutil import numpy as np import torch import torch.distributed as dist from typing import Optional - +import ctypes + +# Load libnuma +libnuma = ctypes.CDLL("libnuma.so") +libnuma.numa_available.restype = ctypes.c_int +libnuma.numa_run_on_node.argtypes = [ctypes.c_int] +libnuma.numa_set_preferred.argtypes = [ctypes.c_int] + +def set_numa_affinity(rank): + cores_per_rank = 12 + numa_node = rank // 4 + core_start = rank * cores_per_rank + core_end = core_start + cores_per_rank + p = psutil.Process(os.getpid()) + p.cpu_affinity(list(range(core_start, core_end))) + print(f"Rank {rank} numa node {numa_node} bound to cores {core_start}-{core_end - 1}") + + # Bind memory to NUMA node + if libnuma.numa_available() != -1: + libnuma.numa_set_preferred(numa_node) + print(f"Rank {rank}: CPU affinity → cores {core_start}-{core_end - 1}, memory NUMA → node {numa_node}") + else: + print(f"Rank {rank}: libnuma not available") def init_dist(local_rank: int, num_local_ranks: int): # NOTES: you may rewrite this function with your own cluster settings @@ -20,8 +44,10 @@ def init_dist(local_rank: int, num_local_ranks: int): world_size=num_nodes * num_local_ranks, rank=node_rank * num_local_ranks + local_rank ) + set_numa_affinity(local_rank) torch.set_default_dtype(torch.bfloat16) torch.set_default_device('cuda') + torch.cuda.set_device(local_rank) return dist.get_rank(), dist.get_world_size(), dist.new_group(list(range(num_local_ranks * num_nodes))) Additionally, as noted earlier, DeepEP leverages NVSHMEM for inter-GPU communication. To ensure each process uses the correct set of Host Channel Adapters (HCAs), we set the NVSHMEM_HCA_LIST environment variable with a comma-separated list of HCAs. For this setting to take effect, the NVSHMEM_ENABLE_NIC_PE_MAPPING variable must also be set to 1. diff --git a/deep_ep/buffer.py b/deep_ep/buffer.py index feeb386..d81130e 100644 --- a/deep_ep/buffer.py +++ b/deep_ep/buffer.py @@ -72,6 +72,8 @@ class Buffer: os.environ['NVSHMEM_IB_ENABLE_IBGDA'] = '1' os.environ['NVSHMEM_IBGDA_NIC_HANDLER'] = 'gpu' os.environ['NVSHMEM_IBGDA_NUM_RC_PER_PE'] = f'{num_qps_per_rank}' + os.environ['NVSHMEM_ENABLE_NIC_PE_MAPPING'] = '1' + os.environ['NVSHMEM_HCA_LIST'] = 'mlx5_ib0:1,mlx5_ib1:1,mlx5_ib2:1,mlx5_ib3:1,mlx5_ib4:1,mlx5_ib5:1,mlx5_ib6:1,mlx5_ib7:1' # Make sure QP depth is always larger than the number of on-flight WRs, so that we can skip WQ slot check os.environ['NVSHMEM_QP_DEPTH'] = '1024' # NOTES: NVSHMEM initialization requires at least 256 MiB AtomicOps Support: A Prerequisite for Running DeepEP DeepEP requires network controllers to support atomic operations. You may face the following error when running DeepEP if the network controllers do not support atomic operations: WARN: device mlx5_an0 does not support all necessary atomic operations. You may want to check the PCI_ATOMIC_MODE value in the NIC firmware. Skipping... /root/nvshmem_src/src/modules/transport/ibrc/ibrc.cpp:340: NULL value qp creation failed /root/nvshmem_src/src/modules/transport/ibrc/ibrc.cpp:1395: non-zero status: 7 ep_create failed h100jcjah000000:16173:16378 [4] NCCL INFO [Service thread] Connection closed by localRank 3 h100jcjah000000:16171:16390 [2] NCCL INFO [Service thread] Connection closed by localRank 3 h100jcjah000000:16169:16382 [0] NCCL INFO [Service thread] Connection closed by localRank 3 W0425 05:06:08.343000 16157 torch/multiprocessing/spawn.py:169] Terminating process 16169 via signal SIGTERM In this section, we discuss how to ensure the atomic operations are supported by the network controllers to avoid this failure. We can use the following command to get the list of network controllers: lspci |grep Mell 0101:00:00.0 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 0102:00:00.0 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 0103:00:00.0 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 0104:00:00.0 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 0105:00:00.0 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 0106:00:00.0 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 0107:00:00.0 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 0108:00:00.0 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function 14a7:00:02.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function] (rev 80) The above output shows that we have eight InfiniBand controllers and one Ethernet controller on the VM. To check if the Ethernet controller supports atomic operations, we can use the following commands: lspci -s 14a7:00:02.0 -vvv | grep AtomicOpsCap AtomicOpsCap: 32bit- 64bit- 128bitCAS- Alternatively, for the InfiniBand controller we have: lspci -s 0108:00:00.0 -vvv | grep AtomicOpsCap AtomicOpsCap: 32bit+ 64bit+ 128bitCAS+ As can be seen from the outputs, for the Ethernet controller (14a7:00:02.0), atomic operations are not supported (AtomicOpsCap: 32bit- 64bit- 128bitCAS-) while for the InfiniBand controller (0108:00:00.0), Atomic operations are supported (AtomicOpsCap: 32bit+ 64bit+ 128bitCAS+). One way to avoid using the Ethernet controller is to provide the list of HCAs through NVSHMEM_HCA_LIST and bypass mlx5_an0 as discussed earlier. Another option is to disable Accelerated Networking (AN) on the VM. This can be done through Azure portal when the VM is deallocated. Disabling AN is not recommended though as it impacts the overall infrastructure, particularly for customers using AMLFS. Performance Experiments After applying the above changes, we got the following performance numbers for the test_internode.py on two Standard_ND96isr_H100_v5 VMs with 8 processes per node (16 total processes). This benchmark evaluates the performance of dispatch and combine operations on a multi-node setting. In this benchmark, the intranode communication is overlapped with internode communications. Please note that the benchmark reports the algorithm bandwidth so the total time of both communication and computation is considered in the performance results. The experimental results on Standard_ND96isr_H100_v5 VMs shows that we're reaching and exceeding the claimed performance in DeepEP repository Item Best Reported RDMA BW Best Reported NVL BW Dispatch (FP8) 45.9 GB/s 149.82 GB/s Dispatch (BF16) 60.32 GB/s 196.89 GB/s Combine 61.34 GB/s 200.22 GB/sComputer-Aided Engineering “CAE” on Azure
Table of Contents: What is Computer-Aided Engineering (CAE)? Why Moving CAE to Cloud? Cloud vs. On-Premises What Makes Azure Special for CAE Workloads? What Makes Azure Stand out Among Public Cloud Providers? “InfiniBand Interconnect” Key CAE Workloads on Azure Azure HPC VM Series for CAE Workloads CAE Software Partnership “ISV’s” Robust Ecosystem of System Integrator “SI” Partners Real-World Use Case: Automotive Sector Final Thoughts -------------------------------------------------------------------------------------------------------- 1. What is Computer-Aided Engineering “CAE”? Computer-Aided Engineering (CAE) is a broad term that refers to the use of computer software to aid in engineering tasks. This includes simulation, validation, and optimization of products, processes, and manufacturing tools. CAE is integral to modern engineering, allowing engineers to explore ideas, validate concepts, and optimize designs before building physical prototypes. CAE encompasses various fields such as finite element analysis (FEA), computational fluid dynamics (CFD), and multibody dynamics (MBD) CAE tools are widely used in industries like automotive, aerospace, and manufacturing to improve product design and performance. For example, in the automotive industry, CAE tools help reduce product development costs and time while enhancing the safety, comfort, and durability of vehicles CAE tools are often used to analyze and optimize designs created within CAD (Computer-Aided Design) software CAE systems typically involve three phases: Pre-processing: Defining the model and environmental factors to be applied to it. Analysis solver: Performing the analysis, usually on high-powered computers. Post-processing: Visualizing the results In a world where product innovation moves faster than ever, Computer-Aided Engineering (CAE) has become a cornerstone of modern design and manufacturing. From simulating airflow over an F1 car to predicting stress in an aircraft fuselage, CAE allows engineers to explore ideas, validate concepts, and optimize designs—before a single prototype is built. -------------------------------------------------------------------------------------------------------- 2. Why Move CAE to Cloud? Cloud vs. On-Premises Historically, CAE workloads were run on-premises due to their compute-intensive nature and large data requirements. Traditional CAE methods—dependent on expensive, on-premises HPC clusters—are facing a tipping point. Many organizations are now embracing cloud-based CAE. When considering whether to use cloud or on-premises solutions, there are several factors to consider: Cost and Maintenance: On-premises solutions require a large upfront investment in hardware and ongoing costs for maintenance and upgrades. Cloud solutions, on the other hand, spread costs over time and often result in lower total cost of ownership. Security and Privacy: On-premises solutions offer control over security but require significant resources to manage. Cloud providers offer advanced security features and compliance certifications, often surpassing what individual companies can achieve on their own Scalability and Flexibility: Cloud solutions provide unmatched scalability and flexibility, allowing businesses to quickly adjust resources based on demand. On-premises solutions can be more rigid and require additional investments to scale Reliability and Availability: Cloud providers offer high availability and disaster recovery options, often with service level agreements (SLAs) guaranteeing uptime. On-premises solutions depend on the company's infrastructure and may require additional investments for redundancy and disaster recovery Integration and Innovation: Cloud solutions often integrate seamlessly with other cloud services and offer continuous innovation through regular updates, new features, and run more simulations in parallel, reducing time-to-solution, accelerating product development cycle, and faster time to market. On-premises solutions may lag in terms of innovation and require manual integration efforts. Global Access: Teams can collaborate and access data/models from anywhere. Cloud gives you global, on-demand supercomputing access without the physical, financial, and operational burden of traditional on-premise clusters. In summary, the choice between cloud and on-premises solutions depends on various factors including cost, performance, security, maintenance, flexibility, and specific business needs. Cloud provides customers with global scalability, high availability, and a broad range of capabilities within a secure, integrated platform. It enables organizations to concentrate on core product innovation, accelerating their journey to market. The following table shows Azure vs. on-premises for CAE Workloads: Aspect Cloud (Azure) On-Premises Global Reach 60+ regions worldwide — deploy compute close to users, customers, or engineers. Limited to where physical hardware is located (one or few sites). Access Flexibility Access from anywhere with secure authentication (VPN/SSO/Conditional Access). Access generally restricted to internal corporate network or VPN. Collaboration Teams across continents can work on shared HPC clusters easily. Remote collaboration can be slow and complex; security risks higher. Elastic Scaling Instantly scale resources up/down globally based on demand. Start small, grow big — then shrink when needed. Scaling requires buying, installing, maintaining new hardware. Time to Deploy No wait for procurement. Minutes to spin up a new HPC cluster in a new region. Weeks/months to procure, rack, configure hardware in new location. Disaster Recovery Built-in regional redundancy, backup options, replication across regions. Disaster recovery requires manual setup, physical duplication. Compliance & Data Residency Choose specific Azure regions to meet compliance (GDPR, HIPAA, ITAR, etc.). Need to build compliant infrastructure manually. Network Latency Optimize by deploying close to users; fast backbone network across regions. Bound by physical proximity; long-distance remote work suffers latency. Maintenance Azure handles hardware upgrades, security patches, downtime minimization. In-house IT teams responsible for all hardware, software, and patching. Security at Scale MSFT commits to invest $20B on cybersecurity over five years. Azure invests >$1B annually in cybersecurity; ISO, SOC, GDPR certified globally. Requires dedicated resources to manage security protocols and maintain visibility across all systems. This can be more complex and resource-intensive compared to cloud solutions Cost Optimization Operates on a pay-as-you-go model, enabling businesses to scale usage and costs as needed. This avoids the capital expenditure of purchasing hardware. Azure also offers various pricing options and discounts, such as reserved capacity, spot pricing, and Azure Hybrid Benefit, which can significantly reduce costs — massive cost control flexibility. Requires significant upfront capital investment in hardware, software licenses, and infrastructure setup. These costs include purchasing and maintaining physical servers, which are subject to technological obsolescence. Ongoing expenses include system maintenance, support, power consumption, and cooling Innovation Access latest GPUs, CPUs (like H100, H200, GB200, AMD-MI300X, HBv3, HBv4, HBv5) Needs investments in hardware refresh cycles. Managed Storage Offers agility with instant provisioning. Scalability as virtually unlimited with automatic scale up or down. Fully managed including updates, patches, backup, etc. High Availability & DR through redundancy, geo-replication, and automated DR options. Security through enterprise-grade security with encryption at rest and in transit & compliance certifications. Pay-as-you-go or reserved pricing with no upfront HW cost (CapEx). Global access through internet. Innovation through continuous improvements with Ai-driven optimization. Offers control but demands heavy investment in HW, time-consuming deployment. Scaling is limited by physical HW capacity. Must be managed by in-house IT teams so required significant time expertise and resources. Redundancy & DR must be designed, funded and maintained manually. Security depends on in-house capabilities and requires investment. High upfront capital expenditure (CapEx). Access limited to local networks unless extended with complex remote-access solutions. Innovation depends on HW refresh cycles limited by expense and infrequency. Software Images & Marketplace Instant access to thousands of pre-built software images via Marketplace. Speedy deployment of complete environments in minutes from ready-to-use templates. Huge ecosystem — access to Microsoft, open-source, and third-party vendor solutions — constantly updated. Automated maintenance and updates as Marketplace software often comes with built-in update capabilities, auto-patching, and cloud-optimized versions. Cost flexibility by either Pay-as-you-go (PAYG) licensing, bring-your-own-license (BYOL) options, or subscription models available. Innovation trough early access to beta, cloud-native, and AI-enhanced software from top vendors through the marketplace. Security is guarded s Marketplace images are verified by cloud provider security and compliance standards. Software must be sourced, manually installed, and configured so takes days to weeks. Manual deployment, installation, environment setup, and configuration can take days or weeks. Limited by licensing agreements, internal vendor contracts, and physical hardware compatibility. Manual updates required and IT must monitor, download, test, and apply patches individually. Large upfront license purchases often needed with renewal and true-up costs can be complex and expensive. Innovation is limited as new software adoption is delayed by procurement, budgeting, and testing cycles. Security assurance depends on internal vetting processes and manual hardening. -------------------------------------------------------------------------------------------------------- 3. What Makes Azure Special for CAE Workloads? Microsoft Azure: a cloud platform enabling scalable, secure, and high-performance CAE workflows across industries. Our goal in Azure is to provide the CAE field with a one-stop, best-in-class technology platform, rich with solution offerings and supported by a robust ecosystem of partners. Azure offers several unique features and benefits that make it particularly well-suited for Computer-Aided Engineering (CAE) workloads: GPU Acceleration: Azure provides powerful GPU options, such as NVIDIA GPUs, which significantly enhance the performance of leading CAE tools. This results in improved turnaround times, reduced power consumption, and lower hardware costs. For example, tools like Ansys Speos for lighting simulation and CPFD's Barracuda Virtual Reactor have been optimized to take advantage of these GPUs. High-Performance Computing (HPC): Azure offers specialized HPC solutions, such as the HBv3, HBv4/HX series, which are designed for high-performance workloads. These solutions provide the computational power needed for complex simulations and analyses. Scalability and Flexibility: Azure's cloud infrastructure allows for easy scaling of resources to meet the demands of CAE workloads. This flexibility ensures that you can handle varying levels of computational intensity without the need for significant upfront investment in hardware. Integration with Industry Tools: Azure supports a wide range of CAE software and tools, making it easier to integrate existing workflows into the cloud environment. This includes certification and optimization of CAE tools on Azure. Support for Hybrid Environments: Azure provides solutions for hybrid cloud environments, allowing you to seamlessly integrate on-premises resources with cloud resources. This is particularly useful for organizations transitioning to the cloud or requiring a hybrid setup for specific workloads. Global Reach: As of April 2025, Microsoft Azure operates over 60 announced regions and more than 300 data centers worldwide, making it the most expansive cloud infrastructure among major providers. Azure ensures low latency and high availability for CAE workloads, regardless of where your team is located. These features collectively make Azure a powerful and flexible platform for running CAE workloads, providing the computational power, scalability, and security needed to handle complex engineering simulations and analyses. -------------------------------------------------------------------------------------------------------- 4. What Makes Azure Stand out Among Public Cloud Providers? “InfiniBand Interconnect” InfiniBand interconnect is one of the key differentiators that makes Microsoft Azure stand out among public cloud providers, especially for high-performance computing (HPC) and CAE workloads. Here’s what makes InfiniBand a game changer, unique, and impactful on Azure: a) Ultra-Low Latency & High Memory Bandwidth InfiniBand on Azure delivers 200 Gbps (and up to 400 Gbps with HDR/NDR in some cases and 800 Gbps for the latest SKU,"HBv5", currently in Preview) interconnect speeds. This ultra-low-latency, high-throughput network is ideal for tightly coupled parallel workloads, such as CFD, FEA, weather simulations, and molecular modeling. When the newly added AMD SKU, HBv5, transitions from preview to general availability (GA), memory bandwidth will no longer be a limitation for workloads such as CFD and Weather simulations. The HBv5 offers an impressive 7 TB/s of memory bandwidth, which is 8 times greater than the latest bare-metal and cloud alternatives. It also provides nearly 20 times more bandwidth than Azure HBv3 and Azure HBv2, which utilize the 3rd Gen EPYC™ with 3D V-cache “Milan-X” and the 2nd Gen EPYC™ “Rome” respectively. Additionally, the HBv5 delivers up to 35 times more memory bandwidth compared to a 4–5-year-old HPC server nearing the end of its hardware lifecycle. b) RDMA (Remote Direct Memory Access) Support RDMA enables direct memory access between VMs, bypassing the CPU, which drastically reduces latency and increases application efficiency — a must for HPC workloads. c) True HPC Fabric in the Cloud Azure is the only major public cloud provider that offers InfiniBand across multiple VM families like: HBv3/4 (for CFD, FEA, Multiphysics, Molecular Dynamics) HX-series (Structural Analysis) ND (GPU + MPI) It allows scaling MPI workloads across thousands of cores — something typically limited to on-premises supercomputers. d) Production-Grade Performance for CAE Solvers like ANSYS Fluent, STAR-CCM+, Abaqus, and MSC Nastran have benchmarked extremely well on Azure, thanks in large part to the InfiniBand-enabled infrastructure. If you’re building CAE, HPC, or AI workloads that rely on ultra-fast communication between nodes, Azure’s InfiniBand-powered VM SKUs offer the best cloud-native alternative to on-prem HPC clusters. -------------------------------------------------------------------------------------------------------- 5. Key CAE Workloads on Azure: CAE isn’t a one-size-fits-all domain. Azure supports a broad spectrum of CAE applications, such as: Computational Fluid Dynamics (CFD): ANSYS Fluent, Ansys CFX, Siemens Simcenter STAR-CCM+, Convergent Science CONVERGE CFD, Autodesk CFD, OpenFOAM, NUMECA Fine/Open, Altair ACuSolve, Simerics MP+, Cadence Fidelity CFD, COMSOL Multiphysics (CFD Module), Dassault Systeme XFlow, etc. Finite Element Analysis (FEA): ANSYS Mechanical, Dassault Systemes Abaqus, Altair OptiStruct, Siemens Simcenter 3D, MSC Nastran, Autodesk Fusion 360 Simulation, COMSOL Multiphysics (Structural Module), etc. Thermal & Electromagnetic Simulation: COMSOL Multiphysics, Ansys-HFSS, CST Studio Suite, Ansys Mechanical (Thermal Module), Siemens Simcenter 3D Thermal, Dassault Systemes Abaqus Thermal, etc. Crash & Impact Testing: Ansys LS-DYNA, Altair Radioss, ESI PAM-Crash, Siemens Simcenter Madymo, Dassault Systemes Abaqus “Explicit”, Ansys Autodyn, etc. These applications require a combination of powerful CPUs, big memory footprint, high memory bandwidth, and low-latency interconnects. Some applications also offer GPU-accelerated versions. All of which are available in Azure’s purpose-built HPC VM families. -------------------------------------------------------------------------------------------------------- 6. Azure HPC VM Series for CAE Workloads Azure offers specialized VM series tailored for CAE applications. These VMs support RDMA-enabled InfiniBand networking, critical for scaling CAE workloads across nodes in parallel simulations. CPU: HBv3, HBv4 Series: Ideal for memory-intensive workloads like CFD and FEA, offering high memory bandwidth and low-latency interconnects. HX Series: Optimized for structural analysis applications, providing significant performance boosts for solvers like MSC Nastran & others. GPU: ND Series: GPU-accelerated VMs optimized for CAE workloads, offering high double-precision compute, large memory bandwidth, and scalable performance with NVIDIA H100, H200, GB200 & AMD M300X GPUs. The highest-performing compute-optimized CPU offering in Azure today is the HBv4/HX series, featuring 176 cores of 4th Gen AMD EPYC processors with 3D V-Cache technology (“Genoa-X”). Below is a sample performance comparison of four different AMD SKU generations against the Intel “HCv1-Skylake” SKU, using the Ansys Fluent (F1 Racecar 140M cells) model. Full performance & scalability of HBv4 and HX-Series VMs with Genoa-X CPUs is HERE. -------------------------------------------------------------------------------------------------------- 7. CAE Software Partnership “ISV’s” Independent Software Vendors (ISVs) play a critical role on Azure by bringing trusted, industry-leading applications to the platform. Their solutions — spanning CAE, CFD, FEA, data analytics, AI, and more — are optimized to run efficiently on Azure’s scalable infrastructure. ISVs ensure that customers can seamlessly move their workloads to the cloud without sacrificing performance, compatibility, or technical support. They also drive innovation by collaborating with Azure engineering teams to deliver cloud-native, HPC-ready, and AI-enhanced capabilities, helping businesses accelerate product development, simulations, and decision-making. Below is a partial list of these ISVs & their offerings on Azure: ANSYS Access: SaaS platform built on Azure, offering native cloud experiences for Fluent, Mechanical, LS-Dyna, HFSS, etc. Altair One: SaaS platform on Azure supporting Altair solvers such as HyperWorks, OptiStruct, Radioss, AcuSolve, etc. Siemens Simcenter: Validated on Azure for fluid, structural, and thermal simulation with solvers such as STAR-CCM+, NX, Femap Dassault Systèmes: Solvers such as Abaqus, CATIA, SIMULIA, XFlow COMSOL: For its flagship solver “COMSOL Multiphysics” CPFD Software: CPFD Software has optimized its simulation tool “Barracuda Virtual Reactor” for Azure, enabling engineers to perform particle-fluid simulations efficiently. -------------------------------------------------------------------------------------------------------- 8. Robust Ecosystem of System Integrator “SI” Partners Azure CAE System Integrators (SIs) are specialized partners that assist organizations in deploying and managing CAE workloads on Microsoft Azure. These SIs provide expertise in cloud migration, HPC optimization, and integration of CAE applications, enabling businesses to leverage Azure’s scalable infrastructure for engineering simulations and analyses. a) What Do Azure CAE System Integrators Offer? Azure CAE SIs deliver a range of services tailored to the unique demands of engineering and simulation workloads: Cloud Migration: Transitioning on-premises CAE applications and data to Azure’s cloud environment. HPC Optimization: Configuring Azure’s HPC resources to maximize performance for CAE tasks. Application Integration: Ensuring compatibility and optimal performance of CAE software (e.g., ANSYS, Siemens, Altair, Abaqus) on Azure. Managed Services: Ongoing support, monitoring, and maintenance of CAE environments on Azure. b) Leading Azure CAE System Integrators Several SIs have been recognized for their capabilities in deploying CAE solutions on Azure. Partial list is below: Rescale, TotalCAE, Oakwood Systems, UberCloud “SIMR”, Capgemini, Accenture, Hexagon Manufacturing Intelligence. c) Benefits of Collaborating with Azure CAE SIs By partnering with Azure CAE System Integrators, organizations can effectively harness the power of cloud computing to enhance their engineering and simulation capabilities. Engaging with Azure CAE System Integrators can provide: Expertise: Access to professionals experienced in both CAE applications and Azure infrastructure. Efficiency: Accelerated deployment and optimization of CAE workloads. Scalability: Ability to scale resources up or down based on project requirements. Cost Management: Optimized resource usage leading to potential cost savings. -------------------------------------------------------------------------------------------------------- 9. Real-World Use Case: Automotive Sector Rimac used Azure cloud computing to help with the design, testing, and manufacturing of its next-generation components and sportscars, and it’s gaining even greater scale and speed in its product development processes with a boost from Microsoft Azure HPC Rimac’s Azure HPC environment uses Azure CycleCloud to organize and orchestrate clusters—putting together different cluster types and sizes flexibly and as necessary. The solution includes Azure Virtual Machines, running containers on Azure HBv3 virtual machines with 3 rd Gen AMD EPYC™ Milan Processors with AMD 3D V-Cache, which are much faster than previous generation Azure virtual machines for explicit calculations. Rimac’s solution takes full advantage of the power of AMD, which offers the highest performing x86 CPU for technical computing. “We’ve gained a significant increase in computational speed with AMD, which leads to lower utilization of HPC licenses and faster iterations,” says Ivan Krajinović, Head of Simulations, Rimac Technology “However complex the model we need to create, we know that we can manage it with Azure HPC. We now produce more highly complex models that simply wouldn’t have been possible on our old infrastructure.” Ivan Krajinović -------------------------------------------------------------------------------------------------------- 10. The Future of CAE is Cloud-Native The next frontier in CAE is not just lifting and shifting legacy solvers into the cloud—but enabling cloud-native simulation pipelines. List includes: AI-assisted simulation tuning Serverless pre/post-processing workflows Digital twins integrated with IoT data on Azure Cloud-based visualization with NVIDIA Omniverse With advances in GPU acceleration, parallel file systems (like Azure Managed Lustre File System, AMLFS), and intelligent job schedulers, Azure is enabling this next-gen CAE transformation today. -------------------------------------------------------------------------------------------------------- 11. Final Thoughts Moving CAE to Azure is more than a tech upgrade—it’s a shift in mindset. It empowers engineering teams to simulate more, iterate faster, and design better—without being held back by hardware constraints. If you’re still running CAE workloads on aging, capacity-constrained systems, now is the time to explore what Azure HPC can offer. Let the cloud be your wind tunnel, your test track, your proving ground. -------------------------------------------------------------------------------------------------------- Let’s Connect Have questions or want to share how you’re using CAE in the cloud? Let’s start a conversation! We'd love to hear your thoughts! Leave a comment below and join the conversation. 👇 #CAE #HPC #AzureHPC #EngineeringSimulation #CFD #FEA #CloudComputing #DigitalEngineering #MicrosoftAzureRunning DeepSeek-R1 on a single NDv5 MI300X VM
Contributors: Davide Vanzo, Yuval Mazor, Jesse Lopez DeepSeek-R1 is an open-weights reasoning model built on DeepSeek-V3, designed for conversational AI, coding, and complex problem-solving. It has gained significant attention beyond the AI/ML community due to its strong reasoning capabilities, often competing with OpenAI’s models. One of its key advantages is that it can be run locally, giving users full control over their data. The NDv5 MI300X VM features 8x AMD Instinct MI300X GPUs, each equipped with 192GB of HBM3 and interconnected via Infinity Fabric 3.0. With up to 5.2 TB/s of memory bandwidth per GPU, the MI300X provides the necessary capacity and speed to process large models efficiently - enabling users to run DeepSeek-R1 at full precision on a single VM. In this blog post, we’ll walk you through the steps to provision an NDv5 MI300X instance on Azure and run DeepSeek-R1 for inference using the SGLang inference framework. Launching an NDv5 MI300X VM Prerequisites Check that your subscription has sufficient vCPU quota for the VM family “StandardNDI Sv 5MI300X” (see Quota documentation). If needed, contact your Microsoft account representative to request quota increase. A Bash terminal with Azure CLI installed and logged into the appropriate tenant. Alternatively, Azure Cloud Shell can also be employed. Provision the VM 1. Using Azure CLI, create an Ubuntu-22.04 VM on ND_MI300x_v5: az group create --location <REGION> -n <RESOURCE_GROUP_NAME> az vm create --name mi300x --resource-group <RESOURCE_GROUP_NAME> --location <REGION> --image microsoft-dsvm:ubuntu-hpc:2204-rocm:22.04.2025030701 --size Standard_ND96isr_MI300X_v5 --security-type Standard --os-disk-size-gb 256 --os-disk-delete-option Delete --admin-username azureadmin --ssh-key-values <PUBLIC_SSH_PATH> Optionally, the deployment can utilize the cloud-init.yaml file specified as --custom-data <CLOUD_INIT_FILE_PATH> to automate the additional preparation described below: az vm create --name mi300x --resource-group <RESOURCE_GROUP_NAME> --location <REGION> --image microsoft-dsvm:ubuntu-hpc:2204-rocm:22.04.2025030701 --size Standard_ND96isr_MI300X_v5 --security-type Standard --os-disk-size-gb 256 --os-disk-delete-option Delete --admin-username azureadmin --ssh-key-values <PUBLIC_SSH_PATH> --custom-data <CLOUD_INIT_FILE_PATH> Note: The GPU drivers may take a couple of mintues to completely load after the VM has been initially created. Additional preparation Beyond provisioning the VM, there are additional steps to prepare the environment to optimally run DeepSeed, or other AI workloads including setting-up the 8 NVMe disks on the node in a RAID-0 configuration to act as the cache location for Docker and Hugging Face. The following steps assume you have connected to the VM and working in a Bash shell. 1. Prepare the NVMe disks in a RAID-0 configuration mkdir -p /mnt/resource_nvme/ sudo mdadm --create /dev/md128 -f --run --level 0 --raid-devices 8 $(ls /dev/nvme*n1) sudo mkfs.xfs -f /dev/md128 sudo mount /dev/md128 /mnt/resource_nvme sudo chmod 1777 /mnt/resource_nvme 2. Configure Hugging Face to use the RAID-0. This environmental variable should also be propagated to any containers pulling images or data from Hugging Face. mkdir –p /mnt/resource_nvme/hf_cache export HF_HOME=/mnt/resource_nvme/hf_cache 3. Configure Docker to use the RAID-0 mkdir -p /mnt/resource_nvme/docker sudo tee /etc/docker/daemon.json > /dev/null <<EOF { "data-root": "/mnt/resource_nvme/docker" } EOF sudo chmod 0644 /etc/docker/daemon.json sudo systemctl restart docker All of these additional preperation steps can be automated in VM creation using cloud-init. The example cloud-init.yaml file can be used in provisioning the VM as described above. #cloud-config package_update: true write_files: - path: /opt/setup_nvme.sh permissions: '0755' owner: root:root content: | #!/bin/bash NVME_DISKS_NAME=`ls /dev/nvme*n1` NVME_DISKS=`ls -latr /dev/nvme*n1 | wc -l` echo "Number of NVMe Disks: $NVME_DISKS" if [ "$NVME_DISKS" == "0" ] then exit 0 else mkdir -p /mnt/resource_nvme # Needed incase something did not unmount as expected. This will delete any data that may be left behind mdadm --stop /dev/md* mdadm --create /dev/md128 -f --run --level 0 --raid-devices $NVME_DISKS $NVME_DISKS_NAME mkfs.xfs -f /dev/md128 mount /dev/md128 /mnt/resource_nvme fi chmod 1777 /mnt/resource_nvme - path: /etc/profile.d/hf_home.sh permissions: '0755' content: | export HF_HOME=/mnt/resource_nvme/hf_cache - path: /etc/docker/daemon.json permissions: '0644' content: | { "data-root": "/mnt/resource_nvme/docker" } runcmd: - ["/bin/bash", "/opt/setup_nvme.sh"] - mkdir -p /mnt/resource_nvme/docker - mkdir -p /mnt/resource_nvme/hf_cache # PAM group not working for docker group, so this will add all users to docker group - bash -c 'for USER in $(ls /home); do usermod -aG docker $USER; done' - systemctl restart docker Using MI300X If you are familiar with Nvidia and CUDA tools and environment, AMD provides equivalents as part of the ROCm stack. MI300X + ROCm Nvidia + CUDA Description rocm-smi nvidia-smi CLI for monitoring the system and making changes rccl nccl Library for communication between GPUs Running DeepSeek-R1 1. Pull the container image. It is O(10) GB in size, so it may take a few minutes to download. docker pull rocm/sgl-dev:upstream_20250312_v1 2. Start the SGLang server. The model (~642 GB) is downloaded the first time it is launched and will take at least a few minutes to download. Once the application outputs “The server is fired up and ready to roll!”, you can begin making queries to the model. docker run \ --device=/dev/kfd \ --device=/dev/dri \ --security-opt seccomp=unconfined \ --cap-add=SYS_PTRACE \ --group-add video \ --privileged \ --shm-size 32g \ --ipc=host \ -p 30000:30000 \ -v /mnt/resource_nvme:/mnt/resource_nvme \ -e HF_HOME=/mnt/resource_nvme/hf_cache \ -e HSA_NO_SCRATCH_RECLAIM=1 \ -e GPU_FORCE_BLIT_COPY_SIZE=64 \ -e DEBUG_HIP_BLOCK_SYN=1024 \ rocm/sgl-dev:upstream_20250312_v1 \ python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1 --tp 8 --trust-remote-code --chunked-prefill-size 131072 --enable-torch-compile --torch-compile-max-bs 256 --host 0.0.0.0 3. You can now make queries to DeepSeek-R1. For example, these requests to the model from another shell on same host provide model data and will generate a sample response. curl http://localhost:30000/get_model_info {"model_path":"deepseek-ai/DeepSeek-R1","tokenizer_path":"deepseek-ai/DeepSeek-R1","is_generation":true} curl http://localhost:30000/generate -H "Content-Type: application/json" -d '{ "text": "Once upon a time,", "sampling_params": { "max_new_tokens": 16, "temperature": 0.6 } }' Conclusion In this post, we detail how to run the full-size 671B DeepSeek-R1 model on a single Azure NDv5 MI300X instance. This includes setting up the machine, installing the necessary drivers, and executing the model. Happy inferencing! References https://github.com/deepseek-ai/DeepSeek-R1 https://github.com/deepseek-ai/DeepSeek-V3 https://www.amd.com/en/developer/resources/technical-articles/amd-instinct-gpus-power-deepseek-v3-revolutionizing-ai-development-with-sglang.html https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/azure-announces-new-ai-optimized-vm-series-featuring-amd%e2%80%99s-flagship-mi300x-gpu/3980770 https://docs.sglang.ai/index.html https://rocm.blogs.amd.com/artificial-intelligence/DeepSeekR1-Part2/README.htmlNew Linux VDI Solution: Deploy Cendio Thinlinc as a Standalone Image with Direct Connectivity
Prerequisites Microsoft Azure Subscription: Ensure you have an active Microsoft Azure subscription. Virtual Network: Ensure you have a VNet with connectivity to your corporate network (for example, over VPN) Resource Group: Choose an existing resource group or create a new one. SSH Key (Optional): Generate an SSH key pair if you don't already have one. Step 1: Deploy the Standalone Thinlinc VM Image Navigate to the Microsoft Azure Marketplace Go to the ThinLinc page on the Microsoft Azure Marketplace: Thinlinc by Cendio Click “Get It Now” to start the deployment process. Choose between Alma Linux 9 and Ubuntu 22.04 Click “Create” in the new Microsoft Azure Portal tab Configure Basic Settings Subscription: Select your Microsoft Azure subscription. Resource Group: Choose an existing resource group or create a new one. VM Name: Provide a name for the virtual machine (VM). Region: Select the Azure region where the VM will be deployed. Availability Zone: Select an Availability Zone as you prefer. Image: Ensure the proper Thinlinc image is selected Select VM Size Click "Change size" and choose a VM that has sufficient resources for your workload. We highly recommend a GPU enabled VM series (e.g. NV-series) although it is possible to do this without a GPU machine. Configure Administrator Account Choose the Authentication Type: We recommend using a SSH public key although you can use a password as well. If using SSH public key, upload your SSH public key or generate a new key pair. Set a username for the admin account. Inbound Port Rules Permit only essential ports (example, 22 for SSH if required for admin maintenance). Networking Tab Virtual Network: Select your existing VNet with direct connectivity Subnet: Choose a subnet that can route traffic to your corporate network Public IP: Disable (since direct connectivity is used) Review + Create Review all your information, then click “Create” to deploy the VM. Step 2: Configure Thinlinc Server After deploying the VM, configure the Thinlinc server to ensure it is accessible and running properly. This step verifies the installation and retrieves the private IP for connection Navigate to your deployment in the Microsoft Azure Portal. Select your deployed VM, go to the networking tab and note its private IP address Connect to the VM from a machine within your private network using SSH: ssh “username”@<PRIVATE_IP> Step 3: Connect Using Thinlinc Client (see below) Download the Thinlinc client from Cendio's Website Open the Thinlinc client and enter the VMs private address in the “server” line Enter your username and SSH key (or password) 4. Click connect to launch your Thinlinc session Conclusion: You have successfully deployed a standalone Cendio Thinlinc VM on Microsoft Azure with direct connectivity! This setup allows seamless remote Linux desktop access while leveraging existing VPNs and private networking. Stay tuned for more updates on our Thinlinc integration with CycleCloud Workspaces for Slurm which will support advanced AI and HPC workloads.Automating Ubuntu Pro Integration with Azure CycleCloud HPC Clusters
Ubuntu Pro is a premium subscription service offered by Canonical. It is designed to provide additional features, tools, and extended support for Ubuntu users, particularly those in enterprise or production environments. A key feature of Ubuntu Pro is that it offers extended security updates and patches for older versions of Ubuntu that are no longer supported under standard updates. This helps ensure that businesses can continue using older Ubuntu LTS (Long Term Support) releases securely even after their standard support period ends. Ubuntu Pro can be purchased through a private offer, which allows enterprises to achieve more efficient cost management. However, to use a private offer in Azure one must ensure that a VM is 'attached' to an Ubuntu Pro subscription at creation. Furthermore, while VMs do get detached from a subscription when deleted, it is best practice to 'detach' from the subscription, when the VM no longer needs the pro features. In this blog post, we will explore the process of automating Ubuntro Pro subscription management in an Azure CycleCloud HPC cluster. Azure CycleCloud is an Azure service tailored for managing high-performance computing (HPC) and AI workloads at scale. With CycleCloud, users can seamlessly execute jobs of any magnitude while automating the creation, scaling, and configuration of infrastructure for HPC+AI systems. It also integrates with familiar HPC+AI schedulers such as Slurm, PBSPro, and Grid Engine, ensuring a streamlined experience for advanced computing environments. The cyclecloud-ubuntupro project will help ensure that the compute infrastructure deployed via Azure CycleCloud is automatically attached to an existing Ubuntu Pro subscription at resource creation and detached when deallocated. Pre-Requisites An active Ubuntu Pro subscription Azure CycleCloud installed and running (CycleCloud 8.0 or later) Supported Templates: Slurm Configuring the project Open a terminal session in the CycleCloud server with the CycleCloud CLI enabled. Clone the cyclecloud-ubuntupro repo git clone https://github.com/egmsft/cyclecloud-ubuntupro.git Switch to the cyclecloud-ubuntupro project directory and upload the project to the cyclecloud locker. cd cyclecloud-ubuntupro/ cyclecloud project upload <locker name> Import the required template cyclecloud import_template -f templates/slurm-ubuntupro.txt -c Slurm slurm-ubuntupro Creating the Cluster After the import you should see an entry for slurm-ubuntupro under the list of supported schedulers. Select the template to build the cluster. Follow the steps to configure the cluster. Once you reach the Ubuntu Pro tab, enter your token for your Ubuntu Pro Subscription Testing the Cluster After configuring the cluster, you can start it and validate that the VMs are getting attached to your subscription by running this command: pro status The output of that command should look something like this: Note, when VMs get terminated, the VMs will detach from your subscription before they get deallocated. The detachment is done using scheduled events and CycleCloud's jetpack will log the successful execution of the onTerminate.sh script that gets applied during the VM's initial configuration. Resources Ubuntu Pro Documentation CycleCloud Documentation