<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>Azure High Performance Computing (HPC) Blog articles</title>
    <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/bg-p/AzureHighPerformanceComputingBlog</link>
    <description>Azure High Performance Computing (HPC) Blog articles</description>
    <pubDate>Sun, 14 Jun 2026 12:54:49 GMT</pubDate>
    <dc:creator>AzureHighPerformanceComputingBlog</dc:creator>
    <dc:date>2026-06-14T12:54:49Z</dc:date>
    <item>
      <title>Real-Time Simulation with Ansys Discovery on Azure AMD Radeon V710 GPUs</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/real-time-simulation-with-ansys-discovery-on-azure-amd-radeon/ba-p/4527323</link>
      <description>&lt;H2&gt;Overview of Synopsys-Ansys Discovery&lt;/H2&gt;
&lt;P&gt;Ansys Discovery is an interactive engineering application that unifies 3D geometry modeling, real‑time simulation, and visual analysis into a single, immersive workspace. It is purpose‑built to accelerate early‑stage engineering exploration while maintaining continuity into high‑fidelity validation workflows.&lt;/P&gt;
&lt;P&gt;Discovery enables engineers and designers to create, modify, simulate, and iterate on designs in real time, dramatically reducing the time required to assess design feasibility, physics behavior, and performance tradeoffs. Unlike traditional sequential CAD‑then‑simulation workflows, Discovery collapses these steps into a continuous feedback loop that supports faster decision‑making and more informed design outcomes.&lt;/P&gt;
&lt;P&gt;Ansys Discovery is organized into three continuously accessible stages:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Model – Focused on direct 3D geometry creation and modification, without simulation overhead&lt;/LI&gt;
&lt;LI&gt;Explore – Provides instant, real‑time simulation results using GPU‑accelerated meshing and solvers for rapid insight&lt;/LI&gt;
&lt;LI&gt;Refine – Enables higher‑fidelity simulation using body‑fitted meshing and CPU/GPU solvers for design validation&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Running Discovery on Azure extends these capabilities by enabling scalable, cloud-based engineering workflows.&lt;/P&gt;
&lt;H2&gt;Key benefits&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Running Ansys Discovery on Azure provides a scalable, cloud-based platform for simulation-driven design with the following advantages:&lt;/LI&gt;
&lt;LI&gt;Access high-performance engineering workstations from anywhere&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;Azure provides global access to GPU-enabled virtual machines, enabling engineers to run Discovery from any location without dependency on local hardware.&lt;/LI&gt;
&lt;LI&gt;Engineers no longer need high-end local GPUs, as compute and visualization are delivered through Azure-hosted environments. This reduces hardware refresh cycles and IT overhead.&lt;/LI&gt;
&lt;LI&gt;Azure NVads V710 v5 supports fractional GPU allocation (1/6 to full GPU), allowing organizations to right-size cost and performance based on workload requirements.&lt;/LI&gt;
&lt;LI&gt;Discovery’s GPU-accelerated Explore mode benefits from Azure GPU infrastructure, enabling instant physics feedback and faster design iteration cycles.&lt;/LI&gt;
&lt;LI&gt;CAD models, simulation results, and design artifacts can be stored in Azure Files or Azure NetApp Files, reducing data duplication and improving collaboration across teams.&lt;/LI&gt;
&lt;LI&gt;Azure allows rapid provisioning and scaling of GPU resources to match peak design cycles or project demands.&lt;/LI&gt;
&lt;LI&gt;Intellectual property remains within enterprise-managed Azure environments, with access controlled using Microsoft Entra ID and Azure security services.&lt;/LI&gt;
&lt;LI&gt;Cloud-hosted Discovery environments enable geographically distributed teams to collaborate on shared designs with consistent performance and user experience.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;To support these capabilities, Azure provides a reference architecture optimized for GPU-accelerated simulation.&lt;/P&gt;
&lt;H2&gt;Architecture&lt;/H2&gt;
&lt;P&gt;The architecture for running Ansys Discovery on Azure is designed to deliver interactive, GPU‑accelerated simulation in a scalable and cloud-native environment. As illustrated in the diagram, engineers connect to Azure-hosted virtual workstations powered by NVads V710 v5 GPUs, enabling real-time geometry modeling and simulation through remote visualization. Design data and simulation outputs are stored in shared cloud storage services such as Azure Files or Azure NetApp Files, ensuring consistency and accessibility across distributed teams. This setup allows users to seamlessly move between interactive exploration and higher-fidelity validation workflows, while benefiting from on-demand scaling, centralized security, and reduced dependency on local high-performance hardware.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Azure GPU SKU Mapping&lt;/H2&gt;
&lt;P&gt;Architectural Mapping Overview&lt;/P&gt;
&lt;P&gt;Ansys Discovery workloads on Azure naturally align to two primary GPU VM categories:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Graphics‑optimized, fractional GPU -&amp;gt; Design exploration&lt;/LI&gt;
&lt;LI&gt;Full‑GPU -&amp;gt; Design Refinement&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;This mapping enables right‑sizing of cost, performance, and user experience on the following recommended Azure VM Series.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://www.bing.com/ck/a?!&amp;amp;&amp;amp;p=d6fcdb877e4df4e4c968b98bf49cec1b2090995aba4023f5b4c1a407d379a588JmltdHM9MTc3NjgxNjAwMA&amp;amp;ptn=3&amp;amp;ver=2&amp;amp;hsh=4&amp;amp;fclid=0f1f8fc5-0fed-6525-250f-98f90ed164f6&amp;amp;psq=nvads+v710+v5+series+azure&amp;amp;u=a1aHR0cHM6Ly9sZWFybi5taWNyb3NvZnQuY29tL2VuLXVzL2F6dXJlL3ZpcnR1YWwtbWFjaGluZXMvc2l6ZXMvZ3B1LWFjY2VsZXJhdGVkL252YWRzdjcxMC12NS1zZXJpZXM" target="_blank" rel="noopener"&gt;Azure NVads V710 v5series&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;NVads V710 v5 VMs are designed for graphics‑intensive, interactive workloads and provide:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;AMD Radeon™ Pro V710 GPUs&lt;/LI&gt;
&lt;LI&gt;Fractional GPU support (1⁄6 → full GPU)&lt;/LI&gt;
&lt;LI&gt;Cost‑optimized GPU visualization&lt;/LI&gt;
&lt;LI&gt;High‑frame‑rate remote graphics&lt;/LI&gt;
&lt;LI&gt;No additional GPU licensing required&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;These characteristics align directly with Discovery’s real‑time, GPU‑accelerated Explore mode, where instant feedback matters more than absolute solver depth.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Typical Discovery Workloads on NVads V710 v5&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Interactive geometry creation and modification&lt;/LI&gt;
&lt;LI&gt;Real‑time physics feedback in Explore mode&lt;/LI&gt;
&lt;LI&gt;Concept‑level CFD, structural, and thermal studies&lt;/LI&gt;
&lt;LI&gt;Design review sessions and rapid iteration loops&lt;/LI&gt;
&lt;/UL&gt;
Early‑stage topology and shape experimentation&lt;/LI&gt;
&lt;LI&gt;
&lt;P&gt;Recommended Sizes (Right‑Sizing Guidance)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Discovery Use Case&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Azure VM Size&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;GPU Allocation&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Single engineer, light models&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Standard_NV4ads_V710_v5&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1⁄6 GPU&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Moderate assemblies, daily use&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Standard_NV8ads_V710_v5&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1⁄3 GPU&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Large CAD models, advanced explore&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Standard_NV12ads_V710_v5&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1⁄2 GPU&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Power users, complex scenes&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Standard_NV24ads_V710_v5&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Full GPU&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;
&lt;H2&gt;Test Scenarios:&lt;/H2&gt;
&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Test Name&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Simulation Analysis Type&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;External Aerodynamics Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Fluid&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Internal Flow Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Fluid&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Internal Flow Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Fluid&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Internal Flow Test 3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Fluid&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Heat Transfer Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Thermal&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Conjugate Heat Transfer Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Fluid-Thermal&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Conjugate Heat Transfer Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Fluid-Thermal&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Static Structural Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Structural&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Static Structural Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Structural&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Static Structural Assembly Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Structural&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Static Structural Assembly Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Structural&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Natural Frequency Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Modal&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Natural Frequency Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Modal&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;
&lt;H2&gt;Result Table:&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Test Name&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Solve Time (s)&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Element Count&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Test Result&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;External Aerodynamics Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;47.68&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1.44E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Internal Flow Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;26.70&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;7.90E+05&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Internal Flow Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;9.34&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;9.64E+05&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Internal Flow Test 3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;18.00&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1.39E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Heat Transfer Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4.55&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2.51E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Conjugate Heat Transfer Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;12.34&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1.34E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Conjugate Heat Transfer Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;30.89&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1.54E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Static Structural Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4.76&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;3.06E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Static Structural Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;4.78&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;3.04E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Static Structural Assembly Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;12.98&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;2.97E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Static Structural Assembly Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;14.16&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1.88E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Natural Frequency Test 1&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;36.04&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1.45E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Natural Frequency Test 2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;16.10&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;1.11E+06&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Pass&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;col style="width: 25.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;These results demonstrate consistent performance across fluid, thermal, and structural simulations, with sub-minute solve times even for multi-million element models.&lt;/P&gt;
&lt;H2&gt;Conclusion:&lt;/H2&gt;
&lt;/DIV&gt;
&lt;P&gt;The validation confirms that Azure NVads V710 v5 delivers consistent, scalable performance across diverse simulation workloads. With fractional GPU flexibility and strong visualization capabilities, it enables organizations to adopt real-time simulation without overprovisioning infrastructure.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;
&lt;P&gt;All workloads successfully passed, demonstrating consistent performance across simulation types. Importantly, we’re seeing rapid solve times even at multi-million element scale, which reinforces the ability to use Discovery for real-time engineering exploration at enterprise level. Azure NVads V710 v5 is uniquely optimized for Discovery’s real-time simulation workflows. It provides GPU acceleration tuned for interactive engineering, not just batch compute.&lt;/P&gt;
&lt;P&gt;The fractional GPU model allows organizations to right-size cost and performance based on user needs—from individual engineers to power users. This ensures maximum flexibility while maintaining high-quality visualization and responsiveness.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Thu, 11 Jun 2026 14:09:17 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/real-time-simulation-with-ansys-discovery-on-azure-amd-radeon/ba-p/4527323</guid>
      <dc:creator>Sunita_AZ0708</dc:creator>
      <dc:date>2026-06-11T14:09:17Z</dc:date>
    </item>
    <item>
      <title>Training 100B+ Models on a Single GPU: What MegaTrain Changes - and What It Means for Azure</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/training-100b-models-on-a-single-gpu-what-megatrain-changes-and/ba-p/4519562</link>
      <description>&lt;H1&gt;The Paradigm Shift in Model Training&lt;/H1&gt;
&lt;P&gt;The conventional wisdom in deep learning has been simple: bigger models require bigger infrastructure. Training a 100-billion parameter language model traditionally demands massive GPU clusters with terabytes of combined memory, where each GPU holds portions of the model simultaneously. This assumption has shaped the entire AI infrastructure landscape, driving demand for high-memory accelerators and complex distributed training frameworks.&lt;/P&gt;
&lt;P&gt;A new &lt;A href="https://arxiv.org/html/2604.05091" target="_blank"&gt;paper named MegaTrain&lt;/A&gt; (Yuan et al) by &amp;nbsp;research teams at NotreDame and Lehigh universities, changes this paradigm by demonstrating that model size need not be limited by GPU memory capacity. Instead of treating memory as a container that must hold the entire model, MegaTrain treats it as a cache through which model components flow during computation. This architectural inversion enables training models orders of magnitude larger than available GPU memory, transforming what was thought to be a hardware limitation into a software scheduling problem.&lt;/P&gt;
&lt;P&gt;The implications extend beyond academic curiosity to practical infrastructure decisions, particularly for cloud platforms like Azure where GPU configurations, interconnect topologies, and cost structures create distinct optimization landscapes.&lt;/P&gt;
&lt;H1&gt;The Memory Bottleneck in Traditional LLM Training&lt;/H1&gt;
&lt;P&gt;Large language model training faces a fundamental memory constraint that grows with model scale. A typical 100-billion parameter model with mixed-precision training may require in excess of 1 Terabyte of memory when taking into account weights and optimizer states.&amp;nbsp; Activation memory adds another substantial overhead, particularly for long sequences where intermediate layer outputs must be retained for backpropagation.&lt;/P&gt;
&lt;P&gt;Traditional training approaches keep all parameters, gradients, and optimizer states resident in GPU memory simultaneously, creating a hard ceiling on trainable model size. Standard datacenter GPUs like the NVIDIA A100 with 80GB memory can barely accommodate models beyond 20 billion parameters with reasonable batch sizes, while consumer-grade GPUs with 16-24GB memory are restricted to models under 5 billion parameters.&lt;/P&gt;
&lt;P&gt;This memory wall has forced the AI community toward distributed training strategies like pipeline parallelism, tensor parallelism, and data parallelism, each adding communication overhead, synchronization complexity, and infrastructure costs. The memory bottleneck becomes particularly acute during gradient accumulation and optimizer updates, where transient memory peaks can trigger out-of-memory errors even when average utilization suggests sufficient capacity.&lt;/P&gt;
&lt;H1&gt;The Key Insight: Inverting the Memory Hierarchy&lt;/H1&gt;
&lt;P&gt;MegaTrain's core innovation inverts the traditional memory hierarchy by treating GPU memory as a streaming cache rather than a static container. In conventional training, the model resides in GPU memory while slower storage sits idle except for checkpoint loading. MegaTrain reverses this relationship: the full model lives in high-bandwidth storage like NVMe SSDs, with only the actively computing layer residing in GPU memory at any moment.&lt;/P&gt;
&lt;P&gt;This architectural inversion exploits the sequential nature of neural network computation: forward and backward passes process layers in deterministic order, creating predictable access patterns amenable to prefetching and streaming. The approach transforms the memory constraint from a capacity problem into a bandwidth problem where success depends on streaming layers between storage and GPU fast enough to keep compute units saturated.&lt;/P&gt;
&lt;P&gt;Modern NVMe SSDs provide 7-14 GB/s sequential read bandwidth, while PCIe 4.0 x16 offers 32 GB/s bidirectional throughput, creating sufficient headroom for layer streaming when properly pipelined. This memory hierarchy is the basis for a fundamental architectural shift, through which MegaTrain moves from memory-bound scaling to bandwidth-bound scaling.&lt;/P&gt;
&lt;H1&gt;MegaTrain Architecture and Core Mechanisms&lt;/H1&gt;
&lt;P&gt;MegaTrain implements memory hierarchy inversion through four interlocking mechanisms that together enable efficient single-GPU training of massive models. The architecture maintains model parameters, optimizer states, and gradients in host memory or NVMe storage, streaming only the actively computing layer into GPU memory. Each training iteration processes layers sequentially: during the forward pass, layers stream from storage to GPU, compute activations, then stream back to make room for the next layer. The backward pass reverses this flow, streaming layers in reverse order to compute gradients. Critically, MegaTrain maintains minimal GPU memory footprint by immediately evicting each layer after computation rather than accumulating them.&lt;/P&gt;
&lt;P&gt;The system achieves this through careful orchestration of four core techniques that transform naive layer streaming from a theoretical possibility into a practical training method.&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;Layer-wise streaming&lt;/STRONG&gt; provides the foundation by decomposing model computation into independent sequential operations.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Pipelined execution&lt;/STRONG&gt; overlaps data movement with computation to hide transfer latency.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Block-wise recomputation&lt;/STRONG&gt; trades redundant forward passes for reduced activation memory.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Stateless execution&lt;/STRONG&gt; eliminates persistent GPU state to maximize available memory for active computation.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Together, these techniques enable training throughput within 2-5x of conventional in-memory training while supporting models 10-100x larger than GPU memory capacity. Let’s look at each of these techniques in detail.&lt;/P&gt;
&lt;H1&gt;Layer-Wise Streaming: Sequential Model Decomposition&lt;/H1&gt;
&lt;P&gt;Layer-wise streaming exploits the inherently sequential structure of neural network computation to decompose model execution into memory-independent stages. Each transformer layer performs a self-contained computation: it receives input activations, applies attention and feedforward transformations, and produces output activations for the next layer. MegaTrain leverages this modularity by loading exactly one layer's parameters into GPU memory, computing its forward pass, storing the output activations, then immediately evicting the layer to make room for the next. During backpropagation, the process reverses: layers stream in reverse order, recompute their forward pass from stored input activations, compute gradients, and stream back to storage.&lt;/P&gt;
&lt;P&gt;This approach reduces peak GPU memory from the sum of all layer parameters to the size of the single largest layer plus activation storage. For a 100-billion parameter model with 80 transformer layers, each layer contains approximately 1.25 billion parameters requiring 2.5GB in FP16, compared to 200GB for the full model. The memory savings enable training on GPUs that would otherwise be incapable of holding even a fraction of the model.&lt;/P&gt;
&lt;P&gt;Layer-wise streaming does introduce computational overhead: each layer must be loaded and evicted during both forward and backward passes, creating 4x the parameter transfer volume compared to conventional training. However, modern interconnect bandwidth and careful prefetching largely mitigate this overhead when properly pipelined.&lt;/P&gt;
&lt;H1&gt;Pipelined Execution: Overlapping Transfer and Compute&lt;/H1&gt;
&lt;P&gt;Pipelined execution hides layer streaming latency by overlapping data transfers with GPU computation, ensuring compute units remain saturated despite continuous parameter movement. MegaTrain maintains a three-stage pipeline: while the GPU computes layer N, the system prefetches layer N+1 from storage to host memory and simultaneously evicts layer N-1 back to storage. This overlapping execution exploits the independence of CPU-GPU data transfers and GPU computation in modern architectures.&lt;/P&gt;
&lt;P&gt;PCIe transfers and GPU kernels execute concurrently when properly scheduled using asynchronous CUDA streams, allowing near-complete latency hiding if transfer time does not exceed computation time. For typical transformer layers, forward pass computation takes 50-200ms depending on batch size and sequence length, while transferring a 2.5GB layer over PCIe 4.0 x16 at 16 GB/s requires only 156ms, fitting comfortably within the computation window.&lt;/P&gt;
&lt;P&gt;The pipeline achieves optimal efficiency when transfer bandwidth and computation throughput are balanced, creating a predictable performance model based on layer size, batch configuration, and interconnect speed. The three-stage pipeline pattern - prefetch, compute, and eviction - interleaves to maintain continuous GPU utilization, as illustrated in the diagram above.&lt;/P&gt;
&lt;H1&gt;Block-Wise Recomputation&lt;/H1&gt;
&lt;P&gt;Block-wise recomputation addresses the activation memory bottleneck by trading computation for memory through selective forward pass recalculation. In standard training, all intermediate activations from the forward pass must be retained for gradient computation during backpropagation, consuming memory proportional to model depth and batch size. MegaTrain instead stores only a subset of checkpoint activations at regular intervals, recomputing intermediate values on-demand during the backward pass. For a model with 80 layers, storing checkpoints every 10 layers reduces activation memory by 90 percent at the cost of recomputing 9 out of 10 layers during backpropagation. This trade-off proves favorable because forward pass computation is relatively cheap compared to the memory savings, and recomputation can be pipelined with layer streaming.&lt;/P&gt;
&lt;H1&gt;Stateless Execution&lt;/H1&gt;
&lt;P&gt;Stateless execution complements recomputation by eliminating all persistent GPU state between layer computations. After processing each layer, MegaTrain immediately evicts not only parameters but also optimizer states like momentum buffers and variance estimates, keeping only the minimal activations needed for gradient flow. This aggressive state eviction maximizes available GPU memory for the active layer's computation, enabling larger batch sizes and longer sequences within fixed memory budgets. Together, block-wise recomputation and stateless execution transform MegaTrain's memory footprint from &lt;EM&gt;O&lt;/EM&gt;(model_size) to &lt;EM&gt;O&lt;/EM&gt;(largest_layer + checkpoint_activations), typically reducing requirements by 50-100x for models in the 100-billion parameter range.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;W- Weight prefetch, F/R/B – Computation, G – Gradient offload&lt;/P&gt;
&lt;P&gt;Source:&amp;nbsp; Yuan, Zhengqing, et al.&amp;nbsp; &lt;A href="https://arxiv.org/html/2604.05091v1" target="_blank"&gt;MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H1&gt;Performance Results and Achievements&lt;/H1&gt;
&lt;P&gt;MegaTrain demonstrates practical viability by training models far exceeding GPU memory capacity with acceptable performance overhead. The paper reports an implementation that successfully trained a 175-billion parameter model on a single NVIDIA A100 GPU with 80GB memory.&amp;nbsp; Using conventional techniques, that same hardware configuration would support only 15-20 billion parameters. When using this conventional configuration as the baseline to compare to, MegaTrain throughput reached 45 percent of the baseline in-memory training speed, translating to approximately 2.2x wallclock time for equivalent training steps. For smaller models at 30-billion parameters, throughput exceeded 65 percent of baseline as the ratio of computation to transfer improved.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;The chart demonstrates the inverse relationship between model size and throughput efficiency. As model parameters increase from 30 billion to 175 billion, throughput degrades linearly from 65% to 45% of baseline, validating the predictable performance characteristics of the streaming approach. This degradation reflects the increasing ratio of layer count to computation time as models grow larger.&lt;/P&gt;
&lt;P&gt;Memory efficiency gains are substantial: peak GPU memory utilization remained under 40GB throughout training regardless of model size, with the remainder allocated to batch size expansion that partially offset streaming overhead. The system sustained 12-14 GB/s effective storage bandwidth during training, approaching theoretical NVMe limits and validating the pipelined streaming approach.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;The memory utilization chart reveals MegaTrain's fundamental breakthrough: GPU memory consumption remains nearly constant across vastly different model sizes. While conventional training would require proportionally more GPU memory as models grow, MegaTrain keeps peak utilization between 35GB and 40GB regardless of whether the model has 30 billion or 175 billion parameters. This flat memory profile enables single-GPU training of models that would otherwise require expensive multi-GPU configurations.&lt;/P&gt;
&lt;P&gt;Scaling analysis revealed predictable performance characteristics: throughput degraded linearly with model size as layer count increased, while improving with batch size as computation became more dominant relative to fixed transfer costs. Energy efficiency comparisons showed single-GPU MegaTrain consuming 85 percent less power than equivalent 8-GPU distributed training for the same model, though at 2.2x longer duration resulting in 40 percent total energy savings.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H1&gt;Running MegaTrain on Azure NC-Series GPUs&lt;/H1&gt;
&lt;P&gt;Azure's NC-series GPU virtual machines provide diverse configurations for MegaTrain deployment, each with distinct performance characteristics shaped by GPU type, interconnect topology, and storage options. The NC A100 v4 series offers NVIDIA A100 GPUs with 80GB memory over PCIe 4.0 interfaces, directly matching the original MegaTrain research configuration. These instances provide 32 GB/s bidirectional PCIe bandwidth sufficient for layer streaming without bottlenecking on parameter transfers for models up to 200-billion parameters. Storage configuration critically impacts MegaTrain performance: Azure Premium SSD v2 delivers up to 16 GB/s throughput with appropriate provisioning, while local NVMe SSDs on storage-optimized instances can reach 7-14 GB/s depending on VM size. The NC series of VMs provides a large amount of system memory, local NVMe storage and GPUs, creating an ideal environment for MegaTrain's host memory and storage requirements. Importantly, Azure's NC-series instances use PCIe-connected GPUs rather than NVLink-connected configurations, making interconnect bandwidth the primary performance constraint rather than GPU compute capacity. Cost analysis favors MegaTrain for experimentation and medium-scale training: a single NC A100 v4 instance costs ~4 dollars per hour compared ~30 dollars per hour for an 8-GPU NDv4 instance, enabling 8x cost reduction for workloads tolerant of 2-3x longer training duration.&lt;/P&gt;
&lt;P&gt;Code for MegaTrain is available on the project’s Github repository at &lt;A href="https://github.com/DLYuanGod/MegaTrain" target="_blank"&gt;https://github.com/DLYuanGod/MegaTrain&lt;/A&gt;.&amp;nbsp; Supported models at the time of writing include Qwen 2/2.5/3/3.5/3.5 MoE, Llama 2/3/3.1/3.2/3.3/4, Mistral, Mixtral, DeepSeek Code/R1, Phi-3/4, Gemma 2/3, as well as some others.&lt;/P&gt;
&lt;H1&gt;Critical Constraint: PCIe Bandwidth vs NVLink&lt;/H1&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The interconnect bandwidth hierarchy creates fundamentally different performance envelopes for MegaTrain on Azure compared to specialized research hardware. PCIe 4.0 x16 connections on Azure NC-series instances provide 32 GB/s bidirectional bandwidth between CPU and GPU.&lt;/P&gt;
&lt;P&gt;For MegaTrain's layer streaming workload, CPU-GPU bandwidth matters most since model parameters flow from host memory or NVMe storage through PCIe to GPU memory. PCIe 4.0's 32 GB/s bidirectional capacity proves sufficient for models up to 200-billion parameters when combined with aggressive prefetching and eviction pipelining but becomes a hard bottleneck for larger models where layer transfer time exceeds computation time.&lt;/P&gt;
&lt;H1&gt;Trade-Offs and Limitations&lt;/H1&gt;
&lt;P&gt;MegaTrain introduces several trade-offs that practitioners must evaluate against infrastructure constraints and training objectives. Training throughput degrades to 40-65 percent of baseline depending on model size, extending wallclock training time by 1.5-2.5x for equivalent iteration counts. This slowdown proves acceptable for research experimentation and low-frequency retraining but may be prohibitive for production pipelines requiring rapid iteration.&lt;/P&gt;
&lt;P&gt;Memory efficiency gains come at the cost of increased storage I/O: a full training run generates 4x the parameter transfer volume compared to conventional training due to bidirectional streaming during forward and backward passes. This I/O amplification accelerates SSD wear and may impact cost for cloud storage with throughput-based pricing.&lt;/P&gt;
&lt;H1&gt;Implications for AI Infrastructure and Azure Strategy&lt;/H1&gt;
&lt;P&gt;MegaTrain's viability reshapes infrastructure planning assumptions for AI training workloads, particularly in cost-constrained environments where model size exceeds available GPU memory. Organizations can defer expensive hardware upgrades by exploiting existing GPU capacity more fully through memory hierarchy inversion, extending the useful life of current hardware generations. Cloud platforms like Azure benefit from increased flexibility in instance sizing: users can select GPU types based on compute requirements rather than being forced into high-memory configurations solely for capacity. This decoupling enables better price-performance optimization by matching compute intensity to GPU type while relying on storage for capacity scaling. The approach particularly suits Azure's NC-series positioning as a cost-effective alternative to specialized AI instances, turning PCIe bandwidth into an acceptable constraint rather than a disqualifying limitation. Research teams gain the ability to prototype and validate large model architectures without provisioning expensive multi-GPU clusters, accelerating iteration during early development phases.&lt;/P&gt;
&lt;P&gt;However, production deployment strategies should carefully evaluate MegaTrain's throughput trade-offs against distributed training alternatives: the 2-3x slowdown may be unacceptable for latency-sensitive pipelines despite cost savings. Infrastructure teams should prioritize NVMe storage provisioning and PCIe 4.0/5.0 support (for example, by using newer GPU types such as the H100) when deploying MegaTrain-compatible environments, as storage bandwidth directly determines achievable performance. Long-term, the techniques validate a broader trend toward memory-disaggregated computing where storage, memory, and compute scale independently rather than in fixed ratios determined by hardware packages.&lt;/P&gt;</description>
      <pubDate>Fri, 29 May 2026 06:08:50 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/training-100b-models-on-a-single-gpu-what-megatrain-changes-and/ba-p/4519562</guid>
      <dc:creator>yuvmaz</dc:creator>
      <dc:date>2026-05-29T06:08:50Z</dc:date>
    </item>
    <item>
      <title>AI Infrastructure Preflight at User space: Validating Multi Node, Multi GPU Slurm Clusters</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/ai-infrastructure-preflight-at-user-space-validating-multi-node/ba-p/4522284</link>
      <description>&lt;P&gt;Every team that operates GPU clusters for AI has seen this pattern. The cluster boots, GPUs are visible, and scheduling works at a basic level. Then the first distributed training run stalls in NCCL initialization, fails during rank rendezvous, or silently maps ranks to the wrong devices. The issue is often not in training code. It is in infrastructure consistency across scheduler, runtime, drivers, networking, and process topology.&lt;/P&gt;
&lt;P&gt;The goal of ai-infra-validator is straightforward:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Run a fast user space preflight before expensive training jobs.&lt;/LI&gt;
&lt;LI&gt;Validate distributed initialization for multi node, multi GPU workloads.&lt;/LI&gt;
&lt;LI&gt;Confirm GPU affinity and rank mapping are correct.&lt;/LI&gt;
&lt;LI&gt;Verify NCCL communication fabric can complete a collective ring under Slurm.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This post walks through the implementation in detail, explains why each part exists, and shows how to operationalize it in real HPC AI environments.&lt;/P&gt;
&lt;H5&gt;&lt;STRONG&gt;What the project validates&lt;/STRONG&gt;&lt;/H5&gt;
&lt;P&gt;Zero-dependency user space smoke test for AI clusters. Validates multi-node PyTorch DDP initialization, GPU affinity, and NCCL fabric connectivity under Slurm orchestration.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Git Repo: &lt;A class="lia-external-url" href="https://github.com/vinil-v/ai-cluster-validator" target="_blank"&gt;ai-cluster-validator&lt;/A&gt;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;In practical terms, this checks that:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Slurm launches the expected number of ranks per node.&lt;/LI&gt;
&lt;LI&gt;Distributed process group creation with NCCL succeeds.&lt;/LI&gt;
&lt;LI&gt;Each rank binds to the expected local GPU.&lt;/LI&gt;
&lt;LI&gt;Cross-rank all-reduce completes and converges.&lt;/LI&gt;
&lt;LI&gt;Node level telemetry confirms software and fabric state.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;This is not a performance benchmark. It is a correctness and readiness gate.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Tested platform profile&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;th&gt;Component&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CycleCloud&lt;/td&gt;&lt;td&gt;8.8.3-3667&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Slurm&lt;/td&gt;&lt;td&gt;25.05.5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Slurm partition&lt;/td&gt;&lt;td&gt;hpc&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Scheduler VM SKU&lt;/td&gt;&lt;td&gt;Standard_D8s_v6&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Compute VM SKU&lt;/td&gt;&lt;td&gt;Standard_ND96asr_v4&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;OS images&lt;/td&gt;&lt;td&gt;microsoft-dsvm:ubuntu-hpc:2204:latest and microsoft-dsvm:ubuntu-hpc:2404:latest&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;PyTorch&lt;/td&gt;&lt;td&gt;2.12.0+cu130&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;CUDA runtime&lt;/td&gt;&lt;td&gt;13.0&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;NCCL target&lt;/td&gt;&lt;td&gt;2.29.7&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;This profile represents a common enterprise scenario where scheduler and compute nodes have different roles, and the training fleet depends on correct multi node orchestration.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Minimal user space bootstrap&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The bootstrap script creates a shared Python environment at&lt;STRONG&gt; /shared/apps/pytorch_env &lt;/STRONG&gt;and installs the required packages:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;torch&lt;/LI&gt;
&lt;LI&gt;torchvision&lt;/LI&gt;
&lt;LI&gt;torchaudio&lt;/LI&gt;
&lt;LI&gt;psutil&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This choice is intentional:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No dependency on containers for first-pass validation.&lt;/LI&gt;
&lt;LI&gt;Single environment path visible to all compute nodes.&lt;/LI&gt;
&lt;LI&gt;Rapid setup and repeatability for cluster operators.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Command sequence:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;LI-CODE lang="bash"&gt;git clone https://github.com/vinil-v/ai-cluster-validator.git
cd ai-cluster-validator
sudo bash bootstrap_env.sh&lt;/LI-CODE&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 2: Slurm job defines deterministic distributed topology&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;The Slurm script expresses a clear topology contract:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;nodes=2&lt;/LI&gt;
&lt;LI&gt;ntasks-per-node=8&lt;/LI&gt;
&lt;LI&gt;gpus-per-node=8&lt;/LI&gt;
&lt;LI&gt;cpus-per-task=12&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;From this, world size is derived as:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;WORLD_SIZE = SLURM_NTASKS = 2 x 8 = 16&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The script also configures network and NCCL behavior:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;NCCL_DEBUG=WARN&lt;/LI&gt;
&lt;LI&gt;NCCL_IB_DISABLE=0&lt;/LI&gt;
&lt;LI&gt;NCCL_P2P_DISABLE=0&lt;/LI&gt;
&lt;LI&gt;NCCL_IGNORE_CPU_AFFINITY=1&lt;/LI&gt;
&lt;LI&gt;GLOO_SOCKET_IFNAME=eth0&lt;/LI&gt;
&lt;LI&gt;NCCL_SOCKET_IFNAME=eth0&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Important implementation detail:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;MASTER_ADDR is set to the first host in SLURM_JOB_NODELIST.&lt;/LI&gt;
&lt;LI&gt;MASTER_PORT is selected dynamically from the ephemeral range 49152-65535 and falls back to 29500 if needed.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Why this matters:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Reduces port collision risk when jobs run frequently.&lt;/LI&gt;
&lt;LI&gt;Avoids hardcoded rendezvous values that may fail in shared clusters.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Launch path:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;LI-CODE lang="bash"&gt;srun --cpu-bind=none bash -c "
source /shared/apps/pytorch_env/bin/activate;
export RANK=$SLURM_PROCID;
export LOCAL_RANK=$SLURM_LOCALID;
python3 ddp_mesh_ping.py
"&lt;/LI-CODE&gt;
&lt;P&gt;The LOCAL_RANK handoff is critical for stable GPU affinity inside each node.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 3: DDP initialization and rank to GPU affinity&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Inside ddp_mesh_ping.py, each process executes:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Parse WORLD_SIZE, RANK, LOCAL_RANK, MASTER_ADDR, MASTER_PORT.&lt;/LI&gt;
&lt;LI&gt;Initialize torch.distributed with backend nccl and TCP init method.&lt;/LI&gt;
&lt;LI&gt;Set CUDA device using LOCAL_RANK.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&lt;STRONG&gt;Core initialization path:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;LI-CODE lang="python"&gt;dist.init_process_group(
        backend="nccl",
        init_method=f"tcp://{master_addr}:{master_port}",
        world_size=world_size,
        rank=rank
)
torch.cuda.set_device(local_rank)&lt;/LI-CODE&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;This validates the minimum distributed contract required by real model training jobs.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 4: Rich node and fabric telemetry in user space&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Each rank collects detailed metadata before the collective test:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Node identity from Slurm and hostname.&lt;/LI&gt;
&lt;LI&gt;GPU model and VRAM from CUDA properties.&lt;/LI&gt;
&lt;LI&gt;System memory via psutil.&lt;/LI&gt;
&lt;LI&gt;CPU model from /proc/cpuinfo.&lt;/LI&gt;
&lt;LI&gt;OS and kernel versions.&lt;/LI&gt;
&lt;LI&gt;NVIDIA driver version from /proc/driver/nvidia/version.&lt;/LI&gt;
&lt;LI&gt;PyTorch, CUDA, and NCCL runtime versions.&lt;/LI&gt;
&lt;LI&gt;InfiniBand device state and link rate from /sys/class/infiniband.&lt;/LI&gt;
&lt;LI&gt;Basic GPU peer access capability via torch.cuda.can_device_access_peer.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;All rank payloads are gathered on rank 0 using dist.gather_object and printed as:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Cluster hardware topology report.&lt;/LI&gt;
&lt;LI&gt;Node environment deep dive.&lt;/LI&gt;
&lt;LI&gt;Network interconnect and fabric status.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This design gives platform teams one artifact that is both operational and diagnostic.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;Step 5: Functional collective validation&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;After telemetry, each rank executes a lightweight DDP compute path:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Build nn.Linear(10,10) on local GPU.&lt;/LI&gt;
&lt;LI&gt;Wrap with DistributedDataParallel.&lt;/LI&gt;
&lt;LI&gt;Perform forward, loss, backward.&lt;/LI&gt;
&lt;LI&gt;Run all_reduce on loss tensor.&lt;/LI&gt;
&lt;LI&gt;Compute global average loss.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Pass condition is explicit in log output:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;SUCCESS: DDP Multi-Node AllReduce Ring Complete!&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;This confirms that process group initialization and collective communication both completed successfully.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;What a successful run looks like&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;&lt;STRONG&gt;Submission:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;LI-CODE lang="bash"&gt;sbatch ddp_smoke_test.slurm
squeue&lt;/LI-CODE&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;&lt;STRONG&gt;Representative outcomes in the log:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Total Execution Ranks: 16&lt;/LI&gt;
&lt;LI&gt;Two nodes with local ranks 0 through 7 on each node&lt;/LI&gt;
&lt;LI&gt;GPU inventory aligned with expected A100 topology&lt;/LI&gt;
&lt;LI&gt;Active InfiniBand HCAs discovered per host&lt;/LI&gt;
&lt;LI&gt;NCCL socket interface set to eth0&lt;/LI&gt;
&lt;LI&gt;Final success marker and computed convergence loss&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;When these markers are present and coherent with expected hardware shape, the cluster is typically ready for distributed training bring-up.&lt;/P&gt;
&lt;H6&gt;&lt;STRONG&gt;How to check the output file&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;The Slurm script writes two artifacts per job:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;ai_infra_smoke_test_&amp;lt;jobid&amp;gt;.log&lt;/LI&gt;
&lt;LI&gt;ai_infra_smoke_test_&amp;lt;jobid&amp;gt;.err&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Use this exact workflow after submission:&lt;/STRONG&gt;&lt;/P&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;LI-CODE lang="bash"&gt;# 1. Submit and capture the job id
sbatch ddp_smoke_test.slurm

# 2. Check job state
squeue -j &amp;lt;jobid&amp;gt;

# 3. Read standard output log
cat ai_infra_smoke_test_&amp;lt;jobid&amp;gt;.log

# 4. Read standard error log
cat ai_infra_smoke_test_&amp;lt;jobid&amp;gt;.err&lt;/LI-CODE&gt;
&lt;PRE&gt;&amp;nbsp;&lt;/PRE&gt;
&lt;P&gt;&lt;STRONG&gt;For stronger validation in automation, also check:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;CODE&gt;Total Execution Ranks&lt;/CODE&gt; equals expected world size.&lt;/LI&gt;
&lt;LI&gt;Both nodes appear in the topology table with local ranks 0 through 7.&lt;/LI&gt;
&lt;LI&gt;NCCL/CUDA/PyTorch versions are present in the node environment section.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H6&gt;&lt;STRONG&gt;Complete reference output&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Use the following full log as a known-good reference from a successful 2-node ND96asr_v4 run.&lt;/P&gt;
&lt;LI-CODE lang="tex"&gt;Master Node IP/Hostname: ddpcluster-hpc-1
Dynamically Assigned Port: 53593
Total Execution Ranks: 16
===============================================================================================
	HPC CLUSTER INTERACTION MONITOR
===============================================================================================
--&amp;gt; Initializing DDP on Master Node : ddpcluster-hpc-1
--&amp;gt; Dynamic Coordination Port     : 53593
--&amp;gt; Target World Cluster Size      : 16 GPUs
-----------------------------------------------------------------------------------------------

===============================================================================================
											 CLUSTER HARDWARE TOPOLOGY REPORT
===============================================================================================
| Rank | Node Name      | Local ID | GPU Model          | VRAM     | Sys Mem   | CPU Cores |
-----------------------------------------------------------------------------------------------
| 0    | ddpcluster-hpc-1 | 0        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 1    | ddpcluster-hpc-1 | 1        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 2    | ddpcluster-hpc-1 | 2        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 3    | ddpcluster-hpc-1 | 3        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 4    | ddpcluster-hpc-1 | 4        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 5    | ddpcluster-hpc-1 | 5        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 6    | ddpcluster-hpc-1 | 6        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 7    | ddpcluster-hpc-1 | 7        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 8    | ddpcluster-hpc-2 | 0        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 9    | ddpcluster-hpc-2 | 1        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 10   | ddpcluster-hpc-2 | 2        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 11   | ddpcluster-hpc-2 | 3        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 12   | ddpcluster-hpc-2 | 4        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 13   | ddpcluster-hpc-2 | 5        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 14   | ddpcluster-hpc-2 | 6        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
| 15   | ddpcluster-hpc-2 | 7        | NVIDIA A100-SXM4-4 | 39.5 GB  | 885.8 GB  | 96 Cores  |
===============================================================================================
											 NODE ENVIRONMENT DEEP DIVE
-----------------------------------------------------------------------------------------------
[ddpcluster-hpc-1] Details:
	--&amp;gt; CPU Microarchitecture : AMD EPYC 7V12 64-Core Processor
	--&amp;gt; Operating System      : Ubuntu 22.04.5 LTS
	--&amp;gt; Kernel Base Version   : 5.15.0-1110-azure
	--&amp;gt; Nvidia Driver Loaded  : 580.126.20
	--&amp;gt; PyTorch Environment   : v2.12.0+cu130
	--&amp;gt; CUDA Runtime Version  : v13.0
	--&amp;gt; NCCL Fabric Target    : v2.29.7
	--&amp;gt; Discovered InfiniBand HCAs:
				- mlx5_an0:1 (4: ACTIVE - 40 Gb/sec (4X QDR))
				- mlx5_ib0:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib1:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib2:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib3:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib4:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib5:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib6:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib7:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
-----------------------------------------------------------------------------------------------
[ddpcluster-hpc-2] Details:
	--&amp;gt; CPU Microarchitecture : AMD EPYC 7V12 64-Core Processor
	--&amp;gt; Operating System      : Ubuntu 22.04.5 LTS
	--&amp;gt; Kernel Base Version   : 5.15.0-1110-azure
	--&amp;gt; Nvidia Driver Loaded  : 580.126.20
	--&amp;gt; PyTorch Environment   : v2.12.0+cu130
	--&amp;gt; CUDA Runtime Version  : v13.0
	--&amp;gt; NCCL Fabric Target    : v2.29.7
	--&amp;gt; Discovered InfiniBand HCAs:
				- mlx5_an0:1 (4: ACTIVE - 40 Gb/sec (4X QDR))
				- mlx5_ib0:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib1:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib2:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib3:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib4:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib5:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib6:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
				- mlx5_ib7:1 (4: ACTIVE - 200 Gb/sec (4X HDR))
-----------------------------------------------------------------------------------------------
										 NETWORK INTERCONNECT &amp;amp; FABRIC STATUS
-----------------------------------------------------------------------------------------------
--&amp;gt; Target Communication Interface (NCCL_SOCKET_IFNAME) : eth0
--&amp;gt; Active Telemetry Tracking Level (NCCL_DEBUG)       : WARN
--&amp;gt; Inter-GPU Topo Link Verification                 : Active (P2P/NVLink Capable)
-----------------------------------------------------------------------------------------------
 SUCCESS: DDP Multi-Node AllReduce Ring Complete!
--&amp;gt; Computed System Verification Convergence Loss    : 1.398719
===============================================================================================&lt;/LI-CODE&gt;
&lt;H6&gt;&lt;STRONG&gt;Why this is effective for platform operations&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;For AI infrastructure teams, this pattern is highly effective because it is:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Fast: can be run after every change window.&lt;/LI&gt;
&lt;LI&gt;Deterministic: same topology contracts every run.&lt;/LI&gt;
&lt;LI&gt;Actionable: output includes enough context for first-level triage.&lt;/LI&gt;
&lt;LI&gt;Low friction: user space only, no heavy control plane dependencies.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;This supports common operating workflows:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Day-0 cluster acceptance.&lt;/LI&gt;
&lt;LI&gt;Day-1 patch validation after driver, kernel, or image changes.&lt;/LI&gt;
&lt;LI&gt;Regression gate in golden image pipelines.&lt;/LI&gt;
&lt;LI&gt;Preflight before large multi node model training jobs.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H6&gt;&lt;STRONG&gt;Practical guidance for extending to larger clusters&lt;/STRONG&gt;&lt;/H6&gt;
&lt;OL&gt;
&lt;LI&gt;Adjust Slurm directives for nodes and tasks per node.&lt;/LI&gt;
&lt;LI&gt;Keep one rank per GPU unless validating alternate placement policy.&lt;/LI&gt;
&lt;LI&gt;Set NCCL_SOCKET_IFNAME and GLOO_SOCKET_IFNAME according to your network policy.&lt;/LI&gt;
&lt;LI&gt;Preserve the dynamic MASTER_PORT logic to avoid static collisions.&lt;/LI&gt;
&lt;LI&gt;Keep the success marker string stable so automation can parse it.&lt;/LI&gt;
&lt;/OL&gt;
&lt;H6&gt;&lt;STRONG&gt;Closing perspective&lt;/STRONG&gt;&lt;/H6&gt;
&lt;P&gt;Most distributed training failures are expensive because they are discovered late. A user space preflight that validates scheduler topology, rank rendezvous, GPU affinity, and NCCL collectives provides a high value guardrail before production starts.&lt;/P&gt;
&lt;P&gt;ai-infra-validator is a practical implementation of that guardrail. It is compact, transparent, and aligned with how real Slurm based AI clusters operate. For teams running multi node multi gpu training at scale, this kind of preflight should be a standard operational gate.&lt;/P&gt;</description>
      <pubDate>Fri, 22 May 2026 12:29:15 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/ai-infrastructure-preflight-at-user-space-validating-multi-node/ba-p/4522284</guid>
      <dc:creator>vinilv</dc:creator>
      <dc:date>2026-05-22T12:29:15Z</dc:date>
    </item>
    <item>
      <title>Distributing model weights to your AI cluster: a faster pre-flight on AKS and Slurm</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/distributing-model-weights-to-your-ai-cluster-a-faster-pre/ba-p/4517294</link>
      <description>&lt;H2 data-line="10"&gt;Why this exists&lt;/H2&gt;
&lt;P data-line="12"&gt;You've provisioned a multi-node GPU cluster on Azure. NDR InfiniBand. The training script is ready. Before any GPU does any useful work, every node needs the&amp;nbsp;&lt;STRONG&gt;same 400 GB model checkpoint&lt;/STRONG&gt;&amp;nbsp;on its local NVMe.&lt;/P&gt;
&lt;P data-line="16"&gt;The naive approach —&amp;nbsp;azcopy&amp;nbsp;(or our Rust equivalent,&amp;nbsp;azcp) on every node, in parallel — has three problems:&lt;/P&gt;
&lt;OL data-line="19"&gt;
&lt;LI data-line="19"&gt;&lt;STRONG&gt;You pay Azure egress N times.&lt;/STRONG&gt;&amp;nbsp;Same bytes, same source account, N separate downloads. Microsoft's published&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/storage/common/scalability-targets-standard-account" data-href="https://learn.microsoft.com/en-us/azure/storage/common/scalability-targets-standard-account" target="_blank"&gt;scalability target for a standard storage account&lt;/A&gt;&amp;nbsp;is 200 Gbps egress (region-dependent). It's a target, not a hard cap — our sharded download peaked at 236 Gb/s on a 16-node run — but if every node is hammering the same account independently you start seeing&amp;nbsp;503 ServerBusy&amp;nbsp;well before the cluster fills up.&lt;/LI&gt;
&lt;LI data-line="26"&gt;&lt;STRONG&gt;It's slower than your fabric should allow.&lt;/STRONG&gt;&amp;nbsp;Per-node Azure throughput tops out around 19-25 Gb/s on a default AKS pod (overlay network, single connection pool). Meanwhile the InfiniBand fabric between those same nodes is sitting idle at&amp;nbsp;&lt;STRONG&gt;400 Gb/s NDR&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;LI data-line="30"&gt;&lt;STRONG&gt;Your job blocks on the slowest node.&lt;/STRONG&gt;&amp;nbsp;Distributed training won't start until rank 0 says "everyone has the data." With per-node downloads, that's whichever node had the unluckiest TCP retry.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="34"&gt;You're paying for a 400 Gb/s fabric and using a 25 Gb/s pipe.&lt;/P&gt;
&lt;H2 data-line="36"&gt;How&amp;nbsp;azcp-cluster&amp;nbsp;does it differently&lt;/H2&gt;
&lt;P data-line="38"&gt;The shape is obvious once you say it:&amp;nbsp;&lt;STRONG&gt;download&amp;nbsp;1/N&amp;nbsp;of the dataset per rank from Azure, then&amp;nbsp;MPI_Ibcast&amp;nbsp;the rest over the fabric&lt;/STRONG&gt;. A sharded download across N nodes — each byte leaves Azure exactly once, instead of once per node — then fan out at fabric speed.&lt;/P&gt;
&lt;P data-line="43"&gt;azcp-cluster is a small MPI binary that does exactly that:&lt;/P&gt;
&lt;img /&gt;
&lt;P data-line="64"&gt;Sharding uses&amp;nbsp;&lt;STRONG&gt;deterministic LPT bin-packing&lt;/STRONG&gt;&amp;nbsp;— files sorted by size descending, greedily assigned to the least-loaded rank — so rank load spread is typically &amp;lt;2% even on lopsided file-size distributions (measured 1.3% on the DeepSeek-R1 checkpoint). Bcast is&amp;nbsp;&lt;STRONG&gt;pipelined&lt;/STRONG&gt;&amp;nbsp;(non-blocking&amp;nbsp;MPI_Ibcast&amp;nbsp;with multiple chunks in flight per file, sized via&amp;nbsp;--bcast-chunk&amp;nbsp;/&amp;nbsp;--bcast-pipeline) and writes are&amp;nbsp;&lt;STRONG&gt;asynchronous&lt;/STRONG&gt;&amp;nbsp;(a dedicated writer thread per rank, decoupled from the MPI receive loop) so NVMe write speed and fabric receive overlap.&lt;/P&gt;
&lt;P data-line="73"&gt;&lt;STRONG&gt;Measured on a 16-node Azure GB300 cluster&lt;/STRONG&gt;, NDR 400 Gb/s, 4× HCA per node, 413 GiB / 524-file checkpoint to per-node NVMe (single example run; reproducer script&amp;nbsp;&lt;A href="https://github.com/edwardsp/azcp/blob/main/tests/cluster_bench.sh" data-href="https://github.com/edwardsp/azcp/blob/main/tests/cluster_bench.sh" target="_blank"&gt;tests/cluster_bench.sh&lt;/A&gt;):&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Stage&lt;/th&gt;&lt;th&gt;Throughput&lt;/th&gt;&lt;th&gt;Wall-clock&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;[download]&amp;nbsp;(Azure → ranks, aggregate)&lt;/td&gt;&lt;td&gt;236 Gb/s&lt;/td&gt;&lt;td&gt;14 s&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;[bcast]&amp;nbsp;(per receiver, RDMA/UCX)&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;99.2 Gb/s&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;33 s&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;STRONG&gt;[total]&amp;nbsp;end-to-end&lt;/STRONG&gt;&lt;/td&gt;&lt;td&gt;—&lt;/td&gt;&lt;td&gt;&lt;STRONG&gt;48 s&lt;/STRONG&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;col style="width: 33.33%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P data-line="84"&gt;The two stages run sequentially (bcast starts after download completes); the extra ~1 s in&amp;nbsp;[total]&amp;nbsp;is&amp;nbsp;[list]&amp;nbsp;+&amp;nbsp;[diff]&amp;nbsp;+&amp;nbsp;[filelist]&amp;nbsp;overhead. Per-node&amp;nbsp;azcp&amp;nbsp;for comparison: ~3-5 minutes wall-clock and&amp;nbsp;&lt;STRONG&gt;16× the egress bill&lt;/STRONG&gt;&amp;nbsp;(the whole 413 GiB transferred 16 times instead of once). It's not just the bill — it's also 16× the demand on the source storage account, which pushes you past the 200 Gbps target into&amp;nbsp;503 ServerBusy&amp;nbsp;territory and the retry storm that comes with it. And that multiplier scales linearly with cluster size: at 64 nodes it's 64×, at 256 nodes it's 256×, while the broadcast version still pays Azure egress exactly once regardless of N.&lt;/P&gt;
&lt;P data-line="96"&gt;Two things had to be true to get there: receivers parallelised the NVMe write path with&amp;nbsp;&lt;STRONG&gt;multiple writer threads&lt;/STRONG&gt;&amp;nbsp;(default 2, flag&amp;nbsp;--bcast-writers), and they opened destination files with&amp;nbsp;&lt;STRONG&gt;O_DIRECT&lt;/STRONG&gt;&amp;nbsp;(default on, opt-out via&amp;nbsp;--no-bcast-direct). With a single buffered writer the path collapses to ~28 Gb/s — page-cache contention, not the fabric, becomes the ceiling. Bypassing the page cache with O_DIRECT and parallelising across two writers per rank lifts that to 99 Gb/s; bumping to tmpfs (no disk in the path at all) shows the fabric ceiling is ~140 Gb/s on this hardware. Full sweep including a&amp;nbsp;--verify&amp;nbsp;end-to-end MD5 check is in&amp;nbsp;&lt;A href="https://github.com/edwardsp/azcp/blob/main/docs/cluster-benchmarks.md" data-href="https://github.com/edwardsp/azcp/blob/main/docs/cluster-benchmarks.md" target="_blank"&gt;docs/cluster-benchmarks.md&lt;/A&gt;.&lt;/P&gt;
&lt;H2 data-line="108"&gt;After the download: the handoff&lt;/H2&gt;
&lt;P data-line="110"&gt;azcp-cluster&amp;nbsp;finishes and every node has the full dataset on local NVMe.&amp;nbsp;&lt;STRONG&gt;Now your training/inference job needs to start, on those same nodes, and read from those same disks.&lt;/STRONG&gt;&amp;nbsp;This is where the platform story diverges.&lt;/P&gt;
&lt;P data-line="115"&gt;On&amp;nbsp;&lt;STRONG&gt;Slurm&lt;/STRONG&gt;, this is one sbatch with two&amp;nbsp;srun&amp;nbsp;steps. Done.&lt;/P&gt;
&lt;P data-line="117"&gt;On&amp;nbsp;&lt;STRONG&gt;AKS&lt;/STRONG&gt;, you have to think about three things — storage backend, sequencing, and quota — and none of them have a single obvious answer yet. Both are below.&lt;/P&gt;
&lt;H2 data-line="123"&gt;Practical: Slurm&lt;/H2&gt;
&lt;P data-line="125"&gt;Slurm makes this easy because the whole pipeline runs inside&amp;nbsp;&lt;STRONG&gt;one sbatch allocation&lt;/STRONG&gt;: nodes don't move between stages, per-node&amp;nbsp;/mnt/nvme&amp;nbsp;persists, and&amp;nbsp;srun&amp;nbsp;is a stage sequencer for free.&lt;/P&gt;
&lt;P data-line="129"&gt;The full example is at&amp;nbsp;&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/slurm/download-then-train.sbatch" data-href="https://github.com/edwardsp/azcp/blob/main/examples/slurm/download-then-train.sbatch" target="_blank"&gt;examples/slurm/download-then-train.sbatch&lt;/A&gt;. The shape:&lt;/P&gt;
&lt;LI-CODE lang="shell"&gt;#SBATCH --nodes=4
#SBATCH --gres=gpu:8
#SBATCH --time=04:00:00

srun --ntasks-per-node=1 mkdir -p /mnt/nvme/dataset

# Stage 1: download + bcast (azcp-cluster image)
srun --mpi=pmix \
     --container-image=/shared/images/azcp-cluster.sqsh \
     --container-mounts=/dev/infiniband:/dev/infiniband,/mnt/nvme:/mnt/nvme \
     azcp-cluster "$SOURCE_URL" /mnt/nvme/dataset \
       --bcast-chunk 512M --bcast-pipeline 16 --compare size

# Stage 2: training (different image, same allocation, same nodes)
srun --container-image=/shared/images/training.sqsh \
     --container-mounts=/mnt/nvme:/mnt/nvme \
     torchrun --nnodes=$SLURM_NNODES --nproc-per-node=8 \
              train.py --model-path /mnt/nvme/dataset
&lt;/LI-CODE&gt;
&lt;P data-line="154"&gt;The snippet above is trimmed for readability. The full launcher (&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/slurm/download-then-train.sbatch" data-href="https://github.com/edwardsp/azcp/blob/main/examples/slurm/download-then-train.sbatch" target="_blank"&gt;examples/slurm/download-then-train.sbatch&lt;/A&gt;) adds the UCX environment (UCX_TLS,&amp;nbsp;UCX_NET_DEVICES,&amp;nbsp;UCX_IB_GID_INDEX),&amp;nbsp;--container-writable,&amp;nbsp;--no-container-entrypoint, and the per-rank&amp;nbsp;torchrun&amp;nbsp;glue — drop those and you'll either fail to launch or silently fall back to TCP.&lt;/P&gt;
&lt;P data-line="162"&gt;Two different containers, two different stages, one allocation. Slurm holds the node reservation across both&amp;nbsp;srun&amp;nbsp;calls, so the per-node NVMe state written by stage 1 is naturally available to stage 2 — no sentinel files, no priority classes, no operator coordination needed.&lt;/P&gt;
&lt;P data-line="167"&gt;For sites without pyxis, swap&amp;nbsp;--container-image=&amp;nbsp;for&amp;nbsp;apptainer exec; see&amp;nbsp;&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/slurm/apptainer.sbatch" data-href="https://github.com/edwardsp/azcp/blob/main/examples/slurm/apptainer.sbatch" target="_blank"&gt;examples/slurm/apptainer.sbatch&lt;/A&gt;.&lt;/P&gt;
&lt;H2 data-line="171"&gt;Practical: AKS&lt;/H2&gt;
&lt;P data-line="173"&gt;Kubernetes has no Slurm-allocation analog out of the box — pods come and go, and the scheduler treats nodes as fungible resource buckets. So the question is: how do you keep the training job on the same nodes the download just populated?&lt;/P&gt;
&lt;P data-line="178"&gt;The cleanest answer, by a wide margin, is to&amp;nbsp;&lt;STRONG&gt;not separate them in the first place&lt;/STRONG&gt;: run both phases inside one MPIJob, with one merged worker image that contains both&amp;nbsp;azcp-cluster&amp;nbsp;and your training stack. The launcher script becomes two&amp;nbsp;mpirun&amp;nbsp;calls, one after the other. The node allocation is held by the MPIJob across both phases — no cross-workload sequencing, no node-pinning glue, no extra controllers.&lt;/P&gt;
&lt;H3 data-line="185"&gt;Adding azcp-cluster to your existing image&lt;/H3&gt;
&lt;P data-line="187"&gt;You almost certainly already have a curated training or inference image — NGC PyTorch, NGC TensorFlow, vLLM, your team's fine-tuning image, whatever. The merged image is just&amp;nbsp;&lt;STRONG&gt;your image plus four&amp;nbsp;COPY&amp;nbsp;lines and one&amp;nbsp;apt install&lt;/STRONG&gt;:&lt;/P&gt;
&lt;LI-CODE lang="docker"&gt;FROM nvcr.io/nvidia/pytorch:24.10-py3 # or whatever you already use

ARG AZCP_VERSION=v0.3.0

COPY --from=ghcr.io/edwardsp/azcp/azcp-cluster:${AZCP_VERSION} \
     /opt/openmpi /opt/azcp/openmpi
COPY --from=ghcr.io/edwardsp/azcp/azcp-cluster:${AZCP_VERSION} \
     /opt/ucx /opt/azcp/ucx
COPY --from=ghcr.io/edwardsp/azcp/azcp-cluster:${AZCP_VERSION} \
     /usr/local/bin/azcp-cluster /usr/local/bin/
COPY --from=ghcr.io/edwardsp/azcp/azcp-cluster:${AZCP_VERSION} \
     /usr/local/bin/azcp /usr/local/bin/
RUN apt-get update &amp;amp;&amp;amp; apt-get install -y openssh-server &amp;amp;&amp;amp; rm -rf /var/lib/apt/lists/*&lt;/LI-CODE&gt;
&lt;P data-line="209"&gt;Three notes on what's in those lines:&lt;/P&gt;
&lt;UL data-line="211"&gt;
&lt;LI data-line="211"&gt;We copy&amp;nbsp;azcp-cluster's bundled Open MPI 4.1.6 + UCX 1.15.0 under&amp;nbsp;/opt/azcp/,&amp;nbsp;&lt;STRONG&gt;not&lt;/STRONG&gt;&amp;nbsp;the default&amp;nbsp;/opt/openmpi. NGC images already ship HPC-X at&amp;nbsp;/opt/hpcx/; vLLM and other inference images sometimes bring their own MPI too. Putting ours at a separate path means we don't clash, and your training stack continues using whatever MPI it was built against.&lt;/LI&gt;
&lt;LI data-line="217"&gt;The launcher script for stage 1 explicitly invokes&amp;nbsp;/opt/azcp/openmpi/bin/mpirun, so phase 1 stays ABI-matched to the binary it was tested with.&lt;/LI&gt;
&lt;LI data-line="220"&gt;mpi-operator's launcher SSHes into worker pods to bootstrap MPI; most training images ship&amp;nbsp;openssh-client&amp;nbsp;but not&amp;nbsp;openssh-server, hence the one apt install. Skip it if your base image already has sshd.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="224"&gt;That's the whole "merged image" cost. A couple of hundred MB of files copied (Open MPI + UCX + the two binaries), no Rust toolchain, no source rebuild, no rebase of your training image.&lt;/P&gt;
&lt;H3 data-line="228"&gt;The two-phase launcher&lt;/H3&gt;
&lt;P data-line="230"&gt;The MPIJob launcher script runs both phases sequentially:&lt;/P&gt;
&lt;LI-CODE lang="shell"&gt;# Phase 1: download + broadcast
mpirun -np "$N" -hostfile /etc/mpi/hostfile \
  -mca pml ucx -x UCX_NET_DEVICES=mlx5_0:1,mlx5_1:1,mlx5_2:1,mlx5_3:1 \
  azcp-cluster "$SOURCE_URL" /mnt/nvme/dataset \
    --bcast-chunk 512M --bcast-pipeline 16 --compare size

# Phase 2: training (mpirun launches torchrun on each node — NCCL, not
# MPI, handles the actual training collectives)
MASTER_ADDR=$(head -1 /etc/mpi/hostfile | awk '{print $1}')
mpirun -np "$N" -hostfile /etc/mpi/hostfile \
  -x MASTER_ADDR="$MASTER_ADDR" -x MASTER_PORT=29500 \
  bash -c '
    torchrun \
      --nnodes='"$N"' --node_rank=$OMPI_COMM_WORLD_RANK \
      --nproc_per_node=8 \
      --master_addr=$MASTER_ADDR --master_port=$MASTER_PORT \
      /opt/train.py --model-path /mnt/nvme/dataset
  '
&lt;/LI-CODE&gt;
&lt;P data-line="253"&gt;The&amp;nbsp;mpirun&amp;nbsp;lines are trimmed to the load-bearing arguments. The real launcher in&amp;nbsp;&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/aks/mpijob-download-then-train.yaml" data-href="https://github.com/edwardsp/azcp/blob/main/examples/aks/mpijob-download-then-train.yaml" target="_blank"&gt;examples/aks/mpijob-download-then-train.yaml&lt;/A&gt;&amp;nbsp;adds&amp;nbsp;--allow-run-as-root,&amp;nbsp;--prefix /opt/azcp/openmpi,&amp;nbsp;-mca plm_rsh_agent ssh,&amp;nbsp;-mca osc ucx,&amp;nbsp;-mca routed direct,&amp;nbsp;-mca coll_hcoll_enable 0, the full UCX environment (UCX_TLS=rc,sm,self,&amp;nbsp;UCX_IB_GID_INDEX=3, …) and the worker pod's&amp;nbsp;IPC_LOCK&amp;nbsp;capability. Without those you either fail to launch or silently fall back to TCP — copy the full manifest, don't retype the snippet.&lt;/P&gt;
&lt;P data-line="264"&gt;Phase 1 uses MPI for what MPI is good at — collective broadcast over RDMA. Phase 2 uses&amp;nbsp;mpirun&amp;nbsp;purely as a process launcher to start&amp;nbsp;torchrun&amp;nbsp;on every node with the right rank derived from&amp;nbsp;$OMPI_COMM_WORLD_RANK; the actual training collectives go through NCCL, same as a plain&amp;nbsp;torchrun&amp;nbsp;Job. If your workload is genuinely MPI-native (Megatron-MPI, Horovod, DeepSpeed-MPI), drop the&amp;nbsp;torchrun&amp;nbsp;wrapper and&amp;nbsp;mpirun&amp;nbsp;your binary directly.&lt;/P&gt;
&lt;P data-line="272"&gt;Authentication uses whatever&amp;nbsp;azcp&amp;nbsp;already supports (workload identity on AKS, IMDS on plain VMs, az-cli token, SAS, shared key); the MPIJob example wires up workload identity by setting&amp;nbsp;AZURE_CLIENT_ID&amp;nbsp;to a user-assigned managed identity that has been granted&amp;nbsp;&lt;STRONG&gt;Storage Blob Data Reader&lt;/STRONG&gt;&amp;nbsp;on the source account. Adding the identity and the role assignment is a one-time setup step (az identity create&amp;nbsp;+&amp;nbsp;az role assignment create) and is the same work you'd do for any other AKS workload that talks to blob storage.&lt;/P&gt;
&lt;P data-line="281"&gt;Dockerfile:&amp;nbsp;&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/aks/Dockerfile.training" data-href="https://github.com/edwardsp/azcp/blob/main/examples/aks/Dockerfile.training" target="_blank"&gt;examples/aks/Dockerfile.training&lt;/A&gt;. Storage is&amp;nbsp;hostPath&amp;nbsp;on the node's local NVMe — it persists naturally across the two phases (and across re-runs, where&amp;nbsp;azcp-cluster --compare size&amp;nbsp;skips already-present files for free).&lt;/P&gt;
&lt;H3 data-line="287"&gt;When this isn't the right pattern&lt;/H3&gt;
&lt;P data-line="289"&gt;The merged image works for distributed training, fine-tuning, eval, batch inference, and anything else you'd normally launch with&amp;nbsp;torchrun&amp;nbsp;or&amp;nbsp;mpirun. It does not fit two cases:&lt;/P&gt;
&lt;UL data-line="293"&gt;
&lt;LI data-line="293"&gt;&lt;STRONG&gt;Long-running serving&lt;/STRONG&gt;&amp;nbsp;(vLLM, Ray, Deployment-shaped consumers). These load weights once and serve for hours or days; the per-node download cost is negligible amortized over the lifetime. Use a plain init container with&amp;nbsp;azcp&amp;nbsp;per pod and don't bother with broadcast.&lt;/LI&gt;
&lt;LI data-line="297"&gt;&lt;STRONG&gt;The consumer must be a different CRD entirely&lt;/STRONG&gt;, owned by another team, not changeable. Then you really do need two separate workloads, and you have a scheduling problem.&lt;/LI&gt;
&lt;/UL&gt;
&lt;H3 data-line="301"&gt;If you can't merge images&lt;/H3&gt;
&lt;P data-line="303"&gt;If the consumer genuinely can't share an image with&amp;nbsp;azcp-cluster, the Kubernetes scheduler is suddenly responsible for two non-trivial things:&lt;/P&gt;
&lt;P data-line="307"&gt;&lt;STRONG&gt;Same-nodes pinning.&lt;/STRONG&gt;&amp;nbsp;Kubernetes has no native "place this workload on the same nodes another workload ran on" primitive. You either dedicate a nodepool sized exactly to the job (so there's only one valid placement and the pool sits idle between runs) or you label nodes from the download pods, which needs cluster-level RBAC and per-run cleanup.&lt;/P&gt;
&lt;P data-line="314"&gt;&lt;STRONG&gt;Gap-window starvation.&lt;/STRONG&gt;&amp;nbsp;When the download workload completes, its GPU and InfiniBand quota is released. On a busy multi-tenant cluster, another tenant's workload can race in and grab those nodes before your consumer is admitted — at which point the consumer queues indefinitely waiting for nodes it can never get back. Workload-level controllers like Kueue admit one workload at a time and don't reserve quota for "the next workload from the same submitter."&lt;/P&gt;
&lt;P data-line="322"&gt;&lt;STRONG&gt;Reach for an initContainer first.&lt;/STRONG&gt;&amp;nbsp;Both problems vanish if you collapse the two workloads back into one Kubernetes object without merging images. Run a plain Indexed&amp;nbsp;batch/v1.Job&amp;nbsp;where each pod has an&amp;nbsp;&lt;STRONG&gt;initContainer&lt;/STRONG&gt;&amp;nbsp;running&amp;nbsp;azcp-cluster&amp;nbsp;(broadcasting the dataset to per-node NVMe over RDMA) and a&amp;nbsp;&lt;STRONG&gt;main container&lt;/STRONG&gt;&amp;nbsp;running the consumer against&amp;nbsp;/mnt/nvme. Two different images, same pod, init-then-main ordering enforced structurally by Kubernetes. No node pinning needed (same pod ⇒ same node). No gap window (same pod ⇒ shared admission). No new CRDs. The cost is ~80 lines of bash to hand-roll the MPI bootstrap inside the initContainer (sshd, hostfile, mpirun) — see&amp;nbsp;&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/aks/init-container-download-then-train.yaml" data-href="https://github.com/edwardsp/azcp/blob/main/examples/aks/init-container-download-then-train.yaml" target="_blank"&gt;examples/aks/init-container-download-then-train.yaml&lt;/A&gt;&amp;nbsp;for a validated reference. This is the second-best AKS option after the merged image, and it's the one to try before any of the heavier options below.&lt;/P&gt;
&lt;P data-line="338"&gt;&lt;STRONG&gt;Heavier options if even that doesn't fit&lt;/STRONG&gt;&amp;nbsp;(multi-stage pipelines, long-running consumers, cross-team CRDs):&lt;/P&gt;
&lt;UL data-line="341"&gt;
&lt;LI data-line="341"&gt;&lt;STRONG&gt;Argo Workflows&lt;/STRONG&gt;&amp;nbsp;can capture the download MPIJob's worker node names as an output parameter and inject them into the consumer's&amp;nbsp;nodeAffinity. Doesn't fully close the gap window — pair with Kueue priorities to minimize it.&lt;/LI&gt;
&lt;LI data-line="345"&gt;&lt;STRONG&gt;Volcano&lt;/STRONG&gt;&amp;nbsp;takes a different approach: a single Volcano Job CRD with internal task dependencies and gang scheduling, sequencing both stages in one admission unit. Cleanest model technically; not in the AKS managed-experience comfort zone.&lt;/LI&gt;
&lt;LI data-line="349"&gt;&lt;STRONG&gt;JobSet + Kueue&lt;/STRONG&gt;&amp;nbsp;wraps both stages in one JobSet (single Kueue admission) with manual node-pinning via shared nodepool exhaustion. Requires installing JobSet (alpha CRD) on top of Kueue and hand-rolling an MPI bootstrap (sshd, hostfile, mpirun) inside an indexed Job, since JobSet wraps&amp;nbsp;batch/v1 Jobs, not&amp;nbsp;MPIJobs.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="355"&gt;Each of these adds at least one extension to the cluster and substantially more YAML. For most distributed training and fine-tuning, the merged-image pattern is strictly simpler — the consumer image grows by four&amp;nbsp;COPY&amp;nbsp;lines and one&amp;nbsp;apt install, and the entire scheduling problem disappears.&lt;/P&gt;
&lt;H2 data-line="363"&gt;Try it&lt;/H2&gt;
&lt;LI-CODE lang="shell"&gt;docker pull ghcr.io/edwardsp/azcp/azcp-cluster:v0.3.0&lt;/LI-CODE&gt;
&lt;P data-line="369"&gt;Multi-arch,&amp;nbsp;linux/amd64&amp;nbsp;and&amp;nbsp;linux/arm64. Source and full docs at&amp;nbsp;&lt;A href="https://github.com/edwardsp/azcp" data-href="https://github.com/edwardsp/azcp" target="_blank"&gt;github.com/edwardsp/azcp&lt;/A&gt;.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;If you want to...&lt;/th&gt;&lt;th&gt;Start here&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Run the standalone download on AKS (no downstream sequencing)&lt;/td&gt;&lt;td&gt;&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/aks/mpi-operator-job.yaml" data-href="https://github.com/edwardsp/azcp/blob/main/examples/aks/mpi-operator-job.yaml" target="_blank"&gt;examples/aks/mpi-operator-job.yaml&lt;/A&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Wire download + training together on AKS (merged image)&lt;/td&gt;&lt;td&gt;&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/aks/mpijob-download-then-train.yaml" data-href="https://github.com/edwardsp/azcp/blob/main/examples/aks/mpijob-download-then-train.yaml" target="_blank"&gt;examples/aks/mpijob-download-then-train.yaml&lt;/A&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Wire download + training together on AKS (separate images, no operator)&lt;/td&gt;&lt;td&gt;&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/aks/init-container-download-then-train.yaml" data-href="https://github.com/edwardsp/azcp/blob/main/examples/aks/init-container-download-then-train.yaml" target="_blank"&gt;examples/aks/init-container-download-then-train.yaml&lt;/A&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Run the standalone download on Slurm&lt;/td&gt;&lt;td&gt;&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/slurm/azcp-cluster.sbatch" data-href="https://github.com/edwardsp/azcp/blob/main/examples/slurm/azcp-cluster.sbatch" target="_blank"&gt;examples/slurm/azcp-cluster.sbatch&lt;/A&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Wire download + training together on Slurm&lt;/td&gt;&lt;td&gt;&lt;A href="https://github.com/edwardsp/azcp/blob/main/examples/slurm/download-then-train.sbatch" data-href="https://github.com/edwardsp/azcp/blob/main/examples/slurm/download-then-train.sbatch" target="_blank"&gt;examples/slurm/download-then-train.sbatch&lt;/A&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Reproduce the benchmark on your own cluster&lt;/td&gt;&lt;td&gt;&lt;A href="https://github.com/edwardsp/azcp/blob/main/tests/cluster_bench.sh" data-href="https://github.com/edwardsp/azcp/blob/main/tests/cluster_bench.sh" target="_blank"&gt;tests/cluster_bench.sh&lt;/A&gt;&amp;nbsp;+&amp;nbsp;&lt;A href="https://github.com/edwardsp/azcp/blob/main/docs/cluster-benchmarks.md" data-href="https://github.com/edwardsp/azcp/blob/main/docs/cluster-benchmarks.md" target="_blank"&gt;docs/cluster-benchmarks.md&lt;/A&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 50.00%" /&gt;&lt;col style="width: 50.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H2 data-line="381"&gt;What's next&lt;/H2&gt;
&lt;P data-line="383"&gt;A few things still on the list:&lt;/P&gt;
&lt;UL data-line="385"&gt;
&lt;LI data-line="385"&gt;&lt;STRONG&gt;Multi-rail UCX bcast.&lt;/STRONG&gt;&amp;nbsp;The 99 Gb/s per-receiver number is the single-rail&amp;nbsp;MPI_Ibcast&amp;nbsp;ceiling; the GB300 nodes have 4 HCAs sitting there. Splitting the broadcast across rails (parallel&amp;nbsp;Ibcasts on disjoint rail subsets, or a custom multi-tree implementation — not just a tuning flag) would push past the ~140 Gb/s tmpfs ceiling we see today.&lt;/LI&gt;
&lt;LI data-line="391"&gt;&lt;STRONG&gt;Bcast tuning autodetect.&lt;/STRONG&gt;&amp;nbsp;Chunk/pipeline defaults are conservative; on NDR fabrics the right answer is&amp;nbsp;--bcast-chunk 512M --bcast-pipeline 16. Probing at startup would be more friendly than asking users to read the tuning doc.&lt;/LI&gt;
&lt;LI data-line="395"&gt;&lt;STRONG&gt;A first-class AKS sequencing story.&lt;/STRONG&gt;&amp;nbsp;Per the gap above.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="397"&gt;Issues, PRs, and "hey we tried this and it broke on $cluster" reports all welcome at&amp;nbsp;&lt;A href="https://github.com/edwardsp/azcp" data-href="https://github.com/edwardsp/azcp" target="_blank"&gt;github.com/edwardsp/azcp&lt;/A&gt;.&lt;/P&gt;</description>
      <pubDate>Wed, 06 May 2026 18:30:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/distributing-model-weights-to-your-ai-cluster-a-faster-pre/ba-p/4517294</guid>
      <dc:creator>pauledwards</dc:creator>
      <dc:date>2026-05-06T18:30:00Z</dc:date>
    </item>
    <item>
      <title>Building resilient networks for AI supercomputers</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/building-resilient-networks-for-ai-supercomputers/ba-p/4516919</link>
      <description>&lt;P&gt;&lt;EM&gt;By &lt;A class="lia-external-url" href="http://www.linkedin.com/in/valeriecutts" target="_blank" rel="noopener"&gt;Valerie Cutts&lt;/A&gt; and &lt;A class="lia-external-url" href="https://www.linkedin.com/in/jithinjosepkl/" target="_blank" rel="noopener"&gt;Jithin Jose&lt;/A&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Last fall we introduced Fairwater, the world’s most powerful AI datacenter. Delivering a system of this scale required rethinking how Azure designs supercomputers, especially the scale-out network. Today, we are sharing more about the networking innovations that have made Fairwater possible.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In this post, we share what’s unique about networking at extreme GPU scale and the system-level design choices required to enable large synchronous training jobs to run reliably, even during network failures. We are also publishing, in partnership with others, the open-source&amp;nbsp;&lt;A class="lia-external-url" href="https://www.opencompute.org/documents/ocp-mrc-1-0-pdf" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;Multipath Reliable Connection (MRC)&lt;/STRONG&gt;&lt;/A&gt; specification and software interfaces, and open-sourcing the associated libraries.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Fault-tolerant scale-out networking&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;At extreme scale, synchronous training amplifies the impact of routine network faults, turning packet drops, slow links, and partial failures into stalls, restarts, and wasted GPU time. As we describe in our &lt;A class="lia-external-url" href="https://cdn.openai.com/pdf/resilient-ai-supercomputer-networking-using-mrc-and-srv6.pdf" target="_blank" rel="noopener"&gt;MRC paper&lt;/A&gt;, the path forward is to treat failure as normal and design the network as an integrated system that:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Scales to 100K+ GPUs&lt;/STRONG&gt; using a &lt;STRONG&gt;two level, multi-path topology&lt;/STRONG&gt; to enable enough redundancy&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Balances load evenly &lt;/STRONG&gt;across the fabric to prevent congestion&lt;/LI&gt;
&lt;LI&gt;Recovers &lt;STRONG&gt;predictably and gracefully &lt;/STRONG&gt;during failures&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Uses less power&lt;/STRONG&gt; than three- or four-layer single-plane topologies&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;As outlined in the &lt;A class="lia-external-url" href="https://www.opencompute.org/documents/ocp-mrc-1-0-pdf" target="_blank" rel="noopener"&gt;Multipath Reliable Connection Specification&lt;/A&gt;, Microsoft partnered with &lt;A class="lia-external-url" href="https://www.amd.com/en/blogs/2026/amd-advances-ai-networking-at-scale-with-mrc.html" target="_blank"&gt;AMD&lt;/A&gt;, &lt;A class="lia-external-url" href="https://www.broadcom.com/blog/enabling-ai-networking-scale-with-multi-path-reliable-connections-mrc-" target="_blank" rel="noopener"&gt;Broadcom&lt;/A&gt;, Intel, &lt;A class="lia-external-url" href="https://blogs.nvidia.com/blog/spectrum-x-ethernet-mrc" target="_blank" rel="noopener"&gt;NVIDIA&lt;/A&gt; and &lt;A class="lia-external-url" href="https://openai.com/index/mrc-supercomputer-networking" target="_blank" rel="noopener"&gt;OpenAI&lt;/A&gt; to jointly address this problem, focusing on changes to transport and network design needed to support training at extreme scale. Instead of relying on lossless fabrics and dynamic routing, we collectively designed, built, and deployed Multipath Reliable Connection (MRC), which draws upon lessons from the &lt;A class="lia-external-url" href="https://ultraethernet.org/wp-content/uploads/sites/20/2026/01/UE-Specification-1.0.2-1.pdf" target="_blank" rel="noopener"&gt;Ultra Ethernet Consortium (UEC)&lt;/A&gt;, and paired it with a multiplane network topology, enabling reliable training jobs even when links, switches, or paths fail. &amp;nbsp;The endpoint–driven transport created a simpler, more resilient network that delivers:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;More resilient, predictable training at very large scale&lt;/STRONG&gt;&lt;BR /&gt;Large training jobs continue making steady forward progress despite routine network faults, reducing stalls and restarts and improving time-to-train as cluster size increases.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Better utilization of expensive GPU infrastructure&lt;/STRONG&gt;&lt;BR /&gt;By avoiding tail latency amplification and repeated recovery cycles, GPUs spend more time doing useful work instead of waiting on synchronization or replaying lost computation, improving overall efficiency and cost effectiveness.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Automatic adaptation at machine timescales&lt;/STRONG&gt;&lt;BR /&gt;Failure detection, load balancing, and recovery happen fast enough to keep up with the rate and complexity of faults in 100K+ GPU systems, well beyond what manual intervention or control-plane convergence can achieve, allowing the system to remain stable as scale increases.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;In the Fairwater supercomputer, enabling graceful degradation in the scale-out network improves training throughput versus traditional transports and architectures. In combination with a multi-plane topology design, MRC increases the time that installed NVIDIA GPUs perform useful computation.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;STRONG&gt;A shift in philosophy: End-to-end control&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;The central design decision behind MRC is to shift responsibility for load balancing and failure handling from complex network switch control planes to the network endpoints with end-to-end controls. The network endpoint controls the path selection and can optimally use a set of paths based on feedback from the network.&lt;/P&gt;
&lt;P&gt;MRC extends the RoCE Reliable Connection (RC) transport to support true multipath operation. Instead of binding a queue pair to a single path, MRC sprays packets across many paths simultaneously, making performance far less sensitive to any single slow or failed link. Several design elements are critical to enable end-to-end control.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Every packet carries enough information for the receiver to place it directly into memory, even if packets arrive out of order.&lt;/LI&gt;
&lt;LI&gt;Selective acknowledgments enable rapid retransmission of only the packets that were lost.&lt;/LI&gt;
&lt;LI&gt;Packet trimming signals network congestion swiftly without forcing full packet drops, enabling efficient congestion control.&lt;/LI&gt;
&lt;LI&gt;MRC disables Priority Flow Control (PFC) entirely and runs Ethernet in best-effort mode, avoiding global pauses that can devastate tail latency or lead to fabric-wide deadlock behavior.&lt;/LI&gt;
&lt;LI&gt;The system enables seamless self-recovery from network hardware failures.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The result is a transport protocol that expects loss, adapts quickly, and continues making progress even when parts of the fabric misbehave.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;STRONG&gt;Rethinking topology: Multi‑plane design&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Transport alone is not enough. To complement MRC, we implemented a two-tier, multiplane network topology in Fairwater using high-radix switches. Our network design splits each NIC into multiple lower speed ports (i.e. eight x 100 Gbps) and builds multiple parallel network planes. This multi-plane design enables a more compact topology, as opposed to a traditional three-tier Clos Network running at 800 Gbps/port. Our 2-layer multi-plane topology design offers several advantages:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Enables connecting &lt;STRONG&gt;100K+ GPUs with just two tiers of network.&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Lower latency&lt;/STRONG&gt; since packets traverse fewer switches.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Reduced impact of network issues on overall job completion, &lt;/STRONG&gt;while we see single switches connected to more servers, the individual impact is reduced by spreading the failure, decreasing the performance impact to the overall job.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Reduced hardware and power costs compared to designs with additional network layer&lt;/STRONG&gt;, without compromising on GPU scale.&lt;STRONG&gt; &lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Most importantly, the &lt;STRONG&gt;network becomes more tolerant&lt;/STRONG&gt; of partial failures, so jobs continue with slightly reduced bandwidth rather than failing outright.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Multiplane networks work efficiently only when traffic is evenly distributed across all planes and paths.&lt;/STRONG&gt; This is where MRC’s packet spraying and path aware congestion response is essential.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Figure 1: Example of two-tier multi-plane topology using 512 x 100 GbE switches: 512 T0s x 256 NICs = 131,072 NICs&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Static SRv6: Fewer moving parts, more predictability&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;In many data center networks, switches rely on Border Gateway Protocol (BGP) or other dynamic routing protocols. We removed dynamic routing from our design; instead, packets are source-routed using IPv6 Segment Routing (SRv6). Each packet encodes the end-to-end network path using compact microsegment IDs (uSIDs).&lt;/P&gt;
&lt;P&gt;At first glance, static routing seems counterintuitive in failure-prone environments. At extreme scale, however, dynamic routing is more of a liability than an asset. Namely, if two or more switches try to reroute packets at the same time, network behavior becomes unpredictable and harder to diagnose. Interactions between adaptive routing and adaptive transport can be hard to resolve and harder to debug at larger scale.&lt;/P&gt;
&lt;P&gt;MRC, on the other hand, handles path health and rapid failover at the transport layer. When static routing is used, it enables precise health feedback for each of the different network paths from that network endpoint. Because probe (test) packets follow the same paths as data packets, operators gain accurate, ground-truth insight into fabric health without depending on switch control planes, which are themselves a common source of failures.&lt;/P&gt;
&lt;P&gt;Additionally, SRv6 routing allows network operators to utilize out-of-band monitoring frameworks to accurately identify link failures and device faults, which has been particularly valuable in managing large-scale AI clusters. &amp;nbsp;&lt;STRONG&gt;Static SRv6 ensures paths are deterministic, making problems easier to reproduce, debug and, ultimately, more stable over time.&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Failure as a normal operating condition&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;In production, failures are expected by design—link flaps, partial failures, and even switch reboots are routine at this scale. With MRC, many of these events no longer impact training workloads. Repair actions proceed in parallel, while MRC dynamically routes around failed paths. As repairs complete, MRC discovers and validates the restored paths before seamlessly reintegrating them—entirely transparent to the training application. In summary:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Systems degrade gracefully&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;Losing a NIC port reduces available bandwidth proportional to the lost port but &lt;STRONG&gt;does not crash jobs&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Flapping T0–T1 links often go unnoticed&lt;/STRONG&gt; by applications&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Switches can be rebooted&lt;/STRONG&gt; without coordinated drain or rerouting of the system&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;For massive scale training runs, this &lt;STRONG&gt;translates into higher effective uptime&lt;/STRONG&gt;&lt;STRONG&gt;, fewer interrupted jobs, and more training throughput&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Figure 2: Bidirectional-bandwidth measurements with pt-pt RDMA Perftest while a T0 switch was taken down. Results indicate that the overall bandwidth dropped in proportion to the T0 switch bandwidth, but without failing the job&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Figure 3: Bidirectional-bandwidth measurements with RDMA Perftest while T1 switch is failed and restored. Results indicate that no impact in performance as MRC was able to route around the bad switch&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;STRONG&gt;Measured results at scale&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;This is not just a thought exercise: &lt;STRONG&gt;Microsoft and OpenAI have both run extensive experiments &lt;/STRONG&gt;&lt;STRONG&gt;and record-scale training jobs,&lt;/STRONG&gt; showing only brief, bounded performance dips during significant network faults, followed by rapid recovery. Microbenchmarks demonstrate near line-rate bandwidth and predictable latency, even under injected loss. &lt;A class="lia-external-url" href="https://openai.com/index/mrc-supercomputer-networking" target="_blank" rel="noopener"&gt;OpenAI describes their scale results in a recent blog post&lt;/A&gt;, consistent with what we observe. Taken together, multi-plane MRC with SRv6 delivers better load balancing with fewer queue pairs and substantially higher resilience to packet loss, enabling millions of networking links to connect hundreds of thousands of GPUs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Figure 4: NCCL Send-Recv Benchmark Results with 42,020 GPUs each with 800 Gbps MRC NIC showing up to 92% of theoretical peak bandwidth for large message sizes&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;What this enables&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Taken together, MRC, multiplane topologies, and static SRv6 form a coherent strategy for building AI supercomputer scale-out networks that keep large synchronous training jobs moving forward under real-world fault conditions. Instead of treating loss, link flaps and partial failures as events that trigger stalls or restarts, the system is designed to fail gracefully and reach 100K+ GPUs scale at high utilization. This design approach has been deployed in Fairwater and elsewhere to train state-of-the-art models, where the result is more predictable performance for large jobs with higher effective GPU utilization. The core takeaway is simple: by assuming failures will happen and, designing for them explicitly, events that would otherwise be catastrophic become minor, manageable perturbations.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Join us in advancing resilient AI infrastructure&lt;/STRONG&gt;&lt;BR /&gt;To help the broader ecosystem adopt these capabilities, Microsoft is joining key partners in releasing the MRC specification to the Open Compute Project and open sourcing key components:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/opencomputeproject/OCP-Multipath-Reliable-Connection" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;libMRC&lt;/STRONG&gt;&lt;/A&gt;: MRC transport APIs&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/microsoft/mrc-nccl-plugin" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;NCCL MRC plugin&lt;/STRONG&gt;&lt;/A&gt;: enables NCCL to run over MRC transport&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/microsoft/mrc-verbs-shim-lib" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;MRC shim library&lt;/STRONG&gt;&lt;/A&gt;: enables compatible verbs applications to run over MRC with no code changes&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/microsoft/mscclpp" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;MSCCL++ with MRC support&lt;/STRONG&gt;&lt;/A&gt;: MSCCL++ library with MRC support&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://aka.ms/sonic" target="_blank" rel="noopener"&gt;&lt;STRONG&gt;SONiC SRv6&lt;/STRONG&gt;&lt;/A&gt;:&lt;STRONG&gt; &lt;/STRONG&gt;enhance SRv6 with open NOS for high performance AI Ethernet&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;We encourage others to review these contributions to the public, share feedback, and ultimately adopt these capabilities within the broader ecosystem of AI networking products, infrastructures, and workloads.&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Acknowledgements&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;&lt;SPAN data-teams="true"&gt; Advancing AI at this scale requires collaboration across the industry. At Microsoft, we value our partnerships with AMD, Broadcom, Intel, NVIDIA, and OpenAI, and our shared commitment to continuing to evolve MRC alongside the broader community.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;References: &lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://cdn.openai.com/pdf/resilient-ai-supercomputer-networking-using-mrc-and-srv6.pdf" target="_blank" rel="noopener"&gt;MRC paper: Resilient AI Supercomputer Networking using MRC and SRv6&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://www.opencompute.org/documents/ocp-mrc-1-0-pdf" target="_blank" rel="noopener"&gt;Multipath Reliable Connection Specification&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://openai.com/index/mrc-supercomputer-networking" target="_blank" rel="noopener"&gt;OpenAI MRC blog&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://www.amd.com/en/blogs/2026/amd-advances-ai-networking-at-scale-with-mrc.html" target="_blank" rel="noopener"&gt;AMD MRC blog&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://www.broadcom.com/blog/enabling-ai-networking-scale-with-multi-path-reliable-connections-mrc-" target="_blank" rel="noopener"&gt;Broadcom MRC blog&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://blogs.nvidia.com/blog/spectrum-x-ethernet-mrc" target="_blank" rel="noopener"&gt;NVIDIA MRC Blog&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/opencomputeproject/OCP-Multipath-Reliable-Connection" target="_blank" rel="noopener"&gt;libMRC APIs&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/microsoft/mrc-verbs-shim-lib" target="_blank" rel="noopener"&gt;microsoft/mrc-verbs-shim-lib: shim library to translate ibverbs to libmrc interfaces&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/microsoft/mrc-nccl-plugin" target="_blank" rel="noopener"&gt;microsoft/mrc-nccl-plugin: MRC plugin for NCCL&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A class="lia-external-url" href="https://github.com/microsoft/mscclpp" target="_blank" rel="noopener"&gt;microsoft/mscclpp: MSCCL++: A GPU-driven communication stack for scalable AI applications (with MRC support)&lt;/A&gt;&lt;/LI&gt;
&lt;/OL&gt;</description>
      <pubDate>Wed, 06 May 2026 16:25:26 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/building-resilient-networks-for-ai-supercomputers/ba-p/4516919</guid>
      <dc:creator>jithinjose</dc:creator>
      <dc:date>2026-05-06T16:25:26Z</dc:date>
    </item>
    <item>
      <title>Simplify troubleshooting at scale - Centralized Log Management for CycleCloud Workspace for Slurm</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/simplify-troubleshooting-at-scale-centralized-log-management-for/ba-p/4470658</link>
      <description>&lt;P&gt;Training large AI models on hundreds or thousands of nodes introduces a critical operational challenge: when a distributed job fails, quickly identifying the root cause across scattered logs can become incredibly time-consuming. This manual process delays recovery and reduces cluster utilization. The ability to quickly parse centralized cluster logs from a single interface is critical to ensure job failure root cases are swiftly identified and mitigated to maintain high cluster utilization.&lt;/P&gt;
&lt;H2 aria-level="2"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Solution Architecture&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;This is a turnkey, customizable log forwarding solution for &lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/cyclecloud/overview-ccws?view=cyclecloud-8" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;CycleCloud Workspace for Slurm&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt; that centralizes all cluster logs into &lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-overview?tabs=simple" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Azure Monitor &lt;/SPAN&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Logs Analytic&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;.&amp;nbsp; The architecture uses &lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/azure-monitor/agents/azure-monitor-agent-overview" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Azure Monitor Agent (AMA)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt; deployed on every VM and Virtual Machine Scale Set (VMSS) to stream logs defined by &lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/azure-monitor/data-collection/data-collection-rule-overview" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Data Collection Rules&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt; (DCR) to dedicated tables in a&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-workspace-overview" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Log Analytics workspace&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt; where they can be queried from a single interface.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The turnkey solution captures three categories of logs essential for troubleshooting distributed workloads, but can be extended for any other logs:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Slurm logs&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; including slurmctld, slurmd, etc., plus archived job artifacts (job submission scripts, environmental variables, stdout/stderr) collected via prolog/epilog scripts.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Infrastructure logs&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; including those from CycleCloud including the CycleCloud Healthagent which automatically tests nodes for hardware health and draining nodes that fail tests.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="1" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Operation System logs&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; from syslog and dmesg capturing kernel events, network state changes, and hardware issues.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;Each log source flows through its own DCR into a dedicated table following a consistent schema.&amp;nbsp; The solution automatically associates scheduler-specific DCRs with the Slurm scheduler node and compute-specific DCRs with compute nodes handling dynamic node scaling transparently.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The solution is purpose-built for &lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/cyclecloud/overview-ccws?view=cyclecloud-8" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;CycleCloud Workspace for Slurm&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;, but designed in a modular fashion to be easily extended for new data sources (i.e. new log formats) and processing (i.e. Data Collection Rules) to support log forwarding and analysis of other required logs.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H2 aria-level="2"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Key Benefits&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Time-series correlation: Azure Monitor's time-based indexing enables rapid identification of cascading failures. For example, trace a network carrier flap detected in syslog to corresponding &lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;slurmd&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt; communication errors to specific job failures all within seconds.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Centralized visibility: Query logs from thousands of nodes through a single interface instead of SSH-ing to individual machines. Correlate Slurm controller decisions with node-level errors and system events in one query.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Log persistence: Logs survive node deallocations and reimaging.&amp;nbsp; Critical in cloud environments where compute nodes are ephemeral.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="4" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Powerful query language: KQL (Kusto Query Language) allows parsing raw logs into structured fields, filtering across multiple sources, and building operational dashboards. Example queries detect patterns like repeated job failures, network instability, or resource exhaustion.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="9" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="5" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Production-ready scalability: User-assigned managed identities automatically propagate to new VMSS instances, and DCR associations handle thousands of nodes without manual configuration.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 aria-level="2"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Getting Started&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H2&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;The complete solution is available on &lt;/SPAN&gt;&lt;A href="https://github.com/yosoyjay/slurm-log-collection" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;GitHub (slurm-log-collection)&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt; with deployment scripts that:&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Create all required Log Analytics tables&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Deploys pre-configured DCRs for Slurm, CycleCloud, and OS logs&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="5" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;SPAN data-contrast="auto"&gt;Automatically associate DCRs with scheduler and compute resources&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;After configuring environment variables and running the setup scripts, logs begin flowing to Azure Monitor and will populate within 15 minutes, but n&lt;/SPAN&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;ormal log ingestion latency is ~30s to 3 minutes&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-contrast="auto"&gt;. The repository includes sample KQL queries for common troubleshooting scenarios to accelerate time-to-resolution and to perform non-troubleshooting analysis of cluster usage.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 31 Mar 2026 21:17:04 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/simplify-troubleshooting-at-scale-centralized-log-management-for/ba-p/4470658</guid>
      <dc:creator>jesselopez</dc:creator>
      <dc:date>2026-03-31T21:17:04Z</dc:date>
    </item>
    <item>
      <title>Azure NCv6 Virtual Machines: Enhancements and GA Transition</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-ncv6-virtual-machines-enhancements-and-ga-transition/ba-p/4503578</link>
      <description>&lt;DIV style="font-family: 'Segoe UI', Arial, sans-serif; color: #242424; line-height: 1.6; max-width: 1000px; margin: 0; text-align: left; background-color: #ffffff;"&gt;
&lt;P style="max-width: 900px; font-size: 16px; margin-top: 20px;"&gt;NCv6 Virtual Machines are Azure's flexible, next generation platform enabling both leading-edge graphics and generative AI compute workloads. Featuring NVIDIA RTX PRO 6000 Blackwell Server Edition (BSE) GPUs, Intel Xeon™ 6 "Granite Rapids" 6900P series CPUs, and a suite of Microsoft Azure technologies, NCv6 VMs are available now in &lt;A class="lia-external-url" href="https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR9s7orOb3OJJnwABCNj_8JdUMzlLSzJFTTdRRE8yU0UxWFFYQlpYV1hDVy4u" target="_blank" rel="noopener"&gt;Preview&lt;/A&gt;.&lt;/P&gt;
&lt;P style="font-size: 16px; margin-bottom: 20px;"&gt;Today, we are pleased to share a series of exciting updates coming soon to Azure NCv6 that will:&lt;/P&gt;
&lt;UL style="font-size: 16px; margin-bottom: 30px;"&gt;
&lt;LI&gt;Enhance VM performance and capabilities&lt;/LI&gt;
&lt;LI&gt;Provide more VM sizes for customers to "right size" their usage&lt;/LI&gt;
&lt;LI&gt;Bring NCv6 to production readiness with a transition to General Availability, and&lt;/LI&gt;
&lt;LI&gt;Expand accessibility across the global Azure cloud&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2 style="margin-top: 48px; color: #0078d4;"&gt;New VM Sizes, Features, and Performance Enhancements&lt;/H2&gt;
&lt;P style="margin-bottom: 20px;"&gt;In the coming weeks, Azure will debut seven new NCv6-series VM sizes and two different sub-families for customers to choose from. The standout features introduced with the new VM sizes include:&lt;/P&gt;
&lt;DIV style="display: flex; flex-direction: column; gap: 16px; margin-bottom: 32px;"&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 60px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0;"&gt;&lt;SPAN style="font-size: 20px;"&gt;🧩&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;P style="margin: 0; font-size: 14px; color: #444;"&gt;&lt;STRONG&gt;Fractional GPU support&lt;/STRONG&gt;, enabling graphics workload customers to deploy VMs with as little as 1/2 or 1/4 of a RTX PRO™ 6000. VMs with fractional GPU support also feature reduced vCPU, memory, SSD, and networking to help customers optimize costs and right size their VMs to their workloads.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 60px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0;"&gt;&lt;SPAN style="font-size: 20px;"&gt;⚡&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;P style="margin: 0; font-size: 14px; color: #444;"&gt;&lt;STRONG&gt;Increased vCPU per VM size&lt;/STRONG&gt; (e.g. 288 vCPU instead of 256) to provide more performance for high-end VDI workstations and better align with the Intel Xeon 6900P's triple compute tile architecture.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 60px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0;"&gt;&lt;SPAN style="font-size: 20px;"&gt;🛠️&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;P style="margin: 0; font-size: 14px; color: #444;"&gt;&lt;STRONG&gt;General Purpose and Compute Optimized VM sizes.&lt;/STRONG&gt; The former provides larger amounts of CPU memory for demanding generative AI inference and ISV CAD/CAE simulations, while the latter offers reduced memory to enable customers with less memory intensive workloads to cost optimize their deployments.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P style="margin-bottom: 24px;"&gt;The new VM sizes will &lt;STRONG&gt;replace&lt;/STRONG&gt; the existing three VM sizes offered in Preview, and be available as follows:&lt;/P&gt;
&lt;H3 style="color: #0078d4; margin-bottom: 15px; font-size: 18px;"&gt;NCv6 - General Purpose VM sizes:&lt;/H3&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; overflow: hidden; margin-bottom: 30px;"&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Size Name&lt;/td&gt;&lt;td style="padding: 10px;"&gt;vCPUs&lt;/td&gt;&lt;td style="padding: 10px;"&gt;Memory (GB)&lt;/td&gt;&lt;td style="padding: 10px;"&gt;Networking (Mb/s)&lt;/td&gt;&lt;td style="padding: 10px;"&gt;GPUs&lt;/td&gt;&lt;td style="padding: 10px;"&gt;GPU Mem (GB)&lt;/td&gt;&lt;td style="padding: 10px;"&gt;Temp Disk&lt;/td&gt;&lt;td style="padding: 10px;"&gt;NVMe Disk&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC36ds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;36&lt;/td&gt;&lt;td style="padding: 10px;"&gt;132&lt;/td&gt;&lt;td style="padding: 10px;"&gt;22500&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1/4&lt;/td&gt;&lt;td style="padding: 10px;"&gt;24&lt;/td&gt;&lt;td style="padding: 10px;"&gt;256&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1600&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC72ds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;72&lt;/td&gt;&lt;td style="padding: 10px;"&gt;264&lt;/td&gt;&lt;td style="padding: 10px;"&gt;45000&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1/2&lt;/td&gt;&lt;td style="padding: 10px;"&gt;48&lt;/td&gt;&lt;td style="padding: 10px;"&gt;512&lt;/td&gt;&lt;td style="padding: 10px;"&gt;3200&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC144ds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;144&lt;/td&gt;&lt;td style="padding: 10px;"&gt;516&lt;/td&gt;&lt;td style="padding: 10px;"&gt;90000&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1&lt;/td&gt;&lt;td style="padding: 10px;"&gt;96&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1024&lt;/td&gt;&lt;td style="padding: 10px;"&gt;6400&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC288ds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;288&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1032&lt;/td&gt;&lt;td style="padding: 10px;"&gt;180000&lt;/td&gt;&lt;td style="padding: 10px;"&gt;2&lt;/td&gt;&lt;td style="padding: 10px;"&gt;192&lt;/td&gt;&lt;td style="padding: 10px;"&gt;2048&lt;/td&gt;&lt;td style="padding: 10px;"&gt;12800&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC324ds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;324&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1284&lt;/td&gt;&lt;td style="padding: 10px;"&gt;180000&lt;/td&gt;&lt;td style="padding: 10px;"&gt;2&lt;/td&gt;&lt;td style="padding: 10px;"&gt;192&lt;/td&gt;&lt;td style="padding: 10px;"&gt;2048&lt;/td&gt;&lt;td style="padding: 10px;"&gt;12800&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 28.6574%" /&gt;&lt;col style="width: 10.1201%" /&gt;&lt;col style="width: 10.6212%" /&gt;&lt;col style="width: 11.6232%" /&gt;&lt;col style="width: 9.71944%" /&gt;&lt;col style="width: 9.61924%" /&gt;&lt;col style="width: 9.71944%" /&gt;&lt;col style="width: 10.02%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H3 style="color: #0078d4; margin-bottom: 15px; font-size: 18px;"&gt;NCv6-Compute Optimized VM sizes:&lt;/H3&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; overflow: hidden; margin-bottom: 30px;"&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 100%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Size Name&lt;/td&gt;&lt;td style="padding: 10px;"&gt;vCPUs&lt;/td&gt;&lt;td style="padding: 10px;"&gt;Memory (GB)&lt;/td&gt;&lt;td style="padding: 10px;"&gt;Networking (Mbps)&lt;/td&gt;&lt;td style="padding: 10px;"&gt;GPUs&lt;/td&gt;&lt;td style="padding: 10px;"&gt;GPU Mem (GB)&lt;/td&gt;&lt;td style="padding: 10px;"&gt;Temp Disk&lt;/td&gt;&lt;td style="padding: 10px;"&gt;NVMe Disk&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC24lds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;24&lt;/td&gt;&lt;td style="padding: 10px;"&gt;72&lt;/td&gt;&lt;td style="padding: 10px;"&gt;22500&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1/4&lt;/td&gt;&lt;td style="padding: 10px;"&gt;24&lt;/td&gt;&lt;td style="padding: 10px;"&gt;256&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1600&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC36lds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;36&lt;/td&gt;&lt;td style="padding: 10px;"&gt;72&lt;/td&gt;&lt;td style="padding: 10px;"&gt;22500&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1/4&lt;/td&gt;&lt;td style="padding: 10px;"&gt;24&lt;/td&gt;&lt;td style="padding: 10px;"&gt;256&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1600&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC72lds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;72&lt;/td&gt;&lt;td style="padding: 10px;"&gt;132&lt;/td&gt;&lt;td style="padding: 10px;"&gt;45000&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1/2&lt;/td&gt;&lt;td style="padding: 10px;"&gt;48&lt;/td&gt;&lt;td style="padding: 10px;"&gt;512&lt;/td&gt;&lt;td style="padding: 10px;"&gt;3200&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC144lds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;144&lt;/td&gt;&lt;td style="padding: 10px;"&gt;264&lt;/td&gt;&lt;td style="padding: 10px;"&gt;90000&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1&lt;/td&gt;&lt;td style="padding: 10px;"&gt;96&lt;/td&gt;&lt;td style="padding: 10px;"&gt;1024&lt;/td&gt;&lt;td style="padding: 10px;"&gt;6400&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC288lds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;288&lt;/td&gt;&lt;td style="padding: 10px;"&gt;516&lt;/td&gt;&lt;td style="padding: 10px;"&gt;180000&lt;/td&gt;&lt;td style="padding: 10px;"&gt;2&lt;/td&gt;&lt;td style="padding: 10px;"&gt;192&lt;/td&gt;&lt;td style="padding: 10px;"&gt;2048&lt;/td&gt;&lt;td style="padding: 10px;"&gt;12800&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="padding: 10px;"&gt;Standard_NC324lds_xl_RTXPro6000_v6&lt;/td&gt;&lt;td style="padding: 10px;"&gt;324&lt;/td&gt;&lt;td style="padding: 10px;"&gt;648&lt;/td&gt;&lt;td style="padding: 10px;"&gt;180000&lt;/td&gt;&lt;td style="padding: 10px;"&gt;2&lt;/td&gt;&lt;td style="padding: 10px;"&gt;192&lt;/td&gt;&lt;td style="padding: 10px;"&gt;2048&lt;/td&gt;&lt;td style="padding: 10px;"&gt;12800&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 12.50%" /&gt;&lt;col style="width: 12.50%" /&gt;&lt;col style="width: 12.50%" /&gt;&lt;col style="width: 12.50%" /&gt;&lt;col style="width: 12.50%" /&gt;&lt;col style="width: 12.50%" /&gt;&lt;col style="width: 12.50%" /&gt;&lt;col style="width: 12.50%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P style="font-style: italic; margin-bottom: 48px; font-size: 14px; color: #666;"&gt;Note that, until the new VM sizes are available, Microsoft Learn resources will continue to reflect the currently offered VM sizes and technical specifications.&lt;/P&gt;
&lt;H2 style="margin-top: 48px; color: #0078d4;"&gt;Transition to General Availability&lt;/H2&gt;
&lt;P style="margin-bottom: 20px;"&gt;In the coming weeks, Azure will transition NCv6-series from Preview to General Availability (GA) status. With this transition, NCv6 VMs will become covered by the Azure Service Level Agreement (SLA) and thus ready to support production-grade deployments by customers, partners, and service providers.&lt;/P&gt;
&lt;P style="margin-bottom: 40px;"&gt;When the transition to NCv6 VMs occurs, they will be available in the Azure &lt;STRONG&gt;West US2&lt;/STRONG&gt; and &lt;STRONG&gt;Southeast Asia&lt;/STRONG&gt; regions. Information on availability timing of additional regions is provided below.&lt;/P&gt;
&lt;H2 style="margin-top: 48px; color: #0078d4;"&gt;Regional Expansion Across the Azure Cloud&lt;/H2&gt;
&lt;P style="margin-bottom: 20px;"&gt;At the beginning of Preview, NCv6 VMs debuted in the West US2 region. Since then, we have also added NCv6 VMs to the Southeast Asia region. Both regions will be part of the transition to GA status.&lt;/P&gt;
&lt;P style="margin-bottom: 20px;"&gt;We are pleased to share that in the proceeding months covering &lt;STRONG&gt;Q3 of 2026&lt;/STRONG&gt;, NCv6 VMs will also become available in the following Azure regions:&lt;/P&gt;
&lt;DIV style="display: grid; grid-template-columns: 1fr 1fr; gap: 10px; font-size: 15px; margin-bottom: 48px; background: #f9f9f9; padding: 25px; border-radius: 12px; border: 1px solid #e1e8f0;"&gt;
&lt;DIV&gt;• East US&lt;/DIV&gt;
&lt;DIV&gt;• West Europe&lt;/DIV&gt;
&lt;DIV&gt;• East US 2&lt;/DIV&gt;
&lt;DIV&gt;• North Europe&lt;/DIV&gt;
&lt;DIV&gt;• South Central US&lt;/DIV&gt;
&lt;DIV&gt;• Germany West Central&lt;/DIV&gt;
&lt;DIV&gt;• West US&lt;/DIV&gt;
&lt;DIV&gt;• Korea Central&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="margin-top: 48px; padding: 40px; background: #0078d4; border-radius: 12px; color: #ffffff; text-align: center; box-shadow: 0 4px 12px rgba(0,0,0,0.1);"&gt;
&lt;H2 style="margin: 0 0 10px 0; color: #ffffff; font-size: 24px; border: none;"&gt;Ready to build for the future with Azure NCv6?&lt;/H2&gt;
&lt;P style="margin: 0 0 25px 0; font-size: 16px; opacity: 0.95; max-width: 700px; margin-left: auto; margin-right: auto;"&gt;NCv6 Virtual Machines are available now in Preview. Start your production-grade AI journey today and explore the next frontier of Azure AI infrastructure.&lt;/P&gt;
&lt;DIV style="display: flex; justify-content: center; gap: 15px; flex-wrap: wrap;"&gt;&lt;A class="lia-external-url" style="display: inline-block; background: #ffffff; color: #0078d4; padding: 12px 28px; border-radius: 4px; text-decoration: none; font-weight: bold; font-size: 15px; transition: background 0.2s;" href="https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR9s7orOb3OJJnwABCNj_8JdUMzlLSzJFTTdRRE8yU0UxWFFYQlpYV1hDVy4u" target="_blank" rel="noopener"&gt;&amp;nbsp;Join the Preview &lt;/A&gt;&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 30 Apr 2026 18:00:24 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-ncv6-virtual-machines-enhancements-and-ga-transition/ba-p/4503578</guid>
      <dc:creator>Fernando_Aznar</dc:creator>
      <dc:date>2026-04-30T18:00:24Z</dc:date>
    </item>
    <item>
      <title>AI Inferencing in Air-Gapped Environments</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/ai-inferencing-in-air-gapped-environments/ba-p/4498594</link>
      <description>&lt;P&gt;If you had to point out the top trends of IT these days, two strong candidates would be Generative AI and Cybersecurity. Especially around the latter, sophistication, reach and volume of cyberattacks have seen significant increases in the last years, with added ingredients such as advanced persistent threats, state actors or “crime-as-a-service” providers.&lt;/P&gt;
&lt;P&gt;Interestingly enough, both trends go hand in hand: Artificial Intelligence extracts value from your data, and cyber criminals are exactly after the same thing: your data. It is not surprising that organizations have taken steps to protect themselves against data theft or data exfiltration, as it is often described.&lt;/P&gt;
&lt;P&gt;In this post we will explore how to deploy in a Kubernetes cluster a Hugging Face-hosted model and a NVIDIA NIM™ microservice, a prebuilt, optimized inference container for rapidly deploying the latest AI models, and at the same time protect your infrastructure against data theft. You can find more information about NVIDIA NIM here:&amp;nbsp;&lt;A class="lia-external-url" href="https://developer.nvidia.com/nim" target="_blank" rel="noopener"&gt;https://developer.nvidia.com/nim.&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;We will outline the process for deploying a Kubernetes cluster in Azure with the highest level of network security to prevent data exfiltration, and we will also demonstrate the&amp;nbsp; deployment of container images and required model parameters for both options&lt;/P&gt;
&lt;H2&gt;Why Kubernetes clusters&lt;/H2&gt;
&lt;P&gt;Unless you have been living under a rock, you are probably aware that Kubernetes has taken the IT world by storm, and the AI ecosystem is not an exception. Kubernetes makes it extremely easy to package and deploy applications over any infrastructure and hence it has become one the most popular platforms to run AI workloads, especially AI inferencing.&lt;/P&gt;
&lt;P&gt;Azure Kubernetes Service (AKS) is an Azure service that&amp;nbsp; makes it&amp;nbsp; easy to run Kubernetes clusters in Azure. Over time, AKS has introduced multiple deployment options to meet increasingly stringent requirements, particularly around security. One such option is the private cluster, where no public IP addresses are assigned to the Kubernetes control plane or nodes.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To understand this evolution, let’s have a look at what a “public” AKS cluster looks like:&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 82.8704%; height: 435px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr style="height: 592px;"&gt;&lt;td style="height: 592px;"&gt;&lt;img /&gt;
&lt;P&gt;&lt;EM&gt;Figure 1- public AKS API enabled cluster&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As the previous figure shows, there are multiple traffic flows that go over the public Internet:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;In the bootstrap phase, the nodes get images from Microsoft Container Registry, as well as potentially from other repositories such as Ubuntu.&lt;/LI&gt;
&lt;LI&gt;The Kubernetes administrator operates the cluster accessing the Kubernetes API provided by Microsoft with a public IP address.&lt;/LI&gt;
&lt;LI&gt;When pulling container images, node clusters can get them from publicly available repositories such as Docker hub or Azure Container Registry (if configured to be publicly accessible).&lt;/LI&gt;
&lt;LI&gt;Lastly, administrators are allowed to expose applications that run in the cluster via public IP addresses, so that users will access them over the Internet too.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The first evolution of this concept towards a more restrictive environment was a commonly used pattern consisting of a combination of private clusters (&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/aks/private-clusters" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/azure/aks/private-clusters&lt;/A&gt;) and Azure Firewall to limit egress traffic (&lt;A class="lia-external-url" href="https://learn.microsoft.com/azure/aks/limit-egress-traffic" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/azure/aks/limit-egress-traffic&lt;/A&gt;) and prevent data exfiltration.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In this model, there are no longer any inbound connections to the cluster:&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;&lt;EM&gt;Figure 2- AKS private cluster&lt;/EM&gt;&lt;/P&gt;
&lt;/img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;The AKS API control plane is fully integrated in the virtual network.&lt;/LI&gt;
&lt;LI&gt;Azure Container Registry and other Azure services such as Azure Storage or Azure Key Vault are also integrated with the virtual network through the Private Link technology (&lt;A href="https://aka.ms/privatelink" target="_blank" rel="noopener"&gt;https://aka.ms/privatelink&lt;/A&gt;).&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;However, there are still outbound flows from the cluster nodes to the Internet, for example during the cluster creation process or the deployment of images stored in public repositories, which need to be explicitly allowed by the egress firewall.&lt;/P&gt;
&lt;H2&gt;Air-gapped clusters&lt;/H2&gt;
&lt;P&gt;It can be argued that using private clusters only provides security up to the robustness of your firewall ruleset: it essentially acts as a fail-open mechanism. If there’s a misconfiguration in the firewall rules, you may unintentionally allow data exfiltration or theft.&lt;/P&gt;
&lt;P&gt;To address this, AKS offers an even greater degree of isolation with network-isolated clusters (&lt;A href="https://learn.microsoft.com/azure/aks/concepts-network-isolated" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/azure/aks/concepts-network-isolated&lt;/A&gt;), where all outbound connections are completely blocked without the need of a firewall:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;&lt;EM&gt;Figure 3- AKS isolated cluster&lt;/EM&gt;&lt;/img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In this mode, AKS nodes are configured in a way so that no outbound flows to the Internet can exist.&lt;/P&gt;
&lt;P&gt;If you are curious about what you need to do to make sure of that in Azure, here is the list:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;No public Azure load balancer attached to the AKS nodes.&lt;/LI&gt;
&lt;LI&gt;No NAT gateway attached to the AKS node subnet.&lt;/LI&gt;
&lt;LI&gt;No public IP address attached to the AKS nodes.&lt;/LI&gt;
&lt;LI&gt;The AKS node subnet configured for no default outbound access (&lt;A href="https://learn.microsoft.com/azure/virtual-network/ip-services/default-outbound-access" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/azure/virtual-network/ip-services/default-outbound-access&lt;/A&gt;).&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;An important consideration is understanding how AKS nodes receive updates or how images are retrieved from public repositories (e.g. docker.io or nvcr.io). This is achieved through an Azure Container Registry feature known as “artifact cache”: &lt;A href="https://learn.microsoft.com/azure/container-registry/artifact-cache-overview" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/azure/container-registry/artifact-cache-overview&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;However, a challenge arises when considering Large Language Models (LLMs): LLM container images sourced from Hugging Face or NVIDIA (or any other source) typically include the inference runtime (for example vLLM) but not the model weights. Instead, model artifacts are downloaded dynamically when the container starts.&lt;/P&gt;
&lt;P&gt;Consequently, Azure Container Registry cannot cache these assets. The question then becomes: how can these model weights be made available within an air-gapped Kubernetes environment?&lt;/P&gt;
&lt;H2&gt;The Model Weights Challenge&lt;/H2&gt;
&lt;P&gt;While the model weight (re) load on container startup is a flexible approach in connected environments, it fails in air-gapped clusters where outbound network access is blocked.&lt;/P&gt;
&lt;P&gt;To address this, we consider two viable strategies:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Constructing a container image that includes all required components and pushing it to the container registry accessible by the isolated cluster &lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;Pre-downloading model artifacts to a private file share connected to the virtual network of the isolated cluster, and accessing these resources as needed.&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;Both methods will be demonstrated in detail, but before proceeding, however, we will further outline the example scenario. To provide context aligned with current priorities among our financial clients and organizations operating within regulated sectors, this demonstration focuses on the process of model deployment and the configuration of an isolated cluster for LLM inferencing.&lt;/P&gt;
&lt;P&gt;For inferencing we use Llama-3.1-8B-Instruct-FP8 served by vLLM, a high-performance inference runtime designed specifically for large language models. In simple terms, vLLM is responsible for efficiently loading the model onto the GPU and handling incoming inference requests with very low latency and high throughput. vLLM is typically packaged as a container image, which can be sourced either from Hugging Face or from NVIDIA (in our examples), the latter being highly optimized for NVIDIA GPUs and CUDA®. As described earlier, these images usually contain the inference runtime and dependencies, but not the model weights themselves.&lt;/P&gt;
&lt;P&gt;Instead, the model weights and other model-specific artifacts are downloaded dynamically when the container starts, allowing the same container image to be reused across different models and versions while keeping the image size small and deployment flexible. This approach is not suitable in isolated AKS clusters, where network traffic flowing outside of the deployed virtual network is not permitted.&lt;/P&gt;
&lt;P&gt;From an architectural perspective,&amp;nbsp;model serving is only one part of the overall inferencing platform, and the design of the underlying GPU infrastructure plays a critical role-especially in isolated AKS clusters. In such environments, challenges are not limited to downloading model weights at container startup; for example, setting up the GPU node pool&amp;nbsp;is another important consideration.&lt;/P&gt;
&lt;P&gt;Traditionally, enabling GPUs on AKS requires installing the NVIDIA device plugin for Kubernetes as well as the NVIDIA GPU drivers, most commonly by deploying the NVIDIA GPU Operator, which takes care of both. While the device plugin itself can be installed relatively easily via the artifact cache of an attached container registry, driver installation is more involved, especially in air-gapped or isolated environments:&amp;nbsp; &lt;A href="https://learn.microsoft.com/azure/aks/use-nvidia-gpu" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/azure/aks/use-nvidia-gpu&lt;/A&gt;. NVIDIA also provides detailed guidance on how to deploy the GPU Operator in such scenarios in their documentation: &lt;A href="https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/install-gpu-operator-air-gapped.html" target="_blank" rel="noopener"&gt;https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/install-gpu-operator-air-gapped.html&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;While the required procedures are clearly outlined and well documented, configuring the NVIDIA AKS GPU node pool within an air-gapped, isolated cluster continues to be a complex and time-consuming process.&lt;/P&gt;
&lt;P&gt;Microsoft has recently unveiled a preview feature that allows users to create fully managed NVIDIA GPU node pools on AKS. With this option, all necessary NVIDIA components including drivers, device plugins, and other supporting software are pre-installed and maintained by Microsoft throughout their lifecycle.&lt;/P&gt;
&lt;P&gt;This functionality is supported on isolated AKS clusters operating with Kubernetes v1.34.0 or later. It substantially decreases operational complexity, streamlining the deployment and maintenance of GPU-based AI inferencing solutions accelerated by NVIDIA in restricted environments&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/aks/aks-managed-gpu-nodes?tabs=add-ubuntu-gpu-node-pool" target="_blank" rel="noopener"&gt;https://learn.microsoft.com/en-us/azure/aks/aks-managed-gpu-nodes?tabs=add-ubuntu-gpu-node-pool&lt;/A&gt;&lt;U&gt; &lt;/U&gt;&lt;/P&gt;
&lt;P&gt;For example, this Azure CLI command would deploy a managed GPU AKS nodepool to an existing AKS cluster:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; az aks nodepool add \&lt;/P&gt;
&lt;P&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--resource-group MyResourceGroup \&lt;/P&gt;
&lt;P&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--cluster-name MyAKSCluster \&lt;/P&gt;
&lt;P&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--name gpunp \&lt;/P&gt;
&lt;P&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--node-count 1 \&lt;/P&gt;
&lt;P&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--node-vm-size &amp;lt;GPU_SKU&amp;gt; \&lt;/P&gt;
&lt;P&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--node-taints sku=gpu:NoSchedule \&lt;/P&gt;
&lt;P&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--enable-cluster-autoscaler \&lt;/P&gt;
&lt;P&gt;9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--min-count 1 \&lt;/P&gt;
&lt;P&gt;10&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--max-count 3 \&lt;/P&gt;
&lt;P&gt;11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--tags &lt;STRONG&gt;EnableManagedGPUExperience=&lt;/STRONG&gt;&lt;STRONG&gt;true&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Keep in mind that if you want complete control over driver versions and related settings, you should follow the guidelines for deploying the NVIDIA GPU Operator in an air-gapped environment. With the managed option, Microsoft takes care of maintaining driver versions for you.&lt;/P&gt;
&lt;H2&gt;Container and model weight deployment&lt;/H2&gt;
&lt;P&gt;With the managed GPU node pool set up, we can begin implementing both inferencing scenarios.&lt;/P&gt;
&lt;H3&gt;Scenario 1: Baking Model Weights into the Container Image&lt;/H3&gt;
&lt;P&gt;Let’s start with the first one, where the model weights are downloaded from Hugging Face, baked into a container image and then pushed to the attached Container Registry which is reachable from the isolated AKS cluster:&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;&lt;EM&gt;Figure 4- Baking Model Weights into Container Image&lt;/EM&gt;&lt;/P&gt;
&lt;/img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Figure 4- Baking Model Weights into Container Image&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;The easiest way to achieve this, is to trigger the container build directly from the container registry, which will take the local Dockerfile, pull the image and data needed from Hugging Face, and deploy and tag the backed image to the container registry.&lt;/P&gt;
&lt;P&gt;Make sure you have acquired a Hugging Face API Key. If you are using a gated model, access must be requested before building the image.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; az acr build \&lt;/P&gt;
&lt;P&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--registry &amp;lt;ACR_NAME&amp;gt; \&lt;/P&gt;
&lt;P&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--image llama3-vllm-fat:8b-instruct \&lt;/P&gt;
&lt;P&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--build-arg HF_TOKEN=$HF_TOKEN \&lt;/P&gt;
&lt;P&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;.&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Note: When you use the “az acr build” command instead of running docker build yourself, it automatically tags your image and pushes it to the Azure Container Registry.&lt;/P&gt;
&lt;P&gt;Once this container image is available in the container registry, we can create a simple pod and an internal load balancer to expose the service endpoint to the user. The detailed instructions and code are available here:&lt;A href="https://github.com/mocelj/aks-air-gap-vllm-deployment" target="_blank" rel="noopener"&gt; https://github.com/mocelj/aks-air-gap-vllm-deployment&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;You can test the deployment by querying the external IP of the deployed service and interacting with the endpoint in an OpenAPI-compatible way . Note that since this is an isolated cluster, you need to connect to the cluster’s network via VPN or run the curl command in a pod inside of the cluster, see&amp;nbsp;&lt;A href="https://github.com/mocelj/aks-air-gap-vllm-deployment/blob/main/aks_isolated.sh#L256" target="_blank" rel="noopener"&gt;aks-air-gap-vllm-deployment/aks_isolated.sh at main · mocelj/aks-air-gap-vllm-deployment&lt;/A&gt; for more details about how to set up a point-to-site VPN in Azure. Here you can see how to get the service IP address and query the completions API:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 1103px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # Get the Service IP&lt;/P&gt;
&lt;P&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; svc_ip=$(kubectl get svc vllm-llama3-8b -o jsonpath='{.status.loadBalancer.ingress[0].ip}')&lt;/P&gt;
&lt;P&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; curl -X POST "http://${svc_ip}:8000/v1/chat/completions" \&lt;/P&gt;
&lt;P&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;-H "accept: application/json" \&lt;/P&gt;
&lt;P&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;-H "Content-Type: application/json" \&lt;/P&gt;
&lt;P&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;-d '{&lt;/P&gt;
&lt;P&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"messages": [&lt;/P&gt;
&lt;P&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{"role": "system", "content": "You are a polite and respectful chatbot."},&lt;/P&gt;
&lt;P&gt;9&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;{"role": "user", "content": "Where&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;should I go for lunch close to the Microsoft office in Pratteln?"}&lt;/P&gt;
&lt;P&gt;10&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;],&lt;/P&gt;
&lt;P&gt;11&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"model": "meta/llama3-8b-instruct",&lt;/P&gt;
&lt;P&gt;12&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;"max_tokens": 512,&lt;/P&gt;
&lt;P&gt;13&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "top_p": 1,&lt;/P&gt;
&lt;P&gt;14&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "n": 1,&lt;/P&gt;
&lt;P&gt;15&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "stream": false,&lt;/P&gt;
&lt;P&gt;16&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; "frequency_penalty": 0.0&lt;/P&gt;
&lt;P&gt;17&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;}'&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;H3&gt;Scenario 2: Using a Shared File System for Model Artifacts&lt;/H3&gt;
&lt;P&gt;In the second scenario, where we are downloading the model weights and other artifacts used by the NVIDIA NIM to a shared NFS drive, we must follow a slightly different strategy.&lt;/P&gt;
&lt;img&gt;
&lt;P&gt;&lt;EM&gt;Figure 5- Pre-download model weights to a private file share&lt;/EM&gt;&lt;/P&gt;
&lt;/img&gt;
&lt;P&gt;For simplicity, we have used a virtual machine capable of downloading artifacts from the Internet and reaching the internal container registry as well as the shared NFS volume deployed in a virtual network. In this example, we have created a simple NFS share using Azure Files (see the Azure CLI code here to create the share and the endpoint):&amp;nbsp;&lt;A href="https://github.com/mocelj/aks-air-gap-vllm-deployment/blob/main/aks_isolated.sh#L352" target="_blank" rel="noopener"&gt;https://github.com/mocelj/aks-air-gap-vllm-deployment/blob/main/aks_isolated.sh#L352&lt;/A&gt;). For large scale inferencing scenarios, you might want to consider other storage options to ensure reasonable startup times, given the weights can be of considerable size.&lt;/P&gt;
&lt;P&gt;To facilitate model deployment on NVIDIA A100 GPUs, we have provisioned a jump box equipped with the same GPU type. If you start with a fresh virtual machine, you may need to install the appropriate GPU driver as well as NVIDIA’s container runtime (&lt;A href="https://developer.nvidia.com/container-runtime" target="_blank" rel="noopener"&gt;https://developer.nvidia.com/container-runtime&lt;/A&gt;). You can alternatively deploy a Linux virtual machine using DSVM Linux images, where only the container runtime needs to be added to ensure readiness for operation.&lt;/P&gt;
&lt;P&gt;To download the container image from nvcr.io we can leverage the caching rules in the container registry and pull the image via our connected container registry. Once the container image is pulled locally, we can download the model profile with the appropriate artifacts and copy everything to the shared folder, e.g. by using rsync.&lt;/P&gt;
&lt;P&gt;The artifacts can be downloaded by using the Utilities for NVIDIA NIM for LLMs:&amp;nbsp;&lt;A href="https://docs.nvidia.com/nim/large-language-models/latest/utilities.html" target="_blank" rel="noopener"&gt;https://docs.nvidia.com/nim/large-language-models/latest/utilities.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 1131px; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;1&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; docker run --rm \&lt;/P&gt;
&lt;P&gt;2&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--runtime=nvidia \&lt;/P&gt;
&lt;P&gt;3&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;--gpus all \&lt;/P&gt;
&lt;P&gt;4&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;-v $LOCAL_NIM_CACHE:/opt/nim/.cache \&lt;/P&gt;
&lt;P&gt;5&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;-u $(id -u) \&lt;/P&gt;
&lt;P&gt;6&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;-e NGC_API_KEY \&lt;/P&gt;
&lt;P&gt;7&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;$TARGET_IMAGE \&lt;/P&gt;
&lt;P&gt;8&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;download-to-cache&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;colgroup&gt;&lt;col style="width: 100.00%" /&gt;&lt;/colgroup&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;Important: The NVIDIA API key must not be included in the AKS deployment manifests, as otherwise it would trigger outbound network calls that will fail in air‑gapped environments. The key is only required on the jump box during the download of the model artefacts.&lt;/P&gt;
&lt;P&gt;Since this shared folder is reachable from within the Jump box network and the isolated, air-gapped AKS cluster, the only thing we must do is pointing the NVIDIA NIM container to use the model weights found in the shared folder, and not to download it once the pod starts. It is important to note that the NVIDIA API key should not be part of the deployment script, since otherwise it will trigger an outbound connection to pull an image from the nvcr.io registry, which will fail in an air gapped environment.&lt;/P&gt;
&lt;P&gt;The service can be tested in a similar way as before. First, we need to find out the external service IP, which we will get via “kubectl get svc vllm-nim-llama3-service -o wide” and then interact with the service in the same way as before, adjusting to the new service IP address.&lt;/P&gt;
&lt;P&gt;A more detailed description of the implementation steps can be found in the attached repository:&amp;nbsp;&lt;A href="https://github.com/mocelj/aks-air-gap-vllm-deployment" target="_blank" rel="noopener"&gt;https://github.com/mocelj/aks-air-gap-vllm-deployment&lt;/A&gt;.&lt;/P&gt;
&lt;H2&gt;Summary&lt;/H2&gt;
&lt;P&gt;This document presents a practical guide for deploying LLM inferencing solutions in isolated Azure Kubernetes Service (AKS) clusters.&lt;/P&gt;
&lt;P&gt;It outlines two deployment approaches: one where model weights and artifacts are pre-downloaded and stored in a shared folder accessible by both the jump box and cluster, and another using a shared NFS drive for storing downloaded resources.&lt;/P&gt;
&lt;P&gt;Both strategies enable secure, air-gapped deployments without relying on outbound internet access. For step-by-step instructions and further technical details, consult the referenced GitHub repository&amp;nbsp;&lt;A href="https://github.com/mocelj/aks-air-gap-vllm-deployment" target="_blank" rel="noopener"&gt;https://github.com/mocelj/aks-air-gap-vllm-deployment&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;For large‑scale or production deployments, more performant storage options—such as local NVMe or other high‑throughput solutions—can be explored; the services used in this guide are intentionally chosen to maximize clarity and reproducibility.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Mar 2026 09:18:31 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/ai-inferencing-in-air-gapped-environments/ba-p/4498594</guid>
      <dc:creator>damocelj</dc:creator>
      <dc:date>2026-03-09T09:18:31Z</dc:date>
    </item>
    <item>
      <title>Microsoft at NVIDIA GTC 2026</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/microsoft-at-nvidia-gtc-2026/ba-p/4497670</link>
      <description>&lt;DIV style="font-family: 'Segoe UI', Arial, sans-serif; color: #242424; line-height: 1.6; max-width: 1000px; margin: 0; text-align: left; background-color: #ffffff;"&gt;
&lt;P&gt;Microsoft returns to NVIDIA GTC 2026 in San Jose with a strong presence across conference sessions, in‑booth theater talks, live demos, and executive‑level ancillary events. Together with NVIDIA and our partner ecosystem, Microsoft is showcasing how Azure AI infrastructure enables AI training, inference, and production at global scale. Visit us at &lt;STRONG&gt;Booth #521&lt;/STRONG&gt; to see the latest innovations in action and connect with Azure and NVIDIA experts.&lt;/P&gt;
&lt;H2 style="margin-top: 48px; color: #0078d4;"&gt;Exclusive GTC Experiences&lt;/H2&gt;
&lt;DIV style="display: flex; flex-wrap: wrap; gap: 20px; margin: 24px 0;"&gt;
&lt;DIV style="flex: 1; min-width: 230px; border: 1px solid #e1e8f0; border-radius: 12px; overflow: hidden; background: #ffffff;"&gt;
&lt;DIV style="height: 140px; background: #f0f4f8; border-bottom: 1px solid #e1e8f0;"&gt;&lt;IMG style="width: 100%; height: 100%; object-fit: cover;" src="https://github.com/zucaina2000/GTC26/blob/main/1763445319496.jpeg?raw=true" /&gt;&lt;/DIV&gt;
&lt;DIV style="padding: 15px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; color: #0078d4; font-size: 16px;"&gt;LEGO® Datacenter Model&lt;/H3&gt;
&lt;P style="margin: 0; font-size: 13px;"&gt;Explore Azure AI infrastructure at the &lt;STRONG&gt;Park Container&lt;/STRONG&gt;.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 1; min-width: 230px; border: 1px solid #e1e8f0; border-radius: 12px; overflow: hidden; background: #ffffff;"&gt;
&lt;DIV style="height: 140px; background: #fdf6f0; border-bottom: 1px solid #e1e8f0;"&gt;&lt;IMG style="width: 100%; height: 100%; object-fit: cover;" src="https://github.com/zucaina2000/GTC26/blob/main/candy%20lounge.png?raw=true" /&gt;&lt;/DIV&gt;
&lt;DIV style="padding: 15px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; color: #d83b01; font-size: 16px;"&gt;Candy Lounge&lt;/H3&gt;
&lt;P style="margin: 0; font-size: 13px;"&gt;Visit the high-traffic candy wall for co-branded treats all day long.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 1; min-width: 230px; border: 1px solid #e1e8f0; border-radius: 12px; overflow: hidden; background: #ffffff;"&gt;
&lt;DIV style="height: 140px; background: #fcfaff; border-bottom: 1px solid #e1e8f0;"&gt;&lt;IMG style="width: 100%; height: 100%; object-fit: cover;" src="https://github.com/zucaina2000/GTC26/blob/main/networking%20lounge.png?raw=true" /&gt;&lt;/DIV&gt;
&lt;DIV style="padding: 15px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; color: #5c2d91; font-size: 16px;"&gt;Networking Lounge&lt;/H3&gt;
&lt;P style="margin: 0; font-size: 13px;"&gt;Relax and recharge with comfy seating and vital charging options.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 1; min-width: 230px; border: 1px solid #e1e8f0; border-radius: 12px; overflow: hidden; background: #ffffff;"&gt;
&lt;DIV style="height: 140px; background: #f5fcf5; border-bottom: 1px solid #e1e8f0;"&gt;&lt;IMG style="width: 100%; height: 100%; object-fit: cover;" src="https://github.com/zucaina2000/GTC26/blob/main/juivr.png?raw=true" /&gt;&lt;/DIV&gt;
&lt;DIV style="padding: 15px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; color: #107c10; font-size: 16px;"&gt;Outdoor Juice Truck&lt;/H3&gt;
&lt;P style="margin: 0; font-size: 13px;"&gt;Free, refreshing beverages served during outdoor park hours.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H2 style="margin-top: 48px; color: #0078d4;"&gt;Sponsored Breakout Sessions&lt;/H2&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 16px; padding: 24px; margin-bottom: 24px; background: #ffffff;"&gt;
&lt;DIV style="display: inline-block; background: #0078d4; color: white; padding: 4px 12px; border-radius: 4px; font-size: 11px; font-weight: bold; margin-bottom: 15px; text-transform: uppercase;"&gt;Microsoft Featured&lt;/DIV&gt;
&lt;H3 style="margin-top: 0; color: #0078d4; font-size: 20px;"&gt;Reinventing Semiconductor Design with Microsoft Discovery&lt;/H3&gt;
&lt;P style="font-size: 14px; margin-bottom: 20px;"&gt;&lt;STRONG&gt;S82398&lt;/STRONG&gt; · Mon, Mar 16 · 4:00 PM&lt;/P&gt;
&lt;DIV style="display: flex; align-items: center; gap: 15px; border-top: 1px solid #f0f0f0; padding: 15px 0;"&gt;
&lt;DIV style="width: 54px; height: 54px; border-radius: 50%; border: 3px solid #0078d4; overflow: hidden; background: #f0f4f8; flex-shrink: 0;"&gt;&lt;IMG style="width: 100%; height: 100%; object-fit: cover;" src="https://github.com/zucaina2000/GTC26/blob/main/1764627786112.jpeg?raw=true" alt="Prashant Varshney" /&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV style="font-weight: 600; font-size: 15px;"&gt;Prashant Varshney&lt;/DIV&gt;
&lt;DIV style="font-size: 13px; color: #555;"&gt;Microsoft · Semiconductor &amp;amp; AI Engineering&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border-top: 1px solid #f0f0f0; padding-top: 15px;"&gt;
&lt;P style="font-size: 14px; line-height: 1.6; color: #444; margin: 0;"&gt;&lt;STRONG&gt;Abstract:&lt;/STRONG&gt; Semiconductor teams face exploding design complexity and shrinking verification windows. This session shows how the Microsoft Discovery AI for Science platform, combined with Synopsys Agent Engineers, introduces an agentic approach to EDA that automates routine steps and accelerates expert decision-making on Azure.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 16px; padding: 24px; margin-bottom: 24px; background: #ffffff;"&gt;
&lt;DIV style="display: inline-block; background: #0078d4; color: white; padding: 4px 12px; border-radius: 4px; font-size: 11px; font-weight: bold; margin-bottom: 15px; text-transform: uppercase;"&gt;Microsoft Featured&lt;/DIV&gt;
&lt;H3 style="margin-top: 0; color: #0078d4; font-size: 20px;"&gt;Operationalizing Agentic AI at Hyperscale&lt;/H3&gt;
&lt;P style="font-size: 14px; margin-bottom: 20px;"&gt;&lt;STRONG&gt;S82399&lt;/STRONG&gt; · Tue, Mar 17 · 1:00 PM&lt;/P&gt;
&lt;DIV style="display: grid; grid-template-columns: repeat(auto-fit, minmax(240px, 1fr)); gap: 20px; border-top: 1px solid #f0f0f0; padding: 15px 0;"&gt;
&lt;DIV style="display: flex; align-items: center; gap: 12px;"&gt;
&lt;DIV style="width: 50px; height: 50px; border-radius: 50%; border: 3px solid #0078d4; overflow: hidden; background: #f0f4f8; flex-shrink: 0;"&gt;&lt;IMG style="width: 100%; height: 100%; object-fit: cover;" src="https://github.com/zucaina2000/GTC26/blob/main/1517681891648.jpeg?raw=true" alt="Nitin Nagarkatte" /&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV style="font-weight: 600; font-size: 14px;"&gt;Nitin Nagarkatte&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; color: #555;"&gt;Microsoft · Azure AI Infrastructure&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="display: flex; align-items: center; gap: 12px;"&gt;
&lt;DIV style="width: 50px; height: 50px; border-radius: 50%; border: 3px solid #0078d4; overflow: hidden; background: #f0f4f8; flex-shrink: 0;"&gt;&lt;IMG style="width: 100%; height: 100%; object-fit: cover;" src="https://github.com/zucaina2000/GTC26/blob/main/1517769589184.jpeg?raw=true" alt="Anand Raman" /&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV style="font-weight: 600; font-size: 14px;"&gt;Anand Raman&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; color: #555;"&gt;Microsoft · Azure AI&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="display: flex; align-items: center; gap: 12px;"&gt;
&lt;DIV style="width: 50px; height: 50px; border-radius: 50%; border: 3px solid #0078d4; overflow: hidden; background: #f0f4f8; flex-shrink: 0;"&gt;&lt;IMG style="width: 100%; height: 100%; object-fit: cover;" src="https://github.com/zucaina2000/GTC26/blob/main/1713380746429.jpeg?raw=true" alt="Vipul Modi" /&gt;&lt;/DIV&gt;
&lt;DIV&gt;
&lt;DIV style="font-weight: 600; font-size: 14px;"&gt;Vipul Modi&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; color: #555;"&gt;Microsoft · AI Systems&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border-top: 1px solid #f0f0f0; padding-top: 15px;"&gt;
&lt;P style="font-size: 14px; line-height: 1.6; color: #444; margin: 0;"&gt;&lt;STRONG&gt;Abstract:&lt;/STRONG&gt; As enterprises move to agentic systems, the challenge shifts to operating intelligent agents reliably at scale. This session demonstrates how Microsoft builds AI Factories on Azure using NVIDIA technology and explores Microsoft Foundry as the control plane for deploying and operating coordinated AI agents.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H2 style="margin-top: 48px; color: #0078d4;"&gt;Live from GTC: AI Podcast&lt;/H2&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 16px; padding: 32px; margin-bottom: 24px; background: #ffffff; display: flex; flex-wrap: wrap; gap: 30px; align-items: center;"&gt;
&lt;DIV style="flex: 0 0 auto; display: flex; gap: 20px; align-items: center;"&gt;
&lt;DIV style="text-align: center;"&gt;
&lt;DIV style="width: 120px; height: 120px; border-radius: 50%; border: 4px solid #0078d4; overflow: hidden; background: #f0f4f8; margin-bottom: 10px;"&gt;&lt;IMG style="width: 100%; height: 100%; object-fit: cover;" src="https://github.com/zucaina2000/GTC26/blob/main/dayan.jpeg?raw=true" alt="Speaker Name" /&gt;&lt;/DIV&gt;
&lt;DIV style="font-weight: bold; font-size: 14px; color: #242424;"&gt;Dayan Rodriguez&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; color: #555;"&gt;
&lt;P&gt;Corporate Vice President&lt;BR /&gt;Global Manufacturing &lt;BR /&gt;and Mobility&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="text-align: center;"&gt;
&lt;DIV style="width: 120px; height: 120px; border-radius: 50%; border: 4px solid #0078d4; overflow: hidden; background: #f0f4f8; margin-bottom: 10px;"&gt;&lt;IMG style="width: 100%; height: 100%; object-fit: cover;" src="https://github.com/zucaina2000/GTC26/blob/main/alistair.jpeg?raw=true" alt="Speaker Name" /&gt;&lt;/DIV&gt;
&lt;DIV style="font-weight: bold; font-size: 14px; color: #242424;"&gt;Alistair Spiers&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; color: #555;"&gt;
&lt;P&gt;General Manager&lt;BR /&gt;Azure Infrastructure&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 2; min-width: 250px;"&gt;
&lt;DIV style="display: inline-block; background: #5c2d91; color: white; padding: 4px 12px; border-radius: 4px; font-size: 11px; font-weight: bold; margin-bottom: 12px; text-transform: uppercase;"&gt;Live Special Feature&lt;/DIV&gt;
&lt;H3 style="margin-top: 0; color: #0078d4; font-size: 24px; margin-bottom: 8px;"&gt;A conversation with Microsoft Azure&lt;/H3&gt;
&lt;P style="font-size: 16px; color: #d83b01; font-weight: bold; margin: 0 0 20px 0;"&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV style="background: #f8f9fa; padding: 15px; border-radius: 8px; border-left: 4px solid #0078d4;"&gt;
&lt;DIV style="font-weight: 600; font-size: 13px; margin-bottom: 5px; color: #666;"&gt;Listen &amp;amp; Subscribe:&lt;/DIV&gt;
&lt;A style="color: #0078d4; text-decoration: none; font-weight: bold; font-size: 18px;" href="https://aka.ms/YourPodcastLink" target="_blank" rel="noopener"&gt;aka.ms/GTC2026Podcast&lt;/A&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 0 0 140px; text-align: center; border: 1px solid #e1e8f0; border-radius: 12px; padding: 15px; background: #ffffff;"&gt;&lt;IMG style="width: 110px; height: 110px; margin-bottom: 10px;" src="https://github.com/zucaina2000/GTC26/blob/main/GTC26.png?raw=true" alt="Scan to Listen" /&gt;
&lt;DIV style="font-size: 10px; color: #666; font-weight: bold; text-transform: uppercase; letter-spacing: 0.5px;"&gt;Scan to Listen&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H2 style="margin-top: 48px; color: #0078d4;"&gt;Earned Conference Sessions&lt;/H2&gt;
&lt;P style="margin-bottom: 24px;"&gt;Don't miss these high-impact sessions where Microsoft and NVIDIA leaders discuss the future of AI factories and infrastructure.&lt;/P&gt;
&lt;DIV style="display: flex; flex-wrap: wrap; gap: 20px; margin: 24px 0;"&gt;
&lt;DIV style="flex: 1; min-width: 450px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 100px; display: flex; flex-direction: column; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; text-align: center; padding: 10px;"&gt;
&lt;DIV style="font-size: 11px; font-weight: bold; color: #0078d4; text-transform: uppercase;"&gt;Mon · Mar 16&lt;/DIV&gt;
&lt;DIV style="font-size: 16px; font-weight: bold; color: #0078d4; margin-top: 4px;"&gt;5:00 PM&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 17px; color: #0078d4; line-height: 1.4;"&gt;Drive Optimal Tokens per Watt on AI Infrastructure Using Benchmarking Recipes&lt;/H3&gt;
&lt;DIV style="font-size: 13px; color: #242424; margin-bottom: 4px;"&gt;&lt;STRONG&gt;Speakers:&lt;/STRONG&gt; Paul Edwards, Emily Potyraj&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; color: #666; font-style: italic;"&gt;Microsoft, NVIDIA&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 1; min-width: 450px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 100px; display: flex; flex-direction: column; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; text-align: center; padding: 10px;"&gt;
&lt;DIV style="font-size: 11px; font-weight: bold; color: #0078d4; text-transform: uppercase;"&gt;Tue · Mar 17&lt;/DIV&gt;
&lt;DIV style="font-size: 16px; font-weight: bold; color: #0078d4; margin-top: 4px;"&gt;9:00 AM&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 17px; color: #0078d4; line-height: 1.4;"&gt;Autonomous AI Factories: Technical Preview of Agent-Native Production&lt;/H3&gt;
&lt;DIV style="font-size: 13px; color: #242424; margin-bottom: 4px;"&gt;&lt;STRONG&gt;Speakers:&lt;/STRONG&gt; JP Vasseur, César Martinez Spessot&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; color: #666; font-style: italic;"&gt;NVIDIA, Microsoft Research&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 1; min-width: 450px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 100px; display: flex; flex-direction: column; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; text-align: center; padding: 10px;"&gt;
&lt;DIV style="font-size: 11px; font-weight: bold; color: #0078d4; text-transform: uppercase;"&gt;Tue · Mar 17&lt;/DIV&gt;
&lt;DIV style="font-size: 16px; font-weight: bold; color: #0078d4; margin-top: 4px;"&gt;4:00 PM&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 17px; color: #0078d4; line-height: 1.4;"&gt;The Road to Intelligent Mobility: Vehicle GenAI&lt;/H3&gt;
&lt;DIV style="font-size: 13px; color: #242424; margin-bottom: 4px;"&gt;&lt;STRONG&gt;Speakers:&lt;/STRONG&gt; Raj Paul, Thomas Evans, Bryan Goodman&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; color: #666; font-style: italic;"&gt;Microsoft, NVIDIA, Bosch&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 1; min-width: 450px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 100px; display: flex; flex-direction: column; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; text-align: center; padding: 10px;"&gt;
&lt;DIV style="font-size: 11px; font-weight: bold; color: #0078d4; text-transform: uppercase;"&gt;Wed · Mar 18&lt;/DIV&gt;
&lt;DIV style="font-size: 16px; font-weight: bold; color: #0078d4; margin-top: 4px;"&gt;9:00 AM&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 17px; color: #0078d4; line-height: 1.4;"&gt;Supercharging AI with Multi-Gigawatt AI Factories&lt;/H3&gt;
&lt;DIV style="font-size: 13px; color: #242424; margin-bottom: 4px;"&gt;&lt;STRONG&gt;Speakers:&lt;/STRONG&gt; Gilad Shainer, Peter Salanki, Evan Burness&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; color: #666; font-style: italic;"&gt;NVIDIA, CoreWeave, Meta, Microsoft&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H2 style="margin-top: 48px; color: #0078d4;"&gt;Daily Booth Theater Schedule&lt;/H2&gt;
&lt;P style="margin-bottom: 24px;"&gt;Visit the Microsoft Theater for lightning talks from engineering leaders and partners.&lt;/P&gt;
&lt;H3 style="color: #0078d4; margin-bottom: 15px; border-left: 4px solid #0078d4; padding-left: 10px;"&gt;Monday, March 16&lt;/H3&gt;
&lt;DIV style="display: flex; flex-direction: column; gap: 12px; margin-bottom: 40px;"&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;2:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH208 · NVIDIA&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Accelerate AI Innovation on Azure with NVIDIA Run:ai &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Rob Magno&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;2:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH202 · General Robotics&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Models to Machines: Deploying Agentic AI in Real-World Robotics &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Dinesh Narayanan&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;3:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH200 · Fractal Analytics&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;From Generalist to Enterprise-Ready: Fractal Builds Domain AI &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— C. Chaudhuri&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;3:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH109 · Microsoft&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Agentic cloud ops - Smarter Operations with Azure Copilot &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Jyoti Sharma&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;4:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH103 · Microsoft&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Build a Deep Research Agent for Enterprise Data &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— D. Casati, A. Slutsky, H. Alkemade&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;4:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH205 · NetApp&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Azure NetApp Files: Powering Your Data for AI Capabilities &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Andy Chan&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;5:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH207 · NVIDIA&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;The Agentic Commerce Stack: Open Models on Azure &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Antonio Martinez&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;5:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH217 · OPAQUE&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Confidential AI on Azure Unlocks Sovereign AI at Scale &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Aaron Fulkerson&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;6:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH218 · Simplismart&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Making BYOC work at scale with modular inference &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Amritanshu Jain&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #fcfcfc; display: flex; overflow: hidden; min-height: 50px;"&gt;
&lt;DIV style="background: #eee; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #666; font-size: 13px;"&gt;6:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1; display: flex; align-items: center;"&gt;
&lt;DIV style="font-size: 15px; font-weight: bold; color: #666; text-transform: uppercase; letter-spacing: 1px;"&gt;Expo Reception&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H3 style="color: #0078d4; margin-bottom: 15px; border-left: 4px solid #0078d4; padding-left: 10px;"&gt;Tuesday, March 17&lt;/H3&gt;
&lt;DIV style="display: flex; flex-direction: column; gap: 12px; margin-bottom: 40px;"&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;1:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH100 · Microsoft&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;From Open Weights to Enterprise Scale: Open-Source Models &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Sharmila Chockalingam&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;2:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH212 · Personal AI&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Unlocking the power of memory in Teams with Personal AI &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Sam Harkness&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;2:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH111 · Microsoft / NVIDIA&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Scalable LLM Inference on AKS Using NVIDIA Dynamo &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Mohamad Al jazaery, Anton Slutsky&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;3:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH204 · Mistral AI&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Innovate with Mistral AI on Microsoft Foundry &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Ian Mathew&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;3:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH104 · Microsoft&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;GPU-Accelerated CFD at Scale: Star-CCM+ on Azure &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Jason Scheffelmaer&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;4:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH206 · NeuBird AI&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Agentic AI for Incident Response on Microsoft Azure &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Grant Griffiths&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;4:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH101 · GitHub&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Agentic DevOps: Evolving software with GitHub Copilot &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Glenn Wester&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;5:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH209 · Rescale&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Real-World AI Physics: GM &amp;amp; NVIDIA on Rescale &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Dinal Perera&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;5:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH107 · Microsoft&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Intro to LoRA Fine-Tuning on Azure &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Christin Pohl&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #fcfcfc; display: flex; overflow: hidden; min-height: 50px;"&gt;
&lt;DIV style="background: #eee; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #666; font-size: 13px;"&gt;6:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1; display: flex; align-items: center;"&gt;
&lt;DIV style="font-size: 15px; font-weight: bold; color: #666; text-transform: uppercase; letter-spacing: 1px;"&gt;Raffle&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H3 style="color: #0078d4; margin-bottom: 15px; border-left: 4px solid #0078d4; padding-left: 10px;"&gt;Wednesday, March 18&lt;/H3&gt;
&lt;DIV style="display: flex; flex-direction: column; gap: 12px; margin-bottom: 40px;"&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;1:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH219 · VAST Data&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Scaling AI Infrastructure on Azure with VAST Data &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Jason Vallery&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;1:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH110 · Microsoft&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Physical AI and Robotics: The Next Frontier &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— F. Miller, C. Souche, D. Narayanan&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;2:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH105 · Microsoft&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Sovereign AI options with Azure Local &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Kim Lam&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;2:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH108 · Microsoft&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Automating HPC Workflows with Copilot Agents &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Param Shah&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;3:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH102 · Microsoft&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Trustworthy Multi-Agent Workflows with Microsoft Foundry &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Brian Benz&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;4:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH106 · Microsoft&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Scaling Enterprise AI on ARO with NVIDIA H100 &amp;amp; H200 &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Lachie Evenson&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;4:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH211 · WEKA&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Hybrid AI Data Orchestration with WEKA NeuralMesh™ &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Desiree Campbell&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;5:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH202 · Hammerspace&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;NVIDIA AI Enterprise Software with NIM &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Mike Bloom&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;5:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH203 · Kinaxis&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Reimagining Global Supply Planning with Azure &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Dane Henshall&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;6:00 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH214 · AT&amp;amp;T&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Connected AI on Azure for Manufacturing &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Brad Pritchett&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #fcfcfc; display: flex; overflow: hidden; min-height: 50px;"&gt;
&lt;DIV style="background: #eee; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #666; font-size: 13px;"&gt;6:30 PM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1; display: flex; align-items: center;"&gt;
&lt;DIV style="font-size: 15px; font-weight: bold; color: #666; text-transform: uppercase; letter-spacing: 1px;"&gt;Raffle&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H3 style="color: #0078d4; margin-bottom: 15px; border-left: 4px solid #0078d4; padding-left: 10px;"&gt;Thursday, March 19&lt;/H3&gt;
&lt;DIV style="display: flex; flex-direction: column; gap: 12px; margin-bottom: 40px;"&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden; min-height: 70px;"&gt;
&lt;DIV style="background: #f5f9fd; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #0078d4; font-size: 13px;"&gt;11:00 AM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1;"&gt;
&lt;DIV style="font-size: 11px; color: #666; font-weight: 600; text-transform: uppercase; margin-bottom: 2px;"&gt;BTH210 · Wandelbots&lt;/DIV&gt;
&lt;DIV style="font-size: 15px; font-weight: 600; color: #242424;"&gt;Physical AI: Powering Software-Defined Automation in Robotics &lt;SPAN style="font-weight: 400; color: #666; font-size: 13px;"&gt;— Marwin Kunz, Martin George&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="border: 1px solid #e1e8f0; border-radius: 12px; background: #fcfcfc; display: flex; overflow: hidden; min-height: 50px;"&gt;
&lt;DIV style="background: #eee; width: 90px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0; font-weight: bold; color: #666; font-size: 13px;"&gt;11:30 AM&lt;/DIV&gt;
&lt;DIV style="padding: 12px 20px; flex-grow: 1; display: flex; align-items: center;"&gt;
&lt;DIV style="font-size: 15px; font-weight: bold; color: #666; text-transform: uppercase; letter-spacing: 1px;"&gt;Raffle&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H2 style="margin-top: 48px; color: #0078d4;"&gt;Explore Our Demo Pods&lt;/H2&gt;
&lt;P style="margin-bottom: 24px;"&gt;Visit the Microsoft booth to see our technology in action with live demonstrations across four dedicated pod areas.&lt;/P&gt;
&lt;DIV style="display: flex; flex-wrap: wrap; gap: 20px; margin: 24px 0;"&gt;
&lt;DIV style="flex: 1; min-width: 450px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 80px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0;"&gt;
&lt;DIV style="font-weight: bold; color: #0078d4; font-size: 14px;"&gt;POD 1&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 18px; color: #0078d4;"&gt;Azure AI Infrastructure&lt;/H3&gt;
&lt;P style="margin: 0; font-size: 14px; color: #444; line-height: 1.5;"&gt;End‑to‑end AI infrastructure for training and inference at scale, featuring the latest NVIDIA GPU integrations on Azure.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 1; min-width: 450px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 80px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0;"&gt;
&lt;DIV style="font-weight: bold; color: #0078d4; font-size: 14px;"&gt;POD 2&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 18px; color: #0078d4;"&gt;Microsoft Foundry&lt;/H3&gt;
&lt;P style="margin: 0; font-size: 14px; color: #444; line-height: 1.5;"&gt;Our comprehensive platform for building, deploying, and operating agentic AI systems with enterprise reliability.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 1; min-width: 450px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 80px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0;"&gt;
&lt;DIV style="font-weight: bold; color: #0078d4; font-size: 14px;"&gt;POD 3&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 18px; color: #0078d4;"&gt;Building AI Together&lt;/H3&gt;
&lt;P style="margin: 0; font-size: 14px; color: #444; line-height: 1.5;"&gt;Showcasing joint Microsoft and NVIDIA solutions across diverse industries, from manufacturing to retail.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="flex: 1; min-width: 450px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; display: flex; overflow: hidden;"&gt;
&lt;DIV style="background: #f5f9fd; width: 80px; display: flex; align-items: center; justify-content: center; border-right: 1px solid #e1e8f0; flex-shrink: 0;"&gt;
&lt;DIV style="font-weight: bold; color: #0078d4; font-size: 14px;"&gt;POD 4&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 18px; color: #0078d4;"&gt;Startups Powering AI&lt;/H3&gt;
&lt;P style="margin: 0; font-size: 14px; color: #444; line-height: 1.5;"&gt;Discover how innovative startups are running next‑generation AI workloads on the Azure platform.&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;H2 style="margin-top: 48px; color: #0078d4;"&gt;Ancillary Events &amp;amp; Networking&lt;/H2&gt;
&lt;P style="margin-bottom: 32px; color: #444;"&gt;Join Microsoft leadership and our partner ecosystem at these curated networking experiences. Click the location to view on Bing Maps.&lt;/P&gt;
&lt;DIV style="max-width: 900px;"&gt;
&lt;DIV style="display: flex; gap: 20px; margin-bottom: 24px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; overflow: hidden;"&gt;
&lt;DIV style="background: #0078d4; color: #ffffff; width: 120px; display: flex; flex-direction: column; align-items: center; justify-content: center; text-align: center; padding: 15px; flex-shrink: 0;"&gt;
&lt;DIV style="font-size: 12px; font-weight: 600; text-transform: uppercase; opacity: 0.9;"&gt;Sun · Mar 15&lt;/DIV&gt;
&lt;DIV style="font-size: 20px; font-weight: bold;"&gt;6:00&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; font-weight: 600;"&gt;PM&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px; flex-grow: 1;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 18px; color: #0078d4;"&gt;Microsoft for Startups Executive Leadership Dinner&lt;/H3&gt;
&lt;DIV style="display: flex; align-items: center; gap: 8px; font-size: 14px; color: #444;"&gt;&lt;SPAN style="font-size: 16px;"&gt;📍&lt;/SPAN&gt; &lt;A style="color: #0078d4; text-decoration: none; font-weight: 600;" href="https://www.bing.com/maps?q=Morton%27s+The+Steakhouse+San+Jose" target="_blank" rel="noopener"&gt; Morton’s Steakhouse, San Jose &lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="font-size: 13px; color: #666; margin-top: 6px;"&gt;Exclusive gathering for startup leaders and Microsoft executives.&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="display: flex; gap: 20px; margin-bottom: 24px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; overflow: hidden;"&gt;
&lt;DIV style="background: #0078d4; color: #ffffff; width: 120px; display: flex; flex-direction: column; align-items: center; justify-content: center; text-align: center; padding: 15px; flex-shrink: 0;"&gt;
&lt;DIV style="font-size: 12px; font-weight: 600; text-transform: uppercase; opacity: 0.9;"&gt;Mon · Mar 16&lt;/DIV&gt;
&lt;DIV style="font-size: 20px; font-weight: bold;"&gt;1:30&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; font-weight: 600;"&gt;PM&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px; flex-grow: 1;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 18px; color: #0078d4;"&gt;Microsoft × NVIDIA Open Meet&lt;/H3&gt;
&lt;DIV style="display: flex; align-items: center; gap: 8px; font-size: 14px; color: #444;"&gt;&lt;SPAN style="font-size: 16px;"&gt;📍&lt;/SPAN&gt; &lt;A style="color: #0078d4; text-decoration: none; font-weight: 600;" href="https://www.bing.com/maps?q=Signia+by+Hilton+San+Jose" target="_blank" rel="noopener"&gt; Signia by Hilton · International Suite &lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="font-size: 13px; color: #666; margin-top: 6px;"&gt;Strategic alignment session for Microsoft and NVIDIA executives.&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="display: flex; gap: 20px; margin-bottom: 24px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; overflow: hidden;"&gt;
&lt;DIV style="background: #0078d4; color: #ffffff; width: 120px; display: flex; flex-direction: column; align-items: center; justify-content: center; text-align: center; padding: 15px; flex-shrink: 0;"&gt;
&lt;DIV style="font-size: 12px; font-weight: 600; text-transform: uppercase; opacity: 0.9;"&gt;Mon · Mar 16&lt;/DIV&gt;
&lt;DIV style="font-size: 20px; font-weight: bold;"&gt;7:30&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; font-weight: 600;"&gt;PM&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px; flex-grow: 1;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 18px; color: #0078d4;"&gt;Microsoft + NVIDIA Executive Dinner&lt;/H3&gt;
&lt;DIV style="display: flex; align-items: center; gap: 8px; font-size: 14px; color: #444;"&gt;&lt;SPAN style="font-size: 16px;"&gt;📍&lt;/SPAN&gt; &lt;A style="color: #0078d4; text-decoration: none; font-weight: 600;" href="https://www.bing.com/maps?q=Il+Fornaio+San+Jose" target="_blank" rel="noopener"&gt; Il Fornaio, San Jose &lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="font-size: 13px; color: #666; margin-top: 6px;"&gt;Executive dinner for key customers and leadership teams.&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="display: flex; gap: 20px; margin-bottom: 24px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; overflow: hidden;"&gt;
&lt;DIV style="background: #0078d4; color: #ffffff; width: 120px; display: flex; flex-direction: column; align-items: center; justify-content: center; text-align: center; padding: 15px; flex-shrink: 0;"&gt;
&lt;DIV style="font-size: 12px; font-weight: 600; text-transform: uppercase; opacity: 0.9;"&gt;Tue · Mar 17&lt;/DIV&gt;
&lt;DIV style="font-size: 18px; font-weight: bold;"&gt;11:00 AM&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; font-weight: 600;"&gt;to 1:00 PM&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px; flex-grow: 1;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 18px; color: #0078d4;"&gt;Microsoft AI Luncheon: Research, Robotics, &amp;amp; Real‑World AI&lt;/H3&gt;
&lt;DIV style="display: flex; align-items: center; gap: 8px; font-size: 14px; color: #444;"&gt;&lt;SPAN style="font-size: 16px;"&gt;📍&lt;/SPAN&gt; &lt;A style="color: #0078d4; text-decoration: none; font-weight: 600;" href="https://www.bing.com/maps?q=Signia+by+Hilton+San+Jose" target="_blank" rel="noopener"&gt; Signia by Hilton · International Suite &lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="font-size: 13px; color: #666; margin-top: 6px;"&gt;&lt;STRONG&gt;Invite-only:&lt;/STRONG&gt; A curated executive lunch exploring the journey from AI research to physical enterprise deployments in robotics and manufacturing.&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="display: flex; gap: 20px; margin-bottom: 24px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; overflow: hidden;"&gt;
&lt;DIV style="background: #0078d4; color: #ffffff; width: 120px; display: flex; flex-direction: column; align-items: center; justify-content: center; text-align: center; padding: 15px; flex-shrink: 0;"&gt;
&lt;DIV style="font-size: 12px; font-weight: 600; text-transform: uppercase; opacity: 0.9;"&gt;Tue · Mar 17&lt;/DIV&gt;
&lt;DIV style="font-size: 20px; font-weight: bold;"&gt;7:30&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; font-weight: 600;"&gt;PM&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px; flex-grow: 1;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 18px; color: #0078d4;"&gt;Networking in AI &amp;amp; Tech&lt;/H3&gt;
&lt;DIV style="display: flex; align-items: center; gap: 8px; font-size: 14px; color: #444;"&gt;&lt;SPAN style="font-size: 16px;"&gt;📍&lt;/SPAN&gt; &lt;A style="color: #0078d4; text-decoration: none; font-weight: 600;" href="https://www.bing.com/maps?q=San+Pedro+Square+Market+San+Jose" target="_blank" rel="noopener"&gt; San Pedro Square Market &lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="font-size: 13px; color: #666; margin-top: 6px;"&gt;Community networking mixer for Microsoft teams, partners, and customers.&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="display: flex; gap: 20px; margin-bottom: 24px; border: 1px solid #e1e8f0; border-radius: 12px; background: #ffffff; overflow: hidden;"&gt;
&lt;DIV style="background: #0078d4; color: #ffffff; width: 120px; display: flex; flex-direction: column; align-items: center; justify-content: center; text-align: center; padding: 15px; flex-shrink: 0;"&gt;
&lt;DIV style="font-size: 12px; font-weight: 600; text-transform: uppercase; opacity: 0.9;"&gt;Wed · Mar 18&lt;/DIV&gt;
&lt;DIV style="font-size: 18px; font-weight: bold;"&gt;10:00 AM&lt;/DIV&gt;
&lt;DIV style="font-size: 12px; font-weight: 600;"&gt;to 1:00 PM&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV style="padding: 20px; flex-grow: 1;"&gt;
&lt;H3 style="margin: 0 0 8px 0; font-size: 18px; color: #0078d4;"&gt;AI Innovator’s Circle Brunch: Powering Intelligent Systems Across the Ecosystem&lt;/H3&gt;
&lt;DIV style="display: flex; align-items: center; gap: 8px; font-size: 14px; color: #444;"&gt;&lt;SPAN style="font-size: 16px;"&gt;📍&lt;/SPAN&gt; &lt;A style="color: #0078d4; text-decoration: none; font-weight: 600;" href="https://www.bing.com/maps?q=Il+Fornaio+San+Jose" target="_blank" rel="noopener"&gt; Il Fornaio, San Jose &lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="font-size: 13px; color: #666; margin-top: 6px;"&gt;Hosted by Microsoft &amp;amp; NVIDIA at GTC. Join us for an exclusive brunch and discussion on the intelligent ecosystem.&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Fri, 06 Mar 2026 23:44:35 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/microsoft-at-nvidia-gtc-2026/ba-p/4497670</guid>
      <dc:creator>Fernando_Aznar</dc:creator>
      <dc:date>2026-03-06T23:44:35Z</dc:date>
    </item>
    <item>
      <title>Azure Recognized as an NVIDIA Cloud Exemplar, Setting the Bar for AI Performance in the Cloud</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-recognized-as-an-nvidia-cloud-exemplar-setting-the-bar-for/ba-p/4495747</link>
      <description>&lt;P&gt;As AI models continue to scale in size and complexity, cloud infrastructure must deliver more than theoretical peak performance. What matters in practice is reliable, end-to-end, workload-level AI performance—where compute, networking, system software, and optimization work together to deliver predictable, repeatable results at scale. This directly translates to business value: efficient full-stack infrastructure accelerates time-to-market, maximizes ROI on GPU and cloud investments, and enables organizations to scale AI from proof-of-concept to revenue-generating products with predictable economics.&lt;/P&gt;
&lt;P&gt;Today, Microsoft is proud to share an important milestone in partnership with NVIDIA: &lt;STRONG&gt;Azure has been &lt;/STRONG&gt;&lt;STRONG&gt;validated&lt;/STRONG&gt;&lt;STRONG&gt; as an NVIDIA &lt;/STRONG&gt;&lt;STRONG&gt;Exemplar&lt;/STRONG&gt;&lt;STRONG&gt; Cloud&lt;/STRONG&gt;, becoming the &lt;STRONG&gt;first cloud provider recognized for Exemplar-class AI performance aligned with GB300-class (Blackwell generation) systems&lt;/STRONG&gt;.&lt;BR /&gt;This recognition builds on Azure’s previously validated Exemplar status for &lt;STRONG&gt;H100 training workloads&lt;/STRONG&gt; and reflects NVIDIA’s confidence in Azure’s ability to extend that rigor and performance discipline into the next generation of AI platforms.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;What Is &lt;/STRONG&gt;&lt;STRONG&gt;NVIDIA &lt;/STRONG&gt;&lt;STRONG&gt;Exemplar &lt;/STRONG&gt;&lt;STRONG&gt;Cloud&lt;/STRONG&gt;&lt;STRONG&gt;?&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;The &lt;STRONG&gt;NVIDIA &lt;/STRONG&gt;&lt;STRONG&gt;Exemplar &lt;/STRONG&gt;&lt;STRONG&gt;Cloud&lt;/STRONG&gt; initiative celebrates cloud platforms that demonstrate &lt;STRONG&gt;robust&lt;/STRONG&gt;&lt;STRONG&gt; end-to-end AI workload performance&lt;/STRONG&gt; using NVIDIA’s &lt;STRONG&gt;Performance &lt;/STRONG&gt;&lt;STRONG&gt;Benchmarking&lt;/STRONG&gt; suite.&lt;/P&gt;
&lt;P&gt;Rather than relying on synthetic microbenchmarks, Performance Benchmarking evaluates real AI training workloads using:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Large-scale LLM training scenarios&lt;/LI&gt;
&lt;LI&gt;Production-grade software stacks&lt;/LI&gt;
&lt;LI&gt;Optimized system and network configurations&lt;/LI&gt;
&lt;LI&gt;Workload-centric metrics such as throughput and time-to-train&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Achieving Exemplar validation signals that a provider can &lt;STRONG&gt;consistently deliver &lt;/STRONG&gt;&lt;STRONG&gt;world-&lt;/STRONG&gt;&lt;STRONG&gt;class AI performance in the cloud&lt;/STRONG&gt;, showcasing that end users are getting optimal performance value by default.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Proven Exemplar Validation on H100&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Azure’s Exemplar Cloud journey began with &lt;STRONG&gt;publicly shared benchmarking results for H100-based training workloads&lt;/STRONG&gt;, where Azure ND GPU clusters demonstrated &lt;STRONG&gt; &lt;/STRONG&gt;&lt;STRONG&gt;exemplar &lt;/STRONG&gt;&lt;STRONG&gt;performance using NVIDIA &lt;/STRONG&gt;&lt;STRONG&gt;Performance&lt;/STRONG&gt;&lt;STRONG&gt; Benchmarking recipes&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;Those results—published previously and validated through NVIDIA’s benchmarking framework—established a &lt;STRONG&gt;proven foundation of end-to-end AI performance&lt;/STRONG&gt; for large-scale, production workloads running on Azure today.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Extending Exemplar-Class AI Performance to GB300-Class Platforms&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Building on the rigor and learnings from H100 validation, Microsoft has now been &lt;STRONG&gt;recognized&lt;/STRONG&gt;&lt;STRONG&gt; by NVIDIA as the first cloud provider &lt;/STRONG&gt;&lt;STRONG&gt;to achieve &lt;/STRONG&gt;&lt;STRONG&gt; Exemplar-class performance&lt;/STRONG&gt;&lt;STRONG&gt; and&lt;/STRONG&gt;&lt;STRONG&gt; readiness aligned with GB300-class systems&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;This designation reflects NVIDIA’s assessment that the same principles applied to H100—including end-to-end system tuning, networking optimization, and software alignment—are being successfully carried forward into the &lt;STRONG&gt;Blackwell generation&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;Rather than treating GB300 as a point solution, Azure approaches it as a &lt;STRONG&gt;continuation of a proven performance model&lt;/STRONG&gt;: delivering consistent world-class AI performance in the cloud while preserving the flexibility, elasticity, and global scale customers expect.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;What Enables Exemplar-Class AI Performance on Azure&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;Delivering Exemplar-class AI performance requires optimization across the full AI stack:&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Infrastructure and Networking&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;High-performance Azure ND GPU clusters with NVIDIA InfiniBand&lt;/LI&gt;
&lt;LI&gt;NUMA-aware CPU, GPU, and NIC alignment to minimize latency&lt;/LI&gt;
&lt;LI&gt;Tuned NCCL communication paths for efficient multi-GPU scaling&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Software and System Optimization&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Tight integration with NVIDIA software, including Performance Benchmarking recipes and NVIDIA AI Enterprise&lt;/LI&gt;
&lt;LI&gt;Parallelism strategies aligned with large-scale LLM training&lt;/LI&gt;
&lt;LI&gt;Continuous tuning as models, workloads, and system architectures evolve&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;End-to-End Workload Focus&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Measuring real training performance, not isolated component metrics&lt;/LI&gt;
&lt;LI&gt;Driving repeatable improvements in application-level throughput and efficiency&lt;/LI&gt;
&lt;LI&gt;Closing the performance gap between cloud and on-premises systems—without sacrificing manageability&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Together, these capabilities enabled Azure to deliver &lt;STRONG&gt;consistent Exemplar-class AI performance&lt;/STRONG&gt; across generations of NVIDIA platforms.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;What This Means for Customers&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;For customers training and deploying advanced AI models, this milestone delivers clear benefits:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;World-&lt;/STRONG&gt;&lt;STRONG&gt;class AI performance&lt;/STRONG&gt; in a fully managed cloud environment&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Predictable scaling&lt;/STRONG&gt; from small clusters to thousands of GPUs&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Faster time to train&lt;/STRONG&gt; and improved performance per dollar&lt;/LI&gt;
&lt;LI&gt;Confidence that Azure is &lt;STRONG&gt;ready for Blackwell-class and GB300-class AI workloads&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;As AI workloads become more complex and reasoning-heavy, infrastructure performance increasingly determines outcomes. Azure’s NVIDIA Cloud Exemplar recognition provides a clear signal: customers can build and scale next-generation AI systems on Azure &lt;STRONG&gt;without compromising on performance&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Learn&lt;/STRONG&gt;&lt;STRONG&gt; More&lt;/STRONG&gt;&lt;/H3&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;DGX Cloud Benchmarking on Azure&lt;/STRONG&gt;&lt;BR /&gt;&lt;A class="lia-internal-link lia-internal-url lia-internal-url-content-type-blog" href="https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/dgx-cloud-benchmarking-on-azure/4410826" data-lia-auto-title="DGX Cloud Benchmarking on Azure | Microsoft Community Hub" data-lia-auto-title-active="0" target="_blank"&gt;DGX Cloud Benchmarking on Azure | Microsoft Community Hub&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Wed, 18 Feb 2026 22:31:25 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-recognized-as-an-nvidia-cloud-exemplar-setting-the-bar-for/ba-p/4495747</guid>
      <dc:creator>Fernando_Aznar</dc:creator>
      <dc:date>2026-02-18T22:31:25Z</dc:date>
    </item>
    <item>
      <title>Centralized cluster performance metrics with ReFrame HPC and Azure Log Analytics</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/centralized-cluster-performance-metrics-with-reframe-hpc-and/ba-p/4488077</link>
      <description>&lt;P&gt;Imagine having several clusters across different environments (dev, test and prod) or planning a migration between PBS and Slurm or porting codes to a different system. They can all seem like daunting tasks.&lt;/P&gt;
&lt;P&gt;This is where the combination of ReFrame HPC, a powerful and feature rich testing framework, and Azure Log Analytics can help improve confidence and assurance in the performance and accuracy of a system.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here we will look at how to configure ReFrame HPC specifically for Azure: Deploying the required Azure resources, running a test and capturing the results in Log Analytics for analysis.&lt;/P&gt;
&lt;H2&gt;Deploying the required Azure Resources&lt;/H2&gt;
&lt;P&gt;Firstly, deploy the required resources in Azure by using this &lt;A class="lia-external-url" href="https://github.com/JimPaine/reframe-azure-perflog-handler" target="_blank" rel="noopener"&gt;bicep&lt;/A&gt; from GitHub. The deployment includes the creation and configuration of everything required for ReFrame HPC. These resources include a &lt;A class="lia-external-url" href="https://docs.azure.cn/en-us/azure-monitor/data-collection/data-collection-endpoint-overview?tabs=portal" target="_blank" rel="noopener"&gt;data collection endpoint&lt;/A&gt;, a &lt;A class="lia-external-url" href="https://docs.azure.cn/en-us/azure-monitor/data-collection/data-collection-rule-overview" target="_blank" rel="noopener"&gt;data collection rule&lt;/A&gt; and a &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-workspace-overview" target="_blank" rel="noopener"&gt;log analytics workspace&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img&gt;Azure icons for a Data Collection Endpoint, Data Collection Rule with an arrow pointing from them to the icon for Log Analytics Workspace.&lt;/img&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-SPOILER label="⚠️Make sure to capture the URL from the bicep output."&gt;
&lt;P&gt;The structure of the endpoint that is needed later is complex, but the&amp;nbsp;&lt;A class="lia-external-url" href="https://github.com/JimPaine/reframe-azure-perflog-handler/blob/main/main.bicep" target="_blank" rel="noopener"&gt;bicep&lt;/A&gt; generates it and outputs it at the end so make sure to caputure it now.&lt;/P&gt;
&lt;/LI-SPOILER&gt;
&lt;H2&gt;Running ior via ReFrame HPC&lt;/H2&gt;
&lt;P&gt;For the purpose of demonstrating a running test and capturing the results in Azure from start to finish, here is a simple ior test which will run both a read and a write operation against the shared storage.&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;import reframe as rfm import reframe.utility.sanity as sn @rfm.simple_test class SimplePerfTest(rfm.RunOnlyRegressionTest): valid_systems = ["*"] valid_prog_environs = ["+ior"] executable = 'ior' executable_opts = [ '-a POSIX -w -r -C -e -g -F -b 2M -t 2M -s 25600 -o /data/demo/test.bin -D 300' ] reference = { 'tst:hbv4': { 'write_bandwidth_mib': (500, -0.05, 0.1, 'MiB/s'), 'read_bandwidth_mib': (350, -0.05, 0.5, 'MiB/s'), } } @sanity_function def validate_run(self): return sn.assert_found(r'Summary of all tests:', self.stdout) @performance_function('MiB/s') def write_bandwidth_mib(self): return sn.extractsingle(r'^write\s+([0-9]+\.?[0-9]*)', self.stdout, 1, float) @performance_function('MiB/s') def read_bandwidth_mib(self): return sn.extractsingle(r'^read\s+([0-9]+\.?[0-9]*)', self.stdout, 1, float)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Test explanation&lt;/H3&gt;
&lt;P&gt;Set the binary to be executed to ior, along with its arguments.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;executable = 'ior' executable_opts = [ '-a POSIX -w -r -C -e -g -F -b 2M -t 2M -s 25600 -o /data/demo/test.bin -D 300' ]&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Specify which systems the test should run on. In this case, any system/cluster which is known to have ior available will be selected. Look at the ReFrame HPC &lt;A class="lia-external-url" href="https://reframe-hpc.readthedocs.io/en/stable/tutorial.html#systems-and-environments" target="_blank" rel="noopener"&gt;documentation &lt;/A&gt;&amp;nbsp;to get a better understanding of the options available for use.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;valid_systems = ["*"] valid_prog_environs = ["+ior"]&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Verify the stdout of the job by searching for a specific value to assert that it ran successfully.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;@sanity_function def validate_run(self): return sn.assert_found(r'Summary of all tests:', self.stdout)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the sanity function passed it will then extract the performance metrics from the stdout of the job. The naming of the methods is important, as they will be stored in the results later.&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;@performance_function('MiB/s') def write_bandwidth_mib(self): return sn.extractsingle(r'^write\s+([0-9]+\.?[0-9]*)', self.stdout, 1, float) @performance_function('MiB/s') def read_bandwidth_mib(self): return sn.extractsingle(r'^read\s+([0-9]+\.?[0-9]*)', self.stdout, 1, float)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A class="lia-external-url" href="https://reframe-hpc.readthedocs.io/en/stable/tutorial.html#adding-performance-references" target="_blank" rel="noopener"&gt;Performance references&lt;/A&gt;&amp;nbsp; are used to determine if the current cluster has met the requirement or not. It also allows margins to be specified in either direction.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;reference = { 'tst:hbv4': { 'write_bandwidth_mib': (500, -0.05, 0.1, 'MiB/s'), 'read_bandwidth_mib': (350, -0.05, 0.5, 'MiB/s'), } }&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;ReFrame HPC Configuration&lt;/H2&gt;
&lt;P&gt;The ReFrame HPC configuration is key to determine how and where the test will run. It is also where the logic allowing Reframe HPC to use Azure for centralized logging will be defined. The full configuration file is vast and is covered in detail within the&amp;nbsp;&lt;A class="lia-external-url" href="https://reframe-hpc.readthedocs.io/en/stable/tutorial.html#systems-and-environments" target="_blank" rel="noopener"&gt;ReFrame HPC documentation&lt;/A&gt;. For the purpose of this test an example can be found &lt;A class="lia-external-url" href="https://github.com/JimPaine/reframe-azure-perflog-handler/blob/main/config.py" target="_blank" rel="noopener"&gt;on GitHub.&lt;/A&gt; Below is a breakdown of the key parts that allow Reframe HPC to push its results into Azure Log Analytics.&lt;/P&gt;
&lt;H3&gt;Logging Handler&lt;/H3&gt;
&lt;P&gt;The most important part of this configuration is the logging section, without it ReFrame HPC will not attempt to log the results. A handler_perflog of type httpjson is added to enable the logs to be sent to a HTTP endpoint with specific values which our covered below.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;'logging': [ { 'perflog_multiline': True, 'handlers_perflog': [ { 'type': 'httpjson', 'url': 'REDACTED', 'level': 'info', 'debug': False, 'extra_headers': {'Authorization': f'Bearer {_get_token()}'}, 'extras': { 'TimeGenerated': f'{datetime.now(timezone.utc).isoformat()}', 'facility': 'reframe', 'reframe_azure_data_version': '1.0', }, 'ignore_keys': ['check_perfvalues'], 'json_formatter': _format_record } ] }&lt;/LI-CODE&gt;
&lt;H3&gt;Multiline Perflog&lt;/H3&gt;
&lt;P&gt;To ensure this works with Azure, enable &lt;A class="lia-external-url" href="https://reframe-hpc.readthedocs.io/en/stable/config_reference.html#config.logging.perflog_multiline" target="_blank" rel="noopener"&gt;perflog_multiline&lt;/A&gt;. This will ensure a single record per metric is sent to Log Analytics. This is the cleanest way to output the results. Having this set to False will move the metric names into column names, which means that the schema will be different for each test and will become hard to maintain.&lt;/P&gt;
&lt;H3&gt;Extra Headers&lt;/H3&gt;
&lt;P&gt;A bearer token is required to authenticate the request. ReFrame HPC allows the adding of headers via the extra_headers property and a simple Python function, which obtains a scoped token that can be appended to the additional header.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;def _get_token(scope='https://monitor.azure.com/.default') -&amp;gt; str: credential = DefaultAzureCredential() token = credential.get_token(scope) return token.token&lt;/LI-CODE&gt;
&lt;H3&gt;Url Structure&lt;/H3&gt;
&lt;P&gt;The url can be found in the output of the &lt;A class="lia-external-url" href="https://github.com/JimPaine/reframe-azure-perflog-handler/blob/main/main.bicep" target="_blank" rel="noopener"&gt;bicep &lt;/A&gt;&amp;nbsp;which was run previously. It can also be obtained via the portal. Here is the structure of the url for reference.&lt;/P&gt;
&lt;LI-CODE lang=""&gt;'${dce.properties.logsIngestion.endpoint}/dataCollectionRules/${dcr.properties.immutableId}/streams/Custom-${table.name}?api-version=2023-01-01'&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;json Formatter&lt;/H3&gt;
&lt;P&gt;A small work around is needed as the Data Collection Rule expects an array of items and ReFrame HPC outputs a single record. To resolve this another Python function can be used which simply wraps the record up in an array. In this example it also tidys up and removes some items that are not required and would cause issues with the json serialization.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;def _format_record(record, extras, ignore_keys): data = {} for attr, val in record.__dict__.items(): if attr in ignore_keys or attr.startswith('_'): continue data[attr] = val data.update(extras) return json.dumps([data])&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H2&gt;Running the Test&lt;/H2&gt;
&lt;P&gt;Now that the infrastructure has been deployed, the test has been defined and is correctly configured, we can run the test.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Start by logging in. Here I am using the managed identity of the node, but User auth and User Assigned Managed Identities are also supported.&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;$ az login --identity&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;ReFrame HPC can be installed via Spack or Python and, while I am using Spack for packages on the cluster, I find the simplest approach is to activate a Python environment and install ReFrame HPC along with test specfic Python dependencies.&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;$ python3 -m venv .venv $ . .venv/bin/activate $ python -m pip install -U pip $ pip install -r requirements.txt&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now using the ReFrame HPC cli, the test can be run using the configuration file and the test file.&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;$ reframe -C config.py -c simple_perf.py --performance-report -r&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;ReFrame HPC will now run the test against the system/cluster defined in the configuration. For this example it is a Slurm cluster on a partition of HBv4 nodes and running squeue clarifys that.&lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 955 hbv4 rfm_Simp jim.pain R 0:28 1 tst4-hbv4-97&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;Results&lt;/H3&gt;
&lt;P&gt;And there we have it, results are now appearing in Azure! From here we can use kql to query and filter the results. This is just a subset of the values available but the dataset is vast and includes a huge range of values that are extremely helpful.&lt;/P&gt;
&lt;img /&gt;
&lt;H3&gt;Summary&lt;/H3&gt;
&lt;P&gt;By standardizing on the combination of ReFrame HPC and Azure Log Analytics for testing and reporting of performance data across our clusters, whether Slurm based, Azure CycleCloud or existing on-prem clusters, you can gain unprecendented visibility and confidence in the systems you manage and the codes you deploy that were previously hard to obtain. Enabling the potential for:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;🔎Fast cross-cluster comparisions&lt;/LI&gt;
&lt;LI&gt;📈Trend analysis over long running periods&lt;/LI&gt;
&lt;LI&gt;📊Standardized metrics regardless of scheduler or system&lt;/LI&gt;
&lt;LI&gt;☁️Unified monitoring and reporting across clusters&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;ReFrame HPC is suitable for a wide range of testing, so if testing is something you have been looking to implement, take a look at&amp;nbsp;&lt;A class="lia-external-url" href="https://reframe-hpc.readthedocs.io/en/stable/index.html" target="_blank" rel="noopener"&gt;ReFrame HPC&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Fri, 06 Feb 2026 09:37:24 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/centralized-cluster-performance-metrics-with-reframe-hpc-and/ba-p/4488077</guid>
      <dc:creator>jimpaine</dc:creator>
      <dc:date>2026-02-06T09:37:24Z</dc:date>
    </item>
    <item>
      <title>Scaling physics-based digital twins: Neural Concept on Azure delivers a New Record in Industrial AI</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/scaling-physics-based-digital-twins-neural-concept-on-azure/ba-p/4483403</link>
      <description>&lt;H2&gt;Automotive Design and the DrivAerNet++ Benchmark&lt;/H2&gt;
&lt;P&gt;In automotive design, external aerodynamics have a direct impact on performance, energy efficiency, and development cost. Even small reductions in drag can translate into significant fuel savings or extended EV range. As development timelines accelerate, engineering teams increasingly rely on data-driven methods to augment or replace traditional CFD workflows.&lt;/P&gt;
&lt;P&gt;MIT’s DrivAerNet++ dataset is the largest open multimodal dataset for automotive aerodynamics, offering a large-scale benchmark for evaluating learning-based approaches that capture the physical signals required by engineers. It includes 8,000 vehicle geometries across 3 variants (fastback, notchback and estate-back) and aggregates 39 TB of high-fidelity CFD outputs such as surface pressure, wall shear stress, volumetric flow fields, and drag coefficients.&lt;/P&gt;
&lt;H2&gt;Benchmark Highlights&lt;/H2&gt;
&lt;P&gt;Neural Concept trained its geometry-native Geometric Regressor, designed to handle any type of engineering data. The benchmark was executed &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;on Azure HPC infrastructure to evaluate the capabilities of the geometry-native platform under transparent, scalable, and fully reproducible conditions.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Surface pressure:&lt;/STRONG&gt; Lowest prediction error recorded on the benchmark, revealing where high- and low-pressure zones form.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Wall shear stress:&lt;/STRONG&gt; Outperforming all competing methods to detect flow attachment and separation for drag and stability control.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Volumetric velocity field:&lt;/STRONG&gt; More than 50% lower error than previous best, capturing full flow structure for wake stability analysis.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Drag coefficient Cd:&lt;/STRONG&gt; R² of 0.978 on the test set, accurate enough for early design screening without full CFD runs.&amp;nbsp;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Dataset Scale and Ingestion: &lt;/STRONG&gt;39 TB of data was ingested into Neural Concept’s platform through a parallel conversion task with 128 workers and 5 GB RAM each that finished in about 1 hour and produced a compact 3 TB dataset in the platform’s native format.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Data Pre Processing: &lt;/STRONG&gt;Pre-processing the dataset required both large-scale parallelization and the application of our domain-specific best practices for handling external aerodynamics workflows.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Model Training and Deployment: &lt;/STRONG&gt;Training completed in 24 hours on 4 A100 GPUs, with the best model obtained after 16 hours. The final model is compact and real-time predictions can be served on a single 16 GB GPU for industrial use.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Neural Concept outperformed all other competing methods, achieving state-of-the-art performance prediction on all metrics and physical quantities within a week: &lt;BR /&gt;&lt;BR /&gt;“Neural Concept’s breakthrough demonstrates the power of combining advanced AI with the scalability of Microsoft Azure,”&lt;STRONG&gt; said Jack Kabat, Partner, Azure HPC and AI Infrastructure Products, Microsoft&lt;/STRONG&gt;. “By running training and deployment on Azure’s high-performance infrastructure — specifically the &lt;STRONG&gt;NC A100 Virtual Machin&lt;/STRONG&gt;&lt;STRONG&gt;e&lt;/STRONG&gt;— Neural Concept was able to transform 39 terabytes of data into a production-ready workflow in just &lt;STRONG&gt;one week&lt;/STRONG&gt;.&amp;nbsp;&amp;nbsp;&amp;nbsp; This shows how Azure accelerates innovation and helps automotive manufacturers bring better products to market faster.”&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For additional benchmark metrics and comparisons, please refer to the &lt;EM&gt;Detailed Quantitative Results&lt;/EM&gt; section at the end of the article.&lt;/P&gt;
&lt;H2&gt;From State-Of-The-Art Benchmark Accuracy to Proven Industrial Impact&lt;/H2&gt;
&lt;P&gt;Model accuracy alone is necessary, but not sufficient for industrial impact.&amp;nbsp; Transformative gains at scale and over time are only revealed once high-performing models are deployed into maintainable and repeatable workflows across organizations.&lt;/P&gt;
&lt;P&gt;Customers using Neural Concept’s platform have achieved:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;30% shorter design cycles&lt;/LI&gt;
&lt;LI&gt;$20M in savings on a 100,000-unit vehicle program&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;These outcomes fundamentally result from a transformed, systematic approach to design, unlocking better and faster data-driven decisions. The Design Lab interface, described in the next section, is at the core of this transformation.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Within Neural Concept’s ecosystem, validated geometry and physics models can be deployed directly into the Design Lab - a collaborative environment where aerodynamicists and designers evaluate concepts in real time. AI copilots provide instant performance feedback, geometry-aware improvement suggestions, and live KPI updates, effectively reconnecting aerodynamic analysis with the pace of modern vehicle design.&lt;/P&gt;
&lt;H2&gt;CES 2026: See how OEMs are transforming product development with Engineering Intelligence&lt;/H2&gt;
&lt;P&gt;&amp;nbsp;Neural Concept and Microsoft will showcase how AI-native aerodynamic workflows can reshape vehicle development — from real-time design exploration to enterprise-scale deployment. Visit the Microsoft booth to see DrivAerNet++ running on Azure HPC and meet the teams shaping the future of automotive engineering.&lt;BR /&gt;&lt;BR /&gt;&lt;A class="lia-external-url" href="https://www.microsoft.com/en-us/industry/blog/manufacturing-and-mobility/2026/01/07/ces-2026-powering-the-next-frontier-in-automotive/?msockid=164e8be6b4616e2132909e52b5496f4f" target="_blank" rel="noopener"&gt;Visit Microsoft Booth to find out more&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;Neural Concept’s executive team will also be at CES to share flagship results achieved by leading OEMs and Tier-1 suppliers already using the platform in production. Learn more on: &amp;nbsp;&amp;nbsp;&lt;A href="https://www.neuralconcept.com/ces-2026" target="_blank" rel="noopener"&gt;https://www.neuralconcept.com/ces-2026&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;STRONG&gt;Credits&lt;/STRONG&gt;&lt;BR /&gt;Microsoft: Hugo Meiland (Principal Program Manager), Guy Bursell (Director Business Strategy, Manufacturing), Fernando Aznar Cornejo (Product Marketing Manager) and Dr. Lukasz Miroslaw (Sr. Industry Advisor)&lt;BR /&gt;&lt;BR /&gt;Neural Concept: Theophile Allard (CTO), Benoit Guillard (Senior ML Research Scientist), Alexander Gorgin (Product Marketing Engineer), Konstantinos Samaras-Tsakiris (Software Engineer)&lt;/P&gt;
&lt;H2&gt;Detailed Quantitative Results&lt;/H2&gt;
&lt;P&gt;In the sections that follow, we share the results obtained by applying Neural Concept’s aerodynamics predictive model training template to Drivaernet++.&lt;BR /&gt;&lt;BR /&gt;We evaluated our model’s prediction errors using the official train/test split and the standard evaluation strategy. For comparison, metrics from other methods were taken from the &lt;A href="https://drivaernet-leaderboard.lovable.app/" target="_blank" rel="noopener"&gt;public leaderboard.&lt;/A&gt; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;We reported both Mean Squared Error (MSE) and Mean Absolute Error (MAE) to quantify prediction accuracy. Lower values for either metric indicate closer agreement with the ground truth simulations, meaning better predictions.&lt;/P&gt;
&lt;H3&gt;1. Surface Field Predictions: Pressure and Wall Shear Stress&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;We began by evaluating predictions for the two physical quantities defined on the vehicle surface.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Surface Pressure&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The Geometric Regressor&amp;nbsp;&amp;nbsp; achieved substantially better performance than all existing methods in predicting surface pressure distribution.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rank&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Deep Learning Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;MSE (*10-2, lower = better)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;MAE (*10-1, lower = better)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;#1&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Neural Concept&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;STRONG&gt;3.98&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;STRONG&gt;1.08&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;#2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;GAOT (May 2025)&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;4.94&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;1.10&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;#3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;FIGConvNet (February 2025)&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;4.99&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;1.22&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;#4&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;TripNet &lt;BR /&gt;(March 2025)&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;5.14&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;1.25&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;#5&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;RegDGCNN&lt;/P&gt;
&lt;P&gt;(June 2024)&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;8.29&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;1.61&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;EM&gt;Tabl&lt;/EM&gt;&lt;EM&gt;e 1: N&lt;/EM&gt;&lt;EM&gt;eural Concept’s Geometric Regressor predicts &lt;STRONG&gt;surface pressure&lt;/STRONG&gt; more accurately than previously published state-of-the-art methods. The dates indicate when the competing model architectures were published.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;EM&gt;Figure 1: Side-by-side comparison of the ground truth pressure field (left), Neural Concept model’s prediction (middle), and the corresponding error for a representative test sample (right).&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Wall Shear Stress&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Similarly, the model delivered top-tier results, outperforming all competing methods.&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rank&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Deep Learning Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;MSE (*10&lt;SUP&gt;-2&lt;/SUP&gt;, lower = better)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;MAE (*10&lt;SUP&gt;-1&lt;/SUP&gt;, lower = better)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;#1&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Neural Concept&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;STRONG&gt;7.80&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;STRONG&gt;1.44&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;EM&gt;#2&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A href="https://arxiv.org/abs/2505.18781v2" target="_blank" rel="noopener"&gt;&lt;EM&gt;GAOT&lt;/EM&gt;&lt;/A&gt;&lt;EM&gt; (May 2025)&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;EM&gt;8.74&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;EM&gt;1.57&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;EM&gt;#3&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A href="https://arxiv.org/abs/2503.17400" target="_blank" rel="noopener"&gt;&lt;EM&gt;TripNet&lt;/EM&gt;&lt;/A&gt;&lt;EM&gt; (March 2025)&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;EM&gt;9.52&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;EM&gt;2.15&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;EM&gt;#4&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A href="https://arxiv.org/abs/2502.04317" target="_blank" rel="noopener"&gt;&lt;EM&gt;FIGConvNet&lt;/EM&gt;&lt;/A&gt;&lt;EM&gt; &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;(Feb. 2025)&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;EM&gt;9.86&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;EM&gt;2.22&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;EM&gt;#5&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A href="https://arxiv.org/abs/2406.09624" target="_blank" rel="noopener"&gt;&lt;EM&gt;RegDGCNN&lt;/EM&gt;&lt;/A&gt;&lt;EM&gt; &lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;(June 2024)&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;EM&gt;13.82&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;EM&gt;3.64&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;EM&gt;Table &lt;/EM&gt;&lt;EM&gt;2&lt;/EM&gt;&lt;EM&gt;: Neural Concept’s Geometric Regressor predicts &lt;STRONG&gt;wall shear stress&lt;/STRONG&gt; more accurately than previously published state-of-the-art methods.&lt;/EM&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="0" style="width: 100%; border-width: 0px;"&gt;&lt;colgroup&gt;&lt;col style="width: 33.3333%" /&gt;&lt;col style="width: 33.3333%" /&gt;&lt;col style="width: 33.3333%" /&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="border-width: 0px;"&gt;&lt;img /&gt;&lt;/td&gt;&lt;td style="border-width: 0px;"&gt;&lt;img /&gt;&lt;/td&gt;&lt;td style="border-width: 0px;"&gt;&lt;img /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Figure&lt;/EM&gt;&lt;EM&gt; 2: S&lt;/EM&gt;&lt;EM&gt;ide-by-side comparison of the ground truth &lt;STRONG&gt;magnitude of the wall shear stress&lt;/STRONG&gt;, Neural Concept model’s prediction, and the corresponding error for a representative test sample.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Across both surface fields (pressure and wall shear stress), the Geometric Regressor achieved the lowest MSE and MAE by a clear margin. The baseline methods represent several high-quality and recent academic work (the earliest being from June 2024), yet our architecture established a new state-of-the-art in predictive performance.&lt;/P&gt;
&lt;H3&gt;2. Volumetric Predictions: Velocity&amp;nbsp;&lt;/H3&gt;
&lt;P&gt;Beyond surface quantities, DrivAerNet++ provides 3D velocity fields in the flow volume surrounding the vehicle, which we also predicted using the Geometric Regressor.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rank&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Deep Learning Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;MSE (lower = better)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;MAE (*10&lt;SUP&gt;-1&lt;/SUP&gt;, lower = better)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;#1&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Neural Concept&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;STRONG&gt;3.11&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;STRONG&gt;9.22&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;EM&gt;#2&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;A href="https://arxiv.org/abs/2503.17400" target="_blank" rel="noopener"&gt;&lt;EM&gt;TripNet&lt;/EM&gt;&lt;/A&gt;&lt;EM&gt; (March 2025)&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;EM&gt;6.71&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;EM&gt;15.2&lt;/EM&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;EM&gt;Table&lt;/EM&gt;&lt;EM&gt; 3:&lt;/EM&gt;&lt;EM&gt; Neural Concept’s Geometric Regressor predicts velocity more accurately than the previously published state-of-the-art method.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;The illustration below shows the velocity magnitude for two test samples. Note that only a single 2D slice of the 3D volumetric domain is shown here, focusing on the wake region behind the car. In practice, the network predicts velocity at any location within the full 3D domain, not just on this slice.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="0" style="width: 100%; border-width: 0px;"&gt;&lt;colgroup&gt;&lt;col style="width: 50%" /&gt;&lt;col style="width: 50%" /&gt;&lt;/colgroup&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="border-width: 0px;"&gt;&lt;img /&gt;&lt;/td&gt;&lt;td style="border-width: 0px;"&gt;&lt;img /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="border-width: 0px;"&gt;&lt;img /&gt;&lt;/td&gt;&lt;td style="border-width: 0px;"&gt;&lt;img /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style="border-width: 0px;"&gt;&lt;img /&gt;&lt;/td&gt;&lt;td style="border-width: 0px;"&gt;&lt;img /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Figure &lt;/EM&gt;&lt;EM&gt;3&lt;/EM&gt;&lt;EM&gt;:&lt;STRONG&gt; Velocity magnitude&lt;/STRONG&gt; for two test samples, arranged in two columns (left and right). For each sample, the top row displays the simulated velocity field, the middle row shows the prediction from the network, and the bottom row presents the error between the two.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;3. Scalar Predictions: Drag Coefficient&lt;/H3&gt;
&lt;P&gt;The drag coefficient (Cd) is the most critical parameter in automotive aerodynamics, as reducing it directly translates to lower fuel consumption in combustion vehicles and increased range in electric vehicles. Using the same underlying architecture, our model achieved state-of-the-art performance in Cd prediction.&lt;/P&gt;
&lt;P&gt;In addition to MSE and MAE, we reported the Maximum Absolute Error (Max AE) to reflect worst-case accuracy. We also included the Coefficient of Determination (R² score), which measures the proportion of variance explained by the model. An R² value of 1 indicates a perfect fit to the target data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 60.6481%; border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Rank&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Deep Learning Model&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;MSE (*1e-5)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;MAE (*1e-3)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Max AE (*1e-2)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;R²&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;#1&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Neural Concept&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;STRONG&gt;0.8&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;STRONG&gt;2.22&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;STRONG&gt;1.13&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;&lt;STRONG&gt;0.978&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;#2&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;TripNet&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;9.1&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;7.19&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;7.70&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;0.957&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;#3&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;PointNet&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;14.9&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;9.60&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;12.45&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;0.643&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;#4&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;RegDGCNN&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;14.2&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;9.31&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;12.79&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;0.641&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;#5&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;GCNN&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;17.1&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;10.43&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;15.03&lt;/P&gt;
&lt;/td&gt;&lt;td class="lia-align-center"&gt;
&lt;P&gt;0.596&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;On the official split, the model shows tight agreement with CFD (R² of 0.978) across the test set, which is sufficient for early design screening where engineers need to rank variants confidently and spot meaningful gains without running full simulations for every change.&lt;/P&gt;
&lt;H3&gt;4. Compute Efficiency and Azure HPC&amp;amp;AI Collaboration&lt;/H3&gt;
&lt;P&gt;Executing the full DrivAerNet++ benchmark at industrial scale required Neural Concept’s full software and infrastructure stack combined with seamless cloud integration on Microsoft Azure to dynamically scale computing resources on demand. The entire pipeline runs natively on Microsoft Azure and can scale within minutes, allowing us to process new industrial datasets that contain thousands of geometries without complex capacity planning.&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Dataset Scale and Ingestion&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;DrivAerNet++ dataset contains 8000 car designs along with their corresponding CFD simulations. The raw dataset occupies approximately 39TB of storage. Generating the simulations required a total of about 3 million CPU hours by MIT’s DeCoDE Lab.&lt;/P&gt;
&lt;P&gt;Ingestion into Neural Concept’s platform is the first step of the pipeline.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;To convert the raw data into the platform’s native format, we use a Conversion task that transforms raw files into the platform’s optimized native format.&lt;/LI&gt;
&lt;LI&gt;This task was parallelized with 128 workers; each allocated 5 GB of RAM.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;As a result, the entire conversion process was completed in approximately one hour only. After converting the relevant data (car geometry, wall shear stress, pressure, and velocity), the full dataset occupies approximately 3 TB in Neural Concept’s native format.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Data Pre-Processing&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Pre-processing the dataset required both large-scale parallelization and the application of our domain-specific best practices. During this phase, workloads were distributed across multiple compute nodes with peak memory usage &amp;nbsp;&amp;nbsp;reaching approximately 1.5 TB of RAM.&lt;/P&gt;
&lt;P&gt;The pre-processing pipeline consists of two main stages. In the first stage, we repaired the car meshes and pre-computed geometric features needed for training. The second stage involved filtering the volumetric domain and re-sampling points to follow a spatial distribution that is more efficient for training our deep learning model.&lt;/P&gt;
&lt;P&gt;We scaled the compute resources so that each of the two stages in the pipeline completes in 1 to 3 hours when processing the full dataset. The first stage is the most computationally intensive. To handle it efficiently, we parallelized the task across 256 independent workers, each allocated 6 GB of RAM.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Model Training and Deployment&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;While we use state-of-the-art hardware for training, our performance gains come primarily from model design. Once trained, the model remains lightweight and cost-effective to run.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Training was performed on Azure Standard_NC96ads_A100_v4 node, which provided access to four A100 GPUs, each with 80 GB of memory.&lt;BR /&gt;&lt;BR /&gt;&lt;/LI&gt;
&lt;LI&gt;The model was trained for approximately &lt;STRONG&gt;24 hours&lt;/STRONG&gt;.&amp;nbsp;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Neural Concept’s Geometric Regressor achieved the best reported performance on the official benchmark for surface pressure, wall shear stress, volumetric velocity and drag prediction. &amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 12 Jan 2026 12:10:23 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/scaling-physics-based-digital-twins-neural-concept-on-azure/ba-p/4483403</guid>
      <dc:creator>lmiroslaw</dc:creator>
      <dc:date>2026-01-12T12:10:23Z</dc:date>
    </item>
    <item>
      <title>mpi-stage: High-Performance File Distribution for HPC Clusters</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/mpi-stage-high-performance-file-distribution-for-hpc-clusters/ba-p/4484366</link>
      <description>&lt;P data-line="6"&gt;When running containerized workloads on HPC clusters, one of the first problems you hit is getting container images onto the nodes quickly and repeatably. A&amp;nbsp;.sqsh&amp;nbsp;is a Squashfs image (commonly used by container runtimes on HPC). In some environments you&amp;nbsp;&lt;EM&gt;can&lt;/EM&gt;&amp;nbsp;run a Squashfs image directly from shared storage, but at scale that often turns the shared filesystem into a hot spot.&lt;/P&gt;
&lt;P data-line="8"&gt;Copying the image to local NVMe keeps startup time predictable and avoids hundreds of nodes hammering the same source during job launch.&lt;/P&gt;
&lt;P data-line="10"&gt;In this post, I'll introduce&amp;nbsp;&lt;A href="https://github.com/edwardsp/mpi-stage" target="_blank" rel="noopener" data-href="https://github.com/edwardsp/mpi-stage"&gt;mpi-stage&lt;/A&gt;, a lightweight tool that uses MPI broadcasts to distribute large files across cluster nodes at speeds that can saturate the backend network.&lt;/P&gt;
&lt;H2 data-line="12"&gt;The Problem: Staging Files at Scale&lt;/H2&gt;
&lt;P data-line="14"&gt;On an&amp;nbsp;&lt;A href="https://github.com/Azure/ai-infrastructure-on-azure/tree/main/infrastructure_references/azure_cyclecloud_workspace_for_slurm" target="_blank" rel="noopener" data-href="https://github.com/Azure/ai-infrastructure-on-azure/tree/main/infrastructure_references/azure_cyclecloud_workspace_for_slurm"&gt;Azure CycleCloud Workspace for Slurm&lt;/A&gt;&amp;nbsp;cluster with&amp;nbsp;&lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nd-gb300-v6-series?tabs=sizebasic" target="_blank" rel="noopener" data-href="https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nd-gb300-v6-series?tabs=sizebasic"&gt;GB300&lt;/A&gt;&amp;nbsp;GPU nodes, I needed to stage a large Squashfs container image from shared storage onto each node's local NVMe storage before launching training jobs.&lt;/P&gt;
&lt;P data-line="16"&gt;At small scale you can often get away with ad-hoc copies, but once hundreds of nodes are all trying to read the same source file, the shared source filesystem quickly becomes the bottleneck.&lt;/P&gt;
&lt;P data-line="18"&gt;I tried several approaches:&lt;/P&gt;
&lt;H3 data-line="20"&gt;Attempt 1: Slurm's sbcast&lt;/H3&gt;
&lt;P data-line="22"&gt;Slurm's built-in&amp;nbsp;&lt;A href="https://slurm.schedmd.com/sbcast.html" target="_blank" rel="noopener" data-href="https://slurm.schedmd.com/sbcast.html"&gt;sbcast&lt;/A&gt;&amp;nbsp;seemed like the natural choice. In my quick testing it was slower than I wanted, and the overwrite/skip-existing behavior didn't match the "fast no-op if already present" workflow I was after. I didn't spend much time exploring all the configuration options before moving on.&lt;/P&gt;
&lt;H3 data-line="24"&gt;Attempt 2: Shell Script Fan-Out&lt;/H3&gt;
&lt;P data-line="26"&gt;I wrote a shell script using a tree-based fan-out approach: copy to N nodes, then each of those copies to N more, and so on. This worked and scaled reasonably, but had some drawbacks:&lt;/P&gt;
&lt;OL data-line="28"&gt;
&lt;LI data-line="28"&gt;&lt;STRONG&gt;Multiple stages&lt;/STRONG&gt;: The script required orchestrating multiple rounds of copy commands, adding complexity&lt;/LI&gt;
&lt;LI data-line="29"&gt;&lt;STRONG&gt;Source filesystem stress&lt;/STRONG&gt;: Even with fan-out, the initial copies still hit the source filesystem simultaneously — a fan-out of 4 meant 4 nodes competing for source bandwidth&lt;/LI&gt;
&lt;LI data-line="30"&gt;&lt;STRONG&gt;Frontend network&lt;/STRONG&gt;: Copies went over the Ethernet network by default — I could have configured IPoIB, but that added more setup&lt;/LI&gt;
&lt;/OL&gt;
&lt;H3 data-line="32"&gt;The Solution: MPI Broadcasts&lt;/H3&gt;
&lt;P data-line="34"&gt;The key insight was that MPI's broadcast primitive (MPI_Bcast) is specifically optimized for one-to-many data distribution. Modern MPI implementations like HPC-X use tree-based algorithms that efficiently utilize the high-bandwidth, low-latency InfiniBand network.&lt;/P&gt;
&lt;P data-line="36"&gt;With&amp;nbsp;&lt;A href="https://github.com/edwardsp/mpi-stage" target="_blank" rel="noopener" data-href="https://github.com/edwardsp/mpi-stage"&gt;mpi-stage&lt;/A&gt;:&lt;/P&gt;
&lt;UL data-line="37"&gt;
&lt;LI data-line="37"&gt;&lt;STRONG&gt;Single source read&lt;/STRONG&gt;: Only one node reads from the source filesystem&lt;/LI&gt;
&lt;LI data-line="38"&gt;&lt;STRONG&gt;Backend network utilization&lt;/STRONG&gt;: Data flows over InfiniBand using optimized MPI collectives&lt;/LI&gt;
&lt;LI data-line="39"&gt;&lt;STRONG&gt;Intelligent skipping&lt;/STRONG&gt;: Nodes that already have the file (verified by size or checksum) skip the copy entirely&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="41"&gt;Combined, this keeps the shared source (NFS, Lustre, blobfuse, etc.) from being hammered by many concurrent readers while still taking full advantage of the backend fabric.&lt;/P&gt;
&lt;H2 data-line="43"&gt;How It Works&lt;/H2&gt;
&lt;P data-line="45"&gt;&lt;A href="https://github.com/edwardsp/mpi-stage" target="_blank" rel="noopener" data-href="https://github.com/edwardsp/mpi-stage"&gt;mpi-stage&lt;/A&gt; is designed around a simple workflow:&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-line="81"&gt;The source node reads the file in chunks and streams each chunk via&amp;nbsp;MPI_Bcast. Destination nodes write each chunk to local storage immediately upon receipt. This streaming approach means the entire file never needs to fit in memory — only a small buffer is required.&lt;/P&gt;
&lt;H3 data-line="83"&gt;Key Features&lt;/H3&gt;
&lt;OL&gt;
&lt;LI data-line="85"&gt;&lt;STRONG&gt; Pre-copy Validation&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="87"&gt;Before any data is transferred, each node checks if the destination file already exists and matches the source. You can choose between:&lt;/P&gt;
&lt;UL data-line="89"&gt;
&lt;LI data-line="89"&gt;&lt;STRONG&gt;Size check&lt;/STRONG&gt;&amp;nbsp;(default): Fast comparison of file sizes—sufficient for most use cases&lt;/LI&gt;
&lt;LI data-line="90"&gt;&lt;STRONG&gt;Checksum&lt;/STRONG&gt;: Stronger validation, but requires reading the full file and is therefore slower&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-line="92"&gt;If all nodes already have the correct file,&amp;nbsp;&lt;A href="https://github.com/edwardsp/mpi-stage" target="_blank" rel="noopener" data-href="https://github.com/edwardsp/mpi-stage"&gt;mpi-stage&lt;/A&gt;&amp;nbsp;completes in milliseconds with no data transfer.&lt;/P&gt;
&lt;OL start="2"&gt;
&lt;LI data-line="94"&gt;&lt;STRONG&gt; Double-Buffered Transfers&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="96"&gt;The implementation uses double-buffered, chunked transfers to overlap network communication with disk I/O. While one buffer is being broadcast, the next chunk is being read from the source.&lt;/P&gt;
&lt;OL start="3"&gt;
&lt;LI data-line="98"&gt;&lt;STRONG&gt; Post-copy Validation&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="100"&gt;Optionally verify that all nodes received the file correctly after the copy completes.&lt;/P&gt;
&lt;OL start="4"&gt;
&lt;LI data-line="102"&gt;&lt;STRONG&gt; Single-Writer Per Node&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P data-line="104"&gt;The tool enforces one MPI rank per node to prevent filesystem contention and ensure predictable performance.&lt;/P&gt;
&lt;H2 data-line="106"&gt;Real-World Performance&lt;/H2&gt;
&lt;P data-line="108"&gt;In one run using 156 GPU nodes, distributing a container image achieved approximately&amp;nbsp;&lt;STRONG&gt;3 GB/s effective distribution rate (file_size/time)&lt;/STRONG&gt;, completing in just over 5 seconds:&lt;/P&gt;
&lt;LI-CODE lang="shell"&gt;[0] Copy required: yes
[0] Starting copy phase (source writes: yes)
[0] Copy complete, Bandwidth: 3007.14 MB/s
[0] Post-validation complete
[0] Timings (s):
  Topology check:    5.22463
  Source metadata:   0.00803746
  Pre-validation:    0.0046786
  Copy phase:        5.21189
  Post-validation:   2.2944e-05
  Total time:        5.2563&lt;/LI-CODE&gt;
&lt;P data-line="124"&gt;Because every node writes the file to its own local NVMe, the&amp;nbsp;&lt;EM&gt;cumulative&lt;/EM&gt;&amp;nbsp;write rate across the cluster is roughly this number times the node count: ~3 GB/s × 156 ≈&amp;nbsp;&lt;STRONG&gt;~468 GB/s of total local writes&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2 data-line="128"&gt;Workflow: Container Image Distribution&lt;/H2&gt;
&lt;P data-line="130"&gt;The primary use case is distributing Squashfs images to local NVMe before launching containerized workloads. Run&amp;nbsp;&lt;A href="https://github.com/edwardsp/mpi-stage" target="_blank" rel="noopener" data-href="https://github.com/edwardsp/mpi-stage"&gt;mpi-stage&lt;/A&gt; as a job step before your main application:&lt;/P&gt;
&lt;LI-CODE lang="shell"&gt;#!/bin/bash
#SBATCH --job-name=my-training-job
#SBATCH --ntasks-per-node=1
#SBATCH --exclusive

# Stage the container image
srun --mpi=pmix ./mpi_stage \
    --source /shared/images/pytorch.sqsh \
    --dest /nvme/images/pytorch.sqsh \
    --pre-validate size \
    --verbose

# Run the actual job (from local NVMe - much faster!)
srun --container-image=/nvme/images/pytorch.sqsh ...&lt;/LI-CODE&gt;
&lt;P data-line="149"&gt;&lt;A href="https://github.com/edwardsp/mpi-stage" target="_blank" rel="noopener" data-href="https://github.com/edwardsp/mpi-stage"&gt;mpi-stage&lt;/A&gt;&amp;nbsp;will create the destination directory if it doesn't exist.&lt;/P&gt;
&lt;P data-line="151"&gt;If your container runtime supports running the image directly from shared storage, you may not strictly need this step—but staging to local NVMe tends to be faster and more predictable at large scale.&lt;/P&gt;
&lt;P data-line="153"&gt;Because of the pre-validation, you can include this step in every job script without penalty—if the image is already present, it completes in milliseconds.&lt;/P&gt;
&lt;H2 data-line="155"&gt;Getting Started&lt;/H2&gt;
&lt;LI-CODE lang="shell"&gt;git clone https://github.com/edwardsp/mpi-stage.git
cd mpi-stage
make&lt;/LI-CODE&gt;
&lt;P data-line="163"&gt;For detailed usage and options, see the&amp;nbsp;&lt;A href="https://github.com/edwardsp/mpi-stage" target="_blank" rel="noopener" data-href="https://github.com/edwardsp/mpi-stage"&gt;README&lt;/A&gt;.&lt;/P&gt;
&lt;H2 data-line="165"&gt;Summary&lt;/H2&gt;
&lt;P data-line="167"&gt;&lt;A href="https://github.com/edwardsp/mpi-stage" target="_blank" rel="noopener" data-href="https://github.com/edwardsp/mpi-stage"&gt;mpi-stage&lt;/A&gt;&amp;nbsp;started as a solution to a very specific problem—staging large container images efficiently across a large GPU cluster—but the same pattern may be useful in other scenarios where many nodes need the same large file.&lt;/P&gt;
&lt;P data-line="169"&gt;By using MPI broadcasts, only a single node reads from the source filesystem, while data is distributed over the backend network using optimized collectives. In practice, this can significantly reduce load on shared filesystems and cloud-backed mounts, such as Azure Blob Storage accessed via&amp;nbsp;&lt;A href="https://github.com/Azure/azure-storage-fuse" target="_blank" rel="noopener" data-href="https://github.com/Azure/azure-storage-fuse"&gt;blobfuse2&lt;/A&gt;, where hundreds of concurrent readers can otherwise become a bottleneck.&lt;/P&gt;
&lt;P data-line="171"&gt;While container images were the initial focus, this approach could also be applied to staging training datasets, distributing model checkpoints or pretrained weights, or copying large binaries to local NVMe before a job starts. Anywhere that a “many nodes, same file” pattern exists is a potential fit.&lt;/P&gt;
&lt;P data-line="173"&gt;If you're running large-scale containerized workloads on Azure HPC infrastructure, give it a try. If you use&amp;nbsp;&lt;A href="https://github.com/edwardsp/mpi-stage" target="_blank" rel="noopener" data-href="https://github.com/edwardsp/mpi-stage"&gt;mpi-stage&lt;/A&gt; in other workflows, I'd love to hear what worked (and what didn't). Feedback and contributions are welcome.&lt;/P&gt;
&lt;P data-line="173"&gt;&lt;EM&gt;Have questions or feedback? Leave a comment below or open an issue on &lt;A href="https://github.com/edwardsp/mpi-stage" target="_blank" rel="noopener" data-href="https://github.com/edwardsp/mpi-stage"&gt;GitHub&lt;/A&gt;.&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 09 Jan 2026 10:24:34 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/mpi-stage-high-performance-file-distribution-for-hpc-clusters/ba-p/4484366</guid>
      <dc:creator>pauledwards</dc:creator>
      <dc:date>2026-01-09T10:24:34Z</dc:date>
    </item>
    <item>
      <title>Azure V710 V5 Series -AMD Radeon GPU - Validation of Siemens CAD -NX</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-v710-v5-series-amd-radeon-gpu-validation-of-siemens-cad-nx/ba-p/4483791</link>
      <description>&lt;H4&gt;&lt;STRONG&gt; Overview of Siemens NX&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Siemens NX is a &lt;STRONG&gt;next-generation integrated CAD/CAM/CAE platform&lt;/STRONG&gt; used by aerospace, automotive, industrial machinery, energy, medical, robotics, and defense manufacturers.&lt;BR /&gt;It spans:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Complex 3D modeling&lt;/LI&gt;
&lt;LI&gt;Assemblies containing thousands to millions of parts&lt;/LI&gt;
&lt;LI&gt;Surfacing and composites&lt;/LI&gt;
&lt;LI&gt;Tolerance engineering&lt;/LI&gt;
&lt;LI&gt;CAM and machining simulation&lt;/LI&gt;
&lt;LI&gt;Integrated multi physics through Simcenter / NX Nastran&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Because NX is used to design &lt;STRONG&gt;real-world engineered systems&lt;/STRONG&gt; — aircraft structures, automotive platforms, satellites, robotic arms, injection molds — its usability and performance directly affect engineering velocity and product timelines.&lt;/P&gt;
&lt;P&gt;&lt;STRONG style="color: rgb(30, 30, 30); font-size: 24px;"&gt;NX Needs GPU Acceleration&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;NX is highly visual.&lt;BR /&gt;It leans heavily on:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;OpenGL acceleration&lt;/LI&gt;
&lt;LI&gt;Shader-based rendering&lt;/LI&gt;
&lt;LI&gt;Hidden line removal&lt;/LI&gt;
&lt;LI&gt;Real-time shading / material rendering&lt;/LI&gt;
&lt;LI&gt;Ray-Traced Studio for photorealistic output&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI&gt;Switch shading modes → CAD content must stay readable&lt;/LI&gt;
&lt;LI&gt;Zoom, section, annotate → requires stable frame pacing&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;&lt;STRONG&gt;NVads V710 v5-Series on Azure&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;The NVads V710 v5-series virtual machines on Azure are designed for GPU-accelerated workloads and virtual desktop environments. Key highlights:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Hardware Specs:&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;o&amp;nbsp;&amp;nbsp; GPU: AMD Radeon™ Pro V710 (up to 24 GiB frame buffer; fractional GPU options available).&lt;/P&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;o&amp;nbsp;&amp;nbsp; CPU: AMD EPYC™ 9V64 F (Genoa) with SMT, base frequency 3.95 GHz, peak 4.3 GHz.&lt;/P&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;o&amp;nbsp;&amp;nbsp; Memory: 16 GiB to 160 GiB.&lt;/P&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;o&amp;nbsp;&amp;nbsp; Storage: NVMe-based ephemeral local storage supported.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;VM Sizes:&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;o&amp;nbsp;&amp;nbsp; Ranges from Standard_NV4ads_V710_v5 (4 vCPUs, 16 GiB RAM, 1/6 GPU) to Standard_NV28adms_V710_v5 (28 vCPUs, 160 GiB RAM, full GPU).&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Supported Features:&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;o&amp;nbsp;&amp;nbsp; Premium storage, accelerated networking, ephemeral OS disk.&lt;/P&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;o&amp;nbsp;&amp;nbsp; Both Windows and Linux VMs supported.&lt;/P&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;o&amp;nbsp;&amp;nbsp; No additional GPU licensing is required.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&amp;nbsp;AMD Radeon™ PRO GPUs offer:&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;o&amp;nbsp;&amp;nbsp; Optimized OpenGL professional driver stack&lt;/P&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;o&amp;nbsp;&amp;nbsp; Stable interactive performance vs large assemblies&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Business Scenario Enabled by NX + Cloud GPU&lt;/STRONG&gt;&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Engineering Anywhere&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;Distributed teams can securely work on the same assemblies from any geographic region.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Supplier Ecosystem Collaboration&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;Tier-1/2 manufacturers and engineering partners can access controlled models without local high-end workstations.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Secure IP Protection&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;Data stays in Azure — files never leave the controlled workspace.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Faster Engineering Cycles&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;Visualization + simulation accelerate design reviews, decision making, and manufacturability evaluations.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Scalable Cost Model&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="lia-indent-padding-left-60px"&gt;Pay for compute only when needed — ideal for burst design cycles and testing workloads.&lt;/P&gt;
&lt;P&gt;&lt;STRONG style="color: rgb(30, 30, 30); font-size: 24px;"&gt;Architecture Overview – Siemens NX on Azure NVads_v710&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;STRONG&gt;Key Architecture Elements&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI style="list-style-type: none;"&gt;
&lt;UL&gt;
&lt;LI&gt;Create Azure Virtual Machine- NVads_v710_24&lt;/LI&gt;
&lt;LI&gt;Install Azure AMD V710 GPU drivers&lt;/LI&gt;
&lt;LI&gt;Deploy Azure File-based storage&lt;BR /&gt;Hosting assemblies, metadata, drawing packages, PMI, simulation data.&lt;/LI&gt;
&lt;LI&gt;Configure Vnet with Accelerated Networking&lt;/LI&gt;
&lt;LI&gt;Install NX licenses and software.&lt;/LI&gt;
&lt;LI&gt;Install NXCP &amp;amp; ATS Test suites on the Virtual Machine&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4&gt;&lt;STRONG&gt;Qualitative Benchmark on Azure NVads_v710_24&lt;/STRONG&gt;&lt;/H4&gt;
&lt;P&gt;Siemens has approved the following qualitative test results. The certification matrix update is currently in progress.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Technical variant:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Complex assemblies with thousands of components maintained smooth rotation, zooming, and selection, even under concurrent session load.&lt;/EM&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;NXCP and ATS test results on NVads_v710_24&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;STRONG&gt;Non-Interactive test results:&lt;/STRONG&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Note: Execution Time (seconds)&lt;/P&gt;
&lt;P&gt;ATS Non‑Interactive Test Results validate the correctness and stability of Siemens NX graphical rendering by comparing generated images against approved reference outputs. The minimal or zero pixel differences confirm deterministic and visually consistent rendering, indicating a stable GPU driver and visualization pipeline. The reported test execution times (in seconds) represent the duration required to complete each automated graphics validation scenario, demonstrating predictable and repeatable processing performance under non‑interactive conditions&lt;STRONG&gt;.&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;STRONG&gt;Interactive test results&lt;/STRONG&gt;&lt;STRONG&gt; on Azure NVads_v710_24:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;Note: Execution Time (seconds)&lt;/P&gt;
&lt;P&gt;ATS Interactive Test Results evaluate Siemens NX graphics behavior during real‑time user interactions such as rotation, zoom, pan, sectioning, and view manipulation. The results demonstrate stable and consistent rendering during interactive workflows, confirming that the GPU driver and visualization stack reliably support user‑driven NX operations.&lt;BR /&gt;The measured execution times (in seconds) reflect the responsiveness of each interactive graphics operation, indicating predictable behavior under live, user‑controlled conditions rather than peak performance tuning.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td colspan="2"&gt;
&lt;P&gt;&lt;STRONG&gt;NX CAD functions&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Automatic Tests&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Interactive Tests&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td rowspan="5"&gt;
&lt;P&gt;Grace1 Basic Tests&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; GrPlayer_xp64.exe &amp;lt;FILE&amp;gt; Basic_Features.tgl&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; GrPlayer_xp64.exe &amp;lt;FILE&amp;gt; Fog_Measurement_Clipping.tgl&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; GrPlayer_xp64.exe &amp;lt;FILE&amp;gt; lighting.tgl&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; GrPlayer_xp64.exe &amp;lt;FILE&amp;gt; Shadow_Bump_Environment.tgl&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; GrPlayer_xp64.exe &amp;lt;FILE&amp;gt; Texture_Map.tgl&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Grace2 Graphics Tests&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; GrPlayer_64.exe &amp;lt;FILE&amp;gt; GrACETrace.tgl&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;Grace2 Graphics Tests&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; GrPlayer_64.exe &amp;lt;FILE&amp;gt; GrACETrace.tgl&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="border-width: 1px;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td colspan="2" rowspan="2"&gt;
&lt;P&gt;&lt;STRONG&gt;NXCP Test Scenarios&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Automatic Tests&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td rowspan="14"&gt;
&lt;P&gt;NXCP Gdat Tests&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_1.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_2.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_4.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_5.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_6.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_7.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_8.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_9.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_10.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_11.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_12.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_13.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_14.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; gdat_leg_xp64.exe -infile &amp;lt;FILE&amp;gt; leg_gfx_cert_15.cgi&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&amp;nbsp;Passed!&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H4&gt;&lt;STRONG&gt;Benefits Azure NVads_v710 (AMD GPU Platform for NX&lt;/STRONG&gt;&lt;/H4&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Workstation-class AMD Radeon PRO graphics drivers baked into Azure&lt;/STRONG&gt;&lt;BR /&gt;Ensures ISV-validated driver pipeline.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Excellent performance for CAD workloads&lt;/STRONG&gt;&lt;BR /&gt;Makes GPU-accelerated NX accessible to wider user bases.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Remote engineering enablement&lt;/STRONG&gt;&lt;BR /&gt;Critical for companies who now operate global design teams.&lt;/LI&gt;
&lt;LI&gt;&amp;nbsp;&lt;STRONG&gt;Elastic scale&lt;/STRONG&gt;&lt;BR /&gt;Spin up GPU when development peaks; scale down when idle.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG style="color: rgb(30, 30, 30); font-size: 24px;"&gt;Conclusion:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Siemens NX on Azure NVads_v710 powered by AMD GPUs enables enterprise-class CAD/CAM/CAE experiences in the cloud. NX benefits directly from workstation-grade OpenGL optimization, shading stability, and Ray Traced Studio acceleration, allowing engineers to interact smoothly with large assemblies, run visualization workloads, and perform design reviews without local hardware dependencies.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Right‑sized GPU delivers workstation‑class experience at lower cost&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The family enables fractional GPU allocation (down to 1/6 of a Radeon™ Pro V710), allowing Siemens NX deployments to be right‑sized per user role. This avoids over‑provisioning full GPUs while still delivering ISV‑grade OpenGL and visualization stability, resulting in a lower per‑engineer cost compared to fixed full‑GPU cloud or on‑prem workstations&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Elastic scale improves cost efficiency for burst engineering workloads&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;NVads_V710_v5 instances support &lt;STRONG&gt;on demand scaling and ephemeral NVMe storage&lt;/STRONG&gt;, allowing NX environments to scale up for design reviews, supplier collaboration, or peak integration cycles and scale down when idle. This consumption model provides a &lt;STRONG&gt;cost advantage over fixed on prem workstations&lt;/STRONG&gt; that remain underutilized outside peak engineering periods&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;NX visualization pipelines benefit from balanced CPU–GPU architecture&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;The combination of &lt;STRONG&gt;high‑frequency AMD EPYC™ Genoa CPUs (up to 4.3&lt;/STRONG&gt;&lt;STRONG&gt; GHz)&lt;/STRONG&gt; and &lt;STRONG&gt;Radeon™ Pro V710 GPUs&lt;/STRONG&gt; addresses Siemens NX’s mixed CPU–GPU workload profile, where scene graph processing, tessellation, and OpenGL submission are CPU‑sensitive. This balance reduces idle GPU cycles, improving effective utilization and overall cost efficiency when compared with GPU‑heavy but CPU‑constrained configurations&lt;/P&gt;
&lt;P&gt;&amp;nbsp;The result is a scalable, secure, and cost-efficient engineering platform that supports distributed innovation, supplier collaboration, and digital product development workflows — all backed by the Rendering and interaction consistency of AMD GPU virtualization on Azure.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 07 Jan 2026 16:38:08 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-v710-v5-series-amd-radeon-gpu-validation-of-siemens-cad-nx/ba-p/4483791</guid>
      <dc:creator>Sunita_AZ0708</dc:creator>
      <dc:date>2026-01-07T16:38:08Z</dc:date>
    </item>
    <item>
      <title>Announcing Azure CycleCloud Workspace for Slurm: Version 2025.12.01 Release</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/announcing-azure-cyclecloud-workspace-for-slurm-version-2025-12/ba-p/4481953</link>
      <description>&lt;P&gt;We are excited to announce the latest release of Azure CycleCloud Workspace for Slurm, now available with the powerful features and enhancements introduced in CycleCloud 8.8.1. This update brings significant improvements to cluster management, monitoring, security, and platform support, empowering technical communities to build and operate scalable HPC environments with greater efficiency and flexibility.&lt;/P&gt;
&lt;H2&gt;Major Feature Updates in CycleCloud Workspace for Slurm 2025.12.01&lt;/H2&gt;
&lt;UL&gt;
&lt;LI&gt;Integrated Monitoring with Prometheus self-agent and managed Grafana&lt;/LI&gt;
&lt;LI&gt;Entra ID Single Sign-On (SSO) for secure and seamless authentication&lt;/LI&gt;
&lt;LI&gt;Support for ARM64 compute nodes&lt;/LI&gt;
&lt;LI&gt;Compatibility with Ubuntu 24.04 and AlmaLinux 9&lt;/LI&gt;
&lt;/UL&gt;
&lt;H2&gt;Enhanced Monitoring: Prometheus Self Agent and Managed Grafana&lt;/H2&gt;
&lt;P&gt;With CycleCloud 8.8.1, monitoring your Slurm clusters is easier and more powerful than ever. The integration of Prometheus self-agent enables automated collection of metrics from compute nodes and Slurm jobs, providing real-time insights into cluster performance and resource utilization. Coupled with managed Grafana, users can visualize these metrics through customizable dashboards, making it simple to track system health, identify bottlenecks, and optimize workloads. This seamless monitoring solution reduces operational overhead and enhances the reliability of your HPC environment.&lt;/P&gt;
&lt;H3&gt;Create the Managed Monitoring Infrastructure&lt;/H3&gt;
&lt;P&gt;To use this feature, simply set up an Azure Monitor Workspace for Prometheus and an Azure Managed Grafana environment. Follow these steps as outlined here: &lt;A href="https://github.com/Azure/cyclecloud-monitoring?tab=readme-ov-file#build-the-managed-monitoring-infrastructure" target="_blank" rel="noopener"&gt;Azure/cyclecloud-monitoring: Cluster-init project and related tools for adding managed monitoring to a CycleCloud cluster.&lt;/A&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;Create a resource group for the monitoring infrastructure&lt;/LI&gt;
&lt;LI&gt;Deploy with the provided commands&lt;/LI&gt;
&lt;/OL&gt;
&lt;LI-CODE lang="bash"&gt;git clone https://github.com/Azure/cyclecloud-monitoring.git
cd cyclecloud-monitoring
./infra/deploy.sh &amp;lt;monitoring_resource_group&amp;gt;&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;After deployment to the specified resource group, you will find an Azure Monitor Workspace called &lt;STRONG&gt;&lt;EM&gt;ccw-mon-xxx&lt;/EM&gt;&lt;/STRONG&gt; and an Azure Managed Grafana named &lt;EM&gt;&lt;STRONG&gt;ccw-graf-xxx&lt;/STRONG&gt;&lt;/EM&gt;. To access the dashboards, go to the Grafana endpoint, enter the Grafana portal, and expand the Dashboards/Azure CycleCloud folder to view the available dashboards.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Depending on the node type, monitoring capabilities include:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;For GPUs: tracking utilization rates, memory copy utilization, various clock speeds, temperature, power consumption, ECC error counts, and NVLink throughput statistics.&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;For Infiniband: assessing throughput and error occurrences.&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;For other resources: evaluating CPU usage and frequency, memory utilization, disk space usage, network activity, file system capacity, as well as NFS operations and associated throughput.&lt;/LI&gt;
&lt;/UL&gt;
&lt;img /&gt;
&lt;H3&gt;Enable Monitoring&lt;/H3&gt;
&lt;P&gt;Monitoring can be enabled during Azure CycleCloud Workspace for Slurm deployment in the Marketplace UI:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;You can get the “Monitoring ingestion endpoint” and “Data collection rules” from the Azure Monitor Workspace properties.&lt;/P&gt;
&lt;P&gt;Starting with CycleCloud 8.8.1, this option is included in the Slurm default template, so you can enable monitoring directly in the cluster options.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The Client ID to be provided should correspond to the User Managed Identity assigned to the nodes, which has been granted permission to push metrics. For CCWS, this will be &lt;STRONG&gt;ccwLockerManagedIdentity&lt;/STRONG&gt;.&lt;/P&gt;
&lt;H2&gt;Secure and Seamless Authentication: Entra ID SSO&lt;/H2&gt;
&lt;P&gt;The new Entra ID Single Sign-On (SSO) integration streamlines user authentication across your CycleCloud Workspace. By leveraging Azure Entra ID, users benefit from centralized identity management, enhanced security, and simplified access control. This feature supports multi-factor authentication and compliance requirements, making it easier for organizations to manage users and permissions while protecting sensitive HPC workloads. Entra ID SSO ensures a frictionless login experience, reducing administrative burden and improving overall security posture.&lt;/P&gt;
&lt;P&gt;Entra ID Single Sign-On (SSO) facilitates authentication for both the CycleCloud user interface and Open OnDemand via OpenID Connect. Mapping to Linux users may be accomplished either through CycleCloud's local user creation process or through LDAP integration with the &lt;A href="https://github.com/xpillons/cc-ldap-auth" target="_blank" rel="noopener"&gt;cc-ldap-auth&lt;/A&gt; CycleCloud cluster-init project. This article will concentrate on the former approach.&lt;/P&gt;
&lt;H3&gt;Pre-deployment Steps&lt;/H3&gt;
&lt;P&gt;Entra ID Single Sign-On (SSO) requires registration of an Entra ID application prior to deploying a CycleCloud Workspace for the Slurm environment. Additionally, a user-managed identity must be created, which serves as a replacement for the secret password by being integrated into the federated credentials of the application. This User Managed Identity (UMI) will be assigned to the Open OnDemand virtual machine and designated as a trusted authentication source.&lt;/P&gt;
&lt;P&gt;Comprehensive instructions are available in our GitHub repository on the &lt;A href="https://github.com/Azure/cyclecloud-slurm-workspace/blob/main/entra_instructions.md" target="_blank" rel="noopener"&gt;entra_instructions&lt;/A&gt; page.&lt;/P&gt;
&lt;H3&gt;Deployment&lt;/H3&gt;
&lt;P&gt;You can enable Microsoft ID SSO from the Basics tab in the latest marketplace UI, which is necessary if you plan to deploy Open OnDemand as well.&lt;/P&gt;
&lt;P&gt;The required values may be obtained from the output generated by the pre-deployment script executed previously.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;H3&gt;Post Deployment&lt;/H3&gt;
&lt;P&gt;When you register the Entra ID application, placeholders are initially used for the CycleCloud and Open OnDemand IP addresses. These need to be updated later, either manually or by using this utility &lt;A href="https://github.com/Azure/cyclecloud-slurm-workspace/blob/main/util/entra_postdeploy.sh" target="_blank" rel="noopener"&gt;script&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Once the application is configured, you need to now grant permissions to users. For this, retrieve the app in Enterprise Applications and select Manage/Users and groups.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To add users to the relevant CycleCloud roles, select "Add user/group" and choose one or more of the predefined roles. Assign Global.Node.User to standard users; for users requiring sudo privileges, assign Global.Node.Admin; and for those engaged in cluster administration within CycleCloud, select SuperUser or Administrator as appropriate.&lt;/P&gt;
&lt;P&gt;After roles are assigned, users must first access the CycleCloud UI before they can interact with the cluster or Open OnDemand. This process ensures user profiles are retrieved, and local accounts are created on the nodes within the clusters.&lt;/P&gt;
&lt;H1&gt;Conclusion&lt;/H1&gt;
&lt;P&gt;The 2025.12.01 release of Azure CycleCloud Workspace for Slurm delivers substantial advancements that strengthen performance, security, and usability for HPC environments. With integrated Prometheus self‑agent monitoring, managed Grafana dashboards, support for ARM64 compute architectures, and compatibility with modern Linux distributions, this update empowers teams to operate clusters with greater visibility and efficiency. The addition of Entra ID Single Sign‑On further streamlines user authentication and reinforces security across both CycleCloud and Open OnDemand interfaces.&lt;/P&gt;
&lt;P&gt;Together, these enhancements reflect our ongoing commitment to providing a flexible, scalable, and secure HPC platform that meets the evolving needs of technical and scientific communities. We look forward to seeing how you leverage these capabilities to accelerate innovation and simplify the operation of your HPC workloads.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Jan 2026 09:22:19 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/announcing-azure-cyclecloud-workspace-for-slurm-version-2025-12/ba-p/4481953</guid>
      <dc:creator>xpillons</dc:creator>
      <dc:date>2026-01-07T09:22:19Z</dc:date>
    </item>
    <item>
      <title>Private Preview: Azure Managed Prometheus on VM / VMSS</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/private-preview-azure-managed-prometheus-on-vm-vmss/ba-p/4473472</link>
      <description>&lt;H2 data-start="638" data-end="698"&gt;What’s new — Managed Prometheus now supports VMs &amp;amp; VMSS&lt;/H2&gt;
&lt;P data-start="700" data-end="1235"&gt;Today we are excited to announce the &lt;STRONG data-start="737" data-end="756"&gt;private preview&lt;/STRONG&gt; of Azure Managed Prometheus support for virtual machines (VM) and virtual machine scale sets (VMSS). Until now, Managed Prometheus on Azure was primarily targeted at containerized workloads — e.g. Kubernetes (AKS) or Azure Arc–enabled clusters. With this preview, you can now extend Prometheus-style monitoring to your IaaS workloads running on VMs/VMSS, giving you unified, scalable, resilient metric collection and observability across both containers and traditional compute —including full support for &lt;STRONG data-start="608" data-end="635"&gt;GPU and InfiniBand (IB)&lt;/STRONG&gt; metric collection for HPC scenarios.&lt;/P&gt;
&lt;P data-start="1237" data-end="1576"&gt;Behind the scenes, Azure Monitor provides the storage, ingestion pipeline, and query engine, while surfacing a fully compatible Prometheus experience — including scraping, PromQL, alerting rules, and dashboards.&lt;/P&gt;
&lt;H2 data-start="1497" data-end="1551"&gt;Why this matters — especially for HPC workloads&lt;/H2&gt;
&lt;P data-start="1490" data-end="1577"&gt;Azure HPC customers running large fleets of GPU-accelerated VMs and VMSS nodes can now:&lt;/P&gt;
&lt;UL data-start="1579" data-end="2067"&gt;
&lt;LI data-start="1579" data-end="1764"&gt;Collect &lt;STRONG data-start="1589" data-end="1611"&gt;node-level metrics&lt;/STRONG&gt; (CPU, memory, disk, frontend NIC, InfiniBand) and &lt;STRONG data-start="1662" data-end="1677"&gt;GPU metrics&lt;/STRONG&gt; (utilization, memory, clocks, ECC, throttling) through standard Prometheus exporters&lt;/LI&gt;
&lt;LI data-start="1765" data-end="1831"&gt;Store all Prometheus metrics in an &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/azure-monitor/metrics/azure-monitor-workspace-overview" target="_blank" rel="noopener"&gt;&lt;STRONG data-start="1802" data-end="1829"&gt;Azure Monitor Workspace&lt;/STRONG&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI data-start="1832" data-end="1928"&gt;Visualize cluster performance using &lt;A class="lia-external-url" href="https://learn.microsoft.com/en-us/azure/managed-grafana/overview" target="_blank" rel="noopener"&gt;&lt;STRONG data-start="1870" data-end="1895"&gt;Azure Managed Grafana&lt;/STRONG&gt;&lt;/A&gt; with out of the box dashboards that include cluster-level views, node-level views, and data links to easily move between them.&lt;/LI&gt;
&lt;LI data-start="1929" data-end="1986"&gt;Run &lt;STRONG data-start="1935" data-end="1953"&gt;PromQL queries&lt;/STRONG&gt; directly against Azure Monitor&lt;/LI&gt;
&lt;LI data-start="1987" data-end="2067"&gt;Monitor &lt;STRONG data-start="1997" data-end="2013"&gt;mixed fleets&lt;/STRONG&gt; (AKS + VMSS + standalone VMs) in one unified system&lt;/LI&gt;
&lt;/UL&gt;
&lt;P data-start="2069" data-end="2188"&gt;All of this is achieved through a &lt;STRONG data-start="2103" data-end="2139"&gt;fully managed Prometheus backend&lt;/STRONG&gt;, with no servers, scaling, or storage to manage.&lt;/P&gt;
&lt;img /&gt;
&lt;H2 data-start="600" data-end="654"&gt;Access Requirement&lt;/H2&gt;
&lt;P data-start="2182" data-end="2341"&gt;This feature is currently in &lt;STRONG data-start="2211" data-end="2230"&gt;private preview&lt;/STRONG&gt;, and your Azure subscription must be &lt;STRONG data-start="2268" data-end="2283"&gt;allowlisted&lt;/STRONG&gt;&amp;nbsp;before you can use Azure Managed Prometheus for VMs/VMSS.&lt;/P&gt;
&lt;P data-start="2343" data-end="2433"&gt;&lt;A class="lia-external-url" href="https://forms.office.com/r/r5g9gDxayz" target="_blank" rel="noopener"&gt;&lt;STRONG data-start="2346" data-end="2387"&gt;Request access to the private preview&lt;/STRONG&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P data-start="2435" data-end="2538"&gt;Once approved, you will be notified and can proceed with the onboarding steps in the GitHub repository.&lt;/P&gt;
&lt;H2 data-start="600" data-end="654"&gt;Try it yourself&lt;/H2&gt;
&lt;P data-start="338" data-end="552"&gt;We invite you to try it out and share your feedback with us. To get started, follow the &lt;A class="lia-external-url" href="https://github.com/Azure/azhpc-guest-monitoring/blob/main/docs/azure-managed-prometheus-vms.md" target="_blank" rel="noopener"&gt;step-by-step guide&lt;/A&gt; in our GitHub repository to help you onboard to the preview quickly.&lt;/P&gt;
&lt;P data-start="798" data-end="957"&gt;Once you’ve onboarded, you can begin scraping node and GPU metrics, run sample PromQL queries, and import ready-made HPC dashboards into Azure Managed Grafana.&lt;/P&gt;
&lt;P data-start="959" data-end="1189"&gt;We hope you enjoy using Azure Managed Prometheus for VM/VMSS and find the new capabilities valuable for your AI and HPC workloads. As this is a private preview, your feedback is especially important. Please share input by opening an issue in the GitHub repository.&lt;/P&gt;</description>
      <pubDate>Wed, 18 Feb 2026 20:00:31 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/private-preview-azure-managed-prometheus-on-vm-vmss/ba-p/4473472</guid>
      <dc:creator>Daramfon</dc:creator>
      <dc:date>2026-02-18T20:00:31Z</dc:date>
    </item>
    <item>
      <title>Automating HPC Workflows with Copilot Agents</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/automating-hpc-workflows-with-copilot-agents/ba-p/4472610</link>
      <description>&lt;P&gt;&lt;STRONG&gt;Introduction&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;High Performance Computing (HPC) workloads are complex, requiring precise job submission scripts and careful resource management. Manual scripting for platforms like OpenFOAM is time-consuming, error-prone, and often frustrating. At SC25, we showcased how Copilot Agents—powered by AI—are transforming HPC workflows by automating Slurm submission scripts, making scientific computing more efficient and accessible. A full demonstration can be found in the video at the end of this article.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why Automate HPC Workflows?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;High-performance computing workloads are often elaborate, requiring carefully structured job submission scripts to efficiently manage system resources. In applications like OpenFOAM, where precise setup of nodes, tasks, and memory is essential, composing these scripts by hand can be both labor-intensive and susceptible to errors.&lt;/P&gt;
&lt;P&gt;Manually creating Slurm scripts not only consumes valuable time but also raises the likelihood of mistakes, resulting in failed jobs and costly delays that delay research and innovation. For OpenFOAM users, this translates into spending less time on actual simulations and more time resolving script-related problems.&lt;/P&gt;
&lt;P&gt;Automating the creation of these scripts eases the burden on researchers and engineers by accelerating research processes, minimizing errors, and enabling users to dedicate more attention to simulation and analysis instead of debugging submission issues.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;AI-powered Workflow Automation&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Copilot Agents uses artificial intelligence to simplify the process of making job submission scripts, helping HPC workflows run smoothly and efficiently. With this system, users can focus less on manual scripting and more on research and analysis.&lt;/P&gt;
&lt;P&gt;Copilot Agent recognizes your workload's context and applies best practices to create precise and optimized Slurm scripts. It interprets specific needs so that each script matches the requirements of individual jobs, which helps with resource allocation and scheduling.&lt;/P&gt;
&lt;P&gt;Key benefits include quicker script creation, fewer mistakes, and greater consistency across HPC tasks. Automating this process speeds up the workflow and maintains standards, resulting in more dependable and repeatable job submissions.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Typical Workflow with Copilot Agents&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Defining the Context:&lt;/STRONG&gt; Begin by outlining your workload requirements clearly and thoroughly. Indicate how to load and run the application, specify the number of tasks per node, and detail any special logging or configuration instructions. The more accurate you are with these details, the more effectively the agent can create a reliable script.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Script Generation by AI:&lt;/STRONG&gt; Copilot processes your input and automatically creates a full Slurm submission script. Using AI models, this stage incorporates best practices to save time and prevent errors.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Validation and Submission:&lt;/STRONG&gt; After the script is built, it’s checked for accuracy and submitted to the scheduler. You should always examine the output and error logs and adjust as needed. This ongoing review helps ensure that jobs run smoothly and improves your workflow over time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Best Practices for Defining Context&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Consider context as your guideline: providing more specific and thorough details helps the agent produce a more accurate Slurm script. Always make your instructions straightforward and precise. Add links to relevant documentation when possible, and share example cases that show exactly what you need. Be clear about requirements like how to load applications, set the number of tasks per node, or any special configuration and logging needs. Clear and complete context not only lowers the chance of mistakes but also results in higher-quality scripts, ultimately saving you time and effort.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;STRONG&gt;Script Generation: Iterative Improvement&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Model Selection:&lt;/STRONG&gt; Advanced models such as GPT-5 are capable of producing highly detailed and comprehensive scripts. Although the initial draft may require additional time to generate, these models typically integrate best practices and sophisticated configuration options, which can be further refined through iterative development.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Iterative Improvement:&lt;/STRONG&gt; The initial script produced by AI generally serves as a starting point for further enhancement. Systematic revisions informed by output logs, error reports, and user feedback contribute to improving the accuracy, efficiency, and customization of the final submission script according to the specific needs of your HPC workload.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Practical Example:&lt;/STRONG&gt; As demonstrated in the video below, a chat-based Copilot Agent facilitates script creation by prompting for the script name and subsequently generating a Bash script that incorporates all requested features. These include leveraging Slurm environment variables, automating task distribution, loading requisite modules, and enabling comprehensive logging. The resulting script is prepared for submission via the &lt;STRONG&gt;&lt;EM&gt;sbatch&lt;/EM&gt;&lt;/STRONG&gt; command.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;STRONG&gt;Validation and Continuous Improvement&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;Once you have generated your Slurm submission script using Copilot Agent, it is essential to conduct a careful review of the output prior to executing your job. This preliminary assessment is critical for identifying potential issues early and ensuring that the script aligns with your specific workload requirements.&lt;/P&gt;
&lt;P&gt;Submit the job to the scheduler for validation, and diligently monitor both the output and error log files, as these will inform your subsequent actions.&lt;/P&gt;
&lt;P&gt;Should errors arise—such as missing file paths or incorrect module loads—utilize the feedback from the logs to amend your script accordingly. This iterative refinement process is fundamental to optimizing your workflow and achieving reliable job execution.&lt;/P&gt;
&lt;P&gt;The accompanying example illustrates how Copilot Agent can assist in locating and correcting errors, such as updating an OpenFOAM tutorial path. By leveraging AI-enabled feedback, users are able to efficiently address issues and confidently resubmit jobs.&lt;/P&gt;
&lt;P&gt;Continuous validation and revision are paramount to advancing high-performance computing automation. Consistently refer to output and error logs to guide subsequent iterations, thereby enhancing the robustness and dependability of your scripts over time.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;&lt;STRONG&gt;Key Benefits&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Time Efficiency:&lt;/STRONG&gt; Copilot Agents significantly decrease the time needed to generate job submission scripts. Tasks that previously required hours of manual scripting can now be completed within minutes, enabling researchers and engineers to give more attention to simulation and analysis rather than script troubleshooting.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Error Reduction:&lt;/STRONG&gt; Automation substantially lowers the risk of human error commonly associated with manual script development. By enforcing best practices and standardizing the script generation process, Copilot Agents improve reliability and minimize job failures.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Enhanced Scalability:&lt;/STRONG&gt; Automated workflows facilitate more efficient scaling across high-performance computing (HPC) environments. As workloads increase in complexity and scale, Copilot Agents support consistency and optimal resource utilization, simplifying the management of expansive simulations.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;User-Friendly Automation:&lt;/STRONG&gt; Copilot Agents make HPC scripting more approachable for new users by offering intuitive automation and guidance. This approach ensures adherence to best practices and broadens accessibility, even for individuals with limited prior experience.&lt;/P&gt;</description>
      <pubDate>Wed, 03 Dec 2025 10:43:26 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/automating-hpc-workflows-with-copilot-agents/ba-p/4472610</guid>
      <dc:creator>xpillons</dc:creator>
      <dc:date>2025-12-03T10:43:26Z</dc:date>
    </item>
    <item>
      <title>Azure NCv6 Public Preview: The new Unified Platform for Converged AI and Visual Computing</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-ncv6-public-preview-the-new-unified-platform-for-converged/ba-p/4472704</link>
      <description>&lt;P&gt;As enterprises accelerate adoption of physical AI (AI models interacting with real-world physics), digital twins (virtual replicas of physical systems), LLM inference (running language models for predictions), and agentic workflows (autonomous AI-driven processes), the demand for infrastructure that bridges high-end visualization and generative AI inference has never been higher. Today, we are pleased to announce the Public Preview of the NC RTX PRO 6000 BSE v6 series, powered by the NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs.&lt;/P&gt;
&lt;P&gt;The NCv6 series represents a generational leap in Azure’s visual compute portfolio, designed to be the dual engine for both Industrial Digitalization and cost-effective LLM inference. By leveraging NVIDIA Multi-Instance GPU (MIG) capabilities, the NCv6 platform offers affordable sizing options similar to our legacy NCv3 and NVv5 series. This provides a seamless upgrade path to Blackwell performance, enabling customers to run complex NVIDIA Omniverse simulations and multimodal AI agents with greater efficiency.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Why Choose Azure NCv6?&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;While traditional GPU instances often force a choice between "compute" (AI) and "graphics" (visualization) optimizations, the NCv6 breaks this silo. Built on the NVIDIA Blackwell architecture, it provides a "right-sized" acceleration platform for workloads that demand both ray-traced fidelity and Tensor Core performance.&lt;/P&gt;
&lt;P&gt;As outlined in our &lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nc-rtxpro6000-bse-v6-series?tabs=sizebasicgp%2Csizebasicco%2Csizebasicmo" target="_blank" rel="noopener"&gt;product documentation&lt;/A&gt;, these VMs are ideal for converged AI and visual computing workloads, including:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Real-time digital twin and NVIDIA Omniverse simulation.&lt;/STRONG&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;LLM Inference and RAG (Retrieval-Augmented Generation)&lt;/STRONG&gt; on small to medium AI models.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;High-fidelity 3D rendering,&lt;/STRONG&gt; product design, and video streaming.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Agentic AI&lt;/STRONG&gt; application development and deployment.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Scientific visualization&lt;/STRONG&gt; and High-Performance Computing (HPC).&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Key Features of the NCv6 Platform&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt; The Power of NVIDIA Blackwell&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;At the heart of the NCv6 is the &lt;STRONG&gt;NVIDIA RTX PRO 6000 Blackwell Server Edition GPU&lt;/STRONG&gt;. This powerhouse delivers breakthrough performance featuring &lt;STRONG&gt;96 GB of ultra-fast GDDR7 memory&lt;/STRONG&gt;. This massive frame buffer allows for the handling of complex multimodal AI models and high-resolution textures that previous generations simply could not fit.&lt;/P&gt;
&lt;OL start="2"&gt;
&lt;LI&gt;&lt;STRONG&gt; Host Performance: Intel Granite Rapids&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;To ensure your workloads aren't bottlenecked by the CPU, the VM host is equipped with &lt;STRONG&gt;Intel Xeon Granite Rapids processors&lt;/STRONG&gt;. These provide an all-core turbo frequency of up to &lt;STRONG&gt;4.2 GHz&lt;/STRONG&gt;, ensuring that demanding pre- and post-processing steps—common in rendering and physics simulations—are handled efficiently.&lt;/P&gt;
&lt;OL start="3"&gt;
&lt;LI&gt;&lt;STRONG&gt; Optimized Sizing for Every Workflow&lt;/STRONG&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;We understand that one size does not fit all. The NCv6 series introduces three distinct sizing categories to match your specific unit economics:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;General Purpose:&lt;/STRONG&gt; Balanced CPU-to-GPU ratios (up to 320 vCPUs) for diverse workloads.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Compute Optimized:&lt;/STRONG&gt; Higher vCPU density for heavy simulation and physics tasks.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Memory Optimized:&lt;/STRONG&gt; Massive memory footprints (up to 1,280 GB RAM) for data-intensive applications.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Crucially, for smaller inference jobs or VDI, we will also offer&amp;nbsp;&lt;STRONG&gt;fractional GPU options&lt;/STRONG&gt;, allowing you to right-size your infrastructure and optimize costs.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;NCv6 Technical Specifications&lt;/STRONG&gt;&lt;/P&gt;
&lt;DIV class="styles_lia-table-wrapper__h6Xo9 styles_table-responsive__MW0lN"&gt;&lt;table border="1" style="width: 62.7778%; border-width: 1px;"&gt;&lt;thead&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN class="lia-text-color-21"&gt;&lt;STRONG&gt;Specification&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;&lt;SPAN class="lia-text-color-21"&gt;&lt;STRONG&gt;Details&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;GPU&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;NVIDIA RTX PRO 6000 Blackwell Server Edition (96 GB GDDR7)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Processor&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Intel Xeon Granite Rapids (up to 4.2 GHz Turbo)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;vCPUs&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;16 – 320 vCPUs (Scalable across GP, Compute, and Memory optimized sizes)&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;System Memory&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;64 GB – 1,280 GB DDR5&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Network&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Up to 200,000 Mbps (200 Gbps) Azure Accelerated Networking&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
&lt;P&gt;&lt;STRONG&gt;Storage&lt;/STRONG&gt;&lt;/P&gt;
&lt;/td&gt;&lt;td&gt;
&lt;P&gt;Up to 2TB local temp storage; Support for Premium SSD v2 &amp;amp; Ultra Disk&lt;/P&gt;
&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;STRONG&gt;Real-World Applications&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The NCv6 is built for versatility, powering everything from pixel-perfect rendering to high-throughput language reasoning:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;STRONG&gt;Production Generative AI &amp;amp; Inference:&lt;/STRONG&gt; Deploy self-hosted LLMs and RAG pipelines with optimized unit economics. The NCv6 is ideal for serving ranking models, recommendation engines, and content generation agents where low latency and cost-efficiency are paramount.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Automotive &amp;amp; Manufacturing:&lt;/STRONG&gt; Validate autonomous driving sensors (LiDAR/Radar) and train physical AI models in high-fidelity simulation environments before they ever touch the real world.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Next-Gen VDI &amp;amp; Azure Virtual Desktop:&lt;/STRONG&gt; Modernize remote workstations with NVIDIA RTX Virtual Workstation capabilities. By leveraging fractional GPU options, organizations can deliver high-fidelity, accelerated desktop experiences to distributed teams—offering a superior, high-density alternative to legacy NVv5 deployments.&lt;/LI&gt;
&lt;LI&gt;&lt;STRONG&gt;Media &amp;amp; Entertainment:&lt;/STRONG&gt; Accelerate render farms for VFX studios requiring burst capacity, while simultaneously running generative AI tools for texture creation and scene optimization.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;STRONG&gt;Conclusion: The Engine for the Era of Converged AI&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;The Azure NCv6 series redefines the boundaries of cloud infrastructure. By combining the raw power of NVIDIA’s Blackwell architecture with the high-frequency performance of Intel Granite Rapids, we are moving beyond just "visual computing." Innovators can now leverage a unified platform to build the industrial metaverse, deploy intelligent agents, and scale production AI—all with the enterprise-grade security and hybrid reach of Azure.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR9s7orOb3OJJnwABCNj_8JdUMzlLSzJFTTdRRE8yU0UxWFFYQlpYV1hDVy4u" target="_blank" rel="noopener"&gt;Ready to experience the next generation? Sign up for the NCv6 Public Preview here.&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Nov 2025 17:22:05 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-ncv6-public-preview-the-new-unified-platform-for-converged/ba-p/4472704</guid>
      <dc:creator>rishabv90</dc:creator>
      <dc:date>2025-11-25T17:22:05Z</dc:date>
    </item>
    <item>
      <title>Azure ND GB300 v6 now Generally Available - Hyper-optimized for Generative and Agentic AI workloads</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-nd-gb300-v6-now-generally-available-hyper-optimized-for/ba-p/4469475</link>
      <description>&lt;P&gt;We are pleased to announce the General Availability (GA) of ND GB300 v6 virtual machines, delivering the next leap in AI infrastructure. On 10/09, we &lt;A href="https://azure.microsoft.com/en-us/blog/microsoft-azure-delivers-the-first-large-scale-cluster-with-nvidia-gb300-nvl72-for-openai-workloads/" target="_blank" rel="noopener"&gt;shared&lt;/A&gt; the delivery of the&amp;nbsp;first at-scale production cluster with more than 4,600 NVIDIA GB300 NVL72, featuring NVIDIA Blackwell Ultra GPUs connected through the next-generation NVIDIA InfiniBand network.&amp;nbsp;We have now deployed tens of thousands of GB300 GPUs for production customer workloads and expect to scale to hundreds of thousands. Built on NVIDIA GB300 NVL72 systems, these VMs redefine performance for frontier model training, large-scale inference, multimodal reasoning, and agentic AI.&lt;/P&gt;
&lt;P&gt;The ND GB300 v6 series enables customers to:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;Deploy trillion-parameter models with unprecedented throughput.&lt;/LI&gt;
&lt;LI&gt;Accelerate inference for long-context and multimodal workloads.&lt;/LI&gt;
&lt;LI&gt;Scale seamlessly at high bandwidth for large scale training workloads.&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;In recent &lt;A href="https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/breaking-the-million-token-barrier-the-technical-achievement-of-azure-nd-gb300-v/4466080" target="_blank" rel="noopener"&gt;benchmarks&lt;/A&gt;, ND GB300 v6 achieved over 1.1 million tokens per second on Llama 2 70B inference workloads - a 27% uplift over ND GB200 v6. This performance breakthrough enables customers to serve long-context, multimodal, and agentic AI models with unmatched speed and efficiency.&lt;/P&gt;
&lt;img /&gt;
&lt;P&gt;With the general availability of ND GB300 v6 VMs, Microsoft strengthens its long-standing collaboration with NVIDIA by leading the market in delivering the latest GPU innovations, reaffirming our commitment to world-class AI infrastructure.&lt;/P&gt;
&lt;P&gt;The ND v6 GB300 systems are built in a rack-scale design, with each rack hosting 18 VMs for a total of 72 GPUs interconnected by high-speed NVLINK. Each VM has 2 NVIDIA Grace CPUs and 4 Blackwell Ultra GPUs. Each NVLINK connect rack contains:&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;72 NVIDIA Blackwell Ultra GPUs (with 36 NVIDIA Grace CPUs).&lt;/LI&gt;
&lt;LI&gt;800 gigabits per second (Gbp/s) per GPU cross-rack scale-out bandwidth via next-generation NVIDIA Quantum-X800 InfiniBand (2x ND GB200 v6).&lt;/LI&gt;
&lt;LI&gt;130 terabytes (TB) per second of NVIDIA NVLink bandwidth within rack.&lt;/LI&gt;
&lt;LI&gt;37TB of fast memory. (~20 TB HBM3e + ~17TB LPDDR)&lt;/LI&gt;
&lt;LI&gt;Up to 1,440 petaflops (PFLOPS) of FP4 Tensor Core performance. (1.5x ND GB200 v6)&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Together, NVLINK and XDR InfiniBand enable GB300 systems to behave as a unified compute and memory pool, minimizing latency, maximizing bandwidth, and dramatically improving scalability. Within a rack, NVLink enables coherent memory access and fast synchronization for tightly coupled workloads. Across racks, XDR InfiniBand ensures ultra-low latency, high-throughput communication with SHARP offloading—maintaining sub-100 µs latency for cross-node collectives.&lt;/P&gt;
&lt;P&gt;Azure provides an end-to-end AI platform that enables customers to build, deploy, and scale AI workloads efficiently on GB300 infrastructure. Services like Azure CycleCloud and Azure Batch simplify the setup and management of HPC and AI environments, allowing organizations to dynamically adjust resources, integrate leading schedulers, and run containerized workloads at massive scale. With tools such as CycleCloud Workspace for Slurm, users can create and configure clusters without prior expertise, while Azure Batch handles millions of parallel tasks, ensuring cost and resource efficiency for large-scale training.&lt;/P&gt;
&lt;P&gt;For cloud-native AI, Azure Kubernetes Service (AKS) offers rapid deployment and management of containerized workloads, complemented by platform-specific optimizations for observability and reliability. Whether using Kubernetes or custom stacks, Azure delivers a unified suite of services to maximize performance and scalability.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Learn More &amp;amp; Get Started&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;A href="https://azure.microsoft.com/en-us/blog/microsoft-azure-delivers-the-first-large-scale-cluster-with-nvidia-gb300-nvl72-for-openai-workloads/" target="_blank" rel="noopener"&gt;https://azure.microsoft.com/en-us/blog/microsoft-azure-delivers-the-first-large-scale-cluster-with-nvidia-gb300-nvl72-for-openai-workloads/&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/breaking-the-million-token-barrier-the-technical-achievement-of-azure-nd-gb300-v/4466080" target="_blank" rel="noopener"&gt;https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/breaking-the-million-token-barrier-the-technical-achievement-of-azure-nd-gb300-v/4466080&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://blogs.nvidia.com/blog/microsoft-azure-worlds-first-gb300-nvl72-supercomputing-cluster-openai/" target="_blank" rel="noopener"&gt;NVIDIA Blog: Azure’s GB300 NVL72 Supercomputing Cluster&lt;/A&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/gpu-accelerated/nd-series" target="_blank" rel="noopener"&gt;Azure VM Sizes Overview&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Nov 2025 01:13:53 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/azure-nd-gb300-v6-now-generally-available-hyper-optimized-for/ba-p/4469475</guid>
      <dc:creator>Nitin_Nagarkatte</dc:creator>
      <dc:date>2025-11-19T01:13:53Z</dc:date>
    </item>
    <item>
      <title>Announcing the Public Preview of AMLFS 20: Azure Managed Lustre New SKU for Massive AI&amp;HPC Workloads</title>
      <link>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/announcing-the-public-preview-of-amlfs-20-azure-managed-lustre/ba-p/4470665</link>
      <description>&lt;P aria-level="2"&gt;&lt;STRONG&gt;Sachin Sheth&lt;/STRONG&gt; - Principal PDM Manager&lt;/P&gt;
&lt;P aria-level="2"&gt;&lt;STRONG&gt;Brian Barbisch &lt;/STRONG&gt;- Principal Group Software Engineering Manager&lt;/P&gt;
&lt;P aria-level="2"&gt;&lt;STRONG&gt;Matt White &lt;/STRONG&gt;- Principal Group Software Engineering Manager&lt;/P&gt;
&lt;P aria-level="2"&gt;&lt;STRONG&gt;Brian Lepore&lt;/STRONG&gt; - Principal Product Manager&lt;/P&gt;
&lt;P aria-level="2"&gt;&lt;STRONG&gt;Wolfgang De Salvador&lt;/STRONG&gt; - Senior Product Manager&lt;/P&gt;
&lt;P aria-level="2"&gt;&lt;STRONG&gt;Ron Hogue&lt;/STRONG&gt; - Senior Product Manager&lt;/P&gt;
&lt;H4 aria-level="2"&gt;Introduction&lt;/H4&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;We are excited to announce the Public Preview of AMLFS Durable Premium 20 (AMLFS 20), a new SKU in Azure Managed&amp;nbsp;Lustre&amp;nbsp;designed to deliver unprecedented performance and scale for demanding AI and HPC workloads.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3 aria-level="2"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Key Features&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H3&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="13" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;Massive Scale:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;Store up to 25 PiB of data in a single namespace, with up to 512 GB/s of total bandwidth.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134233117&amp;quot;:false,&amp;quot;134233118&amp;quot;:false,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:279}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="13" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;Advanced Metadata Performance:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;Multi-MDS (Metadata Server) architecture&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;dramatically improv&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;es&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;metadata IOPS.&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;In&amp;nbsp;&lt;/SPAN&gt;&lt;EM&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;mdtest&lt;/SPAN&gt;&lt;/EM&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&lt;EM&gt;&amp;nbsp;&lt;/EM&gt;benchmarks,&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;AMLFS 20&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;demonstrated&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;more than 5x improvement in metadata operations&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;. An&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;additional&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;MDS is provided for every 5 PiB of&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;provisioned&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;filesystem&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134233117&amp;quot;:false,&amp;quot;134233118&amp;quot;:false,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:279}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="13" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;High&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;File&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;Capacity&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;Supports up to 20 billion&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;inodes&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;for&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;maximum&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;namespace size.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134233117&amp;quot;:false,&amp;quot;134233118&amp;quot;:false,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:279}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 aria-level="2"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Why AMLFS&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;20 Matters&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="1" data-aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;Simplified Architecture:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;Previously, datasets larger than 12.5 PiB&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;required&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;multiple filesystems and complex management.&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;AMLFS 20&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;enables a single, high-performance file system for massive AI and HPC workloads&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;up to 25 PiB&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;, streamlining deployment and administration.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134233117&amp;quot;:false,&amp;quot;134233118&amp;quot;:false,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:279}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="2" data-aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;Accelerated Data Preparation:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;The multi-MDT architecture significantly increases metadata IOPS, which is crucial during the data preparation stage of AI training, where rapid access to millions of files is&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;required&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134233117&amp;quot;:false,&amp;quot;134233118&amp;quot;:false,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:279}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;UL&gt;
&lt;LI aria-setsize="-1" data-leveltext="" data-font="Symbol" data-listid="2" data-list-defn-props="{&amp;quot;335552541&amp;quot;:1,&amp;quot;335559685&amp;quot;:720,&amp;quot;335559991&amp;quot;:360,&amp;quot;469769226&amp;quot;:&amp;quot;Symbol&amp;quot;,&amp;quot;469769242&amp;quot;:[8226],&amp;quot;469777803&amp;quot;:&amp;quot;left&amp;quot;,&amp;quot;469777804&amp;quot;:&amp;quot;&amp;quot;,&amp;quot;469777815&amp;quot;:&amp;quot;hybridMultilevel&amp;quot;}" data-aria-posinset="3" data-aria-level="1"&gt;&lt;STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;Faster Time-to-Value:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;SPAN data-contrast="auto"&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;Researchers and engineers&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;benefit&lt;/SPAN&gt;&lt;SPAN data-ccp-parastyle="Normal (Web)"&gt;&amp;nbsp;from easier management, reduced bottlenecks, and faster access to large datasets, accelerating innovation.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134233117&amp;quot;:false,&amp;quot;134233118&amp;quot;:false,&amp;quot;201341983&amp;quot;:0,&amp;quot;335559738&amp;quot;:0,&amp;quot;335559739&amp;quot;:160,&amp;quot;335559740&amp;quot;:279}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;H4 aria-level="2"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;Availability&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;AMLFS 20&amp;nbsp;is available in Public Preview alongside the&amp;nbsp;already&amp;nbsp;existing AMLFS SKUs. For more details on other SKUs, visit the&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/azure-managed-lustre/create-file-system-portal#throughput-configurations" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;Azure Managed Lustre documentation&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;H4 aria-level="2"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-parastyle="heading 2"&gt;How to Join the Preview&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{&amp;quot;134245418&amp;quot;:true,&amp;quot;134245529&amp;quot;:true,&amp;quot;335559738&amp;quot;:160,&amp;quot;335559739&amp;quot;:80}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/H4&gt;
&lt;P&gt;&lt;SPAN data-contrast="auto"&gt;If&amp;nbsp;you are&amp;nbsp;working with large-scale AI or HPC workloads and&amp;nbsp;would like&amp;nbsp;early access&amp;nbsp;to&amp;nbsp;AMLFS 20, we invite you to&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://aka.ms/AMLFS20PreviewForm" target="_blank" rel="noopener"&gt;&lt;SPAN data-contrast="none"&gt;&lt;SPAN data-ccp-charstyle="Hyperlink"&gt;fill out this form&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/A&gt;&lt;SPAN data-contrast="auto"&gt;&amp;nbsp;to&amp;nbsp;tell us about your use case. Our&amp;nbsp;team will follow&amp;nbsp;up with&amp;nbsp;onboarding details.&lt;/SPAN&gt;&lt;SPAN data-ccp-props="{}"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Nov 2025 17:00:00 GMT</pubDate>
      <guid>https://techcommunity.microsoft.com/t5/azure-high-performance-computing/announcing-the-public-preview-of-amlfs-20-azure-managed-lustre/ba-p/4470665</guid>
      <dc:creator>wolfgangdesalvador</dc:creator>
      <dc:date>2025-11-18T17:00:00Z</dc:date>
    </item>
  </channel>
</rss>

