virtual machines

73 Topics

Microsoft at NVIDIA GTC 2026
Microsoft returns to NVIDIA GTC 2026 in San Jose with a strong presence across conference sessions, in‑booth theater talks, live demos, and executive‑level ancillary events. Together with NVIDIA and our partner ecosystem, Microsoft is showcasing how Azure AI infrastructure enables AI training, inference, and production at global scale. Visit us at Booth #521 to see the latest innovations in action and connect with Azure and NVIDIA experts. Exclusive GTC Experiences LEGO® Datacenter Model Explore Azure AI infrastructure at the Park Container. Candy Lounge Visit the high-traffic candy wall for co-branded treats all day long. Networking Lounge Relax and recharge with comfy seating and vital charging options. Outdoor Juice Truck Free, refreshing beverages served during outdoor park hours. Sponsored Breakout Sessions Microsoft Featured Reinventing Semiconductor Design with Microsoft Discovery S82398 · Mon, Mar 16 · 4:00 PM Prashant Varshney Microsoft · Semiconductor & AI Engineering Abstract: Semiconductor teams face exploding design complexity and shrinking verification windows. This session shows how the Microsoft Discovery AI for Science platform, combined with Synopsys Agent Engineers, introduces an agentic approach to EDA that automates routine steps and accelerates expert decision-making on Azure. Microsoft Featured Operationalizing Agentic AI at Hyperscale S82399 · Tue, Mar 17 · 1:00 PM Nitin Nagarkatte Microsoft · Azure AI Infrastructure Anand Raman Microsoft · Azure AI Vipul Modi Microsoft · AI Systems Abstract: As enterprises move to agentic systems, the challenge shifts to operating intelligent agents reliably at scale. This session demonstrates how Microsoft builds AI Factories on Azure using NVIDIA technology and explores Microsoft Foundry as the control plane for deploying and operating coordinated AI agents. Live from GTC: AI Podcast Dayan Rodriguez Corporate Vice President Global Manufacturing and Mobility Alistair Spiers General Manager Azure Infrastructure Live Special Feature A conversation with Microsoft Azure Monday, March 16 @ 2PM Listen & Subscribe: aka.ms/GTC2026Podcast Scan to Listen Earned Conference Sessions Don't miss these high-impact sessions where Microsoft and NVIDIA leaders discuss the future of AI factories and infrastructure. Mon · Mar 16 5:00 PM Drive Optimal Tokens per Watt on AI Infrastructure Using Benchmarking Recipes Speakers: Paul Edwards, Emily Potyraj Microsoft, NVIDIA Tue · Mar 17 9:00 AM Autonomous AI Factories: Technical Preview of Agent-Native Production Speakers: JP Vasseur, César Martinez Spessot NVIDIA, Microsoft Research Tue · Mar 17 4:00 PM The Road to Intelligent Mobility: Vehicle GenAI Speakers: Raj Paul, Thomas Evans, Bryan Goodman Microsoft, NVIDIA, Bosch Wed · Mar 18 9:00 AM Supercharging AI with Multi-Gigawatt AI Factories Speakers: Gilad Shainer, Peter Salanki, Evan Burness NVIDIA, CoreWeave, Meta, Microsoft Daily Booth Theater Schedule Visit the Microsoft Theater for lightning talks from engineering leaders and partners. Monday, March 16 2:00 PM BTH208 · NVIDIA Accelerate AI Innovation on Azure with NVIDIA Run:ai — Rob Magno 2:30 PM BTH202 · General Robotics Models to Machines: Deploying Agentic AI in Real-World Robotics — Dinesh Narayanan 3:00 PM BTH200 · Fractal Analytics From Generalist to Enterprise-Ready: Fractal Builds Domain AI — C. Chaudhuri, S. Chakraborty 3:30 PM BTH109 · Microsoft Agentic cloud ops - Smarter Operations with Azure Copilot — Jyoti Sharma 4:00 PM BTH103 · Microsoft Build a Deep Research Agent for Enterprise Data — D. Casati, A. Slutsky, H. Alkemade 4:30 PM BTH205 · NetApp Azure NetApp Files: Powering Your Data for AI Capabilities — Andy Chan 5:00 PM BTH207 · NVIDIA The Agentic Commerce Stack: Open Models on Azure — Antonio Martinez 5:30 PM BTH217 · OPAQUE Confidential AI on Azure Unlocks Sovereign AI at Scale — Aaron Fulkerson 6:00 PM BTH218 · Simplismart Making BYOC work at scale with modular inference — Amritanshu Jain 6:30 PM Expo Reception Tuesday, March 17 1:30 PM BTH100 · Microsoft From Open Weights to Enterprise Scale: Open-Source Models — Sharmila Chockalingam 2:00 PM BTH212 · Personal AI Unlocking the power of memory in Teams with Personal AI — Sam Harkness 2:30 PM BTH111 · Microsoft / NVIDIA Scalable LLM Inference on AKS Using NVIDIA Dynamo — Mohamad Al jazaery, Anton Slutsky 3:00 PM BTH204 · Mistral AI Innovate with Mistral AI on Microsoft Foundry — Ian Mathew 3:30 PM BTH104 · Microsoft GPU-Accelerated CFD at Scale: Star-CCM+ on Azure — Jason Scheffelmaer 4:00 PM BTH206 · NeuBird AI Agentic AI for Incident Response on Microsoft Azure — Grant Griffiths 4:30 PM BTH101 · GitHub Agentic DevOps: Evolving software with GitHub Copilot — Glenn Wester 5:00 PM BTH209 · Rescale Real-World AI Physics: GM & NVIDIA on Rescale — Dinal Perera 5:30 PM BTH107 · Microsoft Intro to LoRA Fine-Tuning on Azure — Christin Pohl 6:30 PM Raffle Wednesday, March 18 1:00 PM BTH219 · VAST Data Scaling AI Infrastructure on Azure with VAST Data — Jason Vallery 1:30 PM BTH110 · Microsoft Physical AI and Robotics: The Next Frontier — F. Miller, C. Souche, D. Narayanan 2:00 PM BTH105 · Microsoft Sovereign AI options with Azure Local — Kim Lam 2:30 PM BTH108 · Microsoft Automating HPC Workflows with Copilot Agents — Param Shah 3:00 PM BTH102 · Microsoft Trustworthy Multi-Agent Workflows with Microsoft Foundry — Brian Benz 4:00 PM BTH106 · Microsoft Scaling Enterprise AI on ARO with NVIDIA H100 & H200 — Lachie Evenson 4:30 PM BTH211 · WEKA Hybrid AI Data Orchestration with WEKA NeuralMesh™ — Desiree Campbell 5:00 PM BTH202 · Hammerspace NVIDIA AI Enterprise Software with NIM — Mike Bloom 5:30 PM BTH203 · Kinaxis Reimagining Global Supply Planning with Azure — Dane Henshall 6:00 PM BTH214 · AT&T Connected AI on Azure for Manufacturing — Brad Pritchett 6:30 PM Raffle Thursday, March 19 11:00 AM BTH210 · Wandelbots Physical AI: Powering Software-Defined Automation in Robotics — Marwin Kunz, Martin George 11:30 AM Raffle Explore Our Demo Pods Visit the Microsoft booth to see our technology in action with live demonstrations across four dedicated pod areas. POD 1 Azure AI Infrastructure End‑to‑end AI infrastructure for training and inference at scale, featuring the latest NVIDIA GPU integrations on Azure. POD 2 Microsoft Foundry Our comprehensive platform for building, deploying, and operating agentic AI systems with enterprise reliability. POD 3 Building AI Together Showcasing joint Microsoft and NVIDIA solutions across diverse industries, from manufacturing to retail. POD 4 Startups Powering AI Discover how innovative startups are running next‑generation AI workloads on the Azure platform. Ancillary Events & Networking Join Microsoft leadership and our partner ecosystem at these curated networking experiences. Click the location to view on Bing Maps. Sun · Mar 15 6:00 PM Microsoft for Startups Executive Leadership Dinner 📍 Morton’s Steakhouse, San Jose Exclusive gathering for startup leaders and Microsoft executives. Mon · Mar 16 1:30 PM Microsoft × NVIDIA Open Meet 📍 Signia by Hilton · International Suite Strategic alignment session for Microsoft and NVIDIA executives. Mon · Mar 16 7:30 PM Microsoft + NVIDIA Executive Dinner 📍 Il Fornaio, San Jose Executive dinner for key customers and leadership teams. Tue · Mar 17 7:30 PM Networking in AI & Tech 📍 San Pedro Square Market Community networking mixer for Microsoft teams, partners, and customers. Wed · Mar 18 10:00 AM to 1:00 PM AI Innovator’s Circle Brunch: Powering Intelligent Systems Across the Ecosystem 📍 Il Fornaio, San Jose Hosted by Microsoft & NVIDIA at GTC. Join us for an exclusive brunch and discussion on the intelligent ecosystem.
Fernando_Aznar
Feb 28, 2026 Place Azure High Performance Computing (HPC) Blog
965Views
1like
0Comments
Azure Recognized as an NVIDIA Cloud Exemplar, Setting the Bar for AI Performance in the Cloud
As AI models continue to scale in size and complexity, cloud infrastructure must deliver more than theoretical peak performance. What matters in practice is reliable, end-to-end, workload-level AI performance—where compute, networking, system software, and optimization work together to deliver predictable, repeatable results at scale. This directly translates to business value: efficient full-stack infrastructure accelerates time-to-market, maximizes ROI on GPU and cloud investments, and enables organizations to scale AI from proof-of-concept to revenue-generating products with predictable economics. Today, Microsoft is proud to share an important milestone in partnership with NVIDIA: Azure has been validated as an NVIDIA Exemplar Cloud, becoming the first cloud provider recognized for Exemplar-class AI performance aligned with GB300-class (Blackwell generation) systems. This recognition builds on Azure’s previously validated Exemplar status for H100 training workloads and reflects NVIDIA’s confidence in Azure’s ability to extend that rigor and performance discipline into the next generation of AI platforms. What Is NVIDIA Exemplar Cloud? The NVIDIA Exemplar Cloud initiative celebrates cloud platforms that demonstrate robust end-to-end AI workload performance using NVIDIA’s Performance Benchmarking suite. Rather than relying on synthetic microbenchmarks, Performance Benchmarking evaluates real AI training workloads using: Large-scale LLM training scenarios Production-grade software stacks Optimized system and network configurations Workload-centric metrics such as throughput and time-to-train Achieving Exemplar validation signals that a provider can consistently deliver world-class AI performance in the cloud, showcasing that end users are getting optimal performance value by default. Proven Exemplar Validation on H100 Azure’s Exemplar Cloud journey began with publicly shared benchmarking results for H100-based training workloads, where Azure ND GPU clusters demonstrated exemplar performance using NVIDIA Performance Benchmarking recipes. Those results—published previously and validated through NVIDIA’s benchmarking framework—established a proven foundation of end-to-end AI performance for large-scale, production workloads running on Azure today. Extending Exemplar-Class AI Performance to GB300-Class Platforms Building on the rigor and learnings from H100 validation, Microsoft has now been recognized by NVIDIA as the first cloud provider to achieve Exemplar-class performance and readiness aligned with GB300-class systems. This designation reflects NVIDIA’s assessment that the same principles applied to H100—including end-to-end system tuning, networking optimization, and software alignment—are being successfully carried forward into the Blackwell generation. Rather than treating GB300 as a point solution, Azure approaches it as a continuation of a proven performance model: delivering consistent world-class AI performance in the cloud while preserving the flexibility, elasticity, and global scale customers expect. What Enables Exemplar-Class AI Performance on Azure Delivering Exemplar-class AI performance requires optimization across the full AI stack: Infrastructure and Networking High-performance Azure ND GPU clusters with NVIDIA InfiniBand NUMA-aware CPU, GPU, and NIC alignment to minimize latency Tuned NCCL communication paths for efficient multi-GPU scaling Software and System Optimization Tight integration with NVIDIA software, including Performance Benchmarking recipes and NVIDIA AI Enterprise Parallelism strategies aligned with large-scale LLM training Continuous tuning as models, workloads, and system architectures evolve End-to-End Workload Focus Measuring real training performance, not isolated component metrics Driving repeatable improvements in application-level throughput and efficiency Closing the performance gap between cloud and on-premises systems—without sacrificing manageability Together, these capabilities enabled Azure to deliver consistent Exemplar-class AI performance across generations of NVIDIA platforms. What This Means for Customers For customers training and deploying advanced AI models, this milestone delivers clear benefits: World-class AI performance in a fully managed cloud environment Predictable scaling from small clusters to thousands of GPUs Faster time to train and improved performance per dollar Confidence that Azure is ready for Blackwell-class and GB300-class AI workloads As AI workloads become more complex and reasoning-heavy, infrastructure performance increasingly determines outcomes. Azure’s NVIDIA Cloud Exemplar recognition provides a clear signal: customers can build and scale next-generation AI systems on Azure without compromising on performance. Learn More DGX Cloud Benchmarking on Azure DGX Cloud Benchmarking on Azure | Microsoft Community Hub
Fernando_Aznar
Feb 18, 2026 Place Azure High Performance Computing (HPC) Blog
198Views
0likes
0Comments
Private Preview: Azure Managed Prometheus on VM / VMSS
Announcing private preview support for Azure Managed Prometheus on VM/VMSS, enabling unified monitoring with GPU, InfiniBand, and node-level metrics for HPC workloads.
Daramfon
Feb 18, 2026 Place Azure High Performance Computing (HPC) Blog
1KViews
0likes
0Comments
Azure V710 V5 Series -AMD Radeon GPU - Validation of Siemens CAD -NX
Overview of Siemens NX Siemens NX is a next-generation integrated CAD/CAM/CAE platform used by aerospace, automotive, industrial machinery, energy, medical, robotics, and defense manufacturers. It spans: Complex 3D modeling Assemblies containing thousands to millions of parts Surfacing and composites Tolerance engineering CAM and machining simulation Integrated multi physics through Simcenter / NX Nastran Because NX is used to design real-world engineered systems — aircraft structures, automotive platforms, satellites, robotic arms, injection molds — its usability and performance directly affect engineering velocity and product timelines. NX Needs GPU Acceleration NX is highly visual. It leans heavily on: OpenGL acceleration Shader-based rendering Hidden line removal Real-time shading / material rendering Ray-Traced Studio for photorealistic output Switch shading modes → CAD content must stay readable Zoom, section, annotate → requires stable frame pacing NVads V710 v5-Series on Azure The NVads V710 v5-series virtual machines on Azure are designed for GPU-accelerated workloads and virtual desktop environments. Key highlights: Hardware Specs: o GPU: AMD Radeon™ Pro V710 (up to 24 GiB frame buffer; fractional GPU options available). o CPU: AMD EPYC™ 9V64 F (Genoa) with SMT, base frequency 3.95 GHz, peak 4.3 GHz. o Memory: 16 GiB to 160 GiB. o Storage: NVMe-based ephemeral local storage supported. VM Sizes: o Ranges from Standard_NV4ads_V710_v5 (4 vCPUs, 16 GiB RAM, 1/6 GPU) to Standard_NV28adms_V710_v5 (28 vCPUs, 160 GiB RAM, full GPU). Supported Features: o Premium storage, accelerated networking, ephemeral OS disk. o Both Windows and Linux VMs supported. o No additional GPU licensing is required. AMD Radeon™ PRO GPUs offer: o Optimized OpenGL professional driver stack o Stable interactive performance vs large assemblies Business Scenario Enabled by NX + Cloud GPU Engineering Anywhere Distributed teams can securely work on the same assemblies from any geographic region. Supplier Ecosystem Collaboration Tier-1/2 manufacturers and engineering partners can access controlled models without local high-end workstations. Secure IP Protection Data stays in Azure — files never leave the controlled workspace. Faster Engineering Cycles Visualization + simulation accelerate design reviews, decision making, and manufacturability evaluations. Scalable Cost Model Pay for compute only when needed — ideal for burst design cycles and testing workloads. Architecture Overview – Siemens NX on Azure NVads_v710 Key Architecture Elements Create Azure Virtual Machine- NVads_v710_24 Install Azure AMD V710 GPU drivers Deploy Azure File-based storage Hosting assemblies, metadata, drawing packages, PMI, simulation data. Configure Vnet with Accelerated Networking Install NX licenses and software. Install NXCP & ATS Test suites on the Virtual Machine Qualitative Benchmark on Azure NVads_v710_24 Siemens has approved the following qualitative test results. The certification matrix update is currently in progress. Technical variant: Complex assemblies with thousands of components maintained smooth rotation, zooming, and selection, even under concurrent session load. NXCP and ATS test results on NVads_v710_24 Non-Interactive test results: Note: Execution Time (seconds) ATS Non‑Interactive Test Results validate the correctness and stability of Siemens NX graphical rendering by comparing generated images against approved reference outputs. The minimal or zero pixel differences confirm deterministic and visually consistent rendering, indicating a stable GPU driver and visualization pipeline. The reported test execution times (in seconds) represent the duration required to complete each automated graphics validation scenario, demonstrating predictable and repeatable processing performance under non‑interactive conditions. Interactive test results on Azure NVads_v710_24: Note: Execution Time (seconds) ATS Interactive Test Results evaluate Siemens NX graphics behavior during real‑time user interactions such as rotation, zoom, pan, sectioning, and view manipulation. The results demonstrate stable and consistent rendering during interactive workflows, confirming that the GPU driver and visualization stack reliably support user‑driven NX operations. The measured execution times (in seconds) reflect the responsiveness of each interactive graphics operation, indicating predictable behavior under live, user‑controlled conditions rather than peak performance tuning. NX CAD functions Automatic Tests Interactive Tests Grace1 Basic Tests GrPlayer_xp64.exe <FILE> Basic_Features.tgl Passed! Passed! GrPlayer_xp64.exe <FILE> Fog_Measurement_Clipping.tgl Passed! Passed! GrPlayer_xp64.exe <FILE> lighting.tgl Passed! Passed! GrPlayer_xp64.exe <FILE> Shadow_Bump_Environment.tgl Passed! Passed! GrPlayer_xp64.exe <FILE> Texture_Map.tgl Passed! Passed! Grace2 Graphics Tests GrPlayer_64.exe <FILE> GrACETrace.tgl Passed! Passed! Grace2 Graphics Tests GrPlayer_64.exe <FILE> GrACETrace.tgl Passed! Passed! NXCP Test Scenarios Automatic Tests NXCP Gdat Tests gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_1.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_2.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_4.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_5.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_6.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_7.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_8.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_9.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_10.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_11.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_12.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_13.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_14.cgi Passed! gdat_leg_xp64.exe -infile <FILE> leg_gfx_cert_15.cgi Passed! Benefits Azure NVads_v710 (AMD GPU Platform for NX Workstation-class AMD Radeon PRO graphics drivers baked into Azure Ensures ISV-validated driver pipeline. Excellent performance for CAD workloads Makes GPU-accelerated NX accessible to wider user bases. Remote engineering enablement Critical for companies who now operate global design teams. Elastic scale Spin up GPU when development peaks; scale down when idle. Conclusion: Siemens NX on Azure NVads_v710 powered by AMD GPUs enables enterprise-class CAD/CAM/CAE experiences in the cloud. NX benefits directly from workstation-grade OpenGL optimization, shading stability, and Ray Traced Studio acceleration, allowing engineers to interact smoothly with large assemblies, run visualization workloads, and perform design reviews without local hardware dependencies. Right‑sized GPU delivers workstation‑class experience at lower cost The family enables fractional GPU allocation (down to 1/6 of a Radeon™ Pro V710), allowing Siemens NX deployments to be right‑sized per user role. This avoids over‑provisioning full GPUs while still delivering ISV‑grade OpenGL and visualization stability, resulting in a lower per‑engineer cost compared to fixed full‑GPU cloud or on‑prem workstations Elastic scale improves cost efficiency for burst engineering workloads NVads_V710_v5 instances support on demand scaling and ephemeral NVMe storage, allowing NX environments to scale up for design reviews, supplier collaboration, or peak integration cycles and scale down when idle. This consumption model provides a cost advantage over fixed on prem workstations that remain underutilized outside peak engineering periods NX visualization pipelines benefit from balanced CPU–GPU architecture The combination of high‑frequency AMD EPYC™ Genoa CPUs (up to 4.3 GHz) and Radeon™ Pro V710 GPUs addresses Siemens NX’s mixed CPU–GPU workload profile, where scene graph processing, tessellation, and OpenGL submission are CPU‑sensitive. This balance reduces idle GPU cycles, improving effective utilization and overall cost efficiency when compared with GPU‑heavy but CPU‑constrained configurations The result is a scalable, secure, and cost-efficient engineering platform that supports distributed innovation, supplier collaboration, and digital product development workflows — all backed by the Rendering and interaction consistency of AMD GPU virtualization on Azure.
Sunita_AZ0708
Jan 07, 2026 Place Azure High Performance Computing (HPC) Blog
222Views
1like
0Comments
Monitoring HPC & AI Workloads on Azure H/N VMs Using Telegraf and Azure Monitor (GPU & InfiniBand)
As HPC & AI workloads continue to scale in complexity and performance demands, ensuring visibility into the underlying infrastructure becomes critical. This guide presents an essential monitoring solution for AI infrastructure deployed on Azure RDMA-enabled virtual machines (VMs), focusing on NVIDIA GPUs and Mellanox InfiniBand devices. By leveraging the Telegraf agent and Azure Monitor, this setup enables real-time collection and visualization of key hardware metrics, including GPU utilization, GPU memory usage, InfiniBand port errors, and link flaps. It provides operational insights vital for debugging, performance tuning, and capacity planning in high-performance AI environments. In this blog, we'll walk through the process of configuring Telegraf to collect and send GPU and InfiniBand monitoring metrics to Azure Monitor. This end-to-end guide covers all the essential steps to enable robust monitoring for NVIDIA GPUs and Mellanox InfiniBand devices, empowering you to track, analyze, and optimize performance across your HPC & AI infrastructure on Azure. DISCLAIMER: This is an unofficial configuration guide and is not supported by Microsoft. Please use it at your own discretion. The setup is provided "as-is" without any warranties, guarantees, or official support. While Azure Monitor offers robust monitoring capabilities for CPU, memory, storage, and networking, it does not natively support GPU or InfiniBand metrics for Azure H- or N-series VMs. To monitor GPU and InfiniBand performance, additional configuration using third-party tools—such as Telegraf—is required. As of the time of writing, Azure Monitor does not include built-in support for these metrics without external integrations. 🔔 Update: Supported Monitoring Option Now Available Update (December 2025): At the time this guide was written, monitoring InfiniBand (IB) and GPU metrics on Azure H-series and N-series VMs required a largely unofficial approach using Telegraf and Azure Monitor. Microsoft has since introduced a supported solution: Azure Managed Prometheus on VM / VM Scale Sets (VMSS), currently available in private preview. This new capability provides a native, managed Prometheus experience for collecting infrastructure and accelerator metrics directly from VMs and VMSS. It significantly simplifies deployment, lifecycle management, and long-term support compared to custom Telegraf-based setups. For new deployments, customers are encouraged to evaluate Azure Managed Prometheus on VM / VMSS as the preferred and supported approach for HPC and AI workload monitoring. Official announcement: Private Preview: Azure Managed Prometheus on VM / VMSS Step 1: Making changes in Azure for sending GPU and IB metrics from Telegraf agents to Azure monitor from VM or VMSS. Register the microsoft.insights resource provider in your Azure subscription. Refer: Resource providers and resource types - Azure Resource Manager | Microsoft Learn Step 2: Enable Managed Service Identities to authenticate an Azure VM or Azure VMSS. In the example we are using Managed Identity for authentication. You can also use User Managed Identities or Service Principle to authenticate the VM. Refer: telegraf/plugins/outputs/azure_monitor at release-1.15 · influxdata/telegraf (github.com) Step 3: Set Up the Telegraf Agent Inside the VM or VMSS to Send Data to Azure Monitor In this example, I'll use an Azure Standard_ND96asr_v4 VM with the Ubuntu-HPC 2204 image to configure the environment for VMSS. The Ubuntu-HPC 2204 image comes with pre-installed NVIDIA GPU drivers, CUDA, and InfiniBand drivers. If you opt for a different image, ensure that you manually install the necessary GPU drivers, CUDA toolkit, and InfiniBand driver. Next, download and run the gpu-ib-mon_setup.sh script to install the Telegraf agent on Ubuntu 22.04. This script will also configure the NVIDIA SMI input plugin and InfiniBand Input Plugin, along with setting up the Telegraf configuration to send data to Azure Monitor. Note: The gpu-ib-mon_setup.sh script is currently supported and tested only on Ubuntu 22.04. Please read the InfiniBand counter collected by Telegraf - https://enterprise-support.nvidia.com/s/article/understanding-mlx5-linux-counters-and-status-parameters Run the following commands: wget https://raw.githubusercontent.com/vinil-v/gpu-ib-monitoring/refs/heads/main/scripts/gpu-ib-mon_setup.sh -O gpu-ib-mon_setup.sh chmod +x gpu-ib-mon_setup.sh ./gpu-ib-mon_setup.sh Test the Telegraf configuration by executing the following command: sudo telegraf --config /etc/telegraf/telegraf.conf --test Step 4: Creating Dashboards in Azure Monitor to Check NVIDIA GPU and InfiniBand Usage Telegraf includes an output plugin specifically designed for Azure Monitor, allowing custom metrics to be sent directly to the platform. Since Azure Monitor supports a metric resolution of one minute, the Telegraf output plugin aggregates metrics into one-minute intervals and sends them to Azure Monitor at each flush cycle. Metrics from each Telegraf input plugin are stored in a separate Azure Monitor namespace, typically prefixed with Telegraf/ for easy identification. To visualize NVIDIA GPU usage, go to the Metrics section in the Azure portal: Set the scope to your VM. Choose the Metric Namespace as Telegraf/nvidia-smi. From there, you can select and display various GPU metrics such as utilization, memory usage, temperature, and more. In example we are using GPU memory_used metrics. Use filters and splits to analyze data across multiple GPUs or over time. To monitor InfiniBand performance, repeat the same process: In the Metrics section, set the scope to your VM. Select the Metric Namespace as Telegraf/infiniband. You can visualize metrics such as port status, data transmitted/received, and error counters. In this example, we are using a Link Flap Metrics to check the InfiniBand link flaps. Use filters to break down the data by port or metric type for deeper insights. Link_downed Metric Note: The link_downed metric with Aggregation: Count is returning incorrect values. We can use Max, Min values. Port_rcv_data metrics Creating custom dashboards in Azure Monitor with both Telegraf/nvidia-smi and Telegraf/infiniband namespaces allows for unified visibility into GPU and InfiniBand. Testing InfiniBand and GPU Usage If you're testing GPU metrics and need a reliable way to simulate multi-GPU workloads—especially over InfiniBand—here’s a straightforward solution using the NCCL benchmark suite. This method is ideal for verifying GPU and network monitoring setups. NCCL Benchmark and OpenMPI is part of the Ubuntu HPC 22.04 image. Update the variable according to your environment. Update the hostfile with the hostname. module load mpi/hpcx-v2.13.1 export CUDA_VISIBLE_DEVICES=2,3,0,1,6,7,4,5 mpirun -np 16 --map-by ppr:8:node -hostfile hostfile \ -mca coll_hcoll_enable 0 --bind-to numa \ -x NCCL_IB_PCI_RELAXED_ORDERING=1 \ -x LD_LIBRARY_PATH=/usr/local/nccl-rdma-sharp-plugins/lib:$LD_LIBRARY_PATH \ -x CUDA_DEVICE_ORDER=PCI_BUS_ID \ -x NCCL_SOCKET_IFNAME=eth0 \ -x NCCL_TOPO_FILE=/opt/microsoft/ndv4-topo.xml \ -x NCCL_DEBUG=WARN \ /opt/nccl-tests/build/all_reduce_perf -b 8 -e 8G -f 2 -g 1 -c 1 Alternate: GPU Load Simulation Using TensorFlow If you're looking for a more application-like load (e.g., distributed training), I’ve prepared a script that sets up a multi-GPU TensorFlow training environment using Anaconda. This is a great way to simulate real-world GPU workloads and validate your monitoring pipelines. To get started, run the following: wget -q https://raw.githubusercontent.com/vinil-v/gpu-monitoring/refs/heads/main/scripts/gpu_test_program.sh -O gpu_test_program.sh chmod +x gpu_test_program.sh ./gpu_test_program.sh With either method NCCL benchmarks or TensorFlow training you’ll be able to simulate realistic GPU usage and validate your GPU and InfiniBand monitoring setup with confidence. Happy testing! References: Ubuntu HPC on Azure ND A100 v4-series GPU VM Sizes Telegraf Azure Monitor Output Plugin (v1.15) Telegraf NVIDIA SMI Input Plugin (v1.15) Telegraf InfiniBand Input Plugin Documentation
vinilv
Dec 18, 2025 Place Azure High Performance Computing (HPC) Blog
1.6KViews
2likes
0Comments
Azure NCv6 Public Preview: The new Unified Platform for Converged AI and Visual Computing
As enterprises accelerate adoption of physical AI (AI models interacting with real-world physics), digital twins (virtual replicas of physical systems), LLM inference (running language models for predictions), and agentic workflows (autonomous AI-driven processes), the demand for infrastructure that bridges high-end visualization and generative AI inference has never been higher. Today, we are pleased to announce the Public Preview of the NC RTX PRO 6000 BSE v6 series, powered by the NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. The NCv6 series represents a generational leap in Azure’s visual compute portfolio, designed to be the dual engine for both Industrial Digitalization and cost-effective LLM inference. By leveraging NVIDIA Multi-Instance GPU (MIG) capabilities, the NCv6 platform offers affordable sizing options similar to our legacy NCv3 and NVv5 series. This provides a seamless upgrade path to Blackwell performance, enabling customers to run complex NVIDIA Omniverse simulations and multimodal AI agents with greater efficiency. Why Choose Azure NCv6? While traditional GPU instances often force a choice between "compute" (AI) and "graphics" (visualization) optimizations, the NCv6 breaks this silo. Built on the NVIDIA Blackwell architecture, it provides a "right-sized" acceleration platform for workloads that demand both ray-traced fidelity and Tensor Core performance. As outlined in our product documentation, these VMs are ideal for converged AI and visual computing workloads, including: Real-time digital twin and NVIDIA Omniverse simulation. LLM Inference and RAG (Retrieval-Augmented Generation) on small to medium AI models. High-fidelity 3D rendering, product design, and video streaming. Agentic AI application development and deployment. Scientific visualization and High-Performance Computing (HPC). Key Features of the NCv6 Platform The Power of NVIDIA Blackwell At the heart of the NCv6 is the NVIDIA RTX PRO 6000 Blackwell Server Edition GPU. This powerhouse delivers breakthrough performance featuring 96 GB of ultra-fast GDDR7 memory. This massive frame buffer allows for the handling of complex multimodal AI models and high-resolution textures that previous generations simply could not fit. Host Performance: Intel Granite Rapids To ensure your workloads aren't bottlenecked by the CPU, the VM host is equipped with Intel Xeon Granite Rapids processors. These provide an all-core turbo frequency of up to 4.2 GHz, ensuring that demanding pre- and post-processing steps—common in rendering and physics simulations—are handled efficiently. Optimized Sizing for Every Workflow We understand that one size does not fit all. The NCv6 series introduces three distinct sizing categories to match your specific unit economics: General Purpose: Balanced CPU-to-GPU ratios (up to 320 vCPUs) for diverse workloads. Compute Optimized: Higher vCPU density for heavy simulation and physics tasks. Memory Optimized: Massive memory footprints (up to 1,280 GB RAM) for data-intensive applications. Crucially, for smaller inference jobs or VDI, we will also offer fractional GPU options, allowing you to right-size your infrastructure and optimize costs. NCv6 Technical Specifications Specification Details GPU NVIDIA RTX PRO 6000 Blackwell Server Edition (96 GB GDDR7) Processor Intel Xeon Granite Rapids (up to 4.2 GHz Turbo) vCPUs 16 – 320 vCPUs (Scalable across GP, Compute, and Memory optimized sizes) System Memory 64 GB – 1,280 GB DDR5 Network Up to 200,000 Mbps (200 Gbps) Azure Accelerated Networking Storage Up to 2TB local temp storage; Support for Premium SSD v2 & Ultra Disk Real-World Applications The NCv6 is built for versatility, powering everything from pixel-perfect rendering to high-throughput language reasoning: Production Generative AI & Inference: Deploy self-hosted LLMs and RAG pipelines with optimized unit economics. The NCv6 is ideal for serving ranking models, recommendation engines, and content generation agents where low latency and cost-efficiency are paramount. Automotive & Manufacturing: Validate autonomous driving sensors (LiDAR/Radar) and train physical AI models in high-fidelity simulation environments before they ever touch the real world. Next-Gen VDI & Azure Virtual Desktop: Modernize remote workstations with NVIDIA RTX Virtual Workstation capabilities. By leveraging fractional GPU options, organizations can deliver high-fidelity, accelerated desktop experiences to distributed teams—offering a superior, high-density alternative to legacy NVv5 deployments. Media & Entertainment: Accelerate render farms for VFX studios requiring burst capacity, while simultaneously running generative AI tools for texture creation and scene optimization. Conclusion: The Engine for the Era of Converged AI The Azure NCv6 series redefines the boundaries of cloud infrastructure. By combining the raw power of NVIDIA’s Blackwell architecture with the high-frequency performance of Intel Granite Rapids, we are moving beyond just "visual computing." Innovators can now leverage a unified platform to build the industrial metaverse, deploy intelligent agents, and scale production AI—all with the enterprise-grade security and hybrid reach of Azure. Ready to experience the next generation? Sign up for the NCv6 Public Preview here.
rishabv90
Nov 25, 2025 Place Azure High Performance Computing (HPC) Blog
1.4KViews
1like
0Comments
Azure CycleCloud 8.8 and CCWS 1.2 at SC25 and Ignite
Azure CycleCloud 8.8: Advancing HPC & AI Workloads with Smarter Health Checks Azure CycleCloud continues to evolve as the backbone for orchestrating high-performance computing (HPC) and AI workloads in the cloud. With the release of CycleCloud 8.8, users gain access to a suite of new features designed to streamline cluster management, enhance health monitoring, and future-proof their HPC environments. Key Features in CycleCloud 8.8 1. ARM64 HPC Support The platform expands its hardware compatibility with ARM64 HPC support, opening new possibilities for energy-efficient and cost-effective compute clusters. This includes access to the newer generation of GB200 VMs as well as general ARM64 support, enabling new AI workloads at a scale never possible before 2. Slurm Topology-Aware Scheduling The integration of topology-aware scheduling for Slurm clusters allows CycleCloud users to optimize job placement based on network and hardware topology. This leads to improved performance for tightly coupled HPC workloads and better utilization of available resources. 3. Nvidia MNNVL and IMEX Support With expanded support for Nvidia MNNVL and IMEX, CycleCloud 8.8 ensures compatibility with the latest GPU technologies. This enables users to leverage cutting-edge hardware for AI training, inference, and scientific simulations. 4. HealthAgent: Event-Driven Health Monitoring and Alerting A standout feature in this release is the enhanced HealthAgent, which delivers event-driven health monitoring and alerting. CycleCloud now proactively detects issues across clusters, nodes, and interconnects, providing real-time notifications and actionable insights. This improvement is a game-changer for maintaining uptime and reliability in large-scale HPC deployments. Node Healthagent supports both impactful healthchecks which can only run while nodes are idle as well as non-impactful healthchecks that can run throughout the lifecycle of a job. This allows CycleCloud to alert on issues that not only happen while nodes are starting, but also issues that may result from failures for long-running nodes. Later releases of CycleCloud will also include automatic remediation for common failures, so stay tuned! 5. Enterprise Linux 9 and Ubuntu 24 support One common request has been wider support for the various Enterprise Linux (EL) 9 variants, including RHEL9, AlmaLinux 9, and Rocky Linux 9. CycleCloud 8.8 introduces support for those distributions as well as the latest Ubuntu HPC release. Why These Features Matter The CycleCloud 8.8 release marks a significant leap forward for organizations running HPC and AI workloads in Azure. The improved health check support—anchored by HealthAgent and automated remediation—means less downtime, faster troubleshooting, and greater confidence in cloud-based research and innovation. Whether you’re managing scientific simulations, AI model training, or enterprise analytics, CycleCloud’s latest features help you build resilient, scalable, and future-ready HPC environments. Key Features in CycleCloud Workspace for Slurm 1.2 Along with the release of CycleCloud 8.8 comes a new CycleCloud Workspace for Slurm (CCWS) release. This release includes the General Availability of features that were previously in preview, such as Open OnDemand, Cendio ThinLinc, and managed Grafana monitoring capabilities. In addition to previously announced features, CCWS 1.2 also includes support for a new Hub and Spoke deployment model. This allows customers to retain a central hub of shared resources that can be re-used between cluster deployments with "disposable" spoke clusters that branch from the hub. Hub and Spoke deployments enable customers who need to re-deploy clusters in order to upgrade their operating system, deploy new versions of software, or even reconfigure the overall architecture of Slurm clusters. Come visit us at SC25 and MS Ignite To learn more about these features, come visit us at the Microsoft booth at #SC25 in St. Louis, MO and #Microsoft #Ignite in San Francisco this week!
anhoward
Nov 19, 2025 Place Azure High Performance Computing (HPC) Blog
265Views
0likes
0Comments
Performance and Scalability of Azure HBv5-series Virtual Machines
Azure HBv5-series virtual machines (VMs) for CPU-based high performance computing (HPC) are now Generally Available. This blog provides in-depth information about the technical underpinnings, performance, cost, and management implications of these HPC-optimized VMs. Azure HBv5 VM bring leadership levels of performance, cost optimization, and server (VM) consolidation for a variety of workloads driven by memory performance, such as computational fluid dynamics, weather simulation, geoscience simulations, and finite element analysis. For these applications and compared to HBv4 VMs, previously the highest performance offering for these workloads, HBv5 provides up to : 5x higher performance for CFD workloads with 43% lower costs 3.2x higher performance for weather simulation with 16% lower costs 2.8x higher performance for geoscience workloads at the same costs HBv5-series Technical Overview & VM Sizes Each HBv5 VMs features several new technologies for HPC customers, including: Up to 6.6 TB/s of memory bandwidth (STREAM TRIAD) and 432 GB memory capacity Up to 368 physical cores per VM (user configurable) with custom AMD EPYC CPUs, Zen4 microarchitecture (SMT disabled) Base clock of 3.5 GHz (~1 GHz higher than other 96-core EPYC CPUs), and Boost clock of 4 GHz across all cores 800 Gb/s NVIDIA Quantum-2 InfiniBand (4 x 200 Gb/s CX-7) (~2x higher HBv4 VMs) 180 Gb/s Azure Accelerated Networking (~2.2 higher than HBv4 VMs) 15 TB local NVMe SSD with up to 50 GB/s (read) and 30 GB/s (write) of bandwidth (~4x higher than HBv4 VMs) The highlight feature of HBv5 VMs is their use of high-bandwidth memory (HBM). HBv5 VMs utilize a custom AMD CPU that increases memory bandwidth by ~9x v. dual-socket 4 th Gen EPYC (Zen4, “Genoa”) server platforms, and ~7x v. dual-socket EPYC (Zen5, “Turin”) server platforms, respectively. HBv5 delivers similar levels of memory bandwidth improvement compared to the highest end alternatives from the Intel Xeon and ARM CPU ecosystems. HBv5-series VMs are available in the following sizes with specifications as shown below. Just like existing H-series VMs, HBv5-series includes constrained cores VM sizes, enabling customers to optimize their VM dimensions for a variety of scenarios: ISV licensing constraining a job to a targeted number of cores Maximum-performance-per-VM or maximum performance per core Minimum RAM/core (1.2 GB, suitable for strong scaling workloads) to maximum memory per core (9 GB, suitable for large datasets and weak scaling workloads Table 1: Technical specifications of HBv5-series VMs Note: Maximum clock frequencies (FMAX) are based product specifications of the AMD EPYC 9V64H processor. Experienced clock frequencies by a customer are a function of a variety of factors, including but not limited to the arithmetic intensity (SIMD) and parallelism of an application. For more information see official documentation for HBv5-series VMs Microbenchmark Performance This section focuses on microbenchmarks that characterize performance of the memory subsystem, compute capabilities, and InfiniBand network of HBv5 VMs. Memory & Compute Performance To capture synthetic performance, we ran the following industry standard benchmarks: STREAM – memory bandwidth High Performance Conjugate Gradient (HPCG) – sparse linear algebra High Performance Linpack (HPL)– dense linear algebra Absolute results and comparisons to HBv4 VMs are shown in Table 2, below: Table 2: Results of HBv5 running the STREAM, HPCG, and HPL benchmarks. Note: STREAM was run with the following CLI parameters: OMP_NUM_THREADS=368 OMP_PROC_BIND=true OMP_PLACES=cores ./amd_zen_stream STREAM data size: 2621440000 bytes InfiniBand Networking Performance Each HBv5-series VM is equipped with four NVIDIA Quantum-2 network interface cards (NICs), each operating at 200 Gb/s for an aggregate bandwidth of 800 Gb/s per VM (node). We ran the industry standard IB perftests based on OSU benchmarks test across two (2) HBv5-series VMs, as depicted in the results shown in Figures 3-5, below: Note: all results below are for a single 200 Gb/s (uni-directional) link only. At a VM level, all bandwidth results below are 4x higher as there are four (4) InfiniBand links per HBv5 server. Unidirectional bandwidth: numactl -c 0 ib_send_bw -aF -q 2 Figure 1: results showing 99% achieved uni-directional bandwidth v. theoretical peak. Bi-directional bandwidth: numactl -c 0 ib_send_bw -aF -q 2 -b Figure 2: results showing 99% achieved bi-directional bandwidth v. theoretical peak. Latency: Figure 3: results measuring as low as 1.25 microsecond latencies among HBv5 VMs. Latencies experienced by users will depend on message sizes employed by applications. Application Performance, Cost/Performance, and Server (VM) Consolidation This section focuses on characterizing HBv5-series VMs when running common, real-world HPC applications with an emphasis on those known to be meaningfully bound by memory performance as that is the focus of the HB-series family. We characterize HBv5 below in three (3) ways of high relevance to customer interests: Performance (“how much faster can it do the work”) Cost/Performance (“how much can it reduce the costs to complete the work”) Fleet consolidation (“how much can a customer simplify the size and scale of compute fleet management while still being able to the work”) Where possible, we have included comparisons to other Azure HPC VMs, including: Azure HBv4/HX series with 176 physical cores of 4 th Gen AMD EPYC CPUs with 3D V-Cache (“Genoa-X”) (HBv4 specifications, HX specifications) Azure HBv3 with 120 physical cores of 3 rd Gen AMD EPYC CPUs with 3D V-Cache (“Milan-X”) (HBv3 specifications) Azure HBv2 with 120 physical cores of 2 nd Gen AMD EPYC CPUs (“Rome”) processors (full specifications) Unless otherwise noted, all tests shown below were performed with: Alma Linux 8.10 (image URN : almalinux:almalinux-hpc:8_10-hpc-gen2:latest) for scaling ( image URN: almalinux:almalinux-hpc:8_6-hpc-gen2:latest) NVIDIA HPC-X MPI Further, all Cost/Performance comparisons leverage pricing rate info from list price, Pay-As-You-Go (PAYG) information found on Azure Linux Virtual Machines Pricing. Absolute costs will be a function of a customer’s workload, model, and consumption (PAYG v. Reserved Instance, etc.) approach. That said, the relative cost/performance comparisons illustrated below should hold for the workload and model combinations shown below, regardless of the consumption approach. Computational Fluid Dynamics (CFD) OpenFOAM – version 2306 with 100M Cell Motorbike case Figure 4: HBv5 v. HBv4 on on OpenFOAM with the Motorbike 100M cell case HBv5 VMs provide a 4.8x performance increase over HBv4 VMs. Figure 5: The cost to complete the OpenFOAM Motorbike 100M case is just 57% of what it costs to complete the same case on HBv4. Above, we can see that for customers running OpenFOAM cases similar to the size and complexity of the 100M cell Motorbike problem, organizations can consolidate their server (VM) deployments by approximately a factor of five (5). Palabos – version 1.01 with 3D Cavity, 1001 x 1001 x 1001 cells case Figure 6: On Palabos, a Lattice Boltzmann solver using a streaming memory access pattern, HBv5 VMs provide a 4.4x performance increase over HBv4 VMs. Figure 7: The cost to complete the Palabos 3D Cavity case is just 62% of what it costs to complete the same case on HBv4. Above, we can see that for customers running Palabos with cases similar to the size and complexity of the 100M cell Motorbike problem, organizations can consolidate their server (VM) deployments by approximately a factor of ~4.5. Ansys Fluent – version 2025 R2 with F1 Racecar 140M case Figure 8: On ANSYS Fluent HBv5 VMs provide a 3.4x performance increase over HBv4 VMs. Figure 9: The cost to complete the ANSYS Fluent F1 racecar 140M case is just 81% of what it costs to complete the same case on HBv4. Above, we can see that for customers running ANSYS Fluent with cases similar to the size and complexity of the 140M cell F1 Racecar problem, organizations can consolidate their server (VM) deployments by approximately a factor of ~3.5. Siemens Star-CCM+ - version 17.04.005 with AeroSUV Steady Coupled 106M case Figure 10: On Star-CCM+, HBv5 VMs provide a 3.4x performance increase over HBv4 VMs. Figure 11: The cost to complete the Siemens Star-CCM+ANSYS Fluent F1 racecar 140M case is just 81% of what it costs to complete the same case on HBv4. Above, we can see that for customers running Star-CCM+ with cases similar to the size and complexity of the 106M cell AeroSUV Steady Coupled, organizations can consolidate their server (VM) deployments by approximately a factor of ~3.5. Weather Modeling WRF – version 4.2.2 with CONUS 2.5KM case Figure 12: On WRF, HBv5 VMs provide a 3.27x performance increase over HBv4 VMs. Figure 13: The cost to complete the WRF Conus 2.5KM case is just 84% of what it costs to complete the same case on HBv4. Above, we can see that for customers running WRF with cases similar to the size and complexity of the 2.5km CONUS, organizations can consolidate their server (VM) deployments by approximately a factor of ~3. Energy Research Devito – version 4.8.7 with Acoustic Forward case Figure 14: On Devito, HBv5 VMs provide a 3.27x performance increase over HBv4 VMs. Figure 15: The cost to complete the Devito Acoustic Forward OP case is equivalent to what it costs to complete the same case on HBv4. Above, we can see that for customers running Devito with cases similar to the size and complexity of the Acoustic Forward OP, organizations can consolidate their server (VM) deployments by approximately a factor of ~3. Molecular Dynamics NAMD - version 2.15a2 with STMV 20M case Figure 16: On NAMD, HBv5 VMs provide a 2.18x performance increase over HBv4 VMs. Figure 17: The cost to complete the NAMD STMV 20M case is 26% higher on HBv5 than what it costs to complete the same case on HBv4 Above, we can see that for customers running NAMD with cases similar to the size and complexity of the STMV 20M case, organizations can consolidate their server (VM) deployments by approximately a factor of ~2. Notably, NAMD is a compute bound case, rather than memory performance bound. We include it here to illustrate that not all workloads are fit for purpose with HBv5. This latest Azure HPC VM is the fastest at this workload on the Microsoft Cloud, but does not benefit substantially from HBv5’s premium levels of memory bandwidth. NAMD would instead perform more cost efficiently with a CPU that supports AVX512 instructions natively or, much better still, a modern GPU. Scalability of HBv5-series VMs Weak Scaling Weak scaling measures how well a parallel application or system performs when both the number of processing elements and the problem size increase proportionally, so that the workload per processor remains constant. Weak scaling cases are often employed when time-to-solution is fixed (e.g. it is acceptable to solve a problem within a specified period) but a user desires a simulation to be of a higher fidelity or resolution. A common example is operational weather forecasting. To illustrate weak scaling on HBv5 VMs, we ran Palabos with the same 3D cavity problem as shown earlier: Figure 18: On Palabos with the 3D Cavity model, HBv5 scales linearly as the 3D cavity size is proportionately increased. Strong Scaling Strong scaling is characterized by the efficiency with which execution time is reduced as the number of processor elements (CPUs, GPUs, etc.) is increased, while the problem size remains kept constant. Strong scaling cases are often employed when the fidelity or resolution of the simulation is acceptable, but a user requires faster time to completion. A common example is product engineering validation when an organization wants to bring a product to market faster but must complete a broad range of validation and verification scenarios before doing so. To illustrate Strong scaling on HBv5 VMs, we ran NAMD with two different problems, each intended to illustrate the how expectations for strong scaling efficiency change depending on problem size and the ordering of computation v. communication in distributed memory workloads. First, let us examine NAMD with the 20M STMV benchmark Figure 19: Strong scaling on HBv5 with NAMD STMV 20M cell case As illustrated above, for strong scaling cases for which the compute time is continuously reduced (by leveraging more and more processor elements) but communication time remains constant, scaling efficiency will only stay high for so long. That principle is well-represented by the STMV 20m case, for which parallel efficiency remains linear (i.e. cost/job remains flat) at two (2) nodes but degrades after that. This is because while compute is being sped up, the MPI time remains relatively flat. As such, the relatively static MPI time comes to dominate end-to-end wall clock time as VM scaling increases. Said another way, HBv5 features so much compute performance that even for a moderate-sized problem like STMV 20M scaling the infrastructure can only take performance so far and cost/job will begin to increase. If we examine HBv5 against the 210M cell case, however, with 10.5x as many elements to compute as its 20M case sibling, the scaling efficiency story changes significantly. Figure 19: On NAMD with the STMV 210M cell case, HBv5 scales linearly out to 32 VMs (or more than 11,000 CPU cores). As illustrated above, larger cases with significant compute requirements will continue to scale efficiently with larger amounts of HBv5 infrastructure. While MPI time remains relatively flat for this case (as is the case with the smaller STMV 20M case), the compute demands remain the dominant fraction of end-to-end wall clock time. As such, HBv5 scales these problems with very high levels of efficiency and in doing so job costs to the user remain flat despite up to 8x as many VMs being leveraged compared to the four (4) VM baseline. The key takeaways for strong scaling scenarios are two-fold. First, users should run scaling tests with their applications and models to find a sweet spot of faster performance with constant job costs. This will depend heavily on model size. Second, as new and very high end compute platforms like HBv5 emerge that accelerate compute time, application developers will need to find ways reduce wall clock times bottlenecking on communication (MPI) time. Recommended approaches include using fewer MPI processes and, ideally, restructuring applications to overlap communication with compute phases.
jvenkatesh
Nov 17, 2025 Place Azure High Performance Computing (HPC) Blog
1.3KViews
3likes
1Comment
Join Microsoft @ SC25: Experience HPC and AI Innovation
Supercomputing 2025 is coming to St. Louis, MO, November 16–21! Visit Microsoft Booth #1627 to explore cutting-edge HPC and AI solutions, connect with experts, and experience interactive demos that showcase the future of compute. Whether you’re attending technical sessions, stopping by for a coffee, or joining our partner events, we’ve got something for everyone. Booth Highlights Alpine Formula 1 Showcar: Snap a photo with a real Alpine F1 car and learn how high-performance computing drives innovation in motorsports. Silicon Wall: Discover silicon diversity—featuring chips from our partners AMD and NVIDIA, alongside Microsoft’s own first-party silicon: Maia, Cobalt, and Majorana. NVIDIA Weather Modeling Demo: See how AI and HPC predict extreme weather events with Tomorrow.io and NVIDIA technology. Coffee Bar with Barista: Enjoy a handcrafted coffee while you connect with our experts. Immersive Screens: Watch live demos and visual stories about HPC breakthroughs and AI innovation. Hardware Bar: Explore AMD EPYC™ and NVIDIA GB200 systems powering next-generation workloads. Whether you’re attending technical sessions, stopping by for a coffee and chat with our team, or joining our partner events, we’ve got something for everyone. Conference Details Conference week: Sun, Nov 16 – Fri, Nov 21 Expo hours (CST): Mon, Nov 17: 7:00–9:00 PM (Opening Night) Tue, Nov 18: 10:00 AM–6:00 PM Wed, Nov 19: 10:00 AM–6:00 PM Thu, Nov 20: 10:00 AM–3:00 PM Customer meeting rooms: Four Seasons Hotel Quick links RSVP — Microsoft + AMD Networking Reception (Tue, Nov 18): https://aka.ms/MicrosoftAMD-Mixer RSVP — Microsoft + NVIDIA Panel Luncheon (Wed, Nov 19): Luncheon is now closed as the event is fully booked. Earned Sessions (Technical Program) Sunday, Nov 16 Session Type Time (CST) Title Microsoft Contributor(s) Location Tutorial 8:30 AM–5:00 PM Delivering HPC: Procurement, Cost Models, Metrics, Value, and More Andrew Jones Room 132 Tutorial 8:30 AM–5:00 PM Modern High Performance I/O: Leveraging Object Stores Glenn Lockwood Room 120 Workshop 2:00–5:30 PM 14th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS 2025) Torsten Hoefler Room 265 Monday, Nov 17 Session Type Time (CST) Title Microsoft Contributor(s) Location Early Career Program 3:30–4:45 PM Voices from the Field: Navigating Careers in Academia, Government, and Industry Joe Greenseid Room 262 Workshop 3:50–4:20 PM Towards Enabling Hostile Multi-tenancy in Kubernetes Ali Kanso; Elzeiny Mostafa; Gurpreet Virdi; Slava Oks Room 275 Workshop 5:00–5:30 PM On the Performance and Scalability of Cloud Supercomputers: Insights from Eagle and Reindeer Amirreza Rastegari; Prabhat Ram; Michael F. Ringenburg Room 267 Tuesday, Nov 18 Session Type Time (CST) Title Microsoft Contributor(s) Location BOF 12:15–1:15 PM High Performance Software Foundation BoF Joe Greenseid Room 230 Poster 5:30–7:00 PM Compute System Simulator: Modeling the Impact of Allocation Policy and Hardware Reliability on HPC Cloud Resource Utilization Jarrod Leddy; Huseyin Yildiz Second Floor Atrium Wednesday, Nov 19 Session Type Time (CST) Title Microsoft Contributor(s) Location BOF 12:15–1:15 PM The Future of Python on HPC Systems Michael Droettboom Room 125 BOF 12:15–1:15 PM Autonomous Science Network: Interconnected Autonomous Science Labs Empowered by HPC and Intelligent Agents Joe Tostenrude Room 131 Paper 1:30–1:52 PM Uno: A One‑Stop Solution for Inter‑ and Intra‑Data Center Congestion Control and Reliable Connectivity Abdul Kabbani; Ahmad Ghalayini; Nadeen Gebara; Terry Lam Rooms 260–267 Paper 2:14–2:36 PM SDR‑RDMA: Software‑Defined Reliability Architecture for Planetary‑Scale RDMA Communication Abdul Kabbani; Jie Zhang; Jithin Jose; Konstantin Taranov; Mahmoud Elhaddad; Scott Moe; Sreevatsa Anantharamu; Zhuolong Yu Rooms 260–267 Panel 3:30–5:00 PM CPUs Have a Memory Problem — Designing CPU‑Based HPC Systems with Very High Memory Bandwidth Joe Greenseid Rooms 231–232 Paper 4:36–4:58 PM SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations Kun Li; Liang Yuan; Ting Cao; Mao Yang Rooms 260–267 Thursday, Nov 20 Session Type Time (CST) Title Microsoft Contributor(s) Location BOF 12:15–1:15 PM Super(computing)heroes Laura Parry Rooms 261–266 Paper 3:30–3:52 PM Workload Intelligence: Workload‑Aware IaaS Abstraction for Cloud Efficiency Anjaly Parayil; Chetan Bansal; Eli Cortez; Íñigo Goiri; Jim Kleewein; Jue Zhang; Pantea Zardoshti; Pulkit Misra; Raphael Ghelman; Ricardo Bianchini; Rodrigo Fonseca; Saravan Rajmohan; Xiaoting Qin Room 275 Paper 4:14–4:36 PM From Deep Learning to Deep Science: AI Accelerators Scaling Quantum Chemistry Beyond Limits Fusong Ju; Kun Li; Mao Yang Rooms 260–267 Friday, Nov 21 Session Type Time (CST) Title Microsoft Contributor(s) Location Workshop 9:00 AM–12:30 PM Eleventh International Workshop on Heterogeneous High‑performance Reconfigurable Computing (H2RC 2025) Torsten Hoefler Room 263 Booth Theater Sessions Monday, Nov 17 — 7:00 PM–9:00 PM Time (CST) Session Title Presenter(s) 8:00–8:20 PM Inside the World’s Most Powerful AI Data Center Chris Jones 8:30–8:50 PM Transforming Science and Engineering — Driven by Agentic AI, Powered by HPC Joe Tostenrude Tuesday, Nov 18 — 10:00 AM–6:00 PM Time (CST) Session Title Presenter(s) 11:00–11:50 AM Ignite Keynotes 12:00–12:20 PM Accelerating AI workloads with Azure Storage Sachin Sheth; Wolfgang De Salvador 12:30–12:50 PM Accelerate Memory Bandwidth‑Bound Workloads with Azure HBv5, now GA Jyothi Venkatesh 1:00–1:20 PM Radiation & Health Companion: AI‑Driven Flight‑Dose Awareness Olesya Sarajlic 1:30–1:50 PM Ascend HPC Lab: Your On‑Ramp to GPU‑Powered Innovation Daniel Cooke (Oakwood) 2:00–2:20 PM Azure AMD HBv5: Redefining CFD Performance and Value in the Cloud Rick Knoechel (AMD) 2:30–2:50 PM Empowering High Performance Life Sciences Workloads on Azure Qumulo 3:00–3:20 PM Transforming Science and Engineering — Driven by Agentic AI, Powered by HPC Joe Tostenrude 4:00–4:20 PM Unleashing AMD EPYC on Azure: Scalable HPC for Energy and Manufacturing Varun Selvaraj (AMD) 4:30–4:50 PM Automating HPC Workflows with Copilot Agents Xavier Pillons 5:00–5:20 PM Scaling the Future: NVIDIA’s GB300 NVL72 Rack for Next‑Generation AI Inference Kirthi Devleker (NVIDIA) 5:30–5:50 PM Enabling AI and HPC Workloads in the Cloud with Azure NetApp Files Andy Chan Wednesday, Nov 19 — 10:00 AM–6:00 PM Time (CST) Session Title Presenter(s) 10:30–10:50 AM AI‑Powered Digital Twins for Industrial Engineering John Linford (NVIDIA) 11:00–11:20 AM Advancing 5 Generations of HPC Innovation with AMD on Azure Allen Leibovitch (AMD) 11:30–11:50 AM Intro to LoRA Fine‑Tuning on Azure Christin Pohl 12:00–12:20 PM VAST + Microsoft: Building the Foundation for Agentic AI Lior Genzel (VAST Data) 12:30–12:50 PM Inside the World’s Most Powerful AI Data Center Chris Jones 1:00–1:20 PM Supervised GenAI Simulation – Stroke Prognosis (NVads V710 v5) Kurt Niebuhr 1:30–1:50 PM What You Don’t See: How Azure Defines VM Families Anshul Jain 2:00–2:20 PM Hammerspace Tier 0: Unleashing GPU Storage Performance on Azure Raj Sharma (Hammerspace) 2:30–2:50 PM GM Motorsports: Accelerating Race Performance with AI Physics on Rescale Bernardo Mendez (Rescale) 3:00–3:20 PM Hurricane Analysis and Forecasting on the Azure Cloud Salar Adili (Microsoft); Unni Kirandumkara (GDIT); Stefan Gary (Parallel Works) 3:30–3:50 PM Performance at Scale: Accelerating HPC & AI Workloads with WEKA on Azure Desiree Campbell; Wolfgang De Salvador 4:00–4:20 PM Pushing the Limits of Performance: Supercomputing on Azure AI Infrastructure Biju Thankachen; Ojasvi Bhalerao 4:30–4:50 PM Accelerating Momentum: Powering AI & HPC with AMD Instinct™ GPUs Jay Cayton (AMD) Thursday, Nov 20 — 10:00 AM–3:00 PM Time (CST) Session Title Presenter(s) 11:30–11:50 AM Intro to LoRA Fine‑Tuning on Azure Christin Pohl 12:00–12:20 PM Accelerating HPC Workflows with Ansys Access on Microsoft Azure Dr. John Baker (Ansys) 12:30–12:50 PM Accelerate Memory Bandwidth‑Bound Workloads with Azure HBv5, now GA Jyothi Venkatesh 1:00–1:20 PM Pushing the Limits: Supercomputing on Azure AI Infrastructure Biju Thankachen; Ojasvi Bhalerao 1:30–1:50 PM The High Performance Software Foundation Todd Gamblin (HPSF) 2:00–2:20 PM Heidi AI — Deploying Azure Cloud Environments for Higher‑Ed Students & Researchers James Verona (Adaptive Computing); Dr. Sameer Shende (UO/ParaTools) Partner Session Schedule Tuesday, Nov 18 Date Time (CST) Title Microsoft Contributor(s) Location Nov 18 11:00 AM–11:50 AM Cloud Computing for Engineering Simulation Joe Greenseid Ansys Booth Nov 18 1:00 PM–1:30 PM Revolutionizing Simulation with Artificial Intelligence Joe Tostenrude Ansys Booth Nov 18 4:30 PM–5:00 PM [HBv5] Jyothi Venkatesh AMD Booth Wednesday, Nov 19 Date Time (CST) Title Microsoft Contributor(s) Location Nov 19 11:30 AM–1:30 PM Accelerating Discovery: How HPC and AI Are Shaping the Future of Science (Lunch Panel) Andrew Jones (Moderator); Joe Greenseid (Panelist) Ruth's Chris Steak House Nov 19 1:00 PM–1:30 PM VAST and Microsoft Kanchan Mehrotra VAST Booth Demo Pods at Microsoft Booth Azure HPC & AI Infrastructure Explore how Azure delivers high-performance computing and AI workloads at scale. Learn about VM families, networking, and storage optimized for HPC. Agentic AI for Science See how autonomous agents accelerate scientific workflows, from simulation to analysis, using Azure AI and HPC resources. Hybrid HPC with Azure Arc Discover how Azure Arc enables hybrid HPC environments, integrating on-prem clusters with cloud resources for flexibility and scale. Ancillary Events (RSVP Required) Microsoft + AMD Networking Reception — Tuesday Night When: Tue, Nov 18, 6:30–10:00 PM (CST) Where: UMB Champions Club, Busch Stadium RSVP: https://aka.ms/MicrosoftAMD-Mixer Microsoft + NVIDIA Panel Luncheon — Wednesday When: Wed, Nov 19, 11:30 AM–1:30 PM (CST) Where: Ruth’s Chris Steak House Topic: Accelerating Discovery: How AI and HPC Are Shaping the Future of Science Panelists: Dan Ernst (NVIDIA); Rollin Thomas (NERSC); Joe Greenseid (Microsoft); Antonia Maar (Intersect360 Research); Fernanda Foertter (University of Alabama) RSVP: Luncheon is now closed as the event is fully booked. Conclusion We’re excited to connect with you at SC25! Whether you’re exploring our booth demos, attending technical sessions, or joining one of our partner events, this is your opportunity to experience how Microsoft is driving innovation in HPC and AI. Stop by Booth #1627 to see the Alpine F1 showcar, explore the Silicon Wall featuring AMD, NVIDIA, and Microsoft’s own chips, and enjoy a coffee from our barista while networking with experts. Don’t forget to RSVP for our Microsoft + AMD Network Reception and Microsoft + NVIDIA Panel Luncheon See you in St. Louis!
Fernando_Aznar
Nov 14, 2025 Place Azure High Performance Computing (HPC) Blog
818Views
1like
0Comments
Breaking the Million-Token Barrier: The Technical Achievement of Azure ND GB300 v6
Azure ND GB300 v6 Virtual Machines with NVIDIA GB300 NVL72 rack-scale systems achieve unprecedented performance of 1,100,000 tokens/s on Llama2 70B Inference, beating the previous Azure ND GB200 v6 record of 865,000 tokens/s by 27%.
HugoAffaticati
Nov 04, 2025 Place Azure High Performance Computing (HPC) Blog
18KViews
0likes
0Comments