ramp up with me
11 TopicsAzure’s ND GB200 v6 Delivers Record Performance for Inference Workloads
Achieving peak AI performance requires both cutting-edge hardware and a finely optimized infrastructure. Azure’s ND GB200 v6 Virtual Machines, accelerated by the NVIDIA GB200 Blackwell GPUs, have already demonstrated world record performance of 865,000 tokens/s for inferencing on the industry standard LLAMA2 70BCreating a Slurm Job Submission App in Open OnDemand with Copilot Agent
High Performance Computing (HPC) environments are essential for research, engineering, and data-intensive workloads. To efficiently manage compute resources and job submissions, organizations rely on robust scheduling and orchestration tools. In this blog post, we'll explore how to use Copilot Agent in Visual Studio Code (VSCode) to build an Open OnDemand application that submits Slurm jobs in a CycleCloud Workspace for Slurm (CCWS) environment. We'll start with a brief overview of CCWS and Open OnDemand, then dive into the integration workflow.Distributed Finetuning, and Inference with NeMo-Run on Azure CycleCloud Workspace for Slurm
Introduction The NVIDIA NeMo Framework is a scalable, cloud-native generative AI framework designed to support the development of Large Language Models (LLMs) and Multimodal Models (MMs). The NeMo Framework provides comprehensive tools for efficient training, including Supervised Fine-Tuning (SFT) and Parameter Efficient Fine-Tuning (PEFT). One notable tool within the framework is NeMo-Run, which offers an interface to streamline the configuration, execution, and management of experiments across various computing environments. This functionality encompasses the ability to launch jobs locally on workstations or on large clusters, either enabled with SLURM or Kubernetes within a cloud environment. In this blog post, we will demonstrate how you can use CycleCloud Workpsace for Slurm (CCWS) for distributed finetuning and inference with NeMo-Run. Why Azure CycleCloud Workspace for Slurm? CCWS clusters come with enroot and pyxis pre-configured, facilitating the execution of workloads in containerized environments. CCWS offers integration with Open OnDemand and includes an application for running VSCode against the HPC Cluster in web mode. This application can be used to run Jupyter Notebooks. Using Jupyter Notebooks with NeMo-Run allows for real-time experimentation, visual insight into results, and structured logging—enhancing collaboration and ensuring reproducible workflows. Get Started The full example and setup instructions are available in the recently announced AI Infrastructure on Azure repository. The NeMo-Run example includes: A script to create a Python virtual environment with the required packages to use NeMo-Run. Detailed instructions and guidance for a deployment of CCWS with Open OnDemand integration. A custom module for a NeMo-Run Slurm Executor. A Jupyter Notebook with simple examples that highlight how NeMo-Run recipes can be used to run, track, evaluate, and reproduce experiments. This example, highlighted in the attached video, demonstrates the use of NeMo-Run recipes for fine-tuning and inference within a SLURM cluster environment. Since NeMo-Run also provides recipes for pre-training, this guide can similarly serve as a reference for those tasks. Note that while an inference example is included, it is intended solely for quick model evaluation. For production-scale or batch inference, it is recommended to adapt the model into a dedicated inference pipeline or service. Conclusion By combining NVIDIA’s NeMo-Run framework with Azure CycleCloud Workspace for Slurm, you gain a powerful, flexible, and repeatable setup for training and fine-tuning large language models at scale. This example offers a practical and extensible foundation that can be used in combination with the recommendations from the AI Infrastructure on Azure repository to build large-scale and reproducible AI workflows today.Open OnDemand with Azure CycleCloud Workspace for Slurm
Open OnDemand, developed by the Ohio Supercomputer Center (OSC), is an open-source, web-based portal designed to offer seamless access to high-performance computing (HPC) resources. The integration with Azure CycleCloud Workspace for Slurm facilitates the deployment and configuration of an Open OnDemand virtual machine using a dedicated CycleCloud cluster template and project scripts. Upon the completion of setup, a local daemon will query CycleCloud to automatically register Slurm clusters. User authentication will be managed through Azure Entra ID, requiring the registration and configuration of an Entra Application to support the OpenID Connect mechanism. CycleCloud will manage local user administration for clusters. Additionally, a Visual Studio Code application will be pre-configured on Open OnDemand, enabling users to run VSCode in web mode on login nodes or dynamically provisioned compute nodes.New Linux VDI Solution: Deploy Cendio Thinlinc as a Standalone Image with Direct Connectivity
Prerequisites Microsoft Azure Subscription: Ensure you have an active Microsoft Azure subscription. Virtual Network: Ensure you have a VNet with connectivity to your corporate network (for example, over VPN) Resource Group: Choose an existing resource group or create a new one. SSH Key (Optional): Generate an SSH key pair if you don't already have one. Step 1: Deploy the Standalone Thinlinc VM Image Navigate to the Microsoft Azure Marketplace Go to the ThinLinc page on the Microsoft Azure Marketplace: Thinlinc by Cendio Click “Get It Now” to start the deployment process. Choose between Alma Linux 9 and Ubuntu 22.04 Click “Create” in the new Microsoft Azure Portal tab Configure Basic Settings Subscription: Select your Microsoft Azure subscription. Resource Group: Choose an existing resource group or create a new one. VM Name: Provide a name for the virtual machine (VM). Region: Select the Azure region where the VM will be deployed. Availability Zone: Select an Availability Zone as you prefer. Image: Ensure the proper Thinlinc image is selected Select VM Size Click "Change size" and choose a VM that has sufficient resources for your workload. We highly recommend a GPU enabled VM series (e.g. NV-series) although it is possible to do this without a GPU machine. Configure Administrator Account Choose the Authentication Type: We recommend using a SSH public key although you can use a password as well. If using SSH public key, upload your SSH public key or generate a new key pair. Set a username for the admin account. Inbound Port Rules Permit only essential ports (example, 22 for SSH if required for admin maintenance). Networking Tab Virtual Network: Select your existing VNet with direct connectivity Subnet: Choose a subnet that can route traffic to your corporate network Public IP: Disable (since direct connectivity is used) Review + Create Review all your information, then click “Create” to deploy the VM. Step 2: Configure Thinlinc Server After deploying the VM, configure the Thinlinc server to ensure it is accessible and running properly. This step verifies the installation and retrieves the private IP for connection Navigate to your deployment in the Microsoft Azure Portal. Select your deployed VM, go to the networking tab and note its private IP address Connect to the VM from a machine within your private network using SSH: ssh “username”@<PRIVATE_IP> Step 3: Connect Using Thinlinc Client (see below) Download the Thinlinc client from Cendio's Website Open the Thinlinc client and enter the VMs private address in the “server” line Enter your username and SSH key (or password) 4. Click connect to launch your Thinlinc session Conclusion: You have successfully deployed a standalone Cendio Thinlinc VM on Microsoft Azure with direct connectivity! This setup allows seamless remote Linux desktop access while leveraging existing VPNs and private networking. Stay tuned for more updates on our Thinlinc integration with CycleCloud Workspaces for Slurm which will support advanced AI and HPC workloads.Preview Announcement: Open OnDemand Integration with Azure CycleCloud Workspace for Slurm
This exciting integration with CycleCloud enables end users to dynamically create and configure an Open OnDemand server within minutes. The system automatically deploys Open OnDemand and registers any Slurm cluster deployed by CycleCloud, streamlining the setup process.Ramp up with me... on HPC: Understanding Virtual Machines, CPUs, and GPUs
There are a lot of different products you need to successfully complete a high-performance computing (HPC) workload. You’ll hear several terms regularly, like virtual machines, CPUs, GPUs, compute power, and compute constrained. While these are really important to talk about in high performance computing, they were difficult concepts for me to grasp. Personally, I am a visual learner and I struggle with theoretical concepts that I can’t just physically see. So, I’ll do my best to both explain the concepts, and show you the hardware so you can visualize what they are.Ramp up with me...on HPC: What is high-performance computing (HPC)?
Over the next several months, let’s take a journey together and learn about the different use cases. Join me as I dive into each use case and for some of them, I’ll even try my hand at the workload for the first time. We’ll talk about what went well, and any what issues I ran into. And maybe, you’ll get to hear a little about our customers and partners along the way.