Azure CycleCloud is an enterprise-friendly tool for orchestrating and managing High-Performance Computing (HPC) environments on Azure. With CycleCloud, users can provision infrastructure for HPC systems, deploy familiar HPC schedulers, and automatically scale the infrastructure to run jobs efficiently at any scale.
Slurm is a widely used open-source HPC scheduler that can manage workloads across clusters of compute nodes. Slurm can also be configured to interact with cloud resources, such as Azure CycleCloud, to dynamically add or remove nodes based on the demand of the jobs. This allows users to optimize their resource utilization and cost efficiency, as well as to access the scalability and flexibility of the cloud.
In this blog post, we are discussing how to integrate an external Slurm Scheduler to send jobs to CycleCloud for cloud bursting (Enabling on-premises workloads to be sent to the cloud for processing, known as “cloud bursting”) or hybrid HPC scenarios. For demonstration purposes, we are creating a Slurm Scheduler node in Azure as an external Slurm Scheduler in a different VNET and the execute nodes are in CycleCloud in a separate VNET. We are not discussing the complexities of networking involved in Hybrid scenarios.
Before we start, we need to have the following items ready:
Steps
After we have the prerequisites ready, we can follow these steps to integrate the external Slurm Scheduler node with the CycleCloud cluster:
1. On CycleCloud VM:
cyclecloud
CLI.slurm-headless.txt
).hpc1
using theslurm-headless.txt
template.git clone https://github.com/vinil-v/slurm-cloud-bursting-using-cyclecloud.git
cyclecloud import_cluster hpc1 -c Slurm-HL -f slurm-cloud-bursting-using-cyclecloud/templates/slurm-headless.txt
Output:
[vinil@cc86 ~]$ cyclecloud import_cluster hpc1 -c Slurm-HL -f slurm-cloud-bursting-using-cyclecloud/cyclecloud-template/slurm-headless.txt
Importing cluster Slurm-HL and creating cluster hpc1....
----------
hpc1 : off
----------
Resource group:
Cluster nodes:
Total nodes: 0
2. Preparing Scheduler VM:
slurm-scheduler-builder.sh
) and provide the cluster name (hpc1
) when prompted.
git clone https://github.com/vinil-v/slurm-cloud-bursting-using-cyclecloud.git
cd slurm-cloud-bursting-using-cyclecloud/scripts
sh slurm-scheduler-builder.sh
Output:
------------------------------------------------------------------------------------------------------------------------------
Building Slurm scheduler for cloud bursting with Azure CycleCloud
------------------------------------------------------------------------------------------------------------------------------
Enter Cluster Name: hpc1
------------------------------------------------------------------------------------------------------------------------------
Summary of entered details:
Cluster Name: hpc1
Scheduler Hostname: masternode2
NFSServer IP Address: 10.222.1.26
3. CycleCloud UI:
hpc1
cluster settings, and configure VM SKUs and networking settings./sched
and /shared
mounts in the Network Attached Storage section.hpc1
cluster
4. On Slurm Scheduler Node:
cyclecloud-integrator.sh
script.cd slurm-cloud-bursting-using-cyclecloud/scripts
sh cyclecloud-integrator.sh
Output:
[root@masternode2 scripts]# sh cyclecloud-integrator.sh
Please enter the CycleCloud details to integrate with the Slurm scheduler
Enter Cluster Name: hpc1
Enter CycleCloud Username: vinil
Enter CycleCloud Password:
Enter CycleCloud URL (e.g., https://10.222.1.19): https://10.222.1.19
------------------------------------------------------------------------------------------------------------------------------
Summary of entered details:
Cluster Name: hpc1
CycleCloud Username: vinil
CycleCloud URL: https://10.222.1.19
------------------------------------------------------------------------------------------------------------------------------
5. User and Group Setup:
users.sh
script to create a test user vinil
and group for job submission. (User vinil
exists in CycleCloud)
cd slurm-cloud-bursting-using-cyclecloud/scripts
sh users.sh
6. Testing & Job Submission:
vinil
in this example) on the Scheduler node.
su - vinil
srun hostname &
Output:
[root@masternode2 scripts]# su - vinil
Last login: Tue May 14 04:54:51 UTC 2024 on pts/0
[vinil@masternode2 ~]$ srun hostname &
[1] 43448
[vinil@masternode2 ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1 hpc hostname vinil CF 0:04 1 hpc1-hpc-1
[vinil@masternode2 ~]$ hpc1-hpc-1
You will see a new node getting created in hpc1 cluster.
Congratulations! You have successfully set up Slurm bursting with CycleCloud on Azure.
In this blog post, we have shown how to integrate an external Slurm Scheduler node with Azure CycleCloud for cloud bursting or hybrid HPC scenarios. This enables users to leverage the power and flexibility of the cloud for their HPC workloads, while maintaining their existing Slurm workflows and tools. We hope this guide helps you to get started with your HPC journey on Azure.
Reference:
GitHub repo - slurm-cloud-bursting-using-cyclecloud
Azure CycleCloud Documentation
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.