Azure CycleCloud is an enterprise-friendly tool for orchestrating and managing High Performance Computing (HPC) environments on Azure. With CycleCloud, users can provision infrastructure for HPC systems, deploy familiar HPC schedulers, and automatically scale the infrastructure to run jobs efficiently at any scale. Through CycleCloud, users can create different types of file systems and mount them to the compute cluster nodes to support HPC workloads.
Azure CycleCloud is targeted at HPC administrators and users who want to deploy an HPC environment with a specific scheduler in mind -- commonly used schedulers such as Slurm, PBSPro, LSF, Grid Engine, and HT-Condor are supported out of the box. CycleCloud is the sister product to Azure Batch, which provides a Scheduler as a Service on Azure.
CycleCloud provides methods to customize clusters via custom OS images, or bootstrap options like Cloud-init and Cluster-init. Using a custom OS image can help reduce node boot time compared to using Cloud-init or Cluster-init, but maintaining the custom OS can be a big commitment. Using Cloud-Init or Cluster-Init can be a faster implementation to install packages, configure mounts, or edit config files. The decision to use Cloud-init vs Cluster-init generally depends on what you need to do, whether versioning is needed, etc. This blog will focus on using Cloud-init and extending it with the Linux "at" utility.
"Cloud-init is the industry standard multi-distribution method for cross-platform cloud instance initialization. It is supported across all major public cloud providers, provisioning systems for private cloud infrastructure, and bare-metal installations."
(REF: http://loudinit.readthedocs.io/en/latest/)
Azure CycleCloud provides the ability to configure Cloud-init
in the GUI...
...or via the cluster template file:
Cloud-init
is (in my opinion) "easier" than a cluster-init project because it can be configured directly in the GUI. (NOTE: the default GUI Cloud-init
is not an exported parameter when using the "cyclecloud export_parameters" CLI command. The cluster template can be modified to create a new "parameterized" cloud-init that will be exported as a parameter if needed). The native Cloud-init
syntax, as shown in the GUI capture above, is directly supported by CycleCloud, which is very useful when using multiple Linux distributions. A shell script can also be configured directly in the GUI for Cloud-init
.
Here are some example use cases for Cloud-init
:
Cloud-init
)
Here are examples of use cases not well suited for Cloud-init
:
slurm.conf
)/sched
or/shared
autoscale.json
file
These examples not well suited for Cloud-init
are better suited to Cluster-init as it runs after Jetpack has configured the CycleCloud and cluster components (ie. OpenPBS). However, combined with Linux "at" utility, it becomes possible to use Cloud-init
to support these use cases.
A Redhat.com article describes the "at" utility as follows:
Theat
andbatch
(at -b
) commands read from standard input or a specified file. Theat
tool allows you to specify that a command will run at a particular time. Thebatch
command will execute commands when the system load levels drop to a specific point. Both commands use the user's shell.
Combining "at" with Cloud-init
enables the ease of configuring the script in the CycleCloud GUI with the ability to modify actions performed by Jetpack. The high level structure of a Cloud-init
script using "at" looks like this:
Cloud-init
scriptCloud-init
can finish and move to Jetpack phase)
A simple example of this method is to change the scheduling behavior of the CycleCloud provided OpenPBS project. By default, the CycleCloud autoscale.json file is configured to use physical CPUs for VM resources in the OpenPBS cluster. Most Azure VMs use virtual CPUs, for example, the F72s_v2 VM has 72 virtual CPUs and 36 physical CPUs. To schedule against the 72 virtual CPUs requires modification to the autoscale.json file to use virtual CPUs. Since this file is created during the CycleCloud Jetpack phase it is not available when Cloud-init
runs. The Cloud-init
script below creates a new script file that checks for the existence of the autoscale.json file and if not found will sleep for 30s and try again. Once the file exists the while loop is broken, and the script modifies the autoscale.json file to use virtual CPUs:
#!/bin/bash
set -x
# Redirect all output to a log file
exec 1>/var/log/cyclecloud_cloud_init.log 2>&1
# Check if CentOS and install EPEL repo and at
if grep -iq centos /etc/os-release; then
yum install -y epel-release
yum install -y at
fi
systemctl enable --now atd.service
# Create a new local script file
cat << EOF>>/tmp/autoscale-vcpu.sh
#!/bin/bash
set -x
exec 1>/tmp/autoscale-vcpu.log 2>&1
while [ ! -f /opt/cycle/pbspro/autoscale.json ]; do
sleep 30
done
sed -i 's/pcpu_count/vcpu_count/g' /opt/cycle/pbspro/autoscale.json
EOF
# Schedule the script file to run in 1 min
at now + 1 minute -f /tmp/autoscale-vcpu.sh
CycleCloud Cloud-init
can be extended with the Linux "at" utility to accomplish tasks not possible with Cloud-init
alone. It does not fully replace the CycleCloud specific Cluster-init projects, but does provide a quick and easy interface for small changes. And remember, the more changes you make using either Cloud-init
or Cluster-init, the longer it will take to boot a node.
Interested in learning more about high performance computing?
References:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.