Blog Post

Azure High Performance Computing (HPC) Blog
5 MIN READ

Slurm custom image for a locked down environment and faster start-up time, Azure Cyclecloud

seif765's avatar
seif765
Icon for Microsoft rankMicrosoft
May 22, 2025

Many CycleCloud users or admins need to run their HPC cluster in a secure environment without internet access. Others may want to pre-install Slurm packages on the OS image for cluster nodes to save time during scaling operations. This is a detailed, step by step, guide on how to achieve that.

Environment :

Cyclecloud: 8.7.1

Slurm project 3.0.11

Slurm version: 23.11.10-2

OS of compute and execute: marketplace Almalinux HPC image gen 2 8.10

Prerequisites:

- working CC install (mine is currently 8.7.1) 

- Azure CLI installed  

- CycleCloud CLI installed 

- Azure subscription with permissions to create VMs.

Create VM with all packages.

Install all slurm packages required for compute and execute in an image.

 

1- Create a standalone azure VM with Almalinux HPC image gen 2 8.10 

2- log in to that VM and run the following :

wget https://github.com/Azure/cyclecloud-slurm/releases/download/3.0.11/azure-slurm-install-pkg-3.0.11.tar.gz tar -xvf azure-slurm-install-pkg-3.0.11.tar.gz #cd into the extracted directory. # copy the modified script attached to this article. # run it with the version of slurm available in the version of the project downloaded. ./rhel.mod.sh 23.11.10-2 false

The rhel.mod.sh can be downloaded from here Release 0.1 · amsafo/cyclecloud-blog 

Slurm version must be already supported by the slurm project version being used.

 

3- on the same VM add slurm and Munge user

useradd -u 11100 --no-create-home slurm useradd -u 11101 --no-create-home munge

the above are the default user IDs for munge and slurm please change them if you are customising the users ID of any of them in your cyclecloud template or settings.

 

Add other packages needed by cyclecloud (Chef) 

On the same VM install the following packages as cyclecloud installs Chef and it depends on them.

yum install -y xfsdump xfsprogs yum install -y cryptsetup lvm2

Adjust the OS repositories:

Enable repositories that will be available in the locked down enviornment. You must have at least one active repo.  

In my case I wanted to enable packages-microsoft and AMLFS repo , you can check available repos using  

yum repolist

To disable all the repos:  

yum-config-manager --disable '*'

 

Then you will need to enable the repos you want ( packages-microsoft and amlfs) 

yum-config-manager --enable packages-microsoft-com-prod yum-config-manager --enable amlfs

 

Note: you must keep at least on repo enabled. it will not use it but while chef is trying to install packages that we already installed like xfsdump it needs to check one accessible repo at least. Even if the packages doesn’t exist in that repo, it will work fine but if it has no accessible repo it can raise an error. 

Generalise the VM:

The next step is to delete all the user specific data in preparation to capture the image. 

run this inside the VM. 

waagent --deprovision+user –force

Now log out from the VM and then deallocate it.  

Then in Azure CLI use the below command to generalise it:  

az vm generalize --resource-group myResourceGroup --name myVM

If you want to know more about that command Deprovision or generalize a VM before creating an image - Azure Virtual Machines | Microsoft Learn 

Create the image:

Now we need to capture the image from that VM, in azure portal go to the VM > Overview > capture > Image

 

Select a gallery or create a new one and don’t select “No, capture only a managed image” as cyclecloud will not be able to us that.

You can check to automatically delete the VM after capturing the image as it's not going to be usable anyway.

Fill the rest of info including versions, end life etc. click review and create.

This can take some times, for me it took around 15 to 20 minutes of deploying till it capture the image successfully.

Once the image created go to the properties and take a note of the resource ID as we will need that when we create a cluster using it.

 

Modify Cyclecloud template to not install slurm:

Next step is going back to cyclecloud and modify the template to not install slurm and create a cluster using that.

In the “node defaults” ,”configuration” section, (first one in the default template)

You will need to add the following line “slurm.do_install = false”

 

[[[configuration]]]

        slurm.version = $configuration_slurm_version
        slurm.do_install = false
        slurm.user.uid = 11100
        slurm.user.gid = 11100
        munge.user.uid = 11101
  • Once you update the template recreate the cluster using it:

cyclecloud import_cluster NEW_CLUSTER_NAME -c Slurm -f slurm.txt

If you want to learn more about that command Create a Cluster - Azure CycleCloud | Microsoft Learn

  • Next you need to change the image in cyclecloud to custom image and provide it with the image resource ID of the created image.

 

  • If you can open the cycleserver to have internet access for one time only while starting the cluster for the first time, that will help in caching the project for the first time. If that is not possible you will need to follow the steps in this link Running in Locked Down Networks - Azure CycleCloud | Microsoft Learn. You will also need to review external access to other Azure services shown in the same article
If you are using cyclecloud version 8.7.1 or earlier:

We found few issues that made the “slurm.do_install = false” installation challenging, this patch release by our product team has resolved them and it will be included in any future release Fix do-install flag. · Azure/cyclecloud-slurm@1bbcf04 · GitHub

Please note you don't need to do that if you are using newer version.

To apply the patch  on cyclecloud CLI clone the repo

git clone https://github.com/Azure/cyclecloud-slurm.git
cd cyclecloud-slurm/
git checkout patch_customer

Inside the directory create another directory named “blobs”

place in it the two tar.gz files names "azure-slurm-install-pkg-3.0.11.tar.gz" and "azure-slurm-pkg-3.0.11.tar.gz" that you can find here Release 0.1 · amsafo/cyclecloud-blog · GitHub

then you will need to upload the project with the patch:

cyclecloud project upload "STORAGE_Locker_NAME"
######### if you are not sure about the locker name you can list the avialable ones by running 
cyclecloud locker list

######### run this command to your cluster template 
sed -i 's/cyclecloud\/slurm/slurm/g' your_template.txt

######### reimport the template
cyclecloud import_cluster NEW_CLUSTER_NAME -c Slurm -f your_template.txt

 

 

Updated May 23, 2025
Version 2.0
No CommentsBe the first to comment