Azure HPC main orchestration solutions, Azure Batch and Azure CycleCloud, can be easily adapted to the needs of a specific workload scenario.
One of the features enabling this flexibility is the possibility for end users to create custom OS images for their clusters. These images can be configured to contain specific libraries, applications, drivers or other dependencies required for the workload.
However, managing these customizations can quickly become time consuming and error prone without the implementation of appropriate automation strategies.
This article defines an automated methodology to create custom HPC images through Azure Image Builder and Azure Compute Gallery. It will present also a recipe to deploy the required Azure resources through an Infrastructre as Code (IaC) approach using Bicep templates.
The official scripts for Azure HPC images preparations contained in Azure/azhpc-images repository are leveraged in the procedure described in this article.
Azure HPC images repository
Azure/azhpc-images repository contains the recipes for the preparation of CentOS-HPC / Ubuntu-HPC / Alma-HPC for H-series and N-series machines, as described in the Azure HPC images documentation.
The scripts from this repository perform installation of relevant tools, libraries and drivers for the HPC world. For example, all the layers added in the process for a CentOS 7.9 image are reported inside a readme file of the official repository, where the list contains:
- NVIDIA Mellanox OFED
- Different flavors of MPI and communication runtimes
- NVIDIA GPU Drivers and NVIDIA NCCL
- Docker and NVIDIA-Docker
- AMD and Intel libraries
- GCC compiler
- ...and several others
This repository is a valuable guide for customizing an HPC image starting from a standard Azure Marketplace image since it allows to install the standard packages for an HPC scenario and a specific OS.
In this article, it will be leveraged for the customization of an Ubuntu 20.04 LTS Azure Marketplace image
Target scenario
The methodology presented in this article is based on an Azure Image Builder instance deploying an image to an Azure Compute Gallery. The reference architecture is aligned to what is described in Azure Documentation for Azure Image Builder using a virtual network
Azure Image Builder instance has a User Assigned Identity with two custom roles assigned: one for reading/joining virtual networks and one for contributing to Managed Images or Compute Gallery images (role assignment is scoped to the specific Resource Group).
The architecture uses Azure Image Builder without public IPs. This makes this procedure applicable even for organizations having security policies not allowing public IP deployments.
When Azure Image Builder is deployed inside a Virtual Network it leverages Private Link Service. Private Link Service communicates to a Proxy VM using an Azure Load Balancer. Azure Image Builder service interacts through the Proxy VM with the Build VM.
The images are organized in Image Definitions inside an Azure Compute Gallery. An image definition is a logical grouping of multiple versions of a specific image, meaning each Image Definition can contain multiple Image Versions
Azure Image Builder will distribute the custom OS image to a specific Image Definition inside an Azure Compute Gallery, creating a new incremental Image Version.
Bicep automation
The architecture can be deployed using an Infrastructure as Code (IaC) approach.
A Bicep template has been created for the purpose starting from the ARM templates present in Azure Image Builder documentation
Bicep template contains deployment instructions for all the resources described in the previous paragraph.
Image Builder definition
Azure Image Builder Bicep template is the core element of the deployment. The template in the repository is focused on the creation of an Ubuntu-HPC image starting from an Azure Marketplace image. The Bicep reference for Azure Image Builder describes all the possible configuration options for the Image Builder.
The source image for the build process is defined with a source object in the properties of the Azure Image Builder Bicep template:
source: {
type: 'PlatformImage'
publisher: 'Canonical'
offer: '0001-com-ubuntu-server-focal'
sku: '20_04-lts-gen2'
version: 'latest'
}
Starting from this image, the Image Builder will apply customizations on the Build VM after the image is loaded. The customize property allows to execute different operations in the image creation process:
customize: [
{
type: 'Shell'
name: 'InstallUpgrades'
inline: [
'wget https://codeload.github.com/Azure/azhpc-images/zip/refs/heads/master -O azhpc-images-master.zip'
'sudo apt-get install unzip'
'unzip azhpc-images-master.zip'
'sed -i "s%./install_nvidiagpudriver.sh%#./install_nvidiagpudriver.sh%g" azhpc-images-master/ubuntu/ubuntu-20.x/ubuntu-20.04-hpc/install.sh'
'sed -i \'s%$UBUNTU_COMMON_DIR/install_nccl.sh%#$UBUNTU_COMMON_DIR/install_nccl.sh%g\' azhpc-images-master/ubuntu/ubuntu-20.x/ubuntu-20.04-hpc/install.sh'
'sed -i \'s%rm /etc/%rm -f /etc/%g\' azhpc-images-master/ubuntu/common/install_monitoring_tools.sh'
'cd azhpc-images-master/ubuntu/ubuntu-20.x/ubuntu-20.04-hpc/'
'sudo ./install.sh'
'cd -'
'sudo rm -rf azhpc-images-master'
]
}
]
NVIDIA drivers and NVIDIA NCCL are skipped in this example since the image is assumed to be used only for compute nodes.
If NVIDIA drivers are needed and the Build VM is a SKU without a NVIDIA card, the kernel module load will fail at the end of the NVIDIA driver installation.
This can be overcome in two ways:
- Using a VM SKU with NVIDIA GPU (defining it in the vmSize property of the Image Builder, but paying attention to the related cost per built)
- Forcing the scripts to ignore the error in the customization process and verifying the proper function of the driver later.
The target of the image deployment is then defined through the distribute property and allows to select the target among the three available scenarios: a managed image, a compute gallery or a VHD in a storage account
Image build process involves a real Azure VM. VmSize property allows to specify the VM SKU to be used and this will have direct impact on the price per build of the image. Other properties like OS Disk size and the subnet ID can be defined through this parameter.
Bicep template in the repository is using a Standard_D8ds_v5, while the default would be (for Gen 2 images) Standard_D4ds_v4.
Please consider that this machine selection will determine the biggest part of the cost of each image builder execution.
Deployment and first execution
The deployment of Bicep template has the following prerequisites:
- Having a target Resource Group (<RESOURCE_GROUP_NAME>)
- Having an assigned Owner role to deploy the resources and manage permissions in the Resource Group scope
- A working Azure CLI installation, with login performed for the user in the target tenant (az login) and with the correct subscription set active (az account set). The fastest way is to leverage on Azure Cloud Shell available directly from Azure Portal. The pricing of Azure Cloud Shell is mainly related to the data storage in an Azure File Share and the outbound data transfer, both extremely low in this example.
The following commands in the Azure CLI download the repository an deploy the resources:
git clone https://github.com/wolfgang-desalvador/az-hpc-image-builder.git
cd az-hpc-image-builder
az deployment group create --resource-group <RESOURCE_GROUP_NAME> --template-file main.bicep
The following mandatory parameters need to be specified interactively at the beginning of the deployment (or should be provided through a Bicep parameters file) :
- imageBuilderName -> the name of the image builder. This will be the resource name for the image builder. It will also act as the prefix for the Managed Identity, the VNET and the NSG. However, there is the option to avoid deploying a VNET and NSG, that will be described below
- destinationGalleryName -> name of the Compute Gallery
- destinationGalleryDescription -> a description for the Compute Gallery
- destinationImageName -> the name for the new Image Definition inside the Compute Gallery
- destinationImageDescription -> the description of the Image Definition
The following optional parameters can be used to avoid deploying the VNET and NSG:
- deployVirtualNetwork -> this can be set to false to avoid deploying a new VNET
- virtualNetworkName -> This will be the target VNET name (existing) to be used by the builder
- subnetName -> This will be the target subnet name inside the VNET
// Virtual Network parameters
@description('Boolean to specify is the virtual network and the network security group needs to be deployed.')
param deployVirtualNetwork bool = true
@description('Image Builder Virtual Network name')
param virtualNetworkName string = '${imageBuilderName}-vnet'
@description('Image Builder subnet Network Security Group name')
param nsgName string = '${imageBuilderName}-nsg'
In the main.bicep there are parameters set with a default value and they can be customized for specific needs.
Please pay attention to the fact that the deployment of this Bicep file will update any resource already present in the Resource Group with the same name. For example, in case a Compute Gallery with the same name is present in the subscription, the Description will be updated with the new input parameters.
The target Resource group will contain the following resources after the completion of deployment:
The image build process can be triggered from Azure CLI or from Azure Portal:
- From the CLI:
az resource invoke-action \ --resource-group <RESOURCE_GROUP_NAME> \ --resource-type Microsoft.VirtualMachineImages/imageTemplates \ -n <IMAGE_BUILDER_NAME> \ --action Run
- From the Portal open the Image Builder and click "Start"
Another Resource Group (starting with "IT_") will be created in the Subscriptions during image build process.
This Resource Group contains the resources required by Azure Image Builder and they will result attached to the subnet defined for Azure Image Builder.
The Resource Group will contain a Load Balancer, a Private Link Service, a Proxy VM and a Build VM. No Public IP will be created.
At the end of each image build process, a new Image Version for the Image Definition will be added inside the Compute Gallery:
Using the custom image in an Azure CycleCloud cluster
The image ID is required to use the newly created image inside an Azure CycleCloud cluster. The ID can be obtained through the Azure Portal looking inside the Properties of the specific Image Version.
Alternatively, it can be retrieved from Azure CLI printing the JSON definition of the Image Version:
az sig image-version list --resource-group <RESOURCE_GROUP_NAME> --gallery-name <GALLERY_NAME> --gallery-image-definition <GALLERY_IMAGE_DEFINITION>
In the case of the current deployment, it will become:
az sig image-version list --resource-group <RESOURCE_GROUP_NAME> --gallery-name hpcgallery --gallery-image-definition ubuntu-hpc
The form of the ImageID is the following:
/subscriptions/<subscription_id>/resourceGroups/<resource_group_id>/providers/Microsoft.Compute/galleries/<gallery_name>/images/<image_definition_name>/versions/<version_number>
In Azure CycleCloud images are specified using the ImageID mentioned above.
The same operation can be performed in the UI from the "Advanced Settings" of the cluster.
For example, for a Slurm cluster the ImageID can be specified checking the "Custom Image" box for the node OS specification.
Particular attention should be reserved to the impact of an Image ID change in an Azure CycleCloud cluster. A change from the UI of the Image ID to a new version in an active cluster may lead to heterogeneous OS on the execution node array in case of autoscaling.
If it is critical for a workload to run on homogeneous OS, the Image Version should be changed only adopting proper strategies to minimize the impact.
Several options can be leveraged to manage an image version change in this case:
- Declaring a maintenance window in the cluster where no jobs can run on the nodes. This can be achieved in different ways depending on the scheduler. Once no jobs are running, all the execution nodes can be forcedly terminated to allow autoscaling to regenerate them
- Creating a new Azure CycleCloud cluster with updated image versions and progressively routing jobs to the new one. Once the old cluster will be drained, it can be turned off and eliminated
- Using schedulers features to mark all the execution nodes as offline, and progressively powering them off when jobs running on them are finished. Autoscaling will progressively provision new nodes with the updated image versions.
Automation with GitHub Actions
The Bicep template can be used to create a Continuous Delivery pipeline in GitHub using GitHub Workflow and GitHub Actions.
In this way every change to the image definition in GitHub version control system on main branch will trigger a deployment inside a target Resource Group of the update image template file and the execution of the image build process.
Application registration and permissions setup
The configuration of a Bicep deployment through GitHub Actions involves as a first step the creation of an Application registration in Azure AD.
The following command should be executed in an Azure CLI, with login performed for the user in the target tenant (az login) and with the correct subscription set active (az account set). An easy solution is again to use Azure Cloud Shell.
az ad sp create-for-rbac --name <APPLICATION_NAME_OF_CHOICE> --sdk-auth
This command will output a JSON file that should be copied and used in the subsequent steps.
The deployment agent will need to authenticate through the Service Principal created above to Azure.
GitHub Encrypted Secrets need to be defined in GitHub to allow the deployment agent to authenticate against Azure. Following the guide for encrypted secrets creation in a repository, three secrets should be defined:
- AZURE_CREDENTIALS -> This secret contains the JSON above
- AZURE_RG -> This secret contains the target resource group name for deployment
- AZURE_SUBSCRIPTION -> This secret contains the target subscription containing the resource group for deployment
The final result should look like the following:
The Application registration in AD should then be granted permissions to perform the required deployment operations.
For the purpose of Image Builder deployment, the following Custom Role can be created and assigned to the Application Service Principal scoped to the target Resource Group for the deployment:
{
"properties": {
"roleName": "Azure Image Template Contributor",
"description": "Allows to contribute to Azure Image Builder resources",
"assignableScopes": [
"/subscriptions/4b026ed5-a12a-4349-b2d1-870c7144e09d/resourceGroups/hpc-azure-image-builder"
],
"permissions": [
{
"actions": [
"Microsoft.VirtualMachineImages/imageTemplates/*",
"Microsoft.Resources/deployments/*",
"Microsoft.Compute/galleries/read",
"Microsoft.Compute/galleries/images/read",
"Microsoft.Compute/galleries/images/versions/read",
"Microsoft.Network/virtualNetworks/read",
"Microsoft.Compute/images/read",
"Microsoft.ManagedIdentity/userAssignedIdentities/assign/action",
"Microsoft.ManagedIdentity/userAssignedIdentities/read",
"Microsoft.Resources/subscriptions/resourceGroups/read"
],
"notActions": [],
"dataActions": [],
"notDataActions": []
}
]
}
}
This role grants to the GitHub Application Service Principal exclusively the minimal permissions (least privilege) required for Image Template deployment through Bicep.
GitHub workflow for deployment
Inside the repository there is the definition of a GitHub Workflow which realizes the deployment of the image-builder.bicep at every push on the main branch. In GitHub workflows are defined in the form of YML files. Every time the version control systems gets a new commit, the workflow is automatically executed
This workflow is triggered at every change on the main branch:
on:
push:
branches:
- main
And it performs three jobs in sequence:
- Deletes the existing image builder in the Resource Group
delete-image-builder:
uses: ./.github/workflows/delete-image-builder.yml
with:
image-builder-name: 'imageBuilder'
secrets: inherit
- Deploys the new image builder in the Resource Group
deploy-bicep:
needs: delete-image-builder
runs-on: ubuntu-latest
steps:
# Checkout code
- uses: actions/checkout@main
# Log into Azure
- uses: azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
# Deploy Bicep file
- name: Deploy Bicep resources
uses: azure/arm-deploy@v1
with:
subscriptionId: ${{ secrets.AZURE_SUBSCRIPTION }}
resourceGroupName: ${{ secrets.AZURE_RG }}
template: ./image-builder.bicep
parameters: 'imageBuilderName=imageBuilder destinationGalleryName=hpcgallery destinationImageName=ubuntuhpc'
failOnStdErr: false
- Triggers the image build process
run-builder:
needs: [deploy-bicep, delete-image-builder]
uses: ./.github/workflows/run-image-builder.yml
with:
image-builder-name: 'imageBuilder'
secrets: inherit
The delete-image-builder and run-image-builder jobs are defined calling in the repository two reusable workflows.
Thanks to this framework, every time a change is performed on the code base, the image template builder is recreated, and a new version of the image is built and saved in the Azure Compute Gallery.
Technically a user can perform changes and updates to their HPC Images directly from an IDE pushing to the GitHub repository, while GitHub Actions grant delivery of the images to the Azure Compute Gallery.
This approach can be further extended to support multiple image templates/environments:
-
- Defining separate templates for the image-builder in Bicep and realizing multiple GitHub workflow for their Continuous Delivery
- Defining multiple image environments, creating a production, a development and a QA branch in GitHub with related workflows
That’s all folks
If you've made it this far, congratulations, you already know the basics of automating your Azure HPC Images customization using Azure Image Builder and Bicep
#AzureHPCAI