SLURM, the Simple Linux Utility for Resource Management, is an open-source cluster resource management and job scheduler. This package contains also SlurmDBD (Slurm Database Daemon) that can be used to securely manage the accounting data for several Slurm clusters in a central location. Slurm can be configured to collect accounting information for every job and user. Accounting records can be written to a simple text file or a database. Information is available about both currently executing jobs and jobs which have already terminated. The sacct command can report resource usage for running or terminated jobs including individual tasks, which can be useful to detect load imbalance between the tasks.
From Azure cyclecloud 8.1.0 onwards Slurm template supports enabling SlurmDBD on Slurm 20.11+. This blog will give you the information about how to enable SLURM job accounting with Azure Cyclecloud and Azure MariaDB instance.
This blog post assumes that you have access to Azure Cyclecloud 8.2 and Azure Managed MariaDB Instance for setting up the Slurm cluster and SlurmDBD configuration.
If you don't, please refer the following Azure CycleCloud Documentation and Azure Database for MariaDB documentation
Here are the steps to integrate slurm job accounting in Azure Cyclecloud.
- First, you need to have a managed MariaDB database instance for job accounting as Cyclecloud expect a DB URL for job accounting. Slurm uses MariaDB for writing the job accounting information.
Please note, the "#" character is not permitted to use by Slurm DBD to access MariaDB, so make sure the MariaDB admin password does not contains "#".
2. Once the MariaDB instance is spun up, you have all the required information to fill in to enable job accounting feature in Cyclecloud portal.
- You need to update the VNET rules with the scheduler virtual network in the MariaDB connection security settings for accessing the MariaDB from the scheduler node (slurmdbd daemon will be running in scheduler node and it uses Managed Azure MariaDB as the Database).
- After setting up the MariaDB we could add the DB information in the Advanced Settings section of the Cyclecloud's Slurm cluster. Select “Job Accounting”, Enter the DB information and save and start the cluster.
- Once the cluster is up, run a sample job and check sacct to see the job accounting functionality.
You could pass many parameters to sacct to get the required accounting information from SlurmDBD.
Example – finding a start and end time of job in given time period.
You could also find out the job statistics for the specific user or a specific cluster. See saact documentation for more examples.
Granular Cost Control
Very important aspect in each organization is the ability to calculate the consumption cost in a more granular way, e.g. on the job level or per user, as the infrastructure is usually share between different users. This gives rise to estimating the internal spend for different teams, departments and also forecasting the expenses.
In order to provide that functionality, in addition to the accounting information described above that provides the job duration, SKU type for a specific user, you could leverage the Azure Pricing API to obtain information about the cost of a specific SKU in a region where the cluster is located. This can help you to build a custom parser to calculate the costs of the cluster usage in more granular way as indicated below.
An example query to get hourly pricing for HBv2 Spot VM in West Europe:
https://prices.azure.com/api/retail/prices?$filter=serviceName eq 'Virtual Machines' and meterName eq 'HB120-64rs v2 Low Priority' and armRegionName eq 'westeurope' and productName eq 'Virtual Machines HBSv2 Series'
Provides the following response:
"location": "EU West",
"meterName": "HB120-64rs v2 Low Priority",
"productName": "Virtual Machines HBSv2 Series",
"skuName": "Standard_HB120-64rs_v2 Low Priority",
"serviceName": "Virtual Machines",
"unitOfMeasure": "1 Hour",
Ideas for Parser to calculate the Per job cost :
We could get the Job-related information from sacct and price information from Azure pricing API.
Slurm Accounting - Job Elapsed time (Hr) & No.of Nodes
Azure Pricing API - Price of each Instance/Hr
Per Job cost = Job Elapsed time (Hr) x No.of Nodes x Price of each Instance/Hr (Normalize to minutes if needed)
You could create parser based on your ideas and the information collected from Slurm job accounting and Azure Pricing API.
You have successfully enabled Job accounting for SLURM with Azure Cyclecloud 8.2 and Azure MariaDB, learned couple of commands for reviewing the usage patterns and cluster utilization using sacct command. In combination with Azure Pricing API you could build a customized parser to calculate the cost per job, per user, per cluster and use it for more granular cost control within your organisation.
Technical contribution: Vinil Vadakkepurakkal, Łukasz Mirosław (Microsoft)