First published on MSDN on Sep 12, 2017
Authored by Rangarajan Srirangam (AzureCAT) and Rakesh Patil (AzureCAT). Reviewed by Jamie Frey (University of Wisconsin), Brij Jashal (Tata Institute of Fundamental Sciences), Tony Wu (AzureCAT), Xavier Pillons (AzureCAT) and Suren Machiraju (AzureCAT). Edited by Anna Forse and RoAnn Corbisier.
Overview
The University of Wisconsin and the Microsoft Azure CAT team have collaborated to develop an Azure GAHP (Grid ASCII Helper Protocol) specification and an Azure GAHP server. Together they enable the HTCondor framework to use Azure for running high-throughput computing (HTC) workloads using Azure Virtual Machines and virtual machine scale sets. The Azure GAHP specification describes simple commands for creating compute resources on Azure using features of Azure infrastructure-as-a-service (IaaS) such as virtual machines, virtual networks, disks, and so on. There are also command options for invoking Azure platform-as-a-service (PaaS) options such as Azure Automation and Azure Key Vault for specific purposes. The Azure GAHP server is written in Python, and is open source, and available for download and use from the University of Wisconsin HTCondor page.
HTCondor
HTCondor is an open-source HTC (high-throughput computing) software framework. It is developed by the Center for High Throughput Computing at the University of Wisconsin-Madison. The HTCondor software provides a job queuing mechanism, scheduling mechanism, policy-based control, priority scheme, resource monitoring, and resource management. Jobs submitted to HTCondor are placed into a queue, scheduled for execution at a time and location controlled by a policy, monitored for progress, and their completion reported to the user.
GAHP
The compute resources used by HTCondor in executing jobs could be on-premises, part of an off-premises compute grid, or on a public cloud. When integrating with compute grids or clouds, HTCondor uses a GAHP server process to simplify the use of the relevant client library or SDK. The GAHP server, which runs as a separate process, is implemented as multi-threaded server facilitating nonblocking calls. HTCondor issues commands to the GAHP server process via standard input (STDIN) of the process and accepts responses via the standard output (STDOUT) of the process. The GAHP commands use a simple ASCII based command syntax and conform to a GAHP protocol that is designed to communicate with a particular job submission/execution system. There are therefore several GAHP protocols for interfacing with several grid/cloud systems.
Azure GAHP protocol
One of the GAHP protocols is the Azure GAHP protocol that specifies the commands that an Azure GAHP server must implement. First it allows for individual virtual machines to be created, deleted, and listed on Azure via the AZURE_VM_* commands. To allow for fast creation of hundreds of virtual machines should the need arise, the Azure GAHP protocol provides AZURE_VMSS_* commands that can use Azure virtual machine scale sets to create, delete, scale up, or scale down virtual machines in large numbers. There are options provided with Azure GAHP VM* and VMSS* commands to allow certain Azure-specific options such as virtual networks, data disks, IP addresses, and so on to be specified. The Azure GAHP protocol also allows the user to request an automatic clean up action that uses Azure Automation to delete running virtual machines or scale sets at a user-specified future time. Finally, the Azure GAHP Protocol provides a way for user secrets to be securely installed into scale sets during creation so that such secrets can be used to securely access external resources from within the virtual machines. This capability uses the Managed Service Identity (MSI) feature with scale sets and the Key Vault service.
Azure GAHP server
An Azure GAHP server is an implementation of the Azure GAHP protocol. Since GAHP does not mandate a particular programming language, platform, or technology, there can be multiple Azure GAHP implementations that conform to the Azure GAHP protocol but differ in implementation choices. Currently. There is a freely downloadable implementation of an Azure GAHP server, tested by the University of Wisconsin-Madison to work with HTCondor that can be downloaded from here. It is written in Python and runs as a separate process with a command line. It needs to be configured with a valid Azure subscription and credentials that are provided in a text file. It works as follows:
- HTCondor system sends GAHP commands such as “AZURE_VM_CREATE” to the STDIN of the Azure GAHP server with the requisite parameters.
- The Azure GAHP server uses a configuration file to read the Azure subscription and credential information.
- The Azure GAHP server then translates the GAHP commands into Azure API calls and command parameters to Azure API arguments using the Azure Python SDK libraries.
- These Azure API calls connect over the Internet to Azure (in a chosen Azure region) and virtual machines or scale sets are created, deleted, and so on depending on the command executed.
- While creating virtual machines or scale sets, the dependent Azure resources needed for running them such as virtual networks, subnets, NICs, IPs, and so on are also created based on parameters passed.
- The Azure GAHP server reports the results of each command execution on the STDOUT in response to the GAHP “RESULTS” command.
- The HTCondor system receives the results from the STDOUT of the Azure GAHP server.
This workflow is illustrated in the figure below.
The Azure GAHP server can optionally use the Key Vault to store secrets that need to be used by virtual machines to access resources while running. The virtual machines authenticate to the Key vault securely by using a combination of the specified command parameters and the managed service identity feature supported by scale sets. If an automatic resource deletion job is specified, then the Azure GAHP server creates a runbook in Azure Automation that is invoked upon a user-specified schedule to delete virtual machines or scale sets created by the user.
While the HTCondor system interacts via STDIN/STDOUT in an automated fashion, the GAHP server can also be manually tested via the command line. In the screen shot below the Azure GAHP server command line is shown with the result of the execution of the Azure GAHP “Version” command.
Installing and using the Azure GAHP server
To install and use the Azure GAHP server, you need to:
- Install the prerequisites including Python 3.5, PIP, and Azure SDK for Python.
- Create a Service Principal for the Azure GAHP server.
- Optionally complete the prerequisites for accessing Key Vault.
- Optionally complete the prerequisites for creating the automatic deletion job.
- Enter Azure credentials in the Azure GAHP server configuration file.
- Download the Azure GAHP server package.
- Run the GAHP server.
- Execute Azure GAHP commands.
Detailed instructions for completing the above steps successfully are provided in the “Azure GAHP Server Prerequisites and Installation” and “Azure GAHP Interface Commands" whitepapers. You can also download the Azure GAHP server package from here.
AzureCAT Guidance
"Hands-on solutions, with our heads in the Cloud!"