As many customers are looking at running their HPC workloads in the cloud, onboarding effort and cost are key consideration. As an HPC administrator, in such process you try to provide a unified user experience with a minimal disruption, in which the end users and the cluster administrators can retrieve most of their on-premises environment while leveraging the power of running in the cloud.
The Specialized Workloads for Industry and Mission team that works on some of the most complex HPC customer and partner scenarios has built a solution accelerator Azure HPC OnDemand Platform (aka az-hop) available in the Azure/az-hop public GitHub repository to help our HPC customers onboard faster. az-hop delivers a complete HPC cluster solution ready for users to run applications, which is easy to deploy and manage for HPC administrators. az-hop leverages the various Azure building blocks and can be used as-is, or easily customized and extended to meet any uncovered requirements.
Based on our experience, from years of customer engagements, we have identified some common principles that are important to our customers and designed az-hop with these in mind:
The HPC end-user workflow typically comprises of 3 steps –
Step |
Details |
Key Features needed |
Prepare Model |
In this step, the user would get the data to be used by the application. |
Fast data transfer and a home directory where they can upload their data, scripts etc. |
Run Job |
Using their shell session or UI user would submit their job providing details on the slot type and number of nodes they would need for running the job. |
Auto-scale compute, scheduler, scratch storage. |
Analyze results |
Once the job is finished, the user can visualize their results. |
Interactive desktop |
The below diagram depicts the components needed in a typical on-premise environment to support this workflow.
The default az-hop environment supports the above workflow with the following architecture, all accessed from the OnDemand portal for unified access and only with HTTPS for end users and SSH/HTTPS for administrators.
The unified experience is provided by the Open OnDemand web portal from the Ohio Supercomputer Center. Listed below are some of the features that the current az-hop environment supports but you can see the releases as we add more features:
The whole solution is defined in a single configuration file and deployed with Terraform. Ansible playbooks are used to apply the configuration settings and application packages installation. Packer is used to build the two main custom images for compute nodes and for remote visualization, published into an Azure Shared Image gallery.
The instructions to deploy your az-hop environment are available from this page. The az-hop GitHub comes with some example tutorials to demonstrate how you can integrate and run your applications in the az-hop environment and you can follow them here to give it a test drive or just simply run your own.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.