Have you ever wondered what it takes to have an enterprise level HPC environment in the Cloud? What components should be in place and what steps should be taken to move from an on-premises environment to a Cloud environment? And what are the best practices in this process? Everything starts with a Proof-of-Concept (PoC), in which an organization assesses how the key applications will perform in the Cloud, considering not only performance but also the costs involved. Once a decision is made, it is important to understand what it takes to have an enterprise level HPC Cloud environment.
Based on our experience with various clients, partners, and product groups, we have put together a comprehensive documentation on HPC lift and shift Cloud migration and this blog post gives an overview on what we cover in the document. Feedback is always welcome, as we will keep improving the documentation over time.
TL;DR
- We just made available a detailed documentation on HPC lift and shift cloud migration, containing components, steps, examples, and best practices. We also provide references for products, code repositories, and blog posts.
- Documentation can be accessed here: https://learn.microsoft.com/en-us/azure/high-performance-computing/lift-and-shift-overview
DOCUMENTATION OVERVIEW
Here we provide an overview of the documentation: LINK
On-premises. We start the document by describing what a typical on-premises HPC environment looks like, which includes compute nodes, job schedulers like SLURM, PBS, or LSF, identity management, storage options, and monitoring tools, all hosted within a private network.
Personas. After discussing the on-premises environment, we talk about the personas. From our experience, we observe a lot of discussion on what changes and what does not change for all people involved when moving from on-premises to the Cloud. We discuss their responsibilities and new tasks in an HPC Cloud setup, considering four personas:
- End-user (engineer / scientist / researcher)
- HPC administrator
- Cloud administrator
- Business manager / owner
HPC Cloud target architecture. The next discussion is an overview of the target HPC Cloud architecture, which highlights that there is not much change compared to an on-premises environment in terms of the conceptual components involved. One of the key differentiators is that resources are allocated on demand, allowing users to access more resources as needed.
Migration guide. After a brief discussion on exploring the Cloud environment through a Proof-of-Concept (PoC), we dive deep into the migration guide itself. We have broken the guide into five steps.
WHAT IS NEXT?
We will continue to improve and expand the documentation on this topic as new services, products, and learnings become available. The documentation is not targeted to cover all the possible deployments in the Cloud, but provide guidance based on patterns we observe in how customers use the Cloud to run their HPC workloads. If there is any subject on which more details are required, please send us a note!
LINK TO FULL DOCUMENTATION
https://learn.microsoft.com/en-us/azure/high-performance-computing/lift-and-shift-overview
#AzureHPCAI
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.