Scientific computing has long relied on high-performance computing (HPC) systems to accelerate scientific discovery. What constitutes an HPC system has continued to evolve. Access to computing keeps getting democratized and HPC is no longer limited to multi-billion dollar government laboratories and industries who can afford the infrastructure. Anyone with access to the Internet can now easily leverage the ubiquitous cloud for their computing task du jour! Azure natively supports HPC by providing hardware suitable for high performance computing needs together with software infrastructure to make it easy to harness these resources. In this post, we focus on one such Azure infrastructure component, Azure Batch, and see how we can be used to support a common use-case: data browser with interactive 3D visualization support.
Use-Case: the problem statement
Recently, a customer came to us with an interesting use-case. They wanted to provide their users with an interactive data browser. The datasets are HPC simulation and analysis results which can easily be several gigabytes in size. They wanted to present their users with a web app where users can browse the datasets and then select any of the datasets to interactively visualize it with some canned visualizations.
Variations of this use-case are a very common request in the scientific computing world so let's generalize (and perhaps simplify) the problem. We want to develop the following web application:
A few things to qualify the problem and help guide our design choices.
We want to a scalable solution. Of course, we can set all of this up on a workstation and expose that to the world wide web, however not only is that scary (for security reasons) but also not scalable. We want this to scale no matter how many users are accessing the portal at the same time.
The datasets are large and require processing before they can be visualized. Hence, we want a remote rendering capable system where the rendering can happen on remote computing resources, rather than the browser itself.
These requirements help us make the following design choices:
Azure Batch provides us with the ability to allocate (and free up) compute resources as and when needed. We can setup the web app to submit jobs on Azure Batch for visualizing datasets and then Batch can allocate those jobs to nodes in a node pool that can be setup to auto-scale using fancy rules, as needed. This frees us from having to do any management of the nodes in the pool such as setting them up, ensuring they have access to appropriate storage to read the datasets, etc. Batch takes care of that in addition to providing us with tools for monitoring, debugging and diagnosing issues.
For visualization and data processing, we use ParaView. Together with trame, ParaView makes it easy for us to develop a remote-rendering capable custom web applications that offer all the sophistication and flexibility available in the desktop app. Thus we can easily develop complex data analysis pipelines to satisfy the specific user requirements. trame enables us to access the visualization viewport through a web browser using web sockets.
Deploying the resources
One of the first steps when dealing with cloud computing is deploying the resources necessary on the Cloud. Infrastructure as Code (IaC) refers to the ability of deploying the resources needed and configuring them programmatically. As we go about building our HPC environment in the Azure Cloud, there are many ways to do it. We can use the Azure Portal to setup the system interactively. We can use Azure CLI to script the setup. We can also use domain-specific languages like Terraform or Bicep to define and deploy the infrastructure. For this post, we use Bicep which is a language for declaratively defining the Azure resources. For deploying the Bicep specifications and for other operations like populating datasets, we use Azure CLI.
All the resources needed for this demo can be deployed using the bicep code available in this Github repository. The readme goes over the prerequisites and the detailed steps to deploy all necessary resources. The project includes several different applications. The demo we cover in this post is named trame. Ensure that you pass enableTrame=true to the `az deployment sub create ....` command to deploy the web application.
Demo in action
Once the deployment is successful, follow the steps described here to upload datasets to the storage account deployed. Finally, you should be able to browse to the URL specific to your deployed web app and start visualizing your datasets! Here's a short video of the demo in action:
Demo: Cloud Dataset VIewer in action
Let's dive into the details on how this is put together. Of course, there's no one way to do this. Discussing the details of the resources and their configuration should help anyone trying to adapt a similar solution for their specific requirements.
Here's a schematic of the main Azure resources deployed in this demo.
Container Registry: Azure Container Registry is used to store container images. In this demo, we containerize both the main web app and the visualization application, vizer. It's not necessary to use containers, of course. Both App Service and Batch can work without containers, if needed. Containers just make it easier to setup the runtime environments for our demo.
Key Vault: Key Vault is generally used to store secrets and other private information. In this demo, we need the Key Vault for the Batch resource. Batch uses the Key Vault to store certificates etc. that is needs for setting up the compute nodes in the pools.
As we can see, it's fairly straight forward to get an interactive visualization portal setup using Azure and ParaView. For this demo, we tried to keep things simple and yet follow best practices when it comes to public access to resources in the cloud. Of course, for a production deployment one would want to add authentication to the web app, along with autoscaling for batch pool, add smarts for resource cleanup and fault tolerance to the web application, etc. One thing we have not covered in this post is how to use Azure's HPC SKUs and ParaView's distributed rendering capabilities and GPUs for processing massive datasets. We will explore that and more in subsequent posts.