Azure recently announced support for NVIDIA’s T4 Tensor Core Graphics Processing Units (GPUs) which are ideal for deploying machine learning inferencing or analytical workloads in a cost-effective manner. With Apache Spark™ deployments tuned for NVIDIA GPUs, plus pre-installed libraries, Azure Synapse Analytics offers a simple way to leverage GPUs to power a variety of data processing and machine learning tasks. With built-in support for NVIDIA’s RAPIDS acceleration, the Azure Synapse version of GPU-accelerated Spark offers gains of 2x on standard analytical benchmarks compared to running on CPUs, all without any code changes. Additionally, for machine learning workloads Azure Synapse offers Microsoft's Hummingbird out-of-box which can leverage these GPUs to offer significant acceleration on traditional ML workloads.
Beginning today, this GPU acceleration feature in Azure Synapse is available for private preview by request.
GPUs offer extraordinarily low price-per-performance and high compute performance by speeding up multi-core servers for parallel processing. While a CPU consists of a few cores, optimized for sequential serial processing, a GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed to handle multiple tasks simultaneously. Considering that data scientists spend up to 80% of their time on data pre-processing, GPUs are an asset in one’s data processing pipelines compared to relying on pipelines containing CPUs alone.
The benefits of GPU acceleration in Apache Spark™ include:
NVIDIA and Azure Synapse have teamed up to bring GPU acceleration to data scientists and data engineers. This collaboration is primarily focused on integrating RAPIDS Accelerator for Apache Spark™ into Azure Synapse. This integration will allow customers to use NVIDIA GPUs for Apache Spark™ applications with no-code change and with an experience identical to a CPU cluster. In addition, this collaboration will continue to add support for the latest NVIDIA GPUs and networking products and provide continuous enhancements for big data customers who are looking to improve productivity and save costs with a single pipeline for data engineering, data preparation, and machine learning.
When asked about the collaboration and the importance of having GPUs in Azure Synapse, Scott McClellan, Senior Director, Data Science at NVIDIA said, “The synergy between Azure Synapse and NVIDIA is critical to democratize AI for citizen data scientists on Azure as businesses look to gain competitive advantage with advanced analytics, artificial intelligence (AI), and machine learning (ML). Azure Synapse is transforming siloed enterprise analytics into an integrated platform to accelerate time to insights across data warehouses and big data systems. The on-going collaboration will seamlessly integrate RAPIDS Accelerator for Apache Spark, accelerate the Azure Synapse platform, and fast track new feature development for Accelerated Data Engineering and Data Science applications.”
To learn more about this collaboration, check out our presentation at NVIDIA’s GTC 2021 Conference.
While Apache Spark™ provides GPU support out-of-box, configuring all the required hardware and installing all the low-level libraries can take significant effort. When you attempt to use GPU-enabled Apache Spark™ pools in Azure Synapse, you will immediately notice a surprisingly simple user experience:
Behind the scenes heavy lifting: To be able to run GPU libraries, hardware libraries like NVIDIA CUDA are required for communication with the graphics card on the host machine. Downloading and installing these libraries takes both time and effort. Through integration with Azure, Azure Synapse takes care of pre-installing these libraries and setting up all the complex networking amongst compute nodes to offer you GPU Apache Spark™ pools within just a few minutes so you can stop worrying about setup and focus instead on solving your business problems.
Optimized Spark configuration: By collaborating with NVIDIA, we have come up with optimal configurations for your GPU-enabled Apache Spark™ pools so your workloads run most optimally saving you both time and operational costs.
Packed with Data Prep and ML Libraries: The GPU-enabled Apache Spark™ pools in Azure Synapse come built-in with two popular libraries with support for more on the way:
When running NVIDIA Decision Support (NDS) test queries, derived from industry-known benchmarks, over 1 TB of Parquet data our early results indicate that GPUs can deliver up to 2x acceleration in overall query performance, without any code changes.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.