Provision on-demand Spark clusters on Docker using Azure Batch's infrastructure

Community Manager

Since its release 3 years ago, Apache Spark has soared in popularity amongst Big Data users, but is also increasingly common in the HPC space. However, spinning up a Spark cluster, on-demand, can often be complicated and slow. Instead, Spark developers often share pre-existing clusters managed by their company's IT team. In these scenarios, Spark developers run their Spark applications on static clusters that are in constant flux between under-utilization and insufficient capacity. You're either out of capacity, or you're burning dollars on idle nodes.


I'm excited to announce our beta release of the Azure Distributed Data Engineering Toolkit - an open source python CLI tool that allows you to provision on-demand Spark clusters and submit Spark jobs directly from your CLI.




Read about it in the Azure blog


0 Replies