Provision on-demand Spark clusters on Docker using Azure Batch's infrastructure

Since its release 3 years ago, Apache Spark has soared in popularity amongst Big Data users, but is also increasingly common in the HPC space. However, spinning up a Spark cluster, on-demand, can often be complicated and slow. Instead, Spark developers often share pre-existing clusters managed by their company's IT team. In these scenarios, Spark developers run their Spark applications on static clusters that are in constant flux between under-utilization and insufficient capacity. You're either out of capacity, or you're burning dollars on idle nodes.

I'm excited to announce our beta release of the Azure D istributed Data Engineering Toolkit - an open source python CLI tool that allows you to provision on-demand Spark clusters and submit Spark jobs directly from your CLI.

Read about it in the Azure blog.

0 Replies

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Provision on-demand Spark clusters on Docker using Azure Batch's infrastructure

Provision on-demand Spark clusters on Docker using Azure Batch's infrastructure