Oct 04 2017
02:49 PM
- last edited on
Jul 31 2018
12:05 PM
by
TechCommunityAP
Oct 04 2017
02:49 PM
- last edited on
Jul 31 2018
12:05 PM
by
TechCommunityAP
Since its release 3 years ago, Apache Spark has soared in popularity amongst Big Data users, but is also increasingly common in the HPC space. However, spinning up a Spark cluster, on-demand, can often be complicated and slow. Instead, Spark developers often share pre-existing clusters managed by their company's IT team. In these scenarios, Spark developers run their Spark applications on static clusters that are in constant flux between under-utilization and insufficient capacity. You're either out of capacity, or you're burning dollars on idle nodes.
I'm excited to announce our beta release of the Azure Distributed Data Engineering Toolkit - an open source python CLI tool that allows you to provision on-demand Spark clusters and submit Spark jobs directly from your CLI.
Read about it in the Azure blog.