Running Nextflow Data Pipelines on Azure Batch

Published 02-23-2021 06:00 AM 1,600 Views

We are delighted to announce today that the popular Nextflow workflow manager now supports running data pipelines seamlessly on Microsoft Azure via the Azure Batch service. This is a significant step forward for our customers, particularly those in Life Sciences, who are migrating genomics, machine learning, and other data pipeline or workflow oriented applications on to Azure.


Workflow managers such as Nextflow provide engineers and scientists an easy way of constructing multi-stage workloads with complex task relationships and data dependencies. Importantly, they also help abstract away the underlying task scheduler and execution platform, making users’ workloads easily portable.


The Azure-Nextflow integration was jointly developed by Microsoft and Seqera Labs and is released today in beta. Seqera Labs are the creators of the Nextflow open source project and the Nextflow Tower application stack for pipelines management. The Azure Batch integration is made available in both the Nextflow open-source project as well as the Nextflow Tower product for seamless cloud execution and hybrid cloud bursting.


With this integration, Nextflow users can now select Azure Batch as an executor and utilize Azure Blob Storage for storing data for their pipelines. Batch in turn autoscales the pools of compute nodes and schedules tasks to run on the nodes. Users only need to specify a base configuration such as the number of CPUs, the region, and the Storage Account that they wish to use in order to seamlessly execute their containerized pipelines on Azure.


"We are incredibly excited by this OSS contribution made by Microsoft to implement support for Azure Batch in Nextflow. This represents a major milestone for the project and provides the entire Nextflow community with a powerful and established cloud platform to deploy their pipelines." Paolo Di Tommaso - CTO and co-founder, Seqera Labs


Find out more about running Nextflow pipelines on Azure on the Nextflow blog. For more information or to apply to participate in the beta program, reach out to Seqera Labs at



About Azure Batch

Azure Batch provides cloud-scale job scheduling and compute management and is used to run large-scale parallel and high-performance computing (HPC) batch jobs efficiently in Azure. Azure Batch creates and manages a pool of compute nodes (virtual machines), installs the applications you want to run, and schedules jobs to run on the nodes. There's no cluster or job scheduler software to install, manage, or scale. Instead, you use Batch APIs and tools, command-line scripts, or the Azure portal to configure, manage, and monitor your jobs.


About Seqera Labs

Seqera Labs develops solutions to simplify complex data analysis pipelines. Our software enables developers and data scientists to create and securely deploy data applications in the cloud or on traditional on-premise infrastructure. The core open-source technology Nextflow transforms the building of massively scalable and distributed computing solutions. Seqera is now delivering results for customers across pharma, genomics and biotech, breaking with the status quo of closed-platforms and custom scripts and enabling them to embrace the future of distributed data analysis in the cloud.


@jermth great work pushing this through.  Its great to see Azure listed in the documentation for Nextflow..

A couple tips for anyone trying to run this:
1.  your Azure Batch account should be setup in Batch Pool Allocation mode, which carries a Batch quota separate from your subscription
2.  I could not download the correct edge version as shown in the Nextflow docs....I was able to install it using this variable instead:  `export NXF_VER=21.02.0-edge`

3.  A working `nextflow.config` sample file is here that includes an autoscaled Batch autopool


Thanks Jerrance! We'll add these to the Nextflow docs


Version history
Last update:
‎Feb 22 2021 12:57 PM
Updated by: