Co-authored by Matt McLoughlin, Roberto Lleras, Venkat Malladi and Vinu Gunasekaran from Microsoft Biomedical Platforms and Genomics team.
Cromwell is a workflow management system for scientific workflows. It orchestrates computing tasks needed for genomics analysis. Developed by the Broad Institute of MIT and Harvard, Cromwell is also used to run genome analysis pipelines such as the GATK Best Practices. Cromwell supports running scripts at various scales - local machines, local computing clusters and on the cloud.
Cromwell on Azure (CoA) configures all Azure resources needed to run workflows through Cromwell on the Azure cloud. It uses the GA4GH Task Execution Schema (TES) backend for submitting and managing tasks in Azure Batch.
Cost, performance and security are the top concerns for Cromwell users. TES v1.0 limited the task execution and backend parameters passed into the execution engine (Azure Batch). As a part of the GA4GH working group, Microsoft partnered with the Broad Institute to update the TES backend compute parameters to solve this limitation. The new TES v1.1 lights up several cost and performance optimization features.
Cromwell on Azure (CoA) 3.0 with TES v1.1 support enables customers to configure VM families and control runtime costs of tasks in a workflow. This provides the ability to configure virtual machines provisioned for the tasks in Azure Batch to optimize the workflow.
Users working on highly sensitive data sometimes prefer running workflows in a fully privatized environment. To address their security concerns, we added support for a fully private deployment (no public IP addresses for any Cromwell provisioned VM).
What’s next?
Specialized genomics tools like Illumina DRAGEN require specialized hardware such as FPGAs and GPUs. In the next release, we will add support for granular hardware controls and support for Illumina DRAGEN.
Meanwhile, try out CoA 3.0! Your feedback and suggestions will play a big role in helping us figure out what to build next.