Since released from July last year, AlphaFold2 protein folding algorithm is often used by more researchers and companies to drive more innovations for molecular analysis, drug discovery & etc. To build an AlphaFold2 computing cluster rapidly on the cloud will be the necessary step to leverage agility of cloud computing without CAPEX ahead.
Azure HPC stack has complete portfolio suitable for running AlphaFold2 in large scale, including GPU, storage and orchestrator service. This blog brings detailed steps of building AlphaFold2 HPC cluster on Azure to fasten your process.
Architecture
Build Steps
sudo yum install epel-release python3 -y
sudo yum install aria2 -y
sudo yum remove moby-cli.x86_64 moby-containerd.x86_64 moby-engine.x86_64 moby-runc.x86_64 -y
sudo yum-config-manager --add-repo=https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install -y https://download.docker.com/linux/centos/7/x86_64/stable/Packages/containerd.io-1.4.13-3.1.el7.x86_64.rpm
sudo yum install docker-ce -y
sudo systemctl --now enable docker
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
sudo yum clean expire-cache
sudo yum install -y nvidia-docker2
sudo systemctl restart docker
sudo usermod -aGdocker $USER
newgrp docker
sudo su
cd /opt
git clone https://github.com/deepmind/alphafold.git
cd alphafold/
sed -i '/SHELL ["/bin/bash", "-c"]/a\RUN gpg --keyserver keyserver.ubuntu.com --recv A4B469963BF863CC && gpg --export --armor A4B469963BF863CC | apt-key add -' docker/Dockerfile
docker build -f docker/Dockerfile -t alphafold .
pip3 install -r docker/requirements.txt
Check the "docker images" to confirm the "alphafold:latest" is ready in the list.sudo waagent -deprovision+user
Back to Cloud Shell. Execute these commands to produce the custom image. export myVM=vmImgAlpha
export myImage=imgAlphaFold2
export myResourceGroup=rgAlphaFold
az vm deallocate --resource-group $myResourceGroup --name $myVM
az vm generalize --resource-group $myResourceGroup --name $myVM
az image create --resource-group $myResourceGroup --name $myImage --source $myVM --hyper-v-generation V2
After accomplished, find the image's "Resource ID" in console "Home->Images->Properties" page and remember it for further usage, which the form is as "/subscriptions/xxxx-xxxx-x…/resourceGroups/…/providers/Microsoft.Compute/images/imgAlphaFold2".mkdir /volprotein/AlphaFold2
mkdir /volprotein/AlphaFold2/input
mkdir /volprotein/AlphaFold2/result
sudo chmod +w /volprotein/AlphaFold2
/opt/alphafold/scripts/download_all_data.sh /volprotein/AlphaFold2/
#!/bin/bash
#SBATCH -o job%j.out
#SBATCH --job-name=AlphaFold
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --gres=gpu:1
INPUT_FILE=$1
WORKDIR=/opt/alphafold
INPUTDIR=/volprotein/AlphaFold2/input
OUTPUTDIR=/volprotein/AlphaFold2/result
DATABASEDIR=/volprotein/AlphaFold2/
sudo python3 $WORKDIR/docker/run_docker.py --fasta_paths=$INPUTDIR/$INPUT_FILE --output_dir=$OUTPUTDIR --max_template_date=2020-05-14 --data_dir=$DATABASEDIR --db_preset=reduced_dbs
sbatch run.sh input.fa
sbatch run.sh P05067.fasta
Reference links
deepmind/alphafold: Open source code for AlphaFold. (github.com)
Azure CycleCloud Documentation - Azure CycleCloud | Microsoft Docs
Azure NetApp Files documentation | Microsoft Docs
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.