Sometimes, users will need to install some necessary software/packages in the environment before the task is executed. This can be easily done by using Start task feature of Azure Batch.
But when there are many dependencies to be installed, for example 20 packages in Linux and the installation of some of them will take long time such as Tensorflow, it will cause additional problems such as long start task running time when Azure Batch starts Batch node every time or even possible timeout issue or start task failure issue.
In order to avoid this issue, user has two options: Custom image and Container. Both of these two features are supported in Azure Batch. The way of using custom image is already explained here. This blog will mainly explain how to use the container feature.
One additional advantage is that for some special scenarios when the Batch task (application) has strong dependencies on a specific Operation System (OS) version, such as the Linux Ubuntu 18.04 which will reach EOL on Apr 30th 2023, what user needs to further consider is only the compatibility between the container OS and the host OS(selected when you create Batch pool) if he selects the container feature. And it will be easier for user to recreate/modify the environment itself as he only needs to modify the Dockerfile, but not to recreate a new Virtual Machine to capture a new custom image.
In this blog, the container image will be based on Linux Ubuntu 20.04. Assume that the Batch application needs a specific Python package called numpy as dependency. This blog will install this package based on a requirement.txt file, capture it into a container image, create a Batch pool based on this container image and verify the environment is good by running Batch task.
Note: This is just one simplest example. For real situation, user will need to modify the Dockerfile and include much more his own files when he creates the container image.
Note: Please pay attention that the Dockerfile should be without any extension name. The Dockerfile.txt will not work.
Docker folder files
Dockerfile:
# Use an official Ubuntu 20.04 runtime as a base image
FROM ubuntu:20.04
# Set the working directory to /app
WORKDIR /app
# Copy the current directory contents into the container at /app
ADD . /app
# Install pip3
RUN apt-get update
RUN apt-get -y install python3-pip
# Install numpy package
RUN pip3 install -r requirement.txt
# Make port 80 available to the world outside this container
# EXPOSE 80
requirement.txt:
requirement.txt context
Note: In rare situation, Batch task may need to wait until a request is sent from other client sides to Batch node. In this scenario, please kindly add the EXPOSE {port} line to export the port of the container to host machine. For more details about how to write Dockerfile, please check this official document from Docker.
Docker Desktop UI
docker build -t {container_name} .
The expected result is like the following one. (This step might take lots of time to finish depending on the network speed as Docker is downloading the container image as base.)
Build result of docker image
Container Registry Access Key page
docker login {registry_name}.azurecr.io -u {username} -p {password}
Docker login command result
docker tag {container_name} {registry_name}.azurecr.io/{repository}
docker push {registry_name}.azurecr.io/{repository}
container image upload result
Until here, the container is successfully created and uploaded into Azure Container Registry.
container registry page when creating Batch pool
Container registry list page
Batch pool configuration page
Batch task creation page
The expected output should be just “1”.
stdout result
This proved that our container is successfully created and the installation of python package is also good, otherwise there will be an error message reporting the module not found like following one.
error message if package is not installed
The real user scenario will be more complicated, but the principal way of creating the customized container will be the same. The example Dockerfile in this blog shows how to:
These two steps are also the main steps which user needs to customize.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.