Azure Batch
14 TopicsInstall Python on a windows node using a start task with Azure Batch
It is common that customers contact the Azure Batch Team to provide instructions on how to install Python using the start task feature. I would like to provide the steps to perform this task in case that someone needs to work in a similar case.13KViews0likes5CommentsConfigure remote access to compute nodes in an Azure Batch pool using Azure Portal
If configured, you can allow a node user with network connectivity to connect externally to a compute node in a Batch pool. For example, a user can connect by Remote Desktop (RDP) on port 3389 to a compute node in a Windows pool. Similarly, by default, a user can connect by Secure Shell (SSH) to port 22 to a compute node in a Linux pool. As of API version 2024-07-01 (and all pools created after 30 November 2025 regardless of API version), Batch no longer automatically maps common remote access ports for SSH and RDP. If you wish to allow remote access to your Batch compute nodes with pools created with API version 2024-07-01 or later (and after 30 November 2025), then you must manually configure the pool endpoint configuration to enable such access. In your environment, you might need to enable, restrict, or disable external access settings or any other ports you wish on the Batch pool. You can modify these settings by using the Batch APIs to set the PoolEndpointConfiguration property. While creating the pool using Azure Portal, you need to create network address translation (NAT) pools and network security group (NSG) rule for configuring pool endpoint. Click on the Inbound NAT pool under the virtual network section. You can refer to the snippet below as a reference: A window like the screenshot below will open to create NAT pool and NSG rule: You can either click on +Add or use the default option given to add NAT pool for RDP/SSH from the template. This will open a new window to create the inbound NAT pool like the below snippet: Complete the required fields as demonstrated in the screenshot above. For the backend port, enter 22 for SSH or 3389 for the Windows pool. Next, click on Network Security Group Rules. This action will open a window for creating NSG rules, as illustrated below: Under the Access field, select Allow and assign a priority. In the Source Address Prefix field, you can specify the IP address or IP range for which you want to enable remote desktop access. If you wish to allow access from all addresses, enter *. Afterward, click Select. This action will return you to the previous page for creating the NAT pool. Verify all the details, then click OK and then click on select. This process will add the necessary NAT pool and NSG rules to enable RDP access and configure the pool endpoint. Once completed, navigate to the node and click Connect. The IP address of the node will be displayed, which can be used to establish a remote desktop connection.377Views1like1CommentHow to create multiple tasks under a job in Job Scheduler
In this article you will see detailed procedure of how to create jobs with tasks in a Job scheduler. You can follow below doc on how to schedule jobs in a job scheduler. Schedule Batch jobs for efficiency - Azure Batch | Microsoft Learn While creating a job scheduler, you will see Job Specification section. In this section, you would need to update Job configuration task to add jobs. You will further see below settings. To add tasks to the Job in the job scheduler, you would need to add Job manager task. A job manager task contains the information that is necessary to create the required tasks for a job. A job manager task is required for jobs that are created by a job schedule, because it is the only way to define the tasks before the job is instantiated. You can further change other settings if required. Once the job scheduler is created, you will see a job with name job-1 is created under the job scheduler. Inside, job-1 you can see a task is created. This task is nothing but the job manager task. With this you will be able to create only one job with only one task (job manager task) in a single job scheduler. In some scenarios, you may need to add more tasks to a job in a Job scheduler. For this scenario, you can follow the below approach. To create more tasks under the job that job scheduler generates, it is expected for user to write some command line calling Task-Add API in the job manager task like below. /bin/sh -c 'az batch task create --job-id $(printenv AZ_BATCH_JOB_ID) --account-endpoint "https://batchname.xxx.batch.azure.com" --account-key "ACCOUNT_KEY" --account-name "BATCH_ACCOUNT_NAME" --task-id "TASK_ID" --command-line "TASK_COMMAND_LINE"' Example: /bin/sh -c 'az batch task create --job-id $(printenv AZ_BATCH_JOB_ID) --account-endpoint "https://batchtest.westeurope.batch.azure.com" --account-key " YdFUnnV65NlcfzXf+T48YCeGo/Z2ZMqIyiQxRrgMxxxxxxxxxxx” --account-name "batchtest" --task-id "task2" --command-line "/bin/sh -c echo ‘Hello' "' Also while creating Job scheduler, make sure to define the following properties. 1. Kill Job on completion: This property is true by default. If true, when a job manager task is completed, the Batch service marks the job as completed. If false, the completion of the Job Manager task does not affect the job status. In this case, the user needs to terminate the job explicitly. This is to make sure that if the Job Manager creates a set of tasks but then takes no further role in their execution. 2. Specify When all tasks complete to TerminateJob to explicitly. With above, you will achieve two tasks under a job in job scheduler. One is job manager task(task1) and the other is the task(task2) created from command line of job manager task. Note : A job scheduler can have at most one active job under it at any given time. So, if it is time to create a new job under a job schedule, but the previous job is still running, the Batch service will not create the new job until the previous job finishes. If the previous job does not finish within the startWindow period of the new recurrence interval, then no new job will be scheduled for that interval.326Views1like0CommentsBulk delete all the old jobs from the batch account
Deleting a Job also deletes all Tasks that are part of that Job, and all Job statistics. This also overrides the retention period for Task data; that is, if the Job contains Tasks which are still retained on Compute Nodes, the Batch service deletes those Tasks' working directories and all their contents. When a Delete Job request is received, the Batch service sets the Job to the deleting state. All update operations on a Job that is in deleting state will fail with status code 409 (Conflict), with additional information indicating that the Job is being deleted. We can use below PowerShell command to delete one single job using job id: Remove-AzBatchJob -Id "Job-000001" -BatchContext $Context But if you have a large number of jobs and wants to delete them simultaneously, then you can refer below PowerShell command for the same: # Replace with your Azure Batch account name, resource group, and subscription ID $accountName = "yourBatchAccountName" $resourceGroupName = "yourResourceGroupName" # Authenticate to Azure Connect-AzAccount # Get the Batch account context $batchContext = Get-AzBatchAccount -Name $accountName -ResourceGroupName $resourceGroupName # Get all batch jobs with creation time before May 2024 # Replace the creation time date accordingly $jobsForDelete = Get-AzBatchJob -BatchContext $batchContext | Where-Object {$_.CreationTime -lt "2024-05-01"} # List the jobs Write-Host "Jobs to be deleted:" foreach ($job in $jobsForDelete) { Write-Host $job.Id # Write-Host "Deleting jobs..." Remove-AzBatchJob -Id $job.Id -BatchContext $batchContext -Force } The above script will delete all the jobs created before the creation date. You can accordingly modify the parameters as per your requirement.1.5KViews0likes0CommentsSSL/TLS connection issue troubleshooting guide
You may experience exceptions or errors when establishing TLS connections with Azure services. Exceptions are vary dramatically depending on the client and server types. A typical ones such as "Could not create SSL/TLS secure channel." "SSL Handshake Failed", etc. In this article we will discuss common causes of TLS related issue and troubleshooting steps.38KViews9likes1CommentHow to build customized container image for Azure Batch
Sometimes, users will need to install some necessary software/packages in the environment before the task is executed. This can be easily done by using Start task feature of Azure Batch. But when there are many dependencies to be installed, for example 20 packages in Linux and the installation of some of them will take long time such as Tensorflow, it will cause additional problems such as long start task running time when Azure Batch starts Batch node every time or even possible timeout issue or start task failure issue. In order to avoid this issue, user has two options: Custom image and Container. Both of these two features are supported in Azure Batch. The way of using custom image is already explained here. This blog will mainly explain how to use the container feature. One additional advantage is that for some special scenarios when the Batch task (application) has strong dependencies on a specific Operation System (OS) version, such as the Linux Ubuntu 18.04 which will reach EOL on Apr 30 th 2023, what user needs to further consider is only the compatibility between the container OS and the host OS(selected when you create Batch pool) if he selects the container feature. And it will be easier for user to recreate/modify the environment itself as he only needs to modify the Dockerfile, but not to recreate a new Virtual Machine to capture a new custom image. In this blog, the container image will be based on Linux Ubuntu 20.04. Assume that the Batch application needs a specific Python package called numpy as dependency. This blog will install this package based on a requirement.txt file, capture it into a container image, create a Batch pool based on this container image and verify the environment is good by running Batch task. Note: This is just one simplest example. For real situation, user will need to modify the Dockerfile and include much more his own files when he creates the container image. Pre-requisites: An Azure Batch account An Azure Container Registry The commands and/or the packages of installing dependencies Docker Desktop on Windows (The example used in this blog will all be with Windows OS, including the computer where we create image, the container image OS and the host OS. For Linux system, please kindly change the configuration.) Step 1: Create the own container image In order to create a new container image, user needs to prepare the following files: The Docker File Other dependency components which will be needed (For example the requirement.txt in this scenario) Note: Please pay attention that the Dockerfile should be without any extension name. The Dockerfile.txt will not work. Dockerfile: # Use an official Ubuntu 20.04 runtime as a base image FROM ubuntu:20.04 # Set the working directory to /app WORKDIR /app # Copy the current directory contents into the container at /app ADD . /app # Install pip3 RUN apt-get update RUN apt-get -y install python3-pip # Install numpy package RUN pip3 install -r requirement.txt # Make port 80 available to the world outside this container # EXPOSE 80 requirement.txt: Note: In rare situation, Batch task may need to wait until a request is sent from other client sides to Batch node. In this scenario, please kindly add the EXPOSE {port} line to export the port of the container to host machine. For more details about how to write Dockerfile, please check this official document from Docker. Once the files are prepared, user can start creating the container image. Open Docker Desktop and make sure to login Right click the tray icon and select Switch to Linux containers. Open Powershell in the folder containing Dockerfile and following command: docker build -t {container_name} . The expected result is like the following one. (This step might take lots of time to finish depending on the network speed as Docker is downloading the container image as base.) Find the username and password from Azure Container Registry and use the following command to login: docker login {registry_name}.azurecr.io -u {username} -p {password} Use the following commands to upload the container from local Docker to Azure Container Registry. docker tag {container_name} {registry_name}.azurecr.io/{repository} docker push {registry_name}.azurecr.io/{repository} Until here, the container is successfully created and uploaded into Azure Container Registry. Step 2: Use the container image to run Batch task User can then create a new pool and configure the container related setting in this node. The other configuration can be set as usual. Once the node is idle, please add a new job with a new task and link it to this new Batch pool. While creating the new task, please remember to select the correct Image. The following example uses a command line to let container python environment import numpy package, set variable a with value 1, then print out the value of variable a. The expected output should be just “1”. This proved that our container is successfully created and the installation of python package is also good, otherwise there will be an error message reporting the module not found like following one. Conclusion The real user scenario will be more complicated, but the principal way of creating the customized container will be the same. The example Dockerfile in this blog shows how to: Run command to install dependencies Copy the necessary dependency files into Docker image These two steps are also the main steps which user needs to customize.7.3KViews1like4CommentsCreate Azure Batch Pool with PowerShell
The aim of this blog is to show you how to create a Batch pool using PowerShell command with several features enabled: Scenario 1: Add Batch pool to the Azure Virtual Network with or without public IP address Scenario 2: Add application package to the Batch pool Scenario 3: Add certificate to the Batch pool Scenario 4: Add user account to the Batch pool Scenario 5: Add start task to the Batch pool Scenario 6: Mounting the file system to the Batch pool Scenario 7: Given the image reference (built-in image and custom image) to the Batch pool6.7KViews4likes1CommentHow to detect and delete unusable nodes in Batch Service automatically by PowerShell Script
In some circumstances, Azure Batch might set the node state to unusable due to different reasons. When a node is unusable, tasks can't be scheduled to the node, but it still charges. If it’s not needed to find the root cause of nodes becoming unusable, we can locate these unusable batch nodes and delete them timely to avoid the unexpected cost. This article provides a PowerShell script that can detect all unusable nodes in a batch account/pool and delete them. Please download the script sample from this link. Some Tips: 1.If the auto-scale feature is enabled in a batch pool, it’s needed to disable the auto-scale before removing the nodes. 2.It’s needed to make sure the batch pool is in steady status before enabling auto-scale. Otherwise, there could be some issues. For example, the auto-scale can be enabled without any errors or warnings but it won’t work as expected, because the actual samples of metric data can’t be obtained. 3.When an unusable node is deleted, it would be very hard to troubleshoot the root cause of nodes becoming unusable. If the root cause is needed, you need to disable the auto-scale so that the problematic node can be reserved. 4.This PowerShell script can be configured in an Azure Function App with Timer triggers to detect and delete the unusable nodes periodically. Reference links: 1.Details of unusable batch node: Node in unusable state. 2.Install the latest Batch PowerShell module: Batch PowerShell module.4.5KViews1like0CommentsAzure Batch Pool resizing failed - Allocation failed due to marketplace purchase eligibilty
For Azure Batch users, when you tried to resize your Batch pool, but it failed with error message of Allocation failed due to marketplace purchase eligibility . You can follow the below steps to check if it’s the same scenario with this blog.4.2KViews1like0Comments