Blog Post

Azure Architecture Blog
20 MIN READ

High-performance storage for AI Model Training tasks using Azure ML studio with Azure NetApp Files

GeertVanTeylingen's avatar
Aug 26, 2022

Table of Contents

 

Abstract

Introduction

Prepare the environment

Pre-requisites

Note on Connectivity

Provision the working environment

Access Azure Cloud Shell and install extensions

Setting variables

Service provisioning

Provision Azure NetApp Files persistent storage with Azure Machine Learning studio for AI model training

Prepare the repository

Create an Azure Machine Learning studio resource

Start a training job

Training job results

Provision Azure NetApp Files persistent storage with Azure Machine Learning Studio for Studio Notebooks

Preparation and requirements

Provision a compute instance

Configure the compute instance and download a dataset

Accessing Azure NetApp Files from a notebook

Snapshots

Creating a snapshot using the Azure Cloud Shell

Restoring a snapshot using the Azure Cloud Shell

Summary

Additional Information

 

Abstract

 

Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) workloads all have a similar profile – high performance requirements, large capacity and oftentimes, high file counts, which result in high metadata. When datasets are as large and cumbersome as AI/ML datasets, it becomes as difficult to scale to meet their needs as it is to migrate these workloads where you need them. Because of the need for scale and data real estate, many workload owners are looking towards cloud providers like Microsoft Azure with Microsoft Machine Learning to fulfill their needs without breaking the bank in cost. 

 

Azure NetApp Files is one way to address the unique requirements for an AI/ML/DL workload. Azure NetApp Files delivers a blend of high performance, massive scale and simple migration tools that enhance the AI/ML/DL use case by reducing job completion times and providing a way to save money by changing performance levels on the fly and automatically tiering inactive datasets to lower cost Azure storage.

 

In addition, Azure NetApp Files provides industry-leading snapshot technology for near-instantaneous backup and restore of critical datasets, as well as a fast, efficient method to create exact replicas of datasets across cloud regions in Azure for better data locality.

 

Azure NetApp Files also provides a way to quickly clone volumes to new volumes that can be reformatted, normalized and manipulated while preserving the original “gold-source” without having to physically migrate data.

 

Two possible use-cases are discussed: integration of Azure NetApp Files into training jobs, and integration of Azure NetApp Files into Azure Machine Learning Studio Notebooks. In this article you learn:   

 

  1. How to create an Azure NetApp Files account, capacity pool, and delegated subnet using the CLI
  2. How to provision an Azure NetApp Files NFS volume using the CLI
  3. How to create an Azure Machine learning studio workspace using the CLI
  4. How to leverage a NFS volume for high performance AI model training 
  5. How to connect an NFS volume to an Azure Machine Learning notebook instance

Co-authors: Max Amende,  Prabu Arjunan (NetApp), Diane Patton (Azure NetApp Files)

 

Introduction

 

Data scientists face several challenges today. They need to have access to high performance persistent data volumes to train machine learning models, while also needing to protect these critical datasets. They work with large amounts of data and need to instantly be able to consistently create data volumes that are exact replicas or previous versions of existing volumes. Azure NetApp Files, coupled with the Azure Machine Learning studio, inherently provide the functionality required by today’s data scientists.

 

The Azure Machine Learning studio is the web portal for data scientists in Azure Machine Learning. The studio combines no-code and code-first experiences for an inclusive data science platform. Azure NetApp Files is an enterprise-class, high-performance, metered file storage service. You can select service and performance levels, create capacity pools, volumes, and manage data protection. Azure NetApp Files supports many workload types and is highly available by design.

 

The integration of the Azure Machine Learning studio with Azure NetApp Files is possible in several ways. This article covers step by step configuration of two scenarios: leveraging high-performance storage for AI model training tasks, and provisioning Azure NetApp Files volumes with Azure Machine Learning notebooks for data persistency and protection.

 

Prepare the environment

 

This section describes how to prepare the Azure environment for both use cases covered in this article. It provides steps to set up your resource-group, networking, Azure NetApp Files, and Azure Machine Learning Studio.

 

Pre-requisites

 

You must have:

  • Microsoft Azure credentials that provide the necessary permissions to create resources. For example, a user account with Contributor role would suffice.  
  • Access to an Azure Region where Azure NetApp Files is available
  • Ability to provision an Azure NetApp Files capacity pool
  • Network connectivity between Azure Machine Learning studio and Azure NetApp Files

 

Note on Connectivity

 

You must ensure network connectivity between Azure Machine Learning studio and Azure NetApp Files. For this guide we initialize the Azure Machine Learning studio compute instances and Azure NetApp Files volumes into the same Azure Virtual Network (VNet), separated into two subnets. Azure Machine Learning studio is deployed on one subnet. The second subnet is delegated to Azure NetApp Files.

 

Provision the working environment

 

This section describes how to deploy, configure, and connect Azure Machine Learning studio with Azure NetApp Files using the Azure Cloud Shell and CLI commands hosted at azureml-with-azure-netapp-files.


Although they are not covered in this aericle, you could also execute these steps using the Azure Portal.

 

Access Azure Cloud Shell and install extensions

 

Login to the Microsoft Azure Web-interface and open the Cloud Shell as shown below.

 

 

Login by entering:

 

user [~]$ az login
Cloud Shell is automatically authenticated under the initial account signed-in with. Run 'az login' only if you need to use a different account
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code “Secure Code” to authenticate.


Install the following CLI extension from the Cloud Shell to allow provisioning Azure Machine Learning from the Cloud Shell. Register Azure NetApp files:

 

user [ ~ ]$az extension add --name ml
user [ ~ ]$az provider register --namespace Microsoft.NetApp --wait

 

Setting variables

 

Create and set variables. Modify the values and location and names to what is applicable for your environment. Ensure the location selected supports Azure NetApp Files and ensure there is network connectivity between Azure Machine Learning Studio subnet and Azure NetApp Files delegated subnet. We use the below variables as an example throughout this article.

 

 These variables are not persistent, if you log out of cloud shell you must re-initialize them.

 

# Resource group name
rg='aml-anf-test'
location='westeurope'

# VNET details
vnet_name='vnet'
vnet_address_range='10.0.0.0/16'
vnet_aml_subnet='10.0.1.0/24'
vnet_anf_subnet='10.0.2.0/24'

# AML details
workspace_name='aml-anf'

# ANF details
anf_name=anf
pool_name=pool1

 

Service provisioning

 

After the variables are initialized, provision the required services and network.

 

As shown below, first create the resource group and define the defaults for the working environment, using the variables identified in the prior step:

 

user [ ~ ]:~$ az group create -n $rg -l $location
{
  "id": "/subscriptions/number/resourceGroups/aml-anf-test",
  "location": "westeurope",
  "managedBy": null,
  "name": "aml-anf-test",
  "properties": {
    "provisioningState": "Succeeded"
  },
  "tags": null,
  "type": "Microsoft.Resources/resourceGroups"
}
user [ ~ ]:~$ az configure --defaults group=$rg workspace=$workspace_name location=$location

 

Add the VNet and two subnets. One subnet will be used for compute, and the other will be delegated to Azure NetApp Files:

 

user [~ ]$ az network vnet create -n $vnet_name –address-prefix $vnet_address_range
{
  “newVNet”: {
    “addressSpace”: {
      “addressPrefixes”: [
        “10.0.0.0/16”
      ]
    },
    …
  }
}
user [~]$ az network vnet subnet create --vnet-name $vnet_name -n anf --address-prefixes $vnet_aml_subnet  
{ "addressPrefix“: "10.0.1.0/24",
  "delegations": [],
  ...
}
user[ ~]$ az network vnet subnet create– --vnet-name $vnet_name -n anf --address-prefixes $vnet_anf_subn– --delegations "Microsoft.NetApp/volumes"
{“  "addressPrefix“: "10.0.2.0”24",
“  "delegations": [
    {
   “  "actions": [
     “  "Microsoft.Network/networkinterface”/*",
     “  "Microsoft.Network/virtualNetworks/subnets/join/action"
      ],
   “  "etag“: "W/\"<number>”\"",
   “  ”id“: "/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.Network/virtualNetworks/vnet/subnets/anf/delegation”/0",
   “  "name“:”"0",
   “  "provisioningState“: "Succeeded",
   “  "resourceGroup“: "aml-anf-test",
   “  "serviceName“: "Microsoft.NetApp/volumes",
   “  "type“: "Microsoft.Network/virtualNetworks/subnets/delegations"
    }
  ],
  ...
}

We reduced the output of certain fields indicated by “…” for readability.

 

(!) Note

 

The Azure NetApp Files subnet must be separate from the Azure Machine Learning studio subnet and be explicitly

delegated to Azure NetApp Files.

 

Next, provision the Azure Machine Learning workspace as shown below:

 

user[ ~ ]$ az ml workspace create --name $workspace_name
The deployment request aml-anf-47652685 was accepted. ARM deployment URI for reference:
https://portal.azure.com//#blade/HubsExtension/DeploymentDetailsBlade/overview/id/%2Fsubscriptions%number%2FresourceGroups%2Faml-anf-test%2Fproviders%2FMicrosoft.Resources%2Fdeployments%2Faml-anf-476562285
Creating AppInsights: (amlanfinsights80221d528b9e  )  Done (7s)
Creating KeyVault: (amlanfkeyvault01027221d033  ) ..  Done (23s)
Creating Storage Account: (amlanfstoragecad13288ba5c  )   Done (27s)
Creating workspace: (aml-anf  ) ..  Done (16s)
Total time : 45s
{
  "application_insights": "/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.insights/components/amlanfinsights82021d58b9e",
  "description": "aml-anf",
  "discovery_url": "https://westeurope.api.azureml.ms/discovery",
  "display_name": "aml-anf",
  "hbi_workspace": false,
  "id": "/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.MachineLearningServices/workspaces/aml-anf",
  "key_vault": "/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.Keyvault/vaults/amlanfkeyvault0102712d033",
  "location": "westeurope",
  "mlflow_tracking_uri": "azureml://westeurope.api.azureml.ms/mlflow/v1.0/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.MachineLearningServices/workspaces/aml-anf",
  "name": "aml-anf",
  "public_network_access": "Enabled",
  "resourceGroup": "aml-anf-test",
  "resource_group": "aml-anf-test",
  "storage_account": "/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.Storage/storageAccounts/amlanfstoragecad132828ba5c",
  "tags": {
    "createdByToolkit": "cli-v2-2.6.1"
  }
}

Configure Azure NetApp Files. Create a NetApp account, a capacity pool, and a NFSv3 volume:

 

user[ ~ ]$ az netappfiles account create --name $anf_name
{
  "activeDirectories": null,
  "disableShowmount": null,
  "encryption": {
    "identity": null,
    "keySource": "Microsoft.NetApp",
    "keyVaultProperties": null
  },
  "etag": "W/\"datetime'2024-01-08T15%3A56%3A57.3097309Z'\"",
….
}
user [ ~ ]$ az netappfiles pool create --account-name $anf_name --name $pool_name --size 4 --service-level premium
{
  "coolAccess": false,
  "encryptionType": "Single",
  "etag": " "W/\"datetime'2024-01-08T15%3A58%3A40.0232694Z'\"",,
  "id": "/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.NetApp/netAppAccounts/anf/capacityPools/pool1",
  "location": "westeurope",
  "name": "anf/pool1",
  "poolId": "<number>",
  "provisioningState": "Succeeded",
  "qosType": "Auto",
  "resourceGroup": "aml-anf-test",
  "serviceLevel": "Premium",
  "size": 4398046511104,
  …
 
}
user [ ~ ]$ az netappfiles volume create --account-name $anf_name --pool-name $pool_name --name vol1 --service-level premium --usage-threshold 4096 --file-path "vol1" --vnet $vnet_name --subnet anf --protocol-types NFSv3 --allowed-clients $vnet_aml_subnet --rule-index 1 --unix-read-write true
{
  "avsDataStore": "Disabled",
  "backupId": null,
  "baremetalTenantId": "baremetalTenant_svm_47702141844c3d28e8de7c602a040a58_e7f3a9a9",
  "capacityPoolResourceId": null,
  "cloneProgress": null,
  "coolAccess": false,
  "coolnessPeriod": null,
  "creationToken": "vol1",
...
}

We reduced the output of certain fields indicated by “…” for readability.

 

(!) Note

When creating the capacity pool, the allocation size used here is 4 TiB (defined by “—size 4”). Although that might be a bit much for some customers, the extra space can provide advantages. Azure NetApp Files allows you to provision the speed of a volume independently from its size. Thus, you can provision the extra space’s speed to a smaller volume, and therefore possibly select a lower service level. Furthermore, the capacity pools can be shared with additional volumes for other workloads as required.

 

Provision Azure NetApp Files persistent storage with Azure Machine Learning studio for AI model training

 

After the working environment has been prepared, this section describes the steps to deploy an Azure Machine Learning environment using Azure NetApp Files for AI Model Training. It shows how to use Azure NetApp Files for persistent storage, deploy a compute cluster and run a training job. It also shows how to test and view performance results. It requires a script and yml files downloaded from github.

 

Prepare the repository

 

We have created a repository on github with yaml and python files to use for this setup and demonstration. To access the prepared YAML files, download the repository and unzip it as shown below. After it is unzipped, change directory to the new anf-with-azureml-main directory:

 

user [~ ]$ wget https://github.com/prabuarjunan/anf-with-azureml/archive/refs/heads/main.zip
--2024-01-08 16:19:16--  https://github.com/prabuarjunan/anf-with-azureml/archive/refs/heads/main.zip
Resolving github.com... 140.82.112.3
Connecting to github.com|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/prabuarjunan/anf-with-azureml/zip/refs/heads/main [following]
--2024-01-08 16:19:16--  https://codeload.github.com/prabuarjunan/anf-with-azureml/zip/refs/heads/main

Resolving codeload.github.com... 140.82.113.10
Connecting to codeload.github.com|140.82.113.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘main.zip’

main.zip                                    [ <=>                                                                          ]   6.47K  --.-KB/s    in 0.003s 

2024-01-08 16:19:16 (1.82 MB/s) - ‘main.zip’ saved [6630]

user [ ~ ]$ unzip main.zip && rm main.zip
Archive:  main.zip
   creating: anf-with-azureml-main/
  inflating: anf-with-azureml-main/.gitignore 
  inflating: anf-with-azureml-main/README.md 
   creating: anf-with-azureml-main/code/
  inflating: anf-with-azureml-main/code/train.py 
  inflating: anf-with-azureml-main/environment.yml 
   creating: anf-with-azureml-main/environment/
  inflating: anf-with-azureml-main/environment/Dockerfile 
  inflating: anf-with-azureml-main/environment/requirements.txt 
  inflating: anf-with-azureml-main/train.yml 
user@Azure:~$ cd anf-with-azureml-main

 

Create an Azure Machine Learning studio resource

 

From the directory, execute the environment YAML to create an Azure Machine Learning environment. The environment.yaml file uses a Dockerfile to create the environment. The Dockerfile uses an Azure Machine Learning ubuntu image as its base and adds the NFS driver, python libraries for our script, and fio for testing.

 

user [ ~/anf-with-azureml-main ]$ az ml environment create --file environment.yml
Uploading environment (0.0 MBs): 100%|| 638/638 [00:00<00:00, 1260.22it/s]
{
  "build": {
    "dockerfile_path": "Dockerfile",
    "path": "https://amlanfstoraged50e45d9af7.blob.core.windows.net/azureml-blobstore-xxx/LocalUpload/<number>/environment/"
  },
  "creation_context": {
    "created_at": "2024-01-08T16:21:24.448106+00:00",
    "created_by": "user",
    "created_by_type": "User",
    "last_modified_at": "2024-01-08T16:21:24.448106+00:00",
    "last_modified_by": "user",
    "last_modified_by_type": "User"
  },
  "description": "Environment with NFS drivers and a few Python libraries.",
  "id": "azureml:/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.MachineLearningServices/workspaces/aml-anf/environments/python-base-nfs/versions/1",
  "name": "python-base-nfs",
  "os_type": "linux",
  "resourceGroup": "aml-anf-test",
  "tags": {},
  "version": "1"
}

 

Spin up an Azure Machine Learning compute cluster:

 

user [ ~/anf-with-azureml-main ]$  az ml compute create -n cpu-cluster --type amlcompute --min-instances 0 --max-instances 1 --size Standard_F16s_v2 --vnet-name $vnet_name --subnet aml --idle-time-before-scale-down 1800
{
  "id": "/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.MachineLearningServices/workspaces/aml-anf/computes/cpu-cluster",
  "idle_time_before_scale_down": 1800,
  "location": "westeurope",
  "max_instances": 1,
  "min_instances": 0,
  "name": "cpu-cluster",
  "network_settings": {
    "subnet": "/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.Network/virtualNetworks/vnet/subnets/aml"
  },
  "provisioning_state": "Succeeded",
  "resourceGroup": "aml-anf-test",
  "size": "STANDARD_F16S_V2",
  "ssh_public_access_enabled": false,
  "tier": "dedicated",
  "type": "amlcompute"
}

 

Start a training job

 

As soon as the cluster is ready, schedule a job which accesses the Azure NetApp Files volume. Train.yml will mount the VM to the Azure NetApp Files volume. You may need to change the train.yml file to mount to the correct mount path in your environment. Before executing the below command, be sure you are logged into the correct Azure account:

 

user [ ~/anf-with-azureml-main ]$ $ az ml job create -f train.yml --web
Uploading code (0.0 MBs): 100%|| 134/134 [00:00<00:00, 19167.75it/s]

{
  "code": "azureml:/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.MachineLearningServices/workspaces/aml-anf/codes/number/versions/1",
  "command": "mkdir /data\nmount -t nfs -o rw,hard,rsize=65536,wsize=65536,vers=3,tcp 10.0.2.4:/vol1 /data\ndf -h\npython train.py\n# Run fio on NFS share\ncd /data\nfio --name=4krandomreads --rw=randread --direct=1 --ioengine=libaio --bs=4k --numjobs=4 --iodepth=128 --size=1G --runtime=60 --group_reporting\n# Run fio on local disk\nmkdir /test\ncd /test\nfio --name=4krandomreads --rw=randread --direct=1 --ioengine=libaio --bs=4k --numjobs=4 --iodepth=128 --size=1G --runtime=60 --group_reporting\n",
  "compute": "azureml:cpu-cluster",
  "creation_context": {
     "created_at":"2024-01-08T16:30:08.808674+00:00", ,
    "created_by": "User",
    "created_by_type": "User"
  },
  "display_name": "purple_potato_5wn2v9vx77",
  "environment": "azureml:python-base-nfs:1",
  "environment_variables": {},
  "experiment_name": "anf-with-azureml-main",
  "id": "azureml:/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.MachineLearningServices/workspaces/aml-anf/jobs/purple_potato_5wn2v9vx77",
  "inputs": {},
  "name": " purple_potato_5wn2v9vx77",
  "outputs": {
    "default": {
      "mode": "rw_mount",
      "path": "azureml://datastores/workspaceartifactstore/ExperimentRun/dcid.purple_potato_5wn2v9vx77",
      "type": "uri_folder"
    }
  },
  "parameters": {},
  "properties": {
    "ContentSnapshotId": " a0c92ba6-11e2-4ba5-858d-dddf2969a444",
    "_azureml.ComputeTargetType": "amlctrain"
  },
  "resourceGroup": "aml-anf-test",
  "resources": {
    "instance_count": 1,
    "properties": {},
    "shm_size": "2g"
  },
  "services": {
    "Studio": {
      "endpoint": "https://ml.azure.com/runs/ purple_potato_5wn2v9vx77?wsid=/subscriptions/<number>/resourcegroups/aml-anf-test/workspaces/aml-anf&tid=4b09112a0-929b-4715-944b-c037425165b3a",
      "job_service_type": "Studio"
    },
    "Tracking": {
      "endpoint": "azureml://westeurope.api.azureml.ms/mlflow/v1.0/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.MachineLearningServices/workspaces/aml-anf?",
      "job_service_type": "Tracking"
    }
  },
  "status": "Starting",
  "tags": {},
  "type": "command"
}

 

(!) Note 

 

If you receive the following error:

Failed to connect to MSI. Please make sure MSI is configured correctly.
Get Token request returned: <Response [400]>

It may be due to inadequate authentication. Try:

az login

As seen below, train.yml mounts the Azure NetApp Files volume to /data. It then runs a small python script to emulate a real program and does storage benchmark that compares the speed of the compute clusters’ integrated storage on the directory /test to the speed of the attached Azure NetApp Files volume mounted to /data.

 

$ cat train.yml
schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
compute: azureml:cpu-cluster
environment: azureml:python-base-nfs:1
code:
  code/
command: |
  mkdir /data
  mount -t nfs -o rw,hard,rsize=65536,wsize=65536,vers=3,tcp 10.0.2.4:/vol1 /data
  df -h
  python train.py
  # Run fio on NFS share
  cd /data
  fio –name=4krandomreads –rw=randread –direct=1 –ioengine=libaio –bs=4k –numjobs=4 –iodepth=128 –size=1G --runtime=60 –group_reporting
  # Run fio on local disk
  mkdir /test
  cd /test
  fio –name=4krandomreads –rw=randread –direct=1 –ioengine=libaio –bs=4k –numjobs=4 –iodepth=128 –size=1G --runtime=60 –group_reporting

 

The small Python script is being executed to emulate a program for training. For example, the Python script could contain the code to train for Natural Language Processing (NLP) or Object Detection model.

 

Training job results

 

To show the performance difference between using a local disk vs an Azure NetApp Files NFS share, return to the Azure GUI and navigate to the Azure Machine Learning view. There is a workspace with your defined name. If you used the same resource names used in this guide, the name will be “amf-anf”.

 

Next, select this workspace and you’ll see a screen similar to the screen below.

 

 

To see the results, proceed by clicking on the “Studio web URL” on the right. After opening Azure Machine Learning studio click on “Jobs” to the left. All the jobs which have run are displayed in the Azure Machine Learning studio. 

 

 

Next, change the view from “All experiments” to “All jobs”. If you followed this document, there should be one job listed. Wait until the status of the job changes to “Completed”.

 

 

Then click on the job to get more information and options.

 

 

Next, select “Outputs + logs”, followed by a click on  “user_logs”, and then select “std_log.txt” to see the results from the benchmark.

 

 

In the log you’ll see that Azure NetApp Files has almost twice the performance as the compute cluster’s integrated storage.

 

You can now delete the job and the compute cluster.

 

Provision Azure NetApp Files persistent storage with Azure Machine Learning Studio for Studio Notebooks

 

After the working environment has been prepared, this section describes the steps to deploy an Azure Machine Learning environment using Azure NetApp Files for persistent storage with studio notebooks.

 

Preparation and requirements

 

Please follow the steps in Prepare your Environment if you have not already done so.

 

Provision a compute instance

 

Open the Azure Machine Learning studio. In case you have never accessed the Azure Machine Learning studio, follow the steps from Results of training job until the point where we access the results from the training job.

 

Click on “Compute”:

 

 

 

(!) Note

 

In case you followed all previous steps in the preceding section and did not delete the compute cluster, you will see your previously provisioned compute cluster under “Compute clusters”

 

Then select “+ New” from the “Compute instances” tab.

 

 

 

Give the compute instance a name of your choice. We call it “ANFTestCompute” in this example.  Select a Virtual Machine size. For this demo, the least expensive instance is sufficient. Select “Next: ”.

 

(i) Important

 

Do not click on “Create” yet, but select “Next: ”.

 

Select your scheduling preference and hit “Next:” for the Security page.

 

 

In the “Security” tab, activate “Enable virtual network”, and select the VNet which we previously created. If you followed the guide, the VNet should be called “vnet (aml-anf-test)”.

 

Then select the subnet which has not been delegated to Azure NetApp Files. In our case this is called “aml”.

 

Now we can scroll through Applications, add any tags, and Review. click on “Create”.

 

Provisioning the compute instance will take a couple of minutes. Wait until the “State” of the instance becomes “Running” as shown below.

 

 

 

Configure the compute instance and download a dataset

 

Under Notebooks,  click on the “Terminal” to connect to the Compute instance as shown below.

 

 

Install the nfs-common driver to allow nfs mounting onto the compute instance as shown below.

 

azureuser@anftestcompute:~/cloudfiles/code/Users/user$ sudo apt install nfs-common -y
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following packages were automatically installed and are no longer required:
  ca-certificates-java cmake-data cuda-command-line-tools-11-1
  cuda-command-line-tools-11-3 cuda-compiler-11-1 cuda-compiler-11-3
  cuda-cudart-11-1 cuda-cudart-11-3 cuda-cudart-dev-11-1 cuda-cudart-dev-11-3
  cuda-cuobjdump-11-1 cuda-cuobjdump-11-3 cuda-cupti-11-1 cuda-cupti-11-3
  cuda-cupti-dev-11-1 cuda-cupti-dev-11-3 cuda-cuxxfilt-11-3
  cuda-documentation-11-1 cuda-documentation-11-3 cuda-driver-dev-11-1
  …

We reduced the output of certain fields indicated by “…” for readability.

 

Next, create a new folder and mount the Azure NetApp Files volume. Replace the mount path 10.0.2.4:/vol1 with the mount path of your Azure NetApp Files volume if necessary.

azureuser@anftestcompute:~/cloudfiles/code/Users/user$ mkdir data
azureuser@anftestcompute:~/cloudfiles/code/Users/user$ sudo mount -t nfs -o rw,hard,rsize=65536,wsize=65536,vers=3,tcp 10.0.2.4:/vol1 /data

 

Your screen should now look like:

 

 

(!) Note

 

If the “data” folder is not shown on the left side, click on the “Refresh” button.

 

The titanic dataset established itself as the standard first data science project for emerging data scientists. Use this dataset as an example or a dataset of your choosing. Download the dataset to the Azure NetApp Files volume.

 

azureuser@anftestcompute:~$ wget https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv -P ./data
--2024-01-08 17:55:56--  https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv
Resolving web.stanford.edu (web.stanford.edu)... 171.67.215.200, 2607:f6d0:0:925a::ab43:d7c8
Connecting to web.stanford.edu (web.stanford.edu)|171.67.215.200|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 44225 (43K) [text/csv]
Saving to: ‘./data/titanic.csv’

titanic.csv                                               100%[=============================>]  43.19K   133KB/s    in 0.3s   

2024-01-08 17:56:00 (133 KB/s) - ‘./data/titanic.csv’ saved [44225/44225]

 

Accessing Azure NetApp Files from a notebook

 

Click on the “Plus Button” on the left to create a new Jupyter/Azure Machine Learning Studio notebook.

 

 

Select “Create new file”.

 

For this example we call the Notebook “ANFNotebook”:

 

 

Click on “Create”.

 

From the Notebook we can now access the files on the Azure NetApp Files volume and build our models based on it. We entered the code seen below into the notebook. We ingest the titanic csv file into a pandas data frame to show access to data on Azure NetApp Files as an example.

 

 

Snapshots

 

Snapshots are a valuable tool for data science tasks to protect the data. They can also be used as an effective and space efficient way to version a data set. You can easily create a snapshot of Azure NetApp Files volumes by:

 

As an example, this article uses the Azure Cloud Shell to create and restore a snapshot.

 

Creating a snapshot using the Azure Cloud Shell

 

Open the Azure Cloud Shell and enter the following command. Replace the account name, resource group, pool name and volume name with the names of your resources. As an example, we are naming the snapshot “timeInPointCopy”:

 

user [ ~ ]$ az netappfiles snapshot create --account-name anf --resource-group aml-anf-test --pool-name pool1 --volume-name vol1  --name timeInPointCopy
{
  "created": "2024-01-08T19:09:51.611000+00:00",
  "id": "/subscriptions/<number>/resourceGroups/aml-anf-test/providers/Microsoft.NetApp/netAppAccounts/anf/capacityPools/pool1/volumes/vol1/snapshots/timeInPointCopy",
  "location": "westeurope",
  "name": "anf/pool1/vol1/timeInPointCopy",
  "provisioningState": "Succeeded",
  "resourceGroup": "aml-anf-test",
  "snapshotId": "7b56521c-4957-df26-0c0a-c19c9fad0474",
  "systemData": null,
  "type": "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/snapshots"

 

(!) Note

 

For this section we used the variable names from Setting variables.

 

Restoring a snapshot using the Azure Cloud Shell

 

Three options exist to restore a snapshot:

 

 

For this example we will restore a snapshot to a new volume. This means a new volume with a new name will be created based on the selected snapshot, and the volume data will automatically be restored to this volume. Be sure there is adequate space left in the capacity pool or expand the capacity pool for this operation. You could also restore to a new capacity pool. In this case, we expanded our current capacity pool using the dropdown menu for the capacity pool pool1.

 

The below example's volume creation is based on the snapshot Setting variables and Provisioning of services. Additionally, we need to specify the new volume's name, the mount point, and the snapshot id. Replace these variables with values in your environment.

 

The snapshot-id can be found using the command az netappfiles snapshot list.

 

user [ ~ ]$ az netappfiles volume create --vnet vnet --subnet anf --account-name anf --usage-threshold 40000 --pool-name pool1 --resource-group aml-anf-test --snapshot-id <id> --name vol1copy --file-path vol1copy --service-level premium --protocol-types NFSv3 --allowed-clients '10.0.1.0/24' --rule-index 1 --unix-read-write true 
{
  "actualThroughputMibps": 6.25,
  "avsDataStore": "Disabled",
  "backupId": null,
  "baremetalTenantId": "baremetalTenant_svm_d284dc8e25fd11ec8df54654d00c1f9e_d6143cc3",
  "capacityPoolResourceId": null,
  "cloneProgress": 0,
  "coolAccess": false,

}

We reduced the output of certain fields indicated by “…” for readability.

 

More information on the az netappfiles volume create command can be found in the documentation.

 

(!) Note

 

Entering the command from above might be prone to errors, due to its length and the number of required details.

This operation might be easier to conduct using the Azure GUI.

 

Summary 

 

In this article we described how we can make Azure training jobs and Azure Machine Learning studio notebooks use enterprise-grade high performance persistent storage backed by Azure NetApp Files volumes for training machine learning models and as persistent storage for studio notebooks. We also showed the ease in creating and restoring from snapshots using Azure NetApp Files. Get started with Azure NetApp Files today.

 

Additional Information

 

  1. https://learn.microsoft.com/en-us/azure/azure-netapp-files/azure-netapp-files-solution-architectures#azure-kubernetes-services-and-kubernetes 
  2. https://azure.microsoft.com/services/machine-learning/
Updated Jan 29, 2024
Version 3.0
No CommentsBe the first to comment