Blog Post

FastTrack for Azure
11 MIN READ

Secure Azure Machine Learning Service (AzureML) Environment

SamPanda's avatar
SamPanda
Icon for Microsoft rankMicrosoft
Feb 14, 2022

1         Objective

 

In this document, we are going to discuss the approaches that can be taken into consideration while securing the AzureML environment. All the steps are being referenced from docs.microsoft.com. This is one of the ways of securing the AzureML environment. There can be several other approaches of securing the AzureML environment that depend on the organization's requirement. The key objective of our implementation is to manage the incoming and outgoing network communication.

Details for the enterprise security and governance for Azure Machine Learning can be found in this link.

 

2         Architecture

 

3         Key Components

  1. Azure Machine Learning Service
    1. Azure Machine Learning workspace.
    2. Storage Account.
    3. Key vault
    4. Azure Container Registry
  2. Network Components
    1. VNET and Subnet
    2. Private Endpoints
    3. Azure Firewall
    4. Azure firewall Policy
    5. Route table.
    6. VPN gateway
    7. VPN
    8. Bastion
  3. Compute Resources
    1. Virtual machine
    2. Azure Machine Learning Compute Instance
  4. Development tools
    1. VS Code.

4         Requirements

  1. Azure Subscription.
  2. A resource group with the owner permission.

5         Implementation Steps

We are going to use the combination of Azure PowerShell and Azure portal to provision and configure the required resources. The steps from the Azure portal that are being performed can be automated, however our focus area is here to set up the secure the environment and understand the configuration steps.

5.1      Declare the required PowerShell variables

 

 

 

 

 

 

 

$tenantID=""
$subscriptionID=""
$resourceGroupName="SecuringAMLSDemo"
$location="westus"

 

 

 

 

 

 

 

5.2      Create the Virtual Network and Subnets

 

 

 

 

 

 

 

# Connect the Azure Subscription
Connect-AzAccount -Tenant $tenantID -Subscription $subscriptionID
# Create the Resource Group
New-AzResourceGroup -Name $resourceGroupName -Location $location
# Create the virtual Network and Subnet
$vnet = @{
    Name = 'myHub'
    ResourceGroupName = $resourceGroupName
    Location = $location
    AddressPrefix = '10.222.0.0/16'   
}
$virtualNetwork = New-AzVirtualNetwork @vnet
# Training subnet
$trainingsubnet = @{
    Name = 'training'
    VirtualNetwork = $virtualNetwork
    AddressPrefix = '10.222.0.0/24'
}
$subnetConfig = Add-AzVirtualNetworkSubnetConfig @trainingsubnet

# Scoring subnet
$scoringsubnet = @{
    Name = 'scoring'
    VirtualNetwork = $virtualNetwork
    AddressPrefix = '10.222.1.0/24'
}
$subnetConfig = Add-AzVirtualNetworkSubnetConfig @scoringsubnet
# AzureBastionSubnet
$AzureBastionSubnet = @{
    Name = 'AzureBastionSubnet'
    VirtualNetwork = $virtualNetwork
    AddressPrefix = '10.222.254.0/26'
}
$subnetConfig = Add-AzVirtualNetworkSubnetConfig @AzureBastionSubnet
# GatewaySubnet
$GatewaySubnet = @{
    Name = 'GatewaySubnet'
    VirtualNetwork = $virtualNetwork
    AddressPrefix = '10.222.250.0/24'
}
$subnetConfig = Add-AzVirtualNetworkSubnetConfig @GatewaySubnet
# AzureFirewallSubnet
$AzureFirewallSubnet = @{
    Name = 'AzureFirewallSubnet'
    VirtualNetwork = $virtualNetwork
    AddressPrefix = '10.222.252.0/26'
}
$subnetConfig = Add-AzVirtualNetworkSubnetConfig @AzureFirewallSubnet
# PrivateEndpointSubnet
$PrivateEndpointSubnet = @{
    Name = 'PrivateEndpointSubnet'
    VirtualNetwork = $virtualNetwork
    AddressPrefix = '10.222.2.0/24'
}
$subnetConfig = Add-AzVirtualNetworkSubnetConfig @PrivateEndpointSubnet
$virtualNetwork | Set-AzVirtualNetwork

 

 

 

 

 

 

 

 

 

5.3      Provision of the Azure Machine Learning Service

 

Below AzResourceGroupDeployment command will provision the below services part of AzureML provisioning.

 

  1. AzureML workspace.
  2. Blob Storage
  3. Azure Container Registry
  4. App Insight workspace.
  5. Key vault.
  6. Private link connection of the AzureML workspace.

 

 

 

 

 

 

 

New-AzResourceGroupDeployment `
  -Name "exampledeployment" `
  -ResourceGroupName $resourceGroupName `
  -TemplateUri "https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/quickstarts/microsoft.machinelearningservices/machine-learning-workspace-vnet/azuredeploy.json" `
  -workspaceName "secureamlsdemo" `
  -location $location `
  -containerRegistryOption "new" `
  -containerRegistrySku "Premium" `
  -vnetOption "existing" `
  -vnetName "myhub" `
  -addressPrefixes "10.222.0.0/16" `
  -subnetOption "existing" `
  -subnetName "PrivateEndpointSubnet" `
  -subnetPrefix "10.222.2.0/24" `
  -privateEndpointType "AutoApproval"

 

 

 

 

 

 

 

 

 

 

 

5.4      Create the private link endpoint for the other services

 

We are going to use the private link endpoints to bring the PaaS (platform as a service) services inside the private VNET that we have created in the earlier step.

5.4.1     PE (private endpoints) for storage account (blob subresource)

  • Select the storage account and select the networking => private endpoint connections.
  • Create a private endpoint connection.
  • Select the correct subscription and Resource Group name
  • Name of the private Endpoint storage-blob-private-endpoint
  • Location to be the “westus”, this is as per the variable that you have set in the earlier step.
  • Resource should be the storage account.
  • The target sub-resource should be “blob”.
  • In the networking pane, select the “PrivateEndpointSubnet” in the subnet section.
  • select “Dynamically allocated IP address”
  • select Yes for the option “integrate with private DNS zone”.
  • Select the correct resource group.
  • Review and create

 

5.4.2     PE for storage account (file sub-resource)

  • Select the storage account and select the networking => private endpoint connections.
  • Create a private endpoint connection.
  • Select the correct subscription and Resource Group name
  • Name of the private Endpoint storage-file-private-endpoint
  • Location to be the “westus”, this is as per the variable that you have set in the earlier step.
  • Resource should be the storage account.
  • Target sub-resource should be “file”.
  • In the networking pane, select the “PrivateEndpointSubnet” in the subnet section.
  • select “Dynamically allocated IP address”
  • select Yes for the option “integrate with private DNS zone”.
  • Select the correct resource group.
  • Review and create

5.4.3     PE for the azure key vault

  • Select key vault and go to networking=> private endpoint connections.
  • Create a private endpoint connection.
  • Select the correct subscription and Resource Group name
  • Name of the private Endpoint akv-private-endpoint
  • Location to be the “westus”, this is as per the variable that you have set in the earlier step.
  • Resource type Microsoft.KeyVault/vaults
  • Resource should be the < key vault name>.
  • Target sub-resource should be “vault”.
  • In the networking pane, select the “PrivateEndpointSubnet” in the subnet section.
  • select “Dynamically allocated IP address”
  • select Yes for the option “integrate with private DNS zone”.
  • Select the correct resource group.
  • Review and create

 

 

5.4.4     PE for Azure container registry

 

  • Select container registry and go to networking => private access.
  • Create private endpoint connection.
  • Select the correct subscription and Resource Group name
  • Name of the private Endpoint acr-private-endpoint
  • Location to be the “westus”, this is as per the variable that you have set in the earlier step.
  • Target sub-resource should be “registry”.
  • In the networking pane, select the “PrivateEndpointSubnet” in the subnet section.
  • select “Dynamically allocated IP address”
  • select Yes for the option “integrate with private DNS zone”.
  • Select the correct resource group.
  • Review and create.

IMPORTANT

  When the ACR is behind a VNET, AzureML uses the compute cluster to build the docker image. So we need to create a compute cluster as it is mentioned for the compute instance in 5.6 section later in this document. Please create the cluster in the training subnet with NO public IP.  More details here

 

Use the below script to set the image_build_compute parameter.

 

python

from azureml.core import Workspace
# Load workspace from an existing config file
ws = Workspace.from_config()
# Update the workspace to use an existing compute cluster
ws.update(image_build_compute = 'aml-cluster')
# To switch back to using ACR to build (if ACR is not in the VNet):
# ws.update(image_build_compute = '')

 

cli

 

az ml workspace update --name secureamlsdemo  --resource-group SecuringAMLSDemo --image-build-compute aml-cluster

5.5      Accessing the resources from the client machine.

 

Since all the resources that are created are under VNET, we won’t be able to access them from our local machine. There are two ways we can access resources. Azure bastion is the quicker way of connecting; however, it is not that cost-efficient as we need to rely on a virtual machine inside the VNET for our development work. Point to Site VPN is another approach that can be implemented to use our own computer as the development environment.

We get the following error message now if we try to access the AzureML workspace from our local machine. This is because the AzureML resources are denied the public internet inbound traffic.

 

 

 

5.5.1     Connecting to the secure AzureML environment using azure bastion.

In this step, we are going to create the Virtual machine inside our training VNET and connect the virtual machine using the Azure bastion. Azure bastion provides us with the public IP that becomes the intermediate interface to connect to the Virtual machine.

5.5.1.1       Create the bastion resource

 

  • Select the bastion resource from the marketplace.
  • Give the following details:

Name = mybastion

Region = West US

Tier = Standard

Instance Count = 2

Virtual network = myhub

Subnet = AzureBastionSubnet (no other name is allowed here).

Public IP address = create new

Public IP address name = myhub-ip

Public Ip address SKU = standard.

 

 

 

  • Review and create

5.5.1.2       Create a virtual machine

 

This virtual machine will work as a development machine that can be used to connect to the secure AzureML environment.

Here is the configuration for the Virtual machine. Please note that highlighted section in the screenshot below where we are not allowing the public internet traffic from the internet. We are going to spin up the VM in the training subnet with no public IP.

 

 

 

 

 

 

5.5.1.3       Testing Connectivity using the Azure Bastion

 

From the Virtual Machine resource, from the blade select the Bastion option, and provide the username and password. Once we connect to the Virtual Machine, if we do ipconfig, we will be able to see the private IP from the training subnet.

 

 

We can access the workspace now from the virtual machine.

 

 

5.5.2     Connecting to the secure AzureML environment by configuring the point-to-site VPN.

 

We have a detailed step mentioned in the documentation that can be followed to set up the environment

https://docs.microsoft.com/en-us/azure/vpn-gateway/vpn-gateway-howto-point-to-site-resource-manager-portal

here are some major steps that need to be followed.

  1. Generate certificates. Reference.
    1. Generate root certificates.
    2. Generate client certificates.
  2. Create a VPN gateway
  3. Upload the root certificate public key information to Azure.
  4. Install the client certificate.
  5. Configure the point to Site configuration page.

 

 

  1. Install the VPN client.
  2. Update the DNS information in your system host file (C:\Windows\System32\drivers\etc\hosts)

 

10.222.2.4 d6e2c17a-4d2d-42ac-b449-3920810b2775.workspace.westus.api.azureml.ms

10.222.2.4 d6e2c17a-4d2d-42ac-b449-3920810b2775.workspace.westus.cert.api.azureml.ms

10.222.2.5 ml-secureamlsdemo-westus-d6e2c17a-4d2d-42ac-b449-3920810b2775.westus.privatelink.notebooks.azure.net

10.222.2.6 sa5qtd45ryus6lq.blob.core.windows.net

10.222.2.7 sa5qtd45ryus6lq.file.core.windows.net

10.222.2.10 kv5qtd45ryus6lq.vault.azure.net

10.222.2.9 cr5qtd45ryus6lq.azurecr.io

** please update the private Ips and resource name correctly as per your environment.

 

5.6      Setting up the AzureML compute Instance with NO public IP

 

Before we run an AzureML experiment, we need to create the AzureML Compute Instance. From the AzureML workspace, select the compute option. We need to attach the AzureML compute with the “training” subnet. Also, check the “No Public IP.” No Public IP option is currently in preview. link. In case this option is not available in your region, and if you don’t want to use the preview feature, you can skip this step.

Please go through the later part of the document where we are going to discuss setting up the AzureML compute Instance with public IP.

 

 

 

If you receive the following error while creating the resource, please disable the 2 network policies from the subnet. Detail can be found in this link.

The specified subnet /subscriptions/2e/resourceGroups/SecuringAMLSDemo/providers/Microsoft.Network/virtualNetworks/myHub/subnets/training has PrivateLinkServiceNetworkPolicies or PrivateEndpointNetworkPolicies enabled. Please disable them to provision cluster/instance with no public IP. Please read this document for more details: https://aka.ms/AMLPLNetPolicies

 

 

 

 

 

 

 

 

 

$virtualSubnetName = "training"
$virtualNetwork= Get-AzVirtualNetwork   -Name "myhub"   -ResourceGroupName "SecuringAMLSDemo"  
($virtualNetwork | Select -ExpandProperty subnets | Where-Object  {$_.Name -eq $virtualSubnetName} ).privateLinkServiceNetworkPolicies = "Disabled"  
($virtualNetwork | Select -ExpandProperty subnets | Where-Object  {$_.Name -eq $virtualSubnetName} ).PrivateEndpointNetworkPolicies = "Disabled"

$virtualNetwork | Set-AzVirtualNetwork
$subnets=$virtualNetwork.Subnets
$selectedsubnet = $subnets| where {$_.Name -eq "training"}
$selectedsubnet

 

 

 

 

 

 

 

5.7      Setting up the Firewall and blocking Internet access.

 

As of now, the compute resources have public internet outbound connectivity. We would like to restrict the inbound and outbound traffic to our virtual network.

5.7.1     Create Azure Firewall Resource

We will create a firewall, firewall policy, and public IP resource to setup the firewall.

  • From the marketplace, select the firewall
  • Select the subscription and resourcegroup
  • Name = firewallsecureamls
  • Region = westus
  • Firewall Tier = Standard
  • Firewall management = Use a Firewall Policy to manage this firewall
  • Firewall policy = (add new) firewallpolicysecureamls
  • Choose virtual network = use existing
  • Virtual Network = myhub
  • Public IP address = pip-firewallsecureamls

 

 

  • Once the resources are created, please note the private IP of the firewall.

 

 

5.7.2     Create a Route table and configure the traffic rules.

 

We will create a route table and set a route, so that all the outbound traffic from the virtual network goes via the firewall.

  • Select the route table from the marketplace
  • Provide the correct resourcegroup and subscription details.
  • Region = West US
  • Name= routetablesecureamls
  • Propagate gateway routes=yes.

We are now going to set the route. All the traffic should go via the virtual appliance (i.e., Azure firewall)

 

 

We are going to map the training subnet now so that the rule is applicable only for the resources that are under training and scoring subnets.

 

 

Once the route is enabled, we won’t be able to access any site from the virtual machine or from the AzureML compute instance.

5.7.3     Configure inbound and outbound network traffic in firewall policies.

AzureML compute needs some specific application and network rules to be enabled to work. As per the documentation, we are going to create the application rules and network rules in the firewall policy.

Here are the outbound network rules created for the training and scoring subnets as per the documentation

 

Here are the application rules that are created for the training and scoring subnet as per the documentation.

 

 

 

Destination:

files.pythonhosted.org,mcr.microsoft.com,*.mcr.microsoft.com,graph.windows.net,anaconda.com,*.anaconda.com,*.anaconda.org,pypi.org,cloud.r-project.org,*pytorch.org,*.tensorflow.org,update.code.visualstudio.com,dc.applicationinsights.azure.com,dc.applicationinsights.microsoft.com,dc.services.visualstudio.com

 

5.8      Setting up the AzureML compute Instance with public IP.

While using the No public option is the best option to secure the AzureML environment, however, the feature is in preview right now link.  If the feature is not available in your region, please go ahead with the public IP. We might get the error below while creating the compute resources. To mitigate this, we need to add 2 inbound network rules in the user-defined routes. It is explained here.

 

Error Message:

The specified Azure ML Compute Instance compute-instance-no-pip encountered an unusable node. Please try to restart the compute instance to recover. If it failed at creation time, please delete and try to recreate the compute instance. If the problem persists, please follow up with Azure Support.

Warning: The following IP ranges or service tags are routed to a NetworkVirtualAppliance or a VirtualNetworkGateway. If the NetworkVirtualAppliance or the VirtualNetworkGateway do not re-route these IP ranges to Internet, that might cause a failure.  IP ranges: BatchNodeManagement=[13.86.218.192/27,13.91.55.167/32,13.91.88.93/32,13.91.107.154/32,13.93.206.144/32,40.82.255.64/27,40.112.254.235/32,40.118.208.127/32,104.40.69.159/32,168.62.4.114/32,191.239.18.3/32,191.239.21.73/32,191.239.40.217/32];AzureMachineLearning=[13.86.195.35/32,13.87.160.129/32,40.82.248.80/28,40.112.242.176/28,20.42.0.240/28,40.71.11.64/28,40.78.227.32/28,40.79.154.64/28,52.255.214.109/32,52.255.217.127/32].  For more information about inbound configuration, please refer to https://docs.microsoft.com/azure/machine-learning/how-to-access-azureml-behind-firewall?tabs=ipaddress#inbound-configuration

 

Adding the Inbound connection in the user defined routes for the service tags:

 

 

 

 

 

 

 

 

 

az network route-table route create -g securingamlsdemo --route-table-name routetablesecureamls -n AzureMLRoute --address-prefix AzureMachineLearning --next-hop-type Internet

az network route-table route create -g securingamlsdemo --route-table-name routetablesecureamls -n BatchRoute --address-prefix BatchNodeManagement.westus --next-hop-type Internet

 

 

 

 

 

 

 

 

6         Quick Test

To check if the AzureML environment is working fine or not, let’s run the AzureML Auto ML job.

We can work with the diabetes dataset. Store the data in a container in the storage account that you had created before while setting up the AzureML environment.

Create a dataset in AzureML studio from the diabetes data. And then create an auto ML classification experiment. Set the target column as “Diabetic”.

 

 

 

 

Hope this helps!

 

Updated Mar 29, 2022
Version 5.0
  • KMS2222's avatar
    KMS2222
    Copper Contributor

    Great article Sam, this is very dedtailed and helps to setup the demo.

    Just one quick thing for folks facing issue with the ARM template (json) they may end up with following error:

     

    Validation of network acls failure: SubnetsHaveNoServiceEndpointsConfigured:Subnets privateendpointsubnet of virtual network /subscriptions/XXXXXXX/resourceGroups/xxxx-aml-rg/providers/Microsoft.Network/virtualNetworks/myHub do not have ServiceEndpoints for Microsoft.Storage resources configured. Add Microsoft.Storage to subnet's ServiceEndpoints collection before trying to ACL Microsoft.Storage resources to these subnets.. Click here for details

     

    Basically that privateendpointsubnet subnets needs to have Microsoft.Storage serviceendpoint enabled for the whole deployment to work smoothly. What you could do is post running the Powershell for Vnet/Subnet creation (if you dont want to modify the powershell script itself) is manually go to the Vnet/Subent and add the Microsoft.Storage Service endpoint for that privateendpointsubnet subnet. post that the deployment will be successful.

     

    This can be done to the PS1 script also but in case if you dnt want to change script do it manually.

     

    Thank you Sam once again for putting detailed article.