Enterprise Security and Compliance for Azure Machine Learning
Published Jun 13 2022 12:10 PM 4,826 Views
Microsoft

In the past 12 months, we in Azure Machine Learning team have added many enterprise features - Private Link, Managed Identity, built-in Data Scientist RBAC role - which help IT Admins configure secure ML environments for data science teams they manage. To make the management of workspaces and associated resources easier, we have added support for Terraform-based deployments, and for Workspace move across subscriptions.

 

Today we will review these features - and others - that were added based on the feedback we have received from several enterprise customers we’ve been working with closely. The features range from network security improvements to best practices that enterprise customers can use where multiple teams need to collaborate in a secure manner.

Multiple Private Endpoints

Azure Machine Learning workspace with Private Link used to accept only one Private Endpoint. We’ve added support for multiple Private Endpoints. With this feature, you can adopt several complex network scenarios such as 1) access from the different virtual networks, 2) AKS in the different virtual networks, 3) Private Endpoint access from other Azure services such as Synapse or Data Factory with Private Link -enabled workspace. Find details here.

Enable Storage IP Firewall to Access Data behind VNet

When you use the Azure Machine Learning Studio features such as AutoML or Designer with the storage behind virtual network, private link workspace used to be a requirement. With the new Storage IP Firewall feature, you can use the Studio features storage behind virtual network, and also from a public Workspace, if you allow-list your client IP address on the storage IP firewall. Find details here.

 

Network Security for Inferencing

rastala_0-1654726286273.png

 

Network security is important not just for training, but for inferencing as well. When using Managed Online Endpoints, you can improve security of the online endpoint with Private Endpoint. This feature helps with security of inbound scoring requests and outbound communications from the deployment to other Azure resources. To learn more about this feature, see Use network isolation with managed online endpoints (preview).

 

Azure Machine Learning has also added support for internal load balancer for Azure Kubernetes Service. This feature allows deployment of models on AKS cluster with private IP within Virtual Network. To learn more, see Create or attach an AKS cluster to use Internal Load Balancer with private IP.

Enterprise Best Practices

When planning an Azure Machine Learning deployment for an enterprise environment, there are many architectural decisions to make that affect how you set up Azure Machine Learning. Important aspects are security, integration with the existing IT landscape, budget and cost management, unlocking data access, auditing and monitoring, as well as enabling application teams with self-service capabilities to foster their productivity.

 

rastala_4-1655135389653.gif

 

To flatten the learning curve of getting started with Azure Machine Learning, we have collated best practices that reflect the experience and lessons learned from running machine learning teams internally at Microsoft and while partnering with our customers. Dependent on project maturity and data confidentiality levels, we learned that organizations need to configure Azure Machine Learning with different security configurations. To lower the path to getting started, we published deployment templates, for different maturity levels, that get you going in a few clicks in a secure development environment.

 

Treating data as a strategic asset, is a common goal for our customers. At the center of this, is a sound Analytics and AI strategy. Data Analytics and AI is critical for business growth. In fact, it’s been called the #1 investment for many business leaders. Enterprise-scale data management and analytics landing zones is a secure and scalable analytics framework designed to enable enterprises to build their data and analytics platform. It includes a prescriptive architecture and a documented end-to-end technical solution that includes Azure Machine Learning. This reference implementation can help get you started with an environment that is secure, scalable and compliant.

Identity and Access Controls

When deciding how to authenticate from Azure Machine Learning workspace against data storages or other Azure resources, managed identities provide a simple and flexible solution that reduces the need for explicit credential management. The benefits include added security and ability to compartmentalize access to training data sources. The managed identity support is now at general availability, for more details see: Use Managed Identities with Azure Machine Learning and Connect to storage by using identity-based data access.

When training on Compute Clusters, user identity of the job submitter can be used to authenticate against data storage. With the feature, it is possible to set up per-user, per-team or per-project permissions on training data. For instance, with Azure Data Lake Gen 2 storage with hierarchical namespace enabled allows setting up compartmentalized folders for training data that different users can access. For more details, see Access data for training jobs on compute clusters (preview)

Role-based access controls allow administrators to configure different permissions for different roles, such as workspace manager, data scientist or ML Ops engineer. While custom RBAC roles are a flexible mechanism to handle different use cases, Azure Machine Learning also provides a built-in Data Scientist role that gives permissions for common data science and machine learning tasks. See Manage roles in your workspace for more information on creating and using roles.

Compute Enterprise Features

No Public IP Compute Clusters and Instances

You can deploy a compute cluster or compute instance in your virtual network and configure NSGs or use user defined routing. It is also now possible to deploy a no public IP compute cluster and no public IP compute instance in your virtual network. No public IP compute does not rely on a public IP for communication with dependent resources. Instead, it communicates within the virtual network using Azure Private Link ecosystem as well as Service or Private Endpoints, reducing the need for a public IP. No public IP removes access and discoverability of compute nodes from the public internet, reducing a significant threat vector. No public IP compute also has no inbound communication requirements from public internet compared to those for public IP compute.

 

rastala_1-1654726373655.png

 

Compute Instance Auto-Start and Shutdown

This feature enables users to add auto-shutdown and auto-start based on schedule capability to compute instance. Users now have a streamlined way to control and optimize operating costs in an automated fashion, enhancing general governance of the Azure ML workspace. Users can define multiple schedules for auto-shutdown and auto-start. For instance, user can create a schedule to start at 9 AM and stop at 6 PM from Monday-Thursday and start at 9 AM and stop at 4 PM for Friday. Schedules can be created in users’ local time zone. Users get notification in AML Studio before their compute instance is shutdown. It is possible to create schedules through AML Studio and ARM templates. With ARM templates, cron and Logic Apps expressions are supported to define schedules. Users can create a schedule at compute instance creation time or later. Users can also modify existing schedules. For more details, please refer to this documentation.

Compute Instance Setup Script

Azure ML compute instance setup scripts provide an automated way for IT admins to customize and configure compute instance at creation time. Admins can write a setup script which can be used across all compute instances in the workspace, to provision compute instance as per the enterprise policies. This feature can be used to address automation of various user scenarios: creating custom JupyterLab kernels/conda environments, installing pip/conda/R packages, installing other software/tools, mounting data, git scenarios (cloning git repos, set git config, connections to private git repos), set network proxies, install Jupyter/JupyterLab extensions, set/export environment variables, setting multiple SSH keys among others. This feature enables automated deployment of a customized compute instance through ARM template.

Setup script can be provided in two ways: - 

  1. Shell file: Setup script and any dependencies can be uploaded to workspace file share and then can be accessed from ARM template or from UX while creating the compute instance.
  2. Inline: Shell command can be provided inline in the ARM template directly

Azure Machine Learning also support script with dependent files, script arguments, and has created environment variables which can be used in scripts. To learn more about this capability please refer to the documentation.

Azure Policies

It is possible to use Azure policies to enforce a default virtual network/subnet, setup script, and default start/shutdown schedule. The policies also help to disable SSH connection from public internet through an Azure policy. Virtual network and other security policy descriptions can be found at Azure Machine Learning policy reference.

Compute Cluster Quota

We have simplified the process for increasing Compute Cluster quota beyond 200. You can now request additional quota by creating an Azure support request

Audit Logs

Azure Machine Learning provides users a capability to log interactions with the data plane when data is read, edited or deleted using Azure ML services. The same audit data is made available via APIs and UX in a convenient and transparent manner to monitor transactions for audit purposes. Audit data can also be queried to draw insights on to the resource usage via familiar tools: LogAnalytics workspace and Kusto Query Language (KQL).

 

rastala_0-1655135057653.gif

 

The self-serve onboarding via Azure Monitor provides users with the same uniform experience on data plane audit logging that they get across Azure. The capability integrates with centralized Azure Policy allowing users to enable and control audit coverage for data access for different Azure resources. Refer to the following link for more information on the schema of different tables of this feature: Monitor Azure Machine Learning data reference - Azure Machine Learning | Microsoft Docs

Policies

Azure Policies can help IT admins enforce that Azure Machine Learning is deployed and used in secure and compliant manner across the organization. The policies can also be used to audit existing deployments. For example, policies let Azure subscription owners require that Azure Machine Learning workspaces use private link or private DNS zones, or that compute resources should be accessed using AAD authentication. Furthermore, custom policy definitions can be used to enforce different aspects of workspace deployment in a flexible manner. For details, see Audit and manage Azure Machine Learning.

Terraform Deployment

Terraform enables template-based configuration of Azure Machine Learning Workspace and its associated resources in a repeatable and predictable manner.  Terraform supports multiple different cloud vendors including Azure, meaning lower management complexity when managing infrastructure across clouds. In comparison with Azure Resource Manager templates, Terraform tracks state and is able to clean up and destroy resources. To get started, follow the Azure ML docs tutorial `Use a template to create a secure workspace` and samples on the Terraform QuickStart templates repo

Workspace Move

Workspace Move gives you flexibility to update and change the architecture of your Azure resources for machine learning. The feature supports moving the Azure Machine Learning Workspace resource and its data content between subscriptions and resource groups within same AAD Tenant and Azure region. For example, you can move an existing workspace from a dev & test Azure subscription, to a production subscription, or move the workspace to a subscription with more compute quota. To learn more about this feature, visit Move workspace between subscriptions (preview)

Co-Authors
Version history
Last update:
‎Jun 16 2022 07:55 AM
Updated by: