Announcing registries in Azure Machine Learning to operationalize models and pipelines at scale

Microsoft

Oct 12, 2022

We are excited to announce the public preview of registries in Azure Machine Learning. Registries in Azure Machine Learning are organization wide repositories of machine learning assets such as models, environments, and components. Registries provide a central platform for cataloging and operationalizing machine learning models across various personas, teams and environments involved in the machine learning lifecycle. Registries foster better collaboration among data science teams by offering a central platform to share and discover machine learning models and pipelines.

As machine learning becomes more pervasive in modern apps, scaling model development to integrate with the application development tools and process becomes important. Modern model development flows involve a data science team developing models, machine learning engineers operationalizing model training and inference workflows, and the DevOps and IT teams operationalizing the models across test and production environments. Data science teams use Azure Machine Learning workspaces to iteratively develop and test models. However, these models and pipelines often need to be deployed outside the workspace in which they are developed because of security and compliance requirements, multi-region rollout, and budget management policies that require different subscriptions for development and production environments. The challenge with Azure Machine Learning workspaces today is that models or pipelines developed in one workspace cannot be used in a different workspace without manually copying them over. This not only makes multi-environment MLOps hard and fragile, but also creates barriers for teams to share the models developed by them with others in the organization. Moreover, since each team is working in their respective workspaces, a lot of potentially reusable machine learning models and artifacts go undiscovered. With the emergence of large models that require many days and expensive GPU clusters to train from scratch, it’s more prudent than ever to have a central catalog of pre-trained models across projects and teams in an organization.

Concept: Azure Machine Learning registries

Conceptually, registries are like shared workspaces. You can register machine learning assets such as models, components or environments in a registry the same way you’d do with a workspace. Creating them in a registry makes them available for use in any workspace within your organization (AAD tenant). You can also promote models that are currently registered in a workspace to a registry to share them with other workspaces. Registries can replicate assets in different regions so that workspaces spread across different regions have low latency access to the assets. Registries are Azure resources, so you can create them with ARM templates and use Azure role-based access control for managing access. Registries are the ideal solution for the following scenarios:

Model & Experimentation Hub: Host a central catalog of models and associated training components that can be used to retrain for finetune the models with different datasets.
Multi-region model deployments: Models are trained in one region but need to be deployed to inference endpoints in different geographical regions to meet the low latency requirements.
Model operationalization across environments: The data science team is developing models in one workspace, but either the models need to be deployed to a different Azure subscription or the whole training process needs to repeat with production data in a different Azure subscription.
Repeatable and Reproducible model training: Create training environments and components once and use them to run reproducible training jobs in any workspace. Azure Machine Learning runs training jobs in docker based environments. Each time you create a training pipeline with the corresponding components and environments in a different workspace, there can be subtle differences in the python libraries that are picked up based on your conda dependencies. This can lead to unpredictable bugs or unexplained differences in results, which can be avoided by hosting training environments and components in a single location.

Unlike workspaces that are specific or a team or project, registries are meant to serve multiple projects and teams. It is important to understand requirements and plan appropriately before creating registries:

What regions the registry needs to support depending on the set of current or future workspaces you may have? You can add regions later as well, but more regions mean more resources required for replication.
Which set of users will have access to create assets versus discover and use assets? Do you want to use existing identity groups encompassing all data science and engineering teams for read access to promote discovery and reuse of assets?

Review the documentation article for creating and managing registries to learn plan and create registries. Experience to create and use assets from registry is similar to the existing experience of creating and using assets from a workspace. Review the article to create and use assets from registries to try out an end-to-end tutorial build a training pipeline that can run in different workspaces, register the trained model in the registry and deploy the model to different workspaces. This article can help you accomplish model training and operationalization across environments as shown in the diagram below.

Operationalize model training and inference across dev-test-prod environments

This release of registries is enabling Azure Machine Learning users to share machine learning models, components and environments within their organization. As the first step to enabling sharing of ML asset across organizations, we are launching a “system” registry called “azureml” that will be available to all Azure Machine Learning users. This registry will initially host components from ResponsibleAI, but we plan to release more models and components in future.

Get started today