Updated CLI and ARM REST APIs for Azure Machine Learning
Published Dec 08 2021 02:03 PM 3,568 Views
Microsoft

Overview

 

The Azure Machine Learning team is excited to announce the public preview refresh of the Azure Machine Learning (AML) CLI v2. This refresh builds on our CLI public preview at build, and enables many exciting additions to the CLI v2.

 

Azure Machine Learning currently exposes most of its functionality through the Python SDK. The previous version of AML Command Line Interface (CLI) and REST APIs were limited in functionality. The machine learning lifecycle involves handoff between data scientists and ML engineers (like deployment pros and data engineers). The data scientists are involved in model creation and are usually experts in Python. The ML engineers do not create models but are involved in providing data, deploying the workloads to production etc. They are not necessarily experts in Python. The python heavy Azure ML with lack of a good CLI/REST, made adoption harder for data engineers involved in the ML lifecycle and for data scientists who did not favor python.

 

To address this issue, in the revised CLI v2 and REST API, Azure ML uses YAML to describe all assets and resources. Actions, including management of these assets and resources are possible using simple command lines (CLI v2) or the REST API. Users can use the CLI or the REST API to:

  • Manage AML resources – workspace, compute, datastores
  • Manage AML assets - Datasets, environments, models
  • Run standalone jobs locally to develop/test and then move them to the cloud
  • Run a series of jobs in a pipeline (New)
  • Infer on trained models with Managed Online inferencing or Batch Inferencing
  • Create and use reusable components in pipelines (New)

To improve usability even further, VS code Azure ML extension has increased support for the CLI (v2). The consistent YAML representation of all assets and resources enables git-ops as well as sharing scenarios. The REST APIs can also be used via ARM templates. These features collectively enable a simplified experience for all team members through easy transition from local work to cloud work, One-Click deployment of samples, tooling support, etc. All this without any dependency on a specific programming language (say Python).

 

How it works

 

Some definitions

To start with let us define a few terms:

  • Resources, are platform level capabilities needed to run machine learning on Azure ML. These include the AML workspace which holds everything within it, the compute on which to run tasks and datastores which are pointers to where data is stored.
  • Assets, are artifacts consumed or produced by the jobs themselves. These include the datasets, environments and models.
  • Job, is a task which is run on a desired compute – it has definitions for what to run, how to run it and where to run it as well as what inputs are consumed and what outputs are produced.
  • Pipeline, is a collection of jobs which are run in a particular order based on dependencies/connections between the jobs – each job in a pipeline can be run on a different compute.

We will look at a scenario where we start with a job, run it on local machine, move it to the cloud, and then stitch together a series of jobs into a pipeline.

 

Run a Job locally

To start with, let us run a job on the local machine since we want to develop and do some basic testing before using up cloud resources. A job is defined using a YAML file. Let us examine the YAML:

 

 

 

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command:  python train.py --data ${{inputs.the_data}} --save_model ${{outputs.the_model}}
environment: 
  image: pytorch/pytorch
compute: local
code: ./src
inputs:
  the_data: 
      type: uri_folder
      path: ./data
outputs:
  the_model: 
    mode: upload

 

 

 

 

The YAML defines that we need to run the command python train.py on the local machine using a pytorch image. The job uses inputs which are in the local folder called data. This job can be run using the CLI command line

 

 

 

az ml job create -f local-job.yml

 

 

 

 

With a few lines in a YAML and one command line, we were able to run a job locally.

 

Run a job on the cloud

Now that our job runs as expected on the local machine, let us make this job run on the cloud. To do this let us look at the revised YAML:

 

 

 

$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
command: python train.py --data ${{inputs.the_data}} --save_model ${{outputs.the_model}}
environment: 
  image: pytorch/pytorch
compute: azureml:cpu-cluster
code: ./src
inputs:
  the_data: 
      type: uri_folder 
      path: ./data
outputs:
  the_model: 
    mode: upload

 

 

 

 

The only change in the YAML is that the compute is now pointing to a resource called cpu-cluster in the AML workspace. The job uses data from the same local folder which gets uploaded to the cloud for execution. This job can be run using the CLI command line

 

 

 

az ml job create -f cloud-job.yml

 

 

 

 

 

Using data from the cloud

In the above example, the data is uploaded from the local machine. However, in real life scenarios, local data will not scale. To use data from the cloud, the YAML can be modified to point to a cloud storage. Given below are a few examples:

 

 

 

the_data: 
    type: uri_folder #location is a folder (a blob store in this example)
    path: azureml://datastores/workspaceblobstore/paths/my-data/ #path of the folder
the_data: 
    type: uri_folder #location is a folder (a https location in this example)
    path: https://mainstorage.blob.core.windows.net/example-data/

 

 

 

 

Use a curated environment from the cloud

The environment can also be picked from the curated environments available in the AML workspace. For e.g.:

 

 

 

environment: azureml:AzureML-Minimal@latest

 

 

 

 

With a few changes in a YAML file, we have been able to move a job from local machine to the cloud.

 

Run a series of jobs in a pipeline

Now that we have a job running on the cloud, let us examine how to stitch together jobs into a pipeline. Here is the YAML which combines 2 jobs prepare and train into a pipeline. The prepare job uses data from a dataset in the workspace and outputs data into processed_data. This data is then used in the train step and a model is created.

 

 

 

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

jobs:
  prepare:
    command: python prep.py --raw_data ${{inputs.raw_data}} --prep_data ${{outputs.prep_data}}
    code: ./src
    environment: azureml:AzureML-minimal@latest
    compute: azureml:cpu-cluster
    inputs:
      raw_data: 
          type: uri_file 
          path: azureml://datastores/workspaceblobstore/paths/data/my_raw_data/
    outputs:
      processed_data: 
        mode: upload
 
  train:
    command: python train.py --data ${{inputs.the_data}} --save_model ${{outputs.the_model}}
    code: ./src
    environment: 
      image: docker.io/pytorch/pytorch
    compute: azureml:cpu-cluster
    inputs: 
      the_data: ${{jobs.prepare.outputs.processed_data}}
    outputs:
      the_model: 
        mode: upload

 

 

 

 

Since a pipeline is also a job, it can be in the same way using the CLI.

 

 

 

az ml job create -f 2step-pipeline.yml

 

 

 

 

The YAML of the individual jobs inside the pipeline is very similar to the YAML of a single job itself. The only changes are using the output of one job as input to another. With very few changes, multiple standalone jobs can be stitched into a pipeline. This reduces the number of steps needed for training by ~70%. Another point to note is the amount of code required to create and run a pipeline using YAML. The YAML format reduces lines of code by ~90% for pipelines as compared to Python scripts – leading to more productivity for developers

 

Run a pipeline with components

Let us look at how we can create components which can be used in a pipeline. Let us look at the training job in the above example. Let us say that this job is to be used in multiple instances by many people across your team. Instead of sharing the YAML definition of the job for team members to copy and use, you could create a reusable component which can then be registered into a workspace and easily referenced.

A reusable component is similar to a (python) function. It defines what it will take in and give out. The logic itself is hidden or not required for the consumer. Let us look at the YAML definition of a component.

 

 

 

$schema: https://azuremlschemas.azureedge.net/latest/commandComponent.schema.json
name: my-training-component
command: python train.py --data ${{inputs.the_data}} --save_model ${{outputs.the_model}}
code: ./train_src
environment: 
  image: docker.io/pytorch/pytorch
inputs:
  training_data: 
    type: uri_folder
outputs:
  the_data:
    type: uri_folder
  the_model:
    type: uri_folder

 

 

 

 

The YAML is similar to the job to train but has certain differences. Instead of specifying the exact inputs and outputs, only the name and type of the inputs and outputs are defined. The compute is not defined within the component.

This component can be registered in the AML workspace using the CLI as shown below. This will register the component with the name my-training-component

 

 

 

az ml component create -f mytraincomponent.yml

 

 

 

 

Registered components can be used in a pipeline using azureml:component-name. Users can also provide a YAML definition of a component to be used in a pipeline. Now let us run our pipeline to use components instead of jobs.

 

 

 

$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline

jobs:
  prepare:
    type: command
    component: file:./prep.yml # use a YAML defintion for the component
    compute: azureml:cpu-cluster
    inputs:
      raw_data: 
          type: uri_folder
          path: azureml://datastores/workspaceblobstore/paths/data/my_raw_data/
    outputs:
      processed_data: 
        mode: upload
 
  train:
    type: command
    component: azureml:my-training-component #use the registered component
    compute: azureml:cpu-cluster
    inputs: 
      the_data: ${{jobs.prepare.outputs.processed_data}}
    outputs:
      the_model: 
        mode: upload

 

 

 

 

This pipeline now uses 2 components. One defined in a YAML file and the other which is registered in the workspace. This enables reuse and modularizes individual steps and orchestration separately.

 

How to get started?

To get started, first install the CLI v2, and follow along with our docs and samples:

 

Coming soon

The new CLI v2 and REST API have some features which are still work in progress. The following will be released in the upcoming months as a follow up release:

  • Support for automated ML (AutoML)
  • Ability to run a sweep job within a pipeline
  • Support for Parallel Jobs (aka Parallel Run Step)
  • Support for Tabular Datasets
  • Ability to schedule pipelines

 

Why should I use the new CLI?

The new CLI v2 and REST API provide some key benefits:

 

Easily move from local to cloud

Users can easily move from local workloads to cloud (remote) workloads. They start with their workload in a container for local execution. Data can either be uploaded or cloud data can be connected to the job. The execution itself can then be moved to run on cloud compute (initially AML Compute/ Compute Instance, later also AMLK8S). The workload itself does not matter here. The user controls what goes into the container image for these workloads, by using curated environments or building their own container using AzureML Environments (incl. Docker context) or something else.

 

Directly move from training to orchestration

Once the workload has been containerized and moved to the cloud, it can be orchestrated in a pipeline without needing adaptation. To provide scalability (for larger workflows) and reusability (when sharing parts of a pipeline), user can define shareable Components which capture one or multiple steps of a pipeline. Components can be shared in source-code (i.e., via Git repo) or as Python packages.

 

Directly move from training to deployment

Once a model has been trained (be it locally or in the cloud) and saved in MLflow format, the user can take the model to deployment without having to write additional code. Thanks to managed inferencing, the user does not have to manage a cluster, but can go straight to deploying their model – via the AzureML control plane operations. The same model, without modification, can also be scored in batch.

In addition, the user can bring their own container to be deployed, allowing them to entirely control the serving technology, enabling much simpler integration of other languages (like R) and other serving stacks (like Triton).

 

Standards-based data plane operations

Logging of metrics, params, artifacts and models, is supported by a standards-based way via MLflow. That means that the user can run their workloads locally and log metrics, etc. which they can then visualize in the local MLflow UI. When the same is run in the cloud, the logged artifacts, and metrics land in the AzureML Run History. For model format, AML supports MLflow, allowing the user to save their model together with the scoring code and the environment, enabling them to test the model locally before going on to deploy it.

 

Integrate with AzureML

Users have a largely increased set of options when integrating with AzureML. The simplest option will often be to use the CLI (for instance when starting a job or pipeline from a GitHub workflow or an ADF pipeline) since the CLI now supports all operations offered by the platform – and CLI authentication is well supported across the board. For deeper integrations, the user can choose to use ARM templates or REST. Another improvement comes from the fact that all resources and assets in AzureML now have a well-defined serialization to JSON/YAML, which allows for new sharing and migration scenarios, as well as GitOps-style operations where the state of the system is managed in git and deployed to AzureML at certain sync points.

Co-Authors
Version history
Last update:
‎Apr 04 2022 03:38 PM
Updated by: