Teaching Machine Learning System Efficiently with Active Learning

Microsoft

Mar 30, 2022

Summary

Supervised machine learning models commonly require a large amount of training data to produce good results. However, the process of labeling enough data to create this large dataset can be both time-consuming and expensive. Often, this becomes a barrier to machine learning development and adoption.

Active Learning is a methodology to prioritize the data needing to be labeled to have the highest impact on model performance, as well as a mechanism to run a continuous training to deployment process. When utilizing this method, ML development starts with a smaller labeled dataset, resulting in lower performance, but can return a higher performing model with less time and labeling effort when compared with making a hefty investment upfront in labeling a larger dataset for initial training.

We created a reference implementation for Active Learning in Azure to demonstrate how this works in practice. The reference implementation comes with templates and libraries to facilitate executing a new Active Learning project in Azure.

To show the benefits of Active Learning, a test was conducted using the MIT Indoor Scenes Dataset, which contains 15,620 images of 67 indoor categories. By simulating the Active Learning process, the results demonstrated that Active Learning achieved 7-14% increased accuracy over few training rounds.

Functional Design

The Active Learning flow has two main blocks: Model Training and Model in Production. Similar to traditional MLOps practices, these two components are connected, and changes are automatically deployed from Model Training to Model in Production. There are several important features that make the Active Learning flow different from traditional MLOps:

Scoring result of deployed model is collected and used as input for Model Training
At the beginning of Model Training, a prioritization strategy is applied to select a small subset of the scoring result that the model is least certain about for human labeling. The strategy can be Least Confidence, Smallest Margin Uncertainty or Entropy Sampling.
Selected data is fed into a labeling process for human labeling.
Labeled data is exported and fed into a training pipeline.
Training pipeline performs incremental training with additional labeled examples. The examples are effective in addressing most important mistakes made by the previous model during scoring.
Deployment pipeline deploys the improved model to production.
The loop continues.

Technical design

Technical design is the implementation of the functional flow with Azure components.

Scoring service is implemented with Managed Online Endpoint or Batch Endpoint.
We created a Model and Data Monitoring library to collect data from the scoring service, then store as Azure Data Explorer tables to leverage ADX’s strength in scalable data analysis to perform data selection and analytic queries.
Model monitoring collects prediction data including probabilities, predicted label, and raw image data to store as ADX tables and in Azure storage.
An Azure ML Labeling project is set up with the input data folder in Azure ML storage location configured to be refreshed automatically for human labeling.
A data selection service job runs periodically on the collected prediction data where prioritization strategy is applied to select part of the data. Selected examples are sent to input location of the Azure ML Labeling project.
AML Labeling creates tasks for human labelers to work on new data.
Human labeler releases labeled data as AML dataset.
Azure Function with blob queue trigger calls a GitHub workflow to start training and deployment pipeline.
The training step within the pipeline loads current model checkpoint and performs incremental training with the newly labeled data, producing a new version of the model.
Deployment step deploys new model version to production.

Code modules

The core module
- azure_function: This is the Azure Function that responds to the event that a labeled dataset is exported from Azure ML Labeling project. The Azure Function triggers a new training workflow which is implemented as a GitHub workflow (active_learning_cv_training_deployment).
- data_engineering: This contains functions and classes that implement one of the four data prioritization strategies and prepares the dataset in the format that can be used by training module.
- pipelines: This contains the definitions of pipelines that stitch together components of a functional flow such as model training.
- scoring: This contains the implementation of real time scoring and batch scoring using Azure ML’s Managed Online Endpoint and Batch Endpoint.
- training: This contains definitions of model training procedures. An implementation of AutoML computer vision is provided, but this can be replaced with a custom training module.
- monitoring: This contains a utility class to collect data in streaming and batch modes and provides query service to the data tables in ADX.

Automation with GitHub Actions and Workflows

To automate a series of operations upon changes and availability of input, the following GitHub workflows are implemented:

active_learning_cv_data_selection: This workflow runs on a schedule (e.g. daily) that evaluates the data collected by the Monitoring module from the scoring service to prioritize data selection according to the strategy and stages the data for labeling.
active_learning_cv_training_deployment: This workflow runs the model training job and performs deployment to the scoring service. The trigger to this workflow is by the event that a dataset is exported from the Azure ML Labeling service.

Simulation Module and Results

To support the evaluation of Active Learning, a simulation flow was ran that replaces the human labeling step with an automated labeling module. Here, a labeled dataset of 15k images with 67 classes is used. The flow is as follows:

Initial training and deployment of a model is performed using a small dataset sampled from the full dataset.
A scoring simulation module samples data from the full dataset, excluding data that was used to train with the deployed model.
Scored data is collected and prioritized using standard modules. The output is directly joined with full labeled dataset to obtain the correct label instead of being sent to Azure ML Labeling.
New training dataset is created and sent for model training and deployment. Using the simulation mode, the effectiveness of Active Labeling with different settings can be evaluated. This, however, requires a large, labeled dataset. The diagram and table below show compare the model performance over time in the Active Learning Simulation through 30 rounds of incremental active learning training, each with 50 labeled images which are sampled from 500 images used for scoring.

Results

Observation

There is no obvious winner among the Least Confidence (LC), Smallest Margin Uncertainty (SMU), and Entropy Sampling (ES) strategies when comparing accuracy. However, when compared to Random Sampling, Least Confidence, for example, has a 5-10% increase in accuracy in each incremental training round. Additionally, the more selective the process is, the more effective a deliberate strategy is compared to the Random Sampling method.

Checkout our GitHub repository at MLOpsTemplate/src/active_learning_cv at james-simdev · microsoft/MLOpsTemplate (github.com)

Updated Apr 04, 2022

Version 2.0

JamesN

Microsoft

Joined October 06, 2020

View Profile

Microsoft Foundry Blog

Follow this blog board to get notified when there's new activity

Blog Post

Teaching Machine Learning System Efficiently with Active Learning