Machine Learning DevOps (MLOps) with Azure ML

Lee Stott · ‎Jul 08 2019

The Azure CAT ML team have built the following GitHub Repo which contains code and pipeline definition for a machine learning project demonstrating how to automate an end to end ML/AI workflow.

https://github.com/microsoft/MLOpsPython

The build pipelines include DevOps tasks for data sanity test, unit test, model training on different compute targets, model version management, model evaluation/model selection, model deployment as realtime web service, staged deployment to QA/prod and integration testing.

MLOps Reference Architecture

This reference architecture shows how to implement continuous integration (CI), continuous delivery (CD), and retraining pipeline for an AI application using Azure DevOps and Azure Machine Learning.

The solution example is built on the scikit-learn diabetes dataset but can be easily adapted for any AI scenario and other popular build systems such as Jenkins and Travis.

Architecture Flow

Train Model

Data Scientist writes/updates the code and push it to git repo. This triggers the Azure DevOps build pipeline (continuous integration).
Once the Azure DevOps build pipeline is triggered, it runs following types of tasks:
- Run for new code: Every time new code is committed to the repo, the build pipeline performs data sanity tests and unit tests on the new code.
- One-time run: These tasks runs only for the first time the build pipeline runs. It will programatically create an Azure ML Service Workspace, provision Azure ML Compute (used for model training compute), and publish an Azure ML Pipeline. This published Azure ML pipeline is the model training/retraining pipeline.
Note: The Publish Azure ML pipeline task currently runs for every code change
The Azure ML Retraining pipeline is triggered once the Azure DevOps build pipeline completes. All the tasks in this pipeline runs on Azure ML Compute created earlier. Following are the tasks in this pipeline:
- Train Model task executes model training script on Azure ML Compute. It outputs a model file which is stored in the run history.
- Evaluate Model task evaluates the performance of newly trained model with the model in production. If the new model performs better than the production model, the following steps are executed. If not, they will be skipped.
- Register Model task takes the improved model and registers it with the Azure ML Model registry. This allows us to version control it.

Deploying the MLOps Model

Once you have registered your ML model, you can use Azure ML + Azure DevOps to deploy it.

The Package Model task packages the new model along with the scoring file and its python dependencies into a docker image and pushes it to Azure Container Registry. This image is used to deploy the model as web service.

The Deploy Model task handles deploying your Azure ML model to the cloud (ACI or AKS). This pipeline deploys the model scoring image into Staging/QA and PROD environments.

In the Staging/QA environment, one task creates an Azure Container Instance and deploys the scoring image as a web service on it.

The second task invokes the web service by calling its REST endpoint with dummy data.

The deployment in production is a gated release. This means that once the model web service deployment in the Staging/QA environment is successful, a notification is sent to approvers to manually review and approve the release. Once the release is approved, the model scoring web service is deployed to Azure Kubernetes Service(AKS) and the deployment is tested.

Repo Details

You can find the details of the code and scripts in the repository here

References

Nick Barker · ‎Jul 08 2019

Does anyone know if there is a resource out there that talks about or demonstrates how to use Azure ML to analyze consumer drone video? I am specifically interested in setting something up so that I can use machine learning to do image analysis of microwave communication tower structure integrity. We currently have a large microwave communication network and once a year have each tower manually inspected (a lot of these towers are over 300 ft. tall). I am interested in utilizing a analytics pipeline of some sort to facilitate more frequent inspections of towers and equipment. Any advice, feedback, or interest in the project is welcomed and appreciated.

Lee Stott · ‎Jul 10 2019

Nick

So have a look at https://azure.microsoft.com/en-us/services/media-services/media-analytics/ this service pretty much offers you a lot of features out of the box.

Re a real world example of a project see the Microsoft Machine Learning Channel where there is the following example (Video event detection with Deep Learning and a Python pipeline processing https://www.youtube.com/watch?v=z7NN0JeFNyY) and the code is at https://github.com/vJenny/race-events-recognition

rdurham · ‎Sep 29 2019

And I am getting the an error when I try step# 6 of the workshop (6. Set up Build Pipeline).

aml_service/util/workspace.py:29: SystemExit

----------------------------- Captured stdout call -----------------------------

Error while retrieving Workspace...

Get Token request returned http error: 400 and server response: {"error":"invalid_request","error_description":"AADSTS90002: Tenant 'xxxxxxxxxxxxxxxxx' not found. This may happen if there are no active subscriptions for the tenant.

I have triple checked and this tenant has my subscription in it.

rdurham · ‎Sep 29 2019

Also I am trying to use the video and I am receiving the following error:

null
If the owner of this video has granted you access, please sign in.

Lee Stott · ‎Oct 18 2019

Hi RDurham sorry the video is no longer available please see https://github.com/microsoft/MLOpsPython

For details on Azure DevOps Pipleines I would suggest you look at the content on Microsoft learn https://docs.microsoft.com/en-us/learn/browse/?term=DevOps

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs