Transform Machine Learning Research into a Professional Product with Azure

Microsoft

May 11, 2021

Guest post by Lucas Liu Master’s student in Electrical & Computer Engineering at Duke University who specializes in Machine Learning & Federated Learning.

Throughout my time at university, I have built any number of scikit-learn, Tensorflow, or PyTorch machine learning models. I have developed and trained deep neural networks for applications ranging from cheetah footprint image classification to content similarity matching for bug tracking.

Many of these research projects have wound up more or less as Python scripts sitting on my PC, only existing locally. If a colleague or client wanted to use the product for themselves, this would involve a complicated series of file downloads, installation steps, dependency management, and more – a painful process even for researchers who are intimately knowledgeable with the technologies being used.

How can we eliminate this barrier between our research and our users? Microsoft Azure can help us transform machine learning research into a refined & easy-to-use product.

Containerize

First, we containerize our research with Docker. We take our ML model and package it into a simple flask app that serves predictions from POST requests with json payloads routed to `/predict`.

Here is an example Flask setup which predicts the probability of stroke with a scikit-learn ML model:

from flask import Flask, request
from flask.logging import create_logger
import logging

import pandas as pd
import joblib

app = Flask(__name__)
LOG = create_logger(app)
LOG.setLevel(logging.INFO)

@app.route("/")
def home():
    html = "<h3>Stroke Prediction Home</h3>"
    return html.format(format)

@app.route("/predict", methods=['POST'])
def predict():
    """Performs an sklearn prediction for stroke likelihood"""
    json_payload = request.json
    LOG.info(f"JSON payload: {json_payload}")
    inference_payload = pd.DataFrame(json_payload)
    LOG.info(f"inference payload DataFrame: {inference_payload}")
    prediction = clf.predict_proba(inference_payload)[0][0]

    statement = f'Probability of patient stroke is {prediction: .4f}'
    return statement

if __name__ == "__main__":
    clf = joblib.load("stroke_prediction.joblib")
    app.run(host='0.0.0.0', port=8080, debug=True)

We then specify a Dockerfile configuration, which will handle requirements installation and run our Flask app. We will expose our container on port 80, and use Python slim, which is more lightweight.

A simple Docker configuration might look like this:

FROM python:3.8-slim

# Working Directory
WORKDIR /app
# Copy source code to working directory
COPY . app.py /app/

# Install packages from requirements.txt
RUN pip install --no-cache-dir --upgrade pip &&\
    pip install --no-cache-dir --trusted-host pypi.python.org -r requirements.txt

# Expose port 80
EXPOSE 80

# Run app.py at container launch
CMD ["python", "app.py"]

Next, we will upload our container to the Azure Container Registry by using the Azure CLI. Let’s call our project ‘mlproject’.

Create a resource group name ‘mlproject’ in the Azure Dashboard.
Create an ACR repository with the command:

az acr create --resource-group mlproject --name mlproject –sku\ Basic --admin-enabled true

Build the container in ACR (let’s call our container image ‘stroke-predict’:

az acr build --registry mlproject --image stroke-predict .

Now, users can more easily access our ML model by quickly pulling, building, and running our image, without having to worry about dependencies or model running details. This helps us avoid the age old “But it runs on my machine!” problem.

Deploy & Operationalize

What if, instead of building a container image, users could simply hit a URL and perform inferences (no image setup required)? Let’s use Azure Kubernetes Service to serve our container at a ready-to-go endpoint:

First, we can use an Azure Pipeline Template to help us define a k8s deployment and load balancer YAML.

Example deployment YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stroke-predict
spec:
  selector:
      matchLabels:
        app: stroke-predict
  replicas: 3
  template:
      metadata:
        labels:
            app: stroke-predict
      spec:
        containers:
        - name: stroke-predict
          image: mlproject.azurecr.io/stroke-predict
          imagePullPolicy: Always
          readinessProbe:
            httpGet:
              port: 8080
              path: /
          livenessProbe:
            httpGet:
              port: 8080
              path: /
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "500m"

Example loadbalancer YAML:

apiVersion: v1
kind: Service
metadata:
  name: stroke-predict-loadbalancer
spec:
  type: LoadBalancer
  selector:
    app: stroke-predict
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

Now, we create an AKS cluster. This example cluster will have a load balancer, and the ability to autoscale between 1 and 5 nodes.

az aks create --resource-group mlproject --name mlproject \
--generate-ssh-keys \
--node-count 3 \
--vm-set-type VirtualMachineScaleSets \
--load-balancer-sku standard \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 5

Next, Merge AKS credentials between kubectl and your AKS cluster

az aks get-credentials --resource-group mlproject --name mlproject

Attach our repository to the cluster

az aks update --resource-group mlproject --name mlproject \--attach-acr mlproject

Deploy Application on Cluster

kubectl apply -f k8s/deployment.yaml

Apply Load Balancer

kubectl apply -f k8s/loadbalancer.yaml

Find IP for Endpoint

kubectl get services

Now our users can simply query our endpoint and receive predictions!

The power of Azure goes beyond just initial deployment. We can adopt Continuous Deployment practices with Github Actions for Azure to automatically trigger a new build each time a new version of the model is released, ensuring that the service endpoint is always providing the most up-to-date model.

Additionally, Azure’s autoscale feature allows our service to automatically scale up or down to meet real usage needs, activating additional resources during heavy usage time. We can even expand our AKS to reach users around the world. The Azure portal also allows us to monitor our AKS cluster metrics, and gain insights on the health and performance of the service.

Next Steps: MLOps

In this blog, we discuss how to transform your existing ML research models into a much more refined product with Azure’s Container Registry and Kubernetes Service, making it easy for users to access the fruits of your research.

However, if we start building with Azure from the very beginning, Azure’s MLOps offering provides an end-to-end solution for the ML life-cycle, from training and building the model initially, to continually retraining / redeploying an up-to-date service. Azure MLOps can even help us compare model performances & automatically detect data drift. This is just a small portion of what Azure’s MLOps can do –

Learn more about Azure MLOps here.