Guest post by Lucas Liu Master’s student in Electrical & Computer Engineering at Duke University who specializes in Machine Learning & Federated Learning.
Throughout my time at university, I have built any number of scikit-learn, Tensorflow, or PyTorch machine learning models. I have developed and trained deep neural networks for applications ranging from cheetah footprint image classification to content similarity matching for bug tracking.
Many of these research projects have wound up more or less as Python scripts sitting on my PC, only existing locally. If a colleague or client wanted to use the product for themselves, this would involve a complicated series of file downloads, installation steps, dependency management, and more – a painful process even for researchers who are intimately knowledgeable with the technologies being used.
How can we eliminate this barrier between our research and our users? Microsoft Azure can help us transform machine learning research into a refined & easy-to-use product.
Containerize
First, we containerize our research with Docker. We take our ML model and package it into a simple flask app that serves predictions from POST requests with json payloads routed to `/predict`.
Here is an example Flask setup which predicts the probability of stroke with a scikit-learn ML model:
from flask import Flask, request
from flask.logging import create_logger
import logging
import pandas as pd
import joblib
app = Flask(__name__)
LOG = create_logger(app)
LOG.setLevel(logging.INFO)
@app.route("/")
def home():
html = "<h3>Stroke Prediction Home</h3>"
return html.format(format)
@app.route("/predict", methods=['POST'])
def predict():
"""Performs an sklearn prediction for stroke likelihood"""
json_payload = request.json
LOG.info(f"JSON payload: {json_payload}")
inference_payload = pd.DataFrame(json_payload)
LOG.info(f"inference payload DataFrame: {inference_payload}")
prediction = clf.predict_proba(inference_payload)[0][0]
statement = f'Probability of patient stroke is {prediction: .4f}'
return statement
if __name__ == "__main__":
clf = joblib.load("stroke_prediction.joblib")
app.run(host='0.0.0.0', port=8080, debug=True)
We then specify a Dockerfile configuration, which will handle requirements installation and run our Flask app. We will expose our container on port 80, and use Python slim, which is more lightweight.
A simple Docker configuration might look like this:
FROM python:3.8-slim
# Working Directory
WORKDIR /app
# Copy source code to working directory
COPY . app.py /app/
# Install packages from requirements.txt
RUN pip install --no-cache-dir --upgrade pip &&\
pip install --no-cache-dir --trusted-host pypi.python.org -r requirements.txt
# Expose port 80
EXPOSE 80
# Run app.py at container launch
CMD ["python", "app.py"]
Next, we will upload our container to the Azure Container Registry by using the Azure CLI. Let’s call our project ‘mlproject’.
- Create a resource group name ‘mlproject’ in the Azure Dashboard.
- Create an ACR repository with the command:
az acr create --resource-group mlproject --name mlproject –sku\ Basic --admin-enabled true
- Build the container in ACR (let’s call our container image ‘stroke-predict’:
az acr build --registry mlproject --image stroke-predict .
Now, users can more easily access our ML model by quickly pulling, building, and running our image, without having to worry about dependencies or model running details. This helps us avoid the age old “But it runs on my machine!” problem.
Deploy & Operationalize
What if, instead of building a container image, users could simply hit a URL and perform inferences (no image setup required)? Let’s use Azure Kubernetes Service to serve our container at a ready-to-go endpoint:
- First, we can use an Azure Pipeline Template to help us define a k8s deployment and load balancer YAML.
Example deployment YAML:
apiVersion: apps/v1
kind: Deployment
metadata:
name: stroke-predict
spec:
selector:
matchLabels:
app: stroke-predict
replicas: 3
template:
metadata:
labels:
app: stroke-predict
spec:
containers:
- name: stroke-predict
image: mlproject.azurecr.io/stroke-predict
imagePullPolicy: Always
readinessProbe:
httpGet:
port: 8080
path: /
livenessProbe:
httpGet:
port: 8080
path: /
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
Example loadbalancer YAML:
apiVersion: v1
kind: Service
metadata:
name: stroke-predict-loadbalancer
spec:
type: LoadBalancer
selector:
app: stroke-predict
ports:
- protocol: TCP
port: 80
targetPort: 8080
- Now, we create an AKS cluster. This example cluster will have a load balancer, and the ability to autoscale between 1 and 5 nodes.
az aks create --resource-group mlproject --name mlproject \
--generate-ssh-keys \
--node-count 3 \
--vm-set-type VirtualMachineScaleSets \
--load-balancer-sku standard \
--enable-cluster-autoscaler \
--min-count 1 \
--max-count 5
- Next, Merge AKS credentials between kubectl and your AKS cluster
az aks get-credentials --resource-group mlproject --name mlproject
- Attach our repository to the cluster
az aks update --resource-group mlproject --name mlproject \--attach-acr mlproject
- Deploy Application on Cluster
kubectl apply -f k8s/deployment.yaml
- Apply Load Balancer
kubectl apply -f k8s/loadbalancer.yaml
- Find IP for Endpoint
kubectl get services
Now our users can simply query our endpoint and receive predictions!
The power of Azure goes beyond just initial deployment. We can adopt Continuous Deployment practices with Github Actions for Azure to automatically trigger a new build each time a new version of the model is released, ensuring that the service endpoint is always providing the most up-to-date model.
Additionally, Azure’s autoscale feature allows our service to automatically scale up or down to meet real usage needs, activating additional resources during heavy usage time. We can even expand our AKS to reach users around the world. The Azure portal also allows us to monitor our AKS cluster metrics, and gain insights on the health and performance of the service.
Next Steps: MLOps
In this blog, we discuss how to transform your existing ML research models into a much more refined product with Azure’s Container Registry and Kubernetes Service, making it easy for users to access the fruits of your research.
However, if we start building with Azure from the very beginning, Azure’s MLOps offering provides an end-to-end solution for the ML life-cycle, from training and building the model initially, to continually retraining / redeploying an up-to-date service. Azure MLOps can even help us compare model performances & automatically detect data drift. This is just a small portion of what Azure’s MLOps can do –
Learn more about Azure MLOps here.