Ensuring Edge ML Models Provide Value by Observing Data Drift.

Christopher_Tearpak · ‎Feb 12 2024

In this blog post, guest blogger Martin Bald, Senior Manager Developer Community from one of our startup partners Wallaroo.AI will go through model observability which will check for data drift of our in-store models. The conditions that existed when a model was created, trained and tested can change over time, due to various factors. Retail consumer behavior can change due to supply chain shortages or other price-affecting events, leading to changes in their spending patterns. A new highway, or a highway closure, can affect traffic patterns. Societal adjustments to the COVID-19 pandemic changed a lot of things.

Introduction

In the previous blog posts in this series we deployed our model to production on an edge device at our retail store location for product monitoring. Then we ran some validation checks on the model with challenger models to discover which performed better. We are not quite ready to sit back and put our feet up yet. In order to keep our models operational, we must continue to monitor the behavior and performance of the model to ensure that the model provides value to the business.

What is Data Drift?

In machine learning, you use data and known answers to train a model to make predictions for new previously unseen data. You do this with the assumption that the future unseen data will be similar to the data used during training: the future will look somewhat like the past.

This isn't completely true, of course, and a good model should be robust to some amount of change in the environment; however, if the environment changes too much, your models may no longer be making the correct decisions. This situation is known as concept drift; too much drift can obsolete your models, requiring periodic retraining.

Concept drift generally manifests as data drift: the distribution of values that you observe in your input data (and in the model predictions) will change over time from the distributions of those values at the time you trained the model. Too much data drift, if it persists, can be a sign that it's time to retrain your model.

The challenges that come with model monitoring and observability for edge device deployments can manifest themselves in a number of ways such as remote devices with limited processing resources, low/intermittent connectivity, low latency connections, limited or intermittent power supply to name a few. A comprehensive edge AI solution requires the capability to inference and process output at the edge while providing data scientists the power to centrally monitor, and take prompt, timely action on models in production without incurring high costs and latency delays of sending large amounts of data to and from the edge locations.

Establishing good model monitoring and observability habits in your MLOps lifecycle is crucial as data drift monitoring ensures model robustness, business continuity, and alignment with changing data dynamics.

Monitoring for Data Drift

For the rest of this blog post we will see what model observability for edge devices looks like in action.

The first steps are to load the necessary Python libraries and connect to our production instance.

import wallaroo
from wallaroo.object import EntityNotFoundError
from wallaroo.framework import Framework
from IPython.display import display
from IPython.display import Image
import pandas as pd
import json
import datetime
import time
import cv2
import matplotlib.pyplot as plt
import string
import random
import pyarrow as pa
import sys
import asyncio
import utils
pd.set_option('display.max_colwidth', None)

Next we will authenticate and connect to our Wallaroo cluster and workspace.

wl = wallaroo.Client()
workspace = utils.getWorkspace(wl, "cv-retail-edge")
_ = wl.set_current_workspace(workspace)

For this CV retail example, we want to track the average confidence of object predictions and get alerted if we see a drop in confidence. To accomplish this, we'll add a simple Python step that computes the average confidence to our pipeline:


input_schema = pa.schema([
    pa.field('tensor', pa.list_(
        pa.list_(
            pa.list_(
                pa.float32(), # images are normalized
                list_size=640
            ),
            list_size=480
        ),
        list_size=3
    )),
])

output_schema = pa.schema([
    pa.field('boxes', pa.list_(pa.list_(pa.float32(), list_size=4))),
    pa.field('classes', pa.list_(pa.int64())),
    pa.field('confidences', pa.list_(pa.float32())),
    pa.field('avg_px_intensity', pa.list_(pa.float32())),
    pa.field('avg_confidence', pa.list_(pa.float32())),
])

Next we will authenticate and connect to our Wallaroo cluster and workspace.

wl = wallaroo.Client()
workspace = utils.getWorkspace(wl, "cv-retail-edge")
_ = wl.set_current_workspace(workspace)

For this CV retail example, we want to track the average confidence of object predictions and get alerted if we see a drop in confidence. To accomplish this, we'll add a simple Python step that computes the average confidence to our pipeline:

input_schema = pa.schema([
    pa.field('tensor', pa.list_(
        pa.list_(
            pa.list_(
                pa.float32(), # images are normalized
                list_size=640
            ),
            list_size=480
        ),
        list_size=3
    )),
])

output_schema = pa.schema([
    pa.field('boxes', pa.list_(pa.list_(pa.float32(), list_size=4))),
    pa.field('classes', pa.list_(pa.int64())),
    pa.field('confidences', pa.list_(pa.float32())),
    pa.field('avg_px_intensity', pa.list_(pa.float32())),
    pa.field('avg_confidence', pa.list_(pa.float32())),
])


model_name = "resnet-with-intensity"
model = wl.upload_model(model_name, "models/model-with-pixel-intensity.zip", framework=Framework.CUSTOM, \
                        input_schema=input_schema, output_schema=output_schema)

This will take a few mins and give us the following output.

Waiting for model loading - this will take up to 10 mins.
Model is pending loading to a container runtime..
Model is attempting loading to a container runtime..............successful

Ready

Next we will set up the hardware environment as we have seen in the previous blogs.

deployment_config = wallaroo.DeploymentConfigBuilder() \
    .replica_count(1) \
    .cpus(1) \
    .memory("2Gi") \
    .sidekick_cpus(model, 1) \
    .sidekick_memory(model, '6Gi') \
    .build()

Then we will deploy the model.

pipeline_name = 'retail-inv-tracker-edge-obs'
pipeline = wl.build_pipeline(pipeline_name) \
            .add_model_step(model) \
            .deploy(deployment_config = deployment_config)

This will take a few seconds and give us the following output telling us the deployment was successful.

Waiting for deployment - this will take up to 45s ..................... ok

We have data that describes the expected data, and assume we have training data that describes the expected behavior of the model in a baseline.csv file that we will upload & use that for baseline. We will upload this data to create a baseline distribution that we will use to compare the model behavior to this baseline

pipeline = wl.list_pipelines()[0]
assay_name = "average confidence drift detection v2"
step_name = "resnet-with-intensity"
assay_builder = wl.build_assay(assay_name, pipeline, step_name, iopath="output avg_confidence 0", baseline_data="baseline.csv")

We will click on the output URL and select Yes to authenticate to the instance.

Please log into the following URL in a web browser:
	https://keycloak.demo3.pov.wallaroo.io/auth/realms/master/device?user_code=UDZF-HTWP

Login successful!

Monitor for Drift and see Alerts

One good example of drift detection at the edge is to monitor cameras for focus. Here we show how to set up an assay that will try to do exactly that.

We can assess our model for two variables that we think will help us detect loss of camera focus.

Pixel Intensity - This computes the average pixel value, and drift can be because of subject matter change as well as physical damage to sensors or other input distortion
Average Confidence - This takes the average confidence of the model’s predictions and I’ll get alerted if there is a change in the model’s confidence, this could be due to subject matter drift or other causes

assay_builder = assay_builder.add_run_until(assay_end)

# View 1 minute intervals
assay_builder.window_builder().add_width(minutes=3).add_interval(minutes=3)

assay_config = assay_builder.build()
assay_results = assay_config.interactive_run()

print(f"Generated {len(assay_results)} analyses")
assay_results.chart_scores()

This will produce the output graph showing abrupt drift.

Fig 1.

We can also represent this in table format.

display(assay_results.to_dataframe().loc[:, ["score", "start", "alert_threshold", "status"]])

Fig 2.

We can see the drift in the last 2 entries consistent with the graph visualization.

assay_builder.upload()

Output

Conclusion

We have seen in the above example that in addition to setting up assays on a model's prediction, you can also set up assays on key inputs to the model as well, to more directly measure for concept drift. In either case, data scientists have a way to easily monitor the behavior of their models in production, so that they can intervene and adjust to a changing environment and help ensure model robustness, business continuity, and alignment with changing data dynamics.

The next blog post in this series will address a very common set of challenges that AI teams face with production ML workloads and how to solve them through Model Workload Orchestration so we can easily define, automate, and scale recurring production ML workloads that ingest data from predefined data sources, run inferencing, and deposit the results to a predefined location.

If you want to try the steps in this blog post series you can access the tutorials at this link and use the free inference servers available on the Azure Marketplace. Or you can download a free Wallaroo.AI Community Edition you can use with GitHub Codespaces.

Wallaroo.AI is a unified production ML platform built for Data Scientists and ML Engineers for easily deploying, observing, and optimizing machine learning in production at scale – in any cloud, on-prem, or at the edge.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs