Continuously Monitor the Performance of your AzureML Models in Production

Copper Contributor

Nov 13, 2023

Hi alexanderhughes

In the underlying code you are using "spark.read.mltable" (https://github.com/Azure/azureml-examples/blob/9c762dd6bb704579e34b13e322c9e1e99c51b93e/cli/monitoring/components/custom_preprocessing/src/run.py#L22) - I've been troubleshooting this. Attempting to run it on AzureML Notebook with Azure ML Serverless Spark Compute or an attached Synapse Spark pool, but both attempts fail with the error: 'AttributeError: 'DataFrameReader' object has no attribute 'mltable'.

pip install mltable~=1.3.0, will not help.I'd really appreciate any insights on 'spark.read.mltable.' It runs smoothly as part of an AML job, but when attempted in Notebooks on the same Synapse cluster and spark.synapse.library.python.env, it hits a snag. Could there be any additional custom installation required to make validation framework executable on Spark compute? And when you're crafting your validator, what's your go-to debugging strategy?

A bit of feedback (so energy goes both ways;):

Also - I managed to "set up model monitoring by bringing our own production data to Azure Machine Learning" (https://learn.microsoft.com/en-us/azure/machine-learning/how-to-monitor-model-performance?view=azureml-api-2&tabs=azure-cli#set-up-model-monitoring-by-bringing-your-own-production-data-to-azure-machine-learning) with mltable data stored ADLS2 Datastore. It took 2-3 MD, very acceptable but a bit longer than I expected.
- In this scenario, the 'pre_processing_component' is non-essential and could be optional, making the CI/CD process simpler.
Model Monitoring" essentially brings the capabilities of the "Data Validation" framework to AML, ensuring data quality before it reaches the training job. It's a fantastic addition for AML systems, and the user interface is incredibly user-friendly.
- The existing "advanced_data_quality" comes with validations that significantly enhance the maintainability of ML systems.
- The ability to bring in your own custom validators is a standout feature.
- As for areas to improve, documentation, tutorials, processing speed, and scalability is a must. Looking at cost-efficiency might easily challange other data validators.

Blog Post

Continuously Monitor the Performance of your AzureML Models in Production