MLOps is a set of practices and tools that help organizations to manage and deploy machine learning models in a scalable and reliable way. They include cross-functional collaboration, version control and testing, and ensuring the deployment environment is secure and compliant with relevant regulations. By adopting MLOps practices, organizations can improve collaboration between teams, better govern and comply with regulations, and deploy models safely and securely.
In this blog, let’s explore how Azure Machine Learning can help adopt MLOps practices, with a special focus on model deployment and safe rollout aspect. At Microsoft Build 2023, we announced the General Availability for Mirrored traffic as well - we will see how this helps complete the story for safe rollout of machine learning models.
Azure Machine Learning (AzureML) is a cloud-based platform that provides a comprehensive set of tools and services for building, training, and deploying machine learning models at scale. With Azure Machine Learning, data scientists can work in a collaborative and flexible environment that supports a wide range of open-source frameworks and languages.
To achieve successful and reliable MLOps practices, it's important to understand key concepts around model deployment and management. We also present how you can implement these concepts using Azure Machine Learning platform and its features such as Managed online endpoint.
As machine learning models evolve over time, it's important to keep track of different versions of the model and to manage those versions in a reliable and efficient way. Model versioning and management can help ensure that the correct version of the model is deployed and can be used for auditing and compliance purposes.
Azure Machine Learning workspaces allow model registration, which enables you to store and version your machine learning models in Azure. The model registry makes it easy to organize and keep track of trained models. Registered models are identified by name and version, allowing you to track the changes made to the model over time. Additionally, more metadata tags can be provided during registration, which can be helpful when searching for a specific model. Along with the models, you can also manage environment related metadata (such as pip and conda dependencies) in the Azure Machine Learning workspaces that can be associated with both training and deployment of the models. See Work with models in Azure Machine Learning for more.
Azure Machine Learning Registries takes it one step further. Azure Machine Learning Registries make the model artifacts and dependencies available to all workspaces in an organization, and they enable versioning, artifact management, and deployment management for machine learning models. One of the supported scenarios would be to have separate workspaces for development and production. You can iteratively develop a model in a development workspace. Once a good candidate model has been identified, it can be published to a registry. From the registry, the model can be deployed to endpoints in different production workspaces. See Create a model in registry and Announcing the general availability of Azure Machine Learning registries for more.
After you train a machine learning model or a machine learning pipeline, you need to deploy them so others can consume their predictions. Such an execution mode is called inference. In general, techniques such as A/B testing, canary releases, and feature flags can help to manage model deployment in a reliable and controlled manner. You can implement this practice using Azure Machine Learning Managed online endpoint feature. Let’s quickly touch on the concepts of endpoint and deployment for machine learning inference first.
Azure Machine Learning defines the concept of "endpoints" that defines the “interface” for the inference. For instance, you can make an HTTP request to a URL using some sort of credentials, provide a picture of a car, and you can get the type and color of the car back as string values. This is what an “endpoint” would do as an interface. On the contrary, Alice, a data scientist, can implement and develop a model using ResNet architecture with TensorFlow framework and decide to use a CPU machine. This is considered a “deployment”, and you can assign this deployment to the endpoint mentioned earlier. Then a scoring request will go through the endpoint to this deployment, to provide a prediction.
Bob, another data scientist, may decide to use Torch framework with some data augmentation techniques and run on a GPU machine. This can be a new “deployment”, and you can assign this deployment to the same endpoint mentioned earlier (sharing the same interface concept).
Using the monitoring and logging feature described earlier, you can compare deployments and see if the new model deployment performs better than the old one.
Azure Machine Learning provides a mechanism to control how the scoring requests are routed to each of the deployments behind an endpoint. In a blue-green deployment scenario, basic traffic split can be configured at endpoint level, so that, for example, 90% of the whole traffic goes to the blue (old model) while 10% goes to the green (new model).
We’re excited to share that Mirrored traffic is now Generally Available! Mirrored traffic allows mirroring a portion of the live traffic to a new model. In this case, 100% of the whole traffic can go to the blue (old model) so it ensures all predictions are only coming from a previously approved model, but 10% of the actual traffic from production can go to the green (new model) so that you can monitor its performance against production data.
That way, you can use all monitoring and logging features to measure how the new model performs in real environment, and control how it begins to handle production traffic in the most critical machine learning applications.
In general, testing the new deployment with traffic mirroring/shadowing is also known as shadow testing. The deployment receiving the mirrored traffic can also be called the shadow deployment.
In short, the concepts of endpoint and deployment, traffic control mechanisms including mirrored traffic, and the monitoring features allow simplifying safe rollout of new models and improving reliability in production scenarios.
When machine learning models are deployed to production, it's important to monitor them closely to detect any potential issues. Model monitoring and logging can help identify anomalous behavior or unexpected results, which can be a sign of degraded performance or other issues that need to be addressed.
Azure Machine Learning provides several ways to track and monitor metrics and logs regarding Azure Machine Learning online endpoints. Integrated with Azure Monitor, you can view metrics in chart, compare between endpoints and deployments, pin to Azure portal dashboards, configure alerts, query from log tables, and push the logs to supported targets. You can also use Application Insights to analyze events from user containers.
Endpoint level metrics such as request latency, requests per minute, new connections per second, network bytes, etc. can be drilled down to deployment or status level. Deployment level metrics such as CPU/GPU utilization, memory or disk utilization can be drilled down to instance level. Azure Monitor allows tracking these Metrics in charts and setting up dashboards and alerts for further analysis.
You can send Metrics to the Log Analytics Workspace where you can query the logs using rich Kusto query syntax. You can also send Metrics to Storage Account and/or Event Hubs for further processing. In addition, you can use dedicated Log tables for online endpoint related events, traffic, and console (container) logs. Kusto query allows complex analysis joining multiple tables.
Curated environments include the integration with Application Insights, and you can simply enable/disable it when you create an online deployment. Built-in metrics and logs are sent to Application insights, and you can use its built-in features such as Live metrics, Transaction search, Failures and Performance for further analysis.
In addition, you can perform actual cost breakdown analysis between endpoints and deployments. For example, after you deployed a new model to an endpoint, you can compare costs associated with both old model and new model and confirm cost implication of the changes you brought with the new model.
Network isolation can be crucial for ensuring the privacy, security, and compliance of your machine learning models. Private endpoints, which provide a secure way to access resources within a virtual network, can be used to protect your data and services from unauthorized access.
This involves both inbound and outbound security threats. The inbound threat is about unauthorized access to the endpoints for your machine learning models. You have authorization and authentication mechanisms, but you may want to secure network access to your endpoints as well. The outbound threat is about data exfiltration from your own model deployment. You may want to block outbound access so that model deployments are only allowed to access resources secured within the virtual networks without external access.
Both inbound and outbound network access controls are easily configurable with Azure Machine Learning Managed online endpoint. When you deploy your model, you can simply indicate that you want to secure ingress for the model. In the backend, all the complex configuration is automatically set up so that the model serving endpoint (scoring URI) is only accessible from a private IP from your virtual network using workspace Private Endpoint (PE). Similarly, when you deploy your model, you can simply indicate that you want to secure egress for the model only to workspace resources. In this case, all the complex configuration is automatically set up so that the egress from the scoring model container will be restricted only to specific resources via secure connectivity through PEs, and the internet access is disabled.
Here we illustrate actual steps that you can follow using Azure Machine Learning managed online endpoint. Let’s say you are responsible for deploying recently developed credit card fraud detection models to production. You would have trained the models in a development Azure Machine Learning workspace. After validating a new model that is recently developed, the model can be promoted and registered to Azure Machine Learning Registries. Your task is to deploy the new model to a production Azure Machine Learning workspace leveraging safe rollout strategy.
The production Azure Machine Learning workspace would have been configured with Private Link, and ready to serve models in virtual network. You can go to the Azure Machine Learning Registries you have access to, find the new model, click deploy – real-time endpoint, and choose the production workspace as the target workspace.
In general, you could use the quick deployment wizard which will allow you to deploy the model with just one click. But in this scenario, you want to use advanced options such as mirrored traffic, you can click "more options".
One of the configurations you can set for your endpoint is Public network access. This is a feature for network isolation. If you disable Public network access, it will block internet inbound. In this scenario, you would choose an existing endpoint, that is already running an old model and blocking internet inbound.
When you continue with the wizard, you will arrive at the Deployment step. Here you can set these options:
Now the fun part! You can enable mirrored traffic for your new model, by enabling the feature and assigning 20% of the traffic. What this means is that out of 100% live traffic that the old model is taking, 20% of traffic is copied, or mirrored, and sent to the new model. All predictions that the client application receives are from the old model, but you can use built-in monitoring and logging features to debug the new model with the real-world data, without risking customer impact while testing the new model.
Now that both old and new models are running (although production applications are using the predictions from the old model), you can look at different metrics from both deployments. For example, you can look at latency or throughput or CPU/GPU utilization metrics and verify if new model performs as expected. You can also leverage dashboard or application insights to drill deeper. Another option you can consume the telemetry is to use Kusto queries on the log tables to analyze the data in more detail.
These metrics are useful for evaluating the operational performance of endpoints and deployments. If you are interested in monitoring the model quality performance, such as data drift, prediction drift, and data quality, you can read more here: Continuously Monitor the Performance of your AzureML Models in Production.
If the performance of your new model is within your target threshold (for example, it’s using not too much compute resources like CPU, memory etc, or it’s showing desirable latency or throughput), you can go ahead and analyze the cost of serving the new model. Again, you can check cost distributed per service, and break your cost down to the level of deployment. That way, you can ensure your new model is operating within budget.
If the new deployment looks good in every aspect, we can now remove mirrored traffic and start sending live traffic gradually to the new deployment. For example, you can split the traffic and send only 10% of the traffic to the new model, while 90% is handled by the old model. With some cool down and approval policy implemented, you can integrate this safe rollout with your release pipeline and gradually increase the traffic ratio for the new model. Once the new model starts taking 100% live traffic, you can decide when to remove the old deployment. This way, you can safely roll out your new models, ensuring they meet both business and technical needs.
We have explored how you can approach safe rollout problem in a production setup with Azure Machine Learning. Deployment of your machine learning models is becoming easier with Azure Machine Learning Managed online endpoint. Network isolation helps securing access to models and preventing data exfiltration from your models. Mirrored traffic adds a preventive layer to reduce the risk while testing the new models with real-world data.
Get started today with Azure Machine Learning Managed online endpoint!
To learn more about Azure Machine Learning Managed online endpoint, watch these Microsoft Build 2023 breakout sessions:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.