Like Gitops, Machine Learning Operations (or MLOps) can make significant improvements in accelerating how data scientists can impact organizational needs. A well-implemented MLOps process not only speeds the time from code to production, but also provides ownership, lineage and historical information, critical for understanding the performance of any machine learning model. Critical to this process is a CI/CD system that understands the elements of ML natively as well as stays in sync with any code or data changes, no matter what platform organizations need them to run on.
Unfortunately, many data scientists are still forced to implement MLOps manually. Oftentimes CI/CD platforms are powerful, but quite generic, requiring the implementation of “ML aware” through custom code. And, worse, these platforms often require separating the actions from the code, leading to difficulty in debugging and hard to reproduce caching issues. Our goal is to both give these data scientists tools that are easy to implement and use.
Today, we’re proud to announce a series of GitHub Actions designed to allow people to implement MLOps with just a few configuration settings, but they are flexible enough to support even complicated workflows. Now, by just checking in your code or opening a pull request, you can kick off an entire ML pipeline, recording all information about the process, and updating that model from the actions.
The first five functions we have published are:
NOTE: Though these are all Azure Machine Learning functions, GitHub Actions for MLOps support any cloud.
These actions are based on DevOps principles and practices that increase the efficiency of workflows. For example, continuous integration, delivery, and deployment. We have applied these principles to the machine learning process with the goal of:
Not only that, because the entire MLOps as a service is hosted and run on behalf of the users, it frees up time for the ML Engineers to focus on more business critical issues. Additionally, workflows can be updated and added on the back end without the users even knowing, making maintenance of these pipelines even easier.
To show you end-to-end what this would like, we have a short video:
Let’s walk you through how you would implement something like this.
Using GitHub Actions and Azure Machine Learning
First, you’ll need some initial setup variables. These include:
If you don’t have an Azure subscription, create a free account before you begin. Try the free or paid version of Azure Machine Learning today.
Second, create your own repository from the template. You can do this"Use this template" button in the repo: https://aka.ms/ml-template
Third, you’ll need a service principal with contributor rights to a resource group (either new or existing). To create a new one on Azure, use the Azure CLI on your computer and execute the following command to generate the required credentials:
# Replace {service-principal-name}, {subscription-id} and {resource-group} with your
# Azure subscription id and resource group name and any name for your service principle
az ad sp create-for-rbac --name {service-principal-name} \
--role contributor \
--scopes /subscriptions/{subscription-id}/resourceGroups/{resource-group} \
--sdk-auth
This will generate the following JSON output:
{
"clientId": "<GUID>",
"clientSecret": "<GUID>",
"subscriptionId": "<GUID>",
"tenantId": "<GUID>",
(...)
}
Add this JSON output as a secret with the name AZURE_CREDENTIALS in your GitHub repository:
Please follow this link for more details.
Next, modify the parameters in the /.cloud/.azure/workspace.json file in your repository, so that the GitHub Actions create or connect to the desired Azure Machine Learning workspace.
Once you save your changes to the file, the predefined GitHub workflow that trains and deploys a model on Azure Machine Learning gets triggered. Check the actions tab to view if your actions have successfully run.
Now that you have a running pipeline, you can start modifying the code in the code folder so that the pipeline uses your custom code.
With just a few configuration settings, you can move from zero to an entire code & GitHub Action driven workflow. In addition to the above actions, we are also publishing two templates that include code and workflow definitions for an end to end ML/AI lifecycle.
Please dive into either repo and let us know if there’s anything we can do to help you achieve your goals with MLOps.
Further, though we’ve implemented the first version using Azure Machine Learning, the platform is flexible enough to support most deployment platforms, both on-prem and on any cloud. Just clone our template repo and customize on your own. And make sure to publish your actions to the GitHub Marketplace so that others can use it!
Finally, we very much want to build a community around these actions - please join us at https://aka.ms/ml-template (for the standard template) or https://aka.ms/ml-template-advanced (for the advanced template) to file issues, pull requests and comments about what we can do better. Thank you so much!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.