I know many data scientists, including myself, who do most of their work on a GPU-enabled machine, either locally or in the cloud, through Jupyter Notebooks or some Python IDE. During my two years as AI/ML software engineer that is exactly what I was doing, preparing data on one machine without GPU, and then using GPU VM in the cloud to do the training.
On the other hand, you have probably heard of Azure Machine Learning - a special platform service for doing ML. However, if you start looking at some getting started tutorials, you will have the impression that using Azure ML creates a lot of unnecessary overhead, and the process is not ideal. For example, the training script in the example above is created as a text file in one Jupyter cell, without code completion, or any convenient ways of executing it locally or debugging. This extra overhead was the reason we did not use it as much in our projects.
However, I recently found out that there is a Visual Studio Code Extension for Azure ML. With this extension, you can develop your training code right in the VS Code, run it locally, and then submit the same code to be trained on a cluster with just a few clicks of a button. By doing so, you achieve several important benefits:
I hope you are convinced to try Azure ML yourself! Here is the best way to start:
Everything in Azure ML is organized around a Workspace. It is a central point where you submit your experiments, store your data and resulting models. There is also a special Azure ML Portal that provides web interface for your workspace, and from there you can perform a lot of operations, monitor your experiments and metrics, and so on.
az extension add -n azure-cli-ml az group create -n myazml -l northeurope az ml workspace create -w myworkspace -g myazml
Workspace contains some Compute resources. Once you have a training script, you can submit experiment to the workspace, and specify compute target - it will make sure the experiment runs there, and stores all the results of the experiment in the workspace for future reference.
In our example, we will show how to solve very traditional problem of handwritten digit recognition using MNIST dataset. In the same manner you will be able to run any other training scripts yourself.
Our sample repository contains simple MNIST training script
train_local.py. This script downloads MNIST dataset from OpenML, and then uses SKLearn
LogisticRegression to train the model and print the resulting accuracy:
mnist = fetch_openml('mnist_784') mnist['target'] = np.array([int(x) for x in mnist['target']]) shuffle_index = np.random.permutation(len(mist['data'])) X, y = mnist['data'][shuffle_index], mnist['target'][shuffle_index] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42) lr = LogisticRegression() lr.fit(X_train, y_train) y_hat = lr.predict(X_test) acc = np.average(np.int32(y_hat == y_test)) print('Overall accuracy:', acc)
Of course, we are using Logistic Regression just as illustration, not implying that it is a good way to solve the problem...
You can just run this script locally and see the result. If we chose to use Azure ML, however, it will give us two major benefits:
from azureml.core.run import Run ... try: run = Run.get_submitted_run() run.log('accuracy', acc) except: pass
Modified version of our script is called
train_universal.py (it is just a bit more complicated than the code presented above), and it can be run both locally (without Azure ML), and on remote compute resource.
To run it on Azure ML from VS Code, follow those steps:
Make sure your Azure Extension is connected to your cloud account. Select Azure icon in the left menu. If you are not connected, you will see a notification on the right bottom offering you to connect (see picture). Click on it, and sign in through browser. You can also press Ctrl-Shift-P to bring up command palette, and type in Azure Sign In.
After that, you should be able to see your workspace in the MACHINE LEARNING section of Azure bar:
Here you should see different objects inside your workspace: compute resources, experiments, etc.
Go back to the list of files, and right-click on
train_universal.py and select Azure ML: Run as experiment in Azure.
Confirm your Azure subscription and you workspace, and then select Create new experiment:
Create new Compute and compute configuration:
Now you know that submitting runs to Azure ML is not complicated, and you get some goodies (like storing all statistics from your runs, models, etc.) for free.
You may have noticed that in our case the time it takes for the script to run on the cluster is more than running locally - it may even take several minutes. Of course, there is some overhead in packaging the script and all environment in a container, and sending it to the cloud. If the cluster is set to automatically scale down to 0 nodes - there might be some additional overhead due to VM startup, and all that is noticeable when you have a small sample script that otherwise takes a few seconds to execute. However, in real life scenarios, when training takes tens of minutes and sometimes much more - this overhead becomes barely important, especially given the speed improvements you can expect to get from the cluster.
Now that you know how to submit any scripts for execution on a remote cluster, you can start taking advantage of Azure ML in your daily work. It will allow you to develop scripts on a normal PC, and then schedule it for execution on GPU VM or cluster automatically, keeping all the results in one place.
However, there are more advantages from using Azure ML than just those two. Azure ML can also be used for data storage and dataset handling - making it super-easy for different training scripts to access the same data. Also, you can submit experiments automatically through the API, varying the parameters - and thus performing some hyperparameter optimization. Moreover, there is a specific technology built into Azure ML called Hyperdrive, which does more clever hyperparameter search. I will talk more about those features and technologies in my next posts.
You may find the following courses from Microsoft Learn useful, in case you want to know more:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.