Hey AzML community! The VS Code team is excited to announce version 0.6.15 of the AzML extension, with a brand new way for you to validate your scripts, environments, and datasets before submitting to a remote cluster.
If you'd like to follow along with the blog post and try out the new features, you can install the extension here!
Gaining confidence in your experiment runs
Experiencing a sense of worry or anxiety when submitting a remote experiment is common and expected. It's hard to predict how the training script you've been working very hard on is going to behave once it runs on your remote target. Many of you have expressed pain in not:
Knowing whether the environment you want to use will correctly work with your training script.
Knowing whether your datasets are materialized and being referenced correctly.
Having the confidence to submit your remote experiment and context-switch to another project you're working on.
The VS Code AzML extension team has been working hard over the past few weeks to bring a new capability to alleviate your pains: running a local experiment with an interactive debugging session.
Interactive Debugging with the AML Extension
You might be asking yourself - how is this different from my running my training script in VS Code? Here are some key differences:
The AzML service always uses an environment when submitting a remote run. These environments are materialized as Docker containers. When running a local experiment, the AzML extension will build the same Docker image and same Docker container that's used when running remotely.
Running a Python script normally assumes that you've taken care of data materialization and access. When running remotely, you're recommended to use AzML Datasets giving you the advantage of working with helper functions and configuration options. The extension enables you to configure a local run and work with Datasets the same way in which you would remotely, helping you validate that your dataset is being used correctly.
The extension streamlines setting up an optional debug session when running your experiment. This allows you to set breakpoints and step through your code with ease.
The extension has tightly coupled components of the debugging experience, like the debug console, with your experiment. Expressions you evaluate or print to the console will be written in your 70_driver_log.txt.
Running a local experiment is straightforward and closely resembles the extension's current functionality for submitting a remote run. Here's a summary of the steps for submitting a run.
Right-click on an experiment node in the tree view and choose the Run Experiment option.
Pick the local run option and choose whether you want to debug.
Create a new run configuration or pick a previous created one. The rest of the steps assume the former.
Pick an environment and dataset for your training.
(Only when debugging) Add the debugpy package to your environment. Debugpy is required when running an interactive debug session.
Validate the final configuration options and submit your run.
(Optional) If you've chosen to debug, start the debugger via the prompt or from your run node.
Local Experiment Submission with AML Extension
Congratulations! If you've followed the above steps you've successfully submitted a local experiment and can now confidently proceed to submit a remote run.
For more detailed step-by-step instructions you can follow our docs here.
We're working hard to further improve your run experience from within VS Code, with focus on the following scenarios:
Debugging a single-node remote run on AmlCompute targets.
Streamlining submitting a remote run after succeeding locally.
Streamlining running a local debug experiment from a failed remote run.
If there's anything that you would like us to prioritize, please feel free to let us know on Github.
If you would like to provide feedback on the overall extension, please feel free to do so via our survey.