GitHub Codespaces is a cloud-based development environment that runs your code in a container. It essentially gives you the ability to move your development workflow to the cloud, and to configure your remote environment such that it looks and feels just like your local development environment.
The advantages of codespaces are endless!
In this post, I’ll start by guiding you through the steps needed to set up your Azure ML projects on Codespaces. Then I’ll discuss the different ways to configure your VS Code setup using settings, including tips on how to best set up your machine learning projects. And finally, I’ll discuss the issues I ran into while enabling Codespaces for my blog, and the solutions to those issues.
In order to get the most out of this post, it’s best that you already have some familiarity with Git, GitHub, VS Code, Python, Conda, and Azure ML.
The steps outlined in this post will help you set up Codespaces for one of your repos. Or you can follow along by looking at my Codespaces Demo repo, which is already configured for Codespaces.
At the time of writing, you have access to Codespaces if your repository belongs to a GitHub organization that has been set up using a “GitHub Team” or “GitHub Enterprise Cloud” plan, and the owner of that organization has enabled Codespaces. You can easily create your own organization to try out this feature — according to the pricing, a monthly “Team” plan is the cheapest way to try it out. If you’re the owner of a GitHub organization, you can enable Codespaces by going to “Settings” in your GitHub organization’s page, “Codespaces,” “General,” and then choosing a “User permissions” setting other than “Disabled.” Codespaces usage pricing is explained in the documentation. Alternatively, if you want to use Codespaces in your individual account, you can join the waitlist for Codespaces beta — this is free for now! The rest of the post assumes that you have access to Codespaces, either through an organization or your individual account.
To start Codespaces for a GitHub repository, go to the main page of that repository, click on the green “Code” button, choose the “Codespaces” tab, and click on “New codespace”.
Depending on how Codespaces is set up, you may be given a choice of machine type. For example, if the owner of your organization decided that everyone within that organization gets a certain machine type (which can be done by going to “Settings,” “Codespaces,” “Policy”), then there is no choice to be made. If, on the other hand, you were given permission to use more than one machine type, a dialog similar to the one below is displayed:
After you make your choice, click on the “Create codespace” button. You should see the following while your codespace is being created:
After a little while, your codespace finishes setting up, and VS Code opens in the browser, together with the code in your repository. This is pretty exciting! You’re now running a fully-fledged version of VS Code, just like the one you’re used to running locally, but in a Docker container, in the cloud!
If you were developing a simple application with common requirements (without the need for Azure ML or any special packages), you’d be done. You could run, debug, and re-run your code as usual. That’s because the base Linux image used in the default Docker container includes many of the most popular languages, including Python, Node.js, and JavaScript. You can take a look at the Dockerfile for this base image to see everything that is included. (You may have noticed that a directory named venv
is created at this point — you can safely add it to your .gitignore
file. If you’re not using the Oryx build environment, you can also delete the oryx-build-commands.txt
file that was generated.)
Our goal is to run an Azure ML application in a codespace, so we need a bit more setup. Thankfully, Microsoft provides us with predefined containers with common configurations, and an intuitive user interface to install additional features. We can easily replace the default container with a more fully-featured container by going to the VS Code Command Palette (Ctrl + Shift + P), and choosing “Codespaces: Add Development Container Configuration Files…”
You will first be asked to select a container configuration definition. In this simple scenario, I’ll choose the “Miniconda” base image, since that’s what I use for all my Python package management:
Alternatively, you could choose the “Azure Machine Learning” base, by clicking on “Show All Definitions” followed by “Azure Machine Learning.” I’ll explain the pros and cons of these two base images later in this post.
You’ll then be asked about the Node.js version for your project. Our scenario doesn’t use Node, so let’s select “none.”
And last, you’ll be asked to select additional features to install. The Azure CLI needs to be installed before we can install its Azure ML extension, so let’s check that box:
If you look at the files in your project, you’ll notice that a new .devcontainer
folder was created, containing several Codespaces configuration files. Let’s start by inspecting the Dockerfile
. I like to keep just the parts of the Dockerfile
I need and remove everything else, but you can keep everything that was generated and add to it as needed. Here are the parts I consider essential:
# See here for image contents: https://github.com/microsoft/vscode-dev-containers/tree/v0.222.0/containers/python-3-miniconda/.devcontainer/base.Dockerfile
FROM mcr.microsoft.com/vscode/devcontainers/miniconda:0-3
# Copy environment.yml (if found) to a temp location so we update the environment. Also
# copy "noop.txt" so the COPY instruction does not fail if no environment.yml exists.
COPY environment.yml* .devcontainer/noop.txt /tmp/conda-tmp/
RUN if [ -f "/tmp/conda-tmp/environment.yml" ]; then umask 0002 && /opt/conda/bin/conda env update -n base -f /tmp/conda-tmp/environment.yml; fi \
&& rm -rf /tmp/conda-tmp
The “Miniconda” choice you made earlier in the setup is reflected in the choice of base image used here. Notice the helpful comment at the very top of the file, pointing to the repo containing the Miniconda base image we’re using.
In addition, the Dockerfile copies the environment.yml
file in the root of the project to the container, and updates the base environment according to the contents of that file. This is my favorite feature of this image — when you open your project in a codespace, all the packages needed to run the project will be installed automatically! Super handy!
Now let’s look at the essential parts of the devcontainer.json
file:
// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.222.0/containers/python-3-miniconda
{
"name": "Miniconda (Python 3)",
"build": {
"context": "..",
"dockerfile": "Dockerfile",
},
// Set *default* container specific settings.json values on container create.
"settings": {
"python.defaultInterpreterPath": "/opt/conda/bin/python",
},
// Add the IDs of extensions you want installed when the container is created.
"extensions": [
"ms-python.python",
"ms-python.vscode-pylance",
...
],
// Comment out to connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
"remoteUser": "vscode",
"features": {
"azure-cli": "latest"
},
...
}
We’ll add a few more settings to this file as we go, but for now you can see that the location of the Dockerfile
is specified first, followed by the location of the python interpreter, the name of the remote user, and additional features. In particular, notice the “azure-cli” feature. This was added when you selected the “Azure CLI” as an additional feature during the setup. The generated devcontainer.json
also contains several settings related to linting and formatting the code, not shown above. Later in this post, I’ll explain what these are for, and why I remove them from this file and set them somewhere else.
Just by choosing a more appropriate base image and additional features, we’ve specified that we want miniconda and the Azure CLI installed in our container. We’re most of the way there, but we still need to install the Azure ML extension to the CLI. It turns out that the devcontainer gives us a hook to execute a command after the container starts, through the onCreateCommand
property. This is exactly what we need — we can simply set the value of this property to the installation command for the Azure ML CLI extension, as shown below:
// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.222.0/containers/python-3-miniconda
{
...
"onCreateCommand": "az extension add -n ml -y"
}
For a full list of all the properties you can add to the devcontainer file, check out the documentation.
Our last step in configuring this codespace is to add VS Code extensions to the devcontainer file. For machine learning projects on Azure, I highly recommend installing the Azure Machine Learning extension, which will give you the ability to manage your Azure ML resources directly from within VS Code, and will enable intellisense for your Azure ML YAML
configuration files. You can find this extension by clicking on the “Extensions” icon on the left side of VS Code, searching for its name, and selecting it.
You can add it to the devcontainer file by clicking on the small gear button, and selecting “Add to devcontainer.json.”
This adds a line to the “extensions” section of the devcontainer.json
file. Here’s the complete file:
// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
// https://github.com/microsoft/vscode-dev-containers/tree/v0.222.0/containers/python-3-miniconda
{
"name": "Miniconda (Python 3)",
"build": {
"context": "..",
"dockerfile": "Dockerfile",
},
// Set *default* container specific settings.json values on container create.
"settings": {
"python.defaultInterpreterPath": "/opt/conda/bin/python",
},
// Add the IDs of extensions you want installed when the container is created.
"extensions": [
"ms-python.python",
"ms-python.vscode-pylance",
"ms-toolsai.vscode-ai",
],
// Comment out to connect as root instead. More info: https://aka.ms/vscode-remote/containers/non-root.
"remoteUser": "vscode",
"features": {
"azure-cli": "latest"
},
"onCreateCommand": "az extension add -n ml -y"
}
You’ll also see an add-notice.sh
file, which contains a legal notice — you can keep it around or delete it. The noop.txt
file prevents your container creation from failing in case you forget to provide an environment.yml
file. Since it’s being referenced by the Dockerfile, we’ll also keep it around for this project. In my day-to-day development environment, since I know I won’t forget the environment file, I simplify my Dockerfile and remove the noop.txt
file. You can see an example of that in my Fashion MNIST repo.
This is all you need to start using Azure ML in a codespace! You’re now ready to rebuild the codespace with these settings, which you can do by going to the Command Palette (Ctrl + Shift + P) and typing “Codespaces: Rebuild Container.”
Once the codespace finishes rebuilding, VS Code reappears in your browser, and your terminal shows a notice saying that you’re using a custom image defined in your devcontainer.json
file:
Nice! You’re now running your code in a Docker container in the cloud, with the Azure CLI and its Azure ML extension installed, your conda environment created and activated, and the Azure ML VS Code extension added. You’re ready to test your codespace.
The accompanying repo contains a simple YAML
file that creates a model in Azure ML when executed. You can use that file to test your Azure ML configuration.
In the rebuilt container, log in using the Azure Account extension. You can do this by clicking the Azure “A” icon in VS Code’s left side bar, and selecting “Sign in to Azure…” You’ll then be presented with the list of Azure accounts available to you, and selecting an account will show a list of available workspaces. You can mark a workspace as default by clicking on the pin to its right.
You’ll also need to login using the terminal, which you can do by executing the following command:
az login --use-device-code
We realize that this double login isn’t ideal, and we’ve been looking into ways to consolidate the two login experiences. Also, notice that when using Codespaces in a browser, we really do need to use az login --use-device-code
to log into Azure, instead of the newer and more convenient az login
command.
Once you’re logged in, find the Azure ML model.yml
file:
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: model-test
version: 1
local_path: "../model/weights.pth"
Then right-click on the opened file, and select “Azure ML: Execute YAML.” In just a few seconds, a new Azure ML model is created in the cloud. You can verify this by going to your Azure ML portal, clicking on “Models,” and checking that a model named “model-test” is listed there. And that’s it! You just ran a basic Azure ML command on Codespaces!
Alternatively, instead of right-clicking on the file, you could have created the model by executing a command in the terminal. If you plan to execute several commands using the CLI, you might want to set default values for your account, resource group and workspace:
az account set -s "<YOUR_ACOUNT>"
az configure --defaults group="<YOUR_RESOURCE_GROUP>" workspace="<YOUR_ML_WORKSPACE>"
Here’s the CLI command that creates the model in the cloud:
az ml model create -f cloud/model.yml
You now have an Azure ML project configured to run on Codespaces. Let’s look at what happens when other people visit your repo on GitHub.
With the work you’ve done, other people who have access to Codespaces can now run your project in a container in the cloud, without doing any of the Azure ML setup! This is what they’ll see when they go to your repo on GitHub:
Followed by a choice of machine type:
And then your code opens in VS Code in the browser, in a pre-configured container, ready to be executed! It’s that simple!
The Dockerfile
and devcontainer.json
configurations I show here contain all the steps you need to run my simple Azure ML demo project. This project relies on the Azure ML CLI V2, but not on the Python SDK; and it uses .py
files instead of notebooks. If you’ve been following my Azure ML posts, you know that these are my personal preferences. If you need support for the Azure ML Python SDK or for Jupyter, I recommend that you take a look at the setup provided by the Azure ML team, which you can find on the team’s GitHub.
Alernatively, instead of choosing the “Miniconda” base earlier in these instructions, you could have selected “Show All Definitions,” followed by “Azure Machine Learning.” This gives you a container with full Anaconda (instead of Miniconda) and the Azure ML Python SDK, as you can see in their Dockerfile. Keep in mind that this base image doesn’t install the Azure CLI and Azure ML CLI extension, so you still need to follow all the steps in this post. Also, this Dockerfile doesn’t update the base conda environment with your conda file, so you’ll want to add that to your Dockerfile.
At the moment, there are pros and cons associated with each of these solutions. We at Microsoft are looking into providing a single Azure ML base image with a complete set of capabilities.
VS Code is highly customizable through the use of a wide range of settings. There are a few different locations where you can add these settings, though, and choosing the right place can make a big difference in how efficient you are at working across projects and machines.
Any time I want to add a new setting to VS Code, I choose one of the following three locations for the setting:
.devcontainer/devcontainer.json
— I add to this file settings that are specific to running the project in a codespace, as we saw in the previous section. This is a good place to set the default Python interpreter path because that path is specific to Codespaces..vscode/settings.json
— I add to this file settings that are specific to the project, regardless of whether I run the project on my local machine or in a codespace. This is where I add my linter and formatter choices, as we’ll see in the next section.To change your VS Code settings, click on the button with a gear-shaped icon at the bottom-left of the VS Code window, and choose “Settings”:
Once you’ve opened the settings editor, you can browse and search for all VS Code settings. By default, any setting that you change in this editor will apply to VS Code on the machine you are currently using. For a consistent development experience across your local machine, Codespaces, and any other machine where you use VS Code, I recommend turning on “Settings Sync.” This enables all your settings to be associated with your GitHub (or Microsoft) account, and it causes them to sync every time you open VS Code as that user. You can turn on “Settings Sync” by clicking again on the VS Code gear button, and then “Turn on Settings Sync…“:
You’ll then be taken to the following dialog, where you can configure which settings you want to sync. I like to keep all the checkboxes checked:
Next click on the “Sign in & Turn on” button, select your account (GitHub or Microsoft), and your settings will be synced. If you have conflicts in settings from different machines, you’ll be given a chance to select which settings you want to prevail. In addition to seeing your preferences in the “Settings” editor, you can also see them in json
format, by going to the Command Palette and selecting “Preferences: Open Settings (JSON).” I like to use the “Settings” tab to browse all the settings that are available to me, and the json
file to quickly glance at all the settings I have customized.
My VS Code user settings are not specific to machine learning projects — they apply to every project! For example, this is where I set the color theme for the VS Code user interface (“workbench.colorTheme”: “Default Dark+”), and where I instruct VS Code to show me differences in whitespace when diffing two files (“diffEditor.ignoreTrimWhitespace”: false).
I’ve been a fan of Settings Sync for a while, because it enables me to re-install Visual Studio code and immediately start working in a familiar environment. But with Codespaces, it’s more important than ever. It plays a big role in ensuring that your cloud environment feels as comfortable as your local one.
In this section, I explained how you can configure your VS Code settings, depending on where you want them to apply. In the next section, I will talk about a few of the VS Code settings I use for my machine learning projects, with a special focus on linting and formatting.
A good choice of linter and formatter makes a world of difference when writing Python code on VS Code!
A typical Python linter analyzes your source code, verifies that it follows the PEP8 official style guide for Python, and warns you of any instances where it doesn’t. The PEP8 style guide provides guidance on matters such as indentation, maximum line length, and variable naming conventions. I like to use Pylint as my linter because in addition to PEP8 style checks, it also does error checking — detecting when I’ve used a module without importing it, for example. Pylint is the most popular linter for Python at the time I’m writing this.
A typical Python formatter auto-formats your Python code according to the PEP8 standard. For example, imagine that you have a line of code that’s longer than the maximum line length recommended by PEP8. Running the linter will give you a warning, but it won’t fix the issue for you. That’s where the formatter comes in: when you run it, it breaks up the code onto multiple lines automatically. I like to use YAPF from Google, because in addition to making sure your code conforms to PEP8, it also makes it look good. In the example I mentioned, YAPF won’t just break up the line so that it doesn’t violate PEP8’s max line length, it also breaks it up so that it’s as easy as possible to read.
I set my linter and formatter settings in the .vscode/settings.json
file within each project, because I may want to customize them per project. Applying them to every project is not the best choice for me because I have some projects that rely on TypeScript and Node.js (such as this blog), and some C# .NET projects, too (like my old blog). But if you write Python exclusively, adding these settings to your user-level VS Code settings might be your best choice. I don’t recommend including them in your devcontainer.json
file because you typically want your development environment to be the same locally and in Codespaces.
Here are the contents of my .vscode/settings.json
file for the Fashion-MNIST project:
{
"python.linting.pylintEnabled": true,
"python.formatting.provider": "yapf",
"editor.rulers": [
80
],
"editor.formatOnSave": true,
}
The first two lines specify my choices of linter (Pylint) and formatter (YAPF). The third line instructs VS Code to display a thin vertical line at character 80, since the max line length recommended by PEP8 is 79 characters. This just helps me to visualize where my code should wrap. The fourth line tells VS Code to run YAPF every time I save my code. This is super handy! I can write my code without worrying about making it pretty, and a simple “Ctrl + S” formats it exactly the way I want it!
Now that we’ve enabled Pylint and YAPF, we need to configure their settings, which we typically do by adding a .pylintrc
file and a .style.yapf
file to the root of the project. The .style.yapf
file that I add to all my projects has the following contents:
[style]
based_on_style = google
The Formatting style section of YAPF’s documentation lists the four base styles supported by YAPF: “pep8”, “google”, “yapf”, and “facebook”. I chose Google’s style because before using YAPF I was already following their very comprehensive Google Python style guide. The YAPF docs contain a lot more information to further customize how you want YAPF to work.
The .pylintrc
file that I use also follows the Google Python style guide, and can be found in this location. Occasionally, I don’t want a particular rule to be enforced, and so I disable it in one of two ways:
.pylintrc
file. For example:
disable=abstract-method,
...
zip-builtin-not-iterating,
# pylint: disable=unused-import
from pysindy.differentiation import FiniteDifference, SINDyDerivative
We discussed here two of the files I add to the root of every project I create. Before we move on, let’s take a brief look at the overall structure I use for all my machine learning projects. They all contain a .devcontainer
folder with a Dockerfile
and devcontainer.json
, as we discussed earlier in this post. They also contain a .vscode
folder containing a launch.json
with the launch configuration(s) I want (which determines what happens when I press F5), and a settings.json
containing any VS Code settings specific to the project. In addition, they contain a folder with the name of the repo, where you can find all my code. And finally, they contain the following files:
.gitignore
file specifying all files and directories that I want git to ignore..pylintrc
file and a .style.yapf
file, which configure the linter and formatter I use for my projects, as I explained earlier in this section.environment.yml
file listing all the packages I need to be installed to run the code. As we saw earlier in this post, we configured the Dockerfile
to install these automatically when the container starts.LICENSE
file. I use an MIT License for all my code because I want to allow everyone to use it for all purposes, including commercial projects.README.md
file, containing instructions to run the code and links to blog posts that explain the code in detail.This is by no means the only way to structure machine learning projects, but it has worked well for me over the years. If you have suggestions on how to improve it, please do reach out.
In this section, I will cover the issues I encountered while adding devcontainers to the projects in my blog. I will keep adding to this list as I find and solve new issues.
One issue I ran into was the following exception on an import cv2
line:
Exception has occurred: ImportError x
libGL.so.1: cannot open shared object file: No such file or directory
The import cv2
line imports the OpenCV computer vision library, and OpenCV requires the libGL library. It turns out that this library doesn’t come pre-installed in my container, although it came pre-installed in my local environment (I use WSL2 locally). Installing the library was easily accomplished by adding the following lines to the Dockerfile
:
# Install dependencies of OpenCV that don't come with it.
RUN apt-get update
RUN apt-get install libgl1 -y
Another issue I encountered was an out-of-memory exception when running one of my projects on an 8 GB machine. The solution was to select a more powerful machine while creating the codespace (16 GB did the trick for me). However, I don’t want my readers to run into the same exception, so I added the following to the devcontainer.json
:
"hostRequirements": {
"memory": "16gb",
},
This ensures that anyone who creates a codespace for this repo will only be presented with machine choices with 16 GB of memory or more.
And finally, I have several files that display matplotlib graphs in popup windows when run locally. This doesn’t work in Codespaces, because a VS Code instance running in the browser can’t open a Windows-style popup window. One workaround is to display the graphs right within VS Code using a Python Interactive Window. You can run a snippet of code in an Interactive Window by placing a # %%
line right before the code and then clicking “Run Cell.” In my scenario, I wanted to run an entire Python file, so I navigated to the Command Palette and selected “Jupyter: Run Current File in Interactive Window.”
I encountered one issue while running my code in an Interactive Window though. The following code threw an exception, because the IPython kernel passes several unexpected command-line arguments to my main function:
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--data_dir", dest="data_dir", default=DATA_DIR)
parser.add_argument("--output_dir", dest="output_dir", default=OUTPUT_DIR)
args = parser.parse_args()
...
Since there’s no way to prevent VS Code from passing IPython arguments when launching an Interactive Window, I wrote a bit of code that detects this situation and sidesteps the problem. One way to see if we’re running in an Interactive Window is to check whether the IPython class name is “ZMQInteractiveShell”, as you can see in the code below:
For example:
def main() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--data_dir", dest="data_dir", default=DATA_DIR)
parser.add_argument("--output_dir", dest="output_dir", default=OUTPUT_DIR)
shell = get_ipython().__class__.__name__
argv = [] if (shell == "ZMQInteractiveShell") else sys.argv[1:]
args = parser.parse_args(argv)
...
And the problem is fixed! Now if I’m running locally, I can press F5 and the graphs show up in popup windows. And if I’m on Codespaces, I can run in an Interactive Window and the graphs are displayed there. Either way, my matplotlib graphs are displayed as expected.
Interactive Windows are exciting, and I expect that they’ll get a lot more use as GitHub Codespaces grows in popularity!
In this post, you learned how to configure your Azure ML projects to run on GitHub Codespaces, where to configure your VS Code settings depending on where you want them to apply, and how to use VS Code settings to set up a linter and formatter for your Python projects. You also saw some of the issues I encountered while configuring my machine learning projects to run on Codespaces, and the solutions I found for them. I hope that you learned something useful, and that you’ll give GitHub Codespaces a try!
I want to thank Banibrata De and Daniel Schneider from the Azure ML team, Rong Lu, Sid Unnithan and Rich Chiodo from the Visual Studio team, and Tanmayee Prakash Kamath from the GitHub team, for helpful discussions about many of the topics in this post.
Bea Stollnitz is a principal developer advocate at Microsoft, focusing on Azure ML. See her blog for more in-depth articles about Azure ML and other machine learning topics.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.