Using Microsoft Azure Data Science Virtual Machines in your teaching and Research Labs
Published Mar 21 2019 10:01 AM 698 Views
Microsoft
First published on MSDN on Jan 11, 2018

Running Data Science VM on Azure for your data science and deep learning lab resources

Create the lab infrastructure as follows

  • Use a shared pool of Ubuntu Data Science VMs (with GPUs if doing deep learning) with each student getting a separate account. Recommendation is  allocate about 5 students per VM for our deep learning labs.

  • Automate the bulk creation of Data Science VMs and local access credentials for students. This happens with a post-install script run on each VM after it is created.
  • Lab assistants can distributed logins to students by printing out the server address and credentials and distributing paper slips or machine can be connected to internal AAD or LDAP for Single Sign On Authentication..
  • Script the process to bring in the exercises and datasets needed for the lab or researchers.
  • Share large datasets in a common directory so you avoid large simultaneous downloads.
  • All examples are runnable in Jupyter notebooks or on terminal and use http://notebooks.azure.com to share your tutorial. worksheet and academic resources on a global basis.
  • The DSVM comes with Jupyter with Python, R, Spark, and Julia kernels. Students access Jupyter notebooks by logging in to JupyterHub with their VM username and password.
  • Leverage the terminal functionality in Jupyter to open a bash shell right in a browser window.
  • Students use only a browser to access the VM with their unique login account.
  • No other software is required on the student’s laptop.

Further detailed instructions and recommedations about setting up a lab resources, including step-by-step instructions and common pitfalls to avoid, are available in the Data Science VM GitHub .

The important scripts to note are:

1. Modify the following ARM template to create multiple VMs and execute a post install script. simply change the desired machine to the type required for your activity.

"vmSize": {

"type": "string",

"defaultValue": "Standard_DS3_v2",

"allowedValues": [

"Basic_A3",

"Standard_A2_v2",

"Standard_A4_v2",

"Standard_A8_v2",

"Standard_DS2_v2",

"Standard_DS3_v2",

"Standard_DS4_v2",

"Standard_DS12_v2",

"Standard_DS13_v2",

"Standard_DS14_v2",

"Standard_NC6",

"Standard_NC12",

"Standard_NC24"

2. Sample post install bash script that creates multiple user accounts on each VM with random passwords for each student.

3 Use Azure CLI on your command prompt or bash shell to invoke the ARM template and use the post install script above.

How long does this take to deploy

It typically take 30 minutes to create about 30 GPU based Data Science VMs to support a class of over 200 students and spot check a few VMs.

Benefits

The students were able to train deep neural network models on shared GPUs. . With the shared infrastructure you are also able to save costs of creating separate VMs for every student. The data science VM is already a popular and robust environment among data scientists and AI developers for their development and experimentation in the cloud. By also using a standardized and familiar environment provided by the Data Science VM in training and education, the learning curve is greatly reduced and students can go on to continue development of their AI apps using the Azure Data Science VM.

Resources


Version history
Last update:
‎Mar 21 2019 10:01 AM
Updated by: