First published on MSDN on Jan 11, 2018
Running Data Science VM on Azure for your data science and deep learning lab resources
Create the lab infrastructure as follows
Use a shared pool of
Ubuntu Data Science VMs
(with GPUs if doing deep learning) with each student getting a separate account. Recommendation is allocate about 5 students per VM for our deep learning labs.
Automate the bulk creation of Data Science VMs and local access credentials for students. This happens with a post-install script run on each VM after it is created.
Lab assistants can distributed logins to students by printing out the server address and credentials and distributing paper slips or machine can be connected to internal AAD or LDAP for Single Sign On Authentication..
Script the process to bring in the exercises and datasets needed for the lab or researchers.
Share large datasets in a common directory so you avoid large simultaneous downloads.
All examples are runnable in Jupyter notebooks or on terminal and use
to share your tutorial. worksheet and academic resources on a global basis.
The DSVM comes with Jupyter with Python, R, Spark, and Julia kernels. Students access Jupyter notebooks by logging in to JupyterHub with their VM username and password.
Leverage the terminal functionality in Jupyter to open a bash shell right in a browser window.
Students use only a browser to access the VM with their unique login account.
No other software is required on the student’s laptop.
Further detailed instructions and recommedations about setting up a lab resources, including step-by-step instructions and common pitfalls to avoid, are available in the
Data Science VM GitHub
The important scripts to note are:
1. Modify the following
to create multiple VMs and execute a post install script. simply change the desired machine to the type required for your activity.
post install bash script
that creates multiple user accounts on each VM with random passwords for each student.
on your command prompt or bash shell to invoke the ARM template and use the post install script above.
How long does this take to deploy
It typically take 30 minutes to create about 30 GPU based Data Science VMs to support a class of over 200 students and spot check a few VMs.
The students were able to train deep neural network models on shared GPUs. . With the shared infrastructure you are also able to save costs of creating separate VMs for every student. The data science VM is already a popular and robust environment among data scientists and AI developers for their development and experimentation in the cloud. By also using a standardized and familiar environment provided by the Data Science VM in training and education, the learning curve is greatly reduced and students can go on to continue development of their AI apps using the Azure Data Science VM.