Microsoft Data Science Virtual Machine
Deep Learning Virtual Machine
are customized VM image on Microsoft’s Azure cloud built specifically for doing data science. It has many popular data science and other tools pre-installed and pre-configured to jump-start building intelligent applications for advanced analytics. It is available on Windows Server and Linux.
We offer Windows edition of DSVM on Server 2016 and Server 2012.
We offer Linux editions of the DSVM on Ubuntu 16.04 LTS and CentOS 7.4.
In most scenarios Institutions deploy a specific number of virtual machines and provide students SSH login to the infrastructure to undertake data science activities. Typically I advise 20 student per 1 VM NC6.
On an Azure virtual machine (VM), including the Data Science Virtual Machine (DSVM), you create local user accounts while provisioning the VM. Users then authenticate to the VM by using these credentials. If you have multiple VMs that you need to access, this approach can quickly get cumbersome as you manage credentials. Common user accounts and management through a standards-based identity provider enable you to use a single set of credentials to access multiple resources on Azure, including multiple DSVMs. As most universities now have some form of Active Directory support either with o365, you can use Azure Active Directory (Azure AD) or on-premises Active Directory to authenticate users on a standalone DSVM or a cluster of DSVMs in an Azure virtual machine scale set. You do this by joining the DSVM instances to an Active Directory domain. see
By using the Azure Resource manager (ARM) templates it provide a capability to define extensions for resources like Virtual Machines. VM Extensions are scripts that are run during the deployment of a VM to install additional pieces of software or reconfigure the VM to your needs or to comply with specific IT policies your company may have.
Ubuntu Linux (Has two disks it comes with in the VM image):
Disk=100 GB disk (mounted on /data).
FREE Space about 20 GB free on main disk. There should still be about 64GB empty space on the 2
Windows 2016 DSVM:
One OS disk only which is 127GB.
There should have about 40GB free.
Re storage on the VM there are a few options on the Linux DSVM itself and via mounted storage. Most VMs types also have a temporary (non persistent across reboots) local SSD. In fact, this is a good place for scratch disk and is often much faster than the OS disk which are typically on an external blob (in same data center).
Lastly, you also have a very large local non-persistent SSD. You can copy large datasets here and use it from local SSD. Only thing to note is that it may not survive reboots (so you have to redownload the file back to local SSD upon reboots or back it up on a blob or somewhere). In fact this is the fastest disk you have on the VM since it is local and an SSD.
lsds11 datajam VM deallocated lsds11.uksouth.cloudapp.azure.com uksouth
Giving Access to DSVM via a Jupyter Notebook
You can also simply allow your student to use Azure Notebooks
to build their Jupyter Notebook Experiments and then run the experiments on the Data Science VMs which have built. If you have an o365 account provided by your school or University, Azure Notebooks will discover the DSVMs that you have access to and list them into a menu under the run button
Choose the DSVM that you would like to connect to and enter the user name and password of the Linux account on the DSVM.