Getting started with R on Data Science Virtual Machines
Published May 20 2022 10:05 AM 2,039 Views
Microsoft

The Data Science Virtual Machine (DSVM) is a powerful data science development environment where you can perform data exploration and modeling tasks. The environment comes already built and bundled with several popular data analytics and data science tools that make it easy to get started without spending 20 min to 1 hour deploying a suitable infrastructure.

 

A new learning experience for R developers

Also, we are delighted to announce some exciting new updates which makes R a 1st Class developer experience for learners on DSVMs and on Learn Sandboxes on MS Learn. Starting from April 2022, the DSVM offering has been enriched by DSVM for Windows 2019 v. 22.04.21, DSVM for Ubuntu 20.04 and Ubuntu 18.04 v. 22.04.27, which provide an updated R environment including the following R libraries: Cluster, Devtools Factoextra, GlueHere, Ottr, Paletteer, Patchwork, Plotly, Rmd2jupyter, Scales, Statip, Summarytools, Tidyverse, Tidymodels and Testthat.

 

Getting started with R on DSVMs : a guided tutorial

But how you can start using R on DSVMs in your course or lab to perform data science tasks? Let's go through all the steps you'll need to create a DSVM on Azure and run a R Jupyter notebook. 

 

1. First of all, you'll need an Azure subscription. You have not one yet, have a look on how to sign up for a free trial or to the offers dedicated to your students.

2. Sign in to the Azure Portal and search for "data science virtual machine". Choose one of the resulting offerings by clicking on it. For this tutorial we will use Data Science Virtual Machine - Ubuntu 20.04.

carlottacaste_1-1652785919639.png

3. Choose a resource group and the name of the VM you want to create as well as the Azure subscription on which the machine will be billed. Select the datacenter region closest to your physical location and, for quicker set up, select "Password" as authentication type. Then specify the username and password you'll use to login into your virtual machine.

carlottacaste_0-1652804211736.png

 

Click on Review + create and wait until the deploy is succesfully completed.

4. There are different ways to access your DSVM. One of these is Jupyter Hub, a multiuser Jupyter server. To connect, open a web browser from your local machine and navigate to https://your-vm-ip:8000, replacing "your-vm-ip" with the IP address you can find in the overview section of your resource.

carlottacaste_1-1652806052197.png

 

5. At this point, you can sign in using the credentials you specified at the creation of the resource.

carlottacaste_0-1652804978805.png

6. You're now ready to start coding in R. You may browse the many sample notebooks that are available or you can create a new notebook by clicking on the R kernel button.

carlottacaste_0-1652805999595.png

 

If you want to get more R code examples on data analysis and machine learning you can have a look to the exercise units of this MS Learn path: Create machine learning models with R and tidymodels .

7. Remember to shut down your machine when you are not using it.

 

Note that if you are an educator and you want to use DSVMs for you R course, you have the chance to choose if providing all your students with a single DSVM, by sharing the credentials within the class or to provide every student with a single DSVM, finding the right trade-off for you among costs and flexibility.

 

Keep on learning

The example above covers only one of the possible functionalities enabled by DSVMs. You can also open a session in an interactive R console or coding within RStudio, which is pre-installed in the VM. In addition, you can leverage on other Azure services for data storage and modeling and you can share code with your team by using GitHub and the pre-installed Git clients: Git Bash and Git GUI. Find out more guidance on the DSVM documentation

 

 

Version history
Last update:
‎May 20 2022 06:37 AM
Updated by: