One of the most distinctive features of Data Science, as opposed to working with databases, Business Intelligence or other data professions, is its heavy use of statistical methods. At the first appearance of computing science, programs and algorithms were created to deal with the large amounts of calculations required in statistics.
One of those implementations was the “S” programming language, invented in the mid-1970’s. Based on those concepts, the “R” environment was created by Ross Ihaka and Robert Gentleman in New Zealand under the GNU license. Interestingly, it’s written in C, Fortran, and R itself. It’s one of the premier languages and environments you can use in Data Science. It has amazing language breadth, and it can be extended through the use of “packages” – there are SO many packages out there, your first task in using R, it seems, is learning what is already written so you can simply leverage it.
In future entries we’ll explore working with R, but for now, we need to install it. That really isn’t that difficult, but it does bring up something we need to deal with first. While the R environment is truly amazing, it has some limitations. It’s most glaring issue is that the data you want to work with is loaded into memory as a frame, which of course limits the amount of data you can process for a given task. It’s also not terribly well-suited for parallelism – many things are handled as in-line tasks. And if you use a package in your script, you have to ensure others load that script, and at the right version.
When Microsoft acquired Revolution R we updated and renamed it to Microsoft R - So that’s what we’ll install. Microsoft R is based on, and is 100% compatible with Open-Source R, which is one of the most widely used statistics software in the world. It's fully compatible with all packages, scripts and applications that work with R. It also includes additional capabilities for improved performance, reproducibility
Note that the server versions of this product have also been re-written and rebranded as Microsoft Machine Learning Services - which can be installed stand-alone or alongside SQL Server. This server also includes a Python distribution, and several library and other enhancements. Read more about that here.
Installing Microsoft R Open
There are two versions of Microsoft R - one is the "MRE" (Microsoft R for Enterprise) and the other is "MRO" (Microsoft R Open). MRO is the free installation we'll deal with in this article. We’ll install on Windows, but Ubuntu, RedHat, and SuSE Linux is also supported.
For this installation, we can take all the defaults. I did add the icon to the quick launch area, since I plan to be in R quite a bit. After we install the main RRO package, we’ll also want the enhanced math libraries. You can see that at the download site, just to the right of the installation for the Windows package we just launched. Click that “MKL” link, and once again, take all the defaults.
Exploring the Tools
Once installed, you'll find a new MRO folder, with an icon for the Microsoft RGUI. It's a rather primitive graphical tool for working with R, but it's fine to work with at the start.
Opening that tool brings us to the R Console, with MRO loaded up and ready. We can now run some simple commands, like these: