First published on MSDN on Apr 19, 2017
MicrosoftML
is a new package for Microsoft R Server that adds state-of-the-art algorithms and data transforms to Microsoft R Server functionality. The
MicrosoftML
package was available in
Microsoft R Server for Windows
and in
SQL Server vNext.
Now we bring the power of these algorithms to Spark and Hadoop.
Training on a Hadoop/Spark cluster occurs in a parallel manner on worker nodes using ensembling. You can also use ensembling to combine multiple models will be the subject of a future blog post.
Below is an example of how to use regression using
rxFastTrees()
in Spark.
In this example, we will:
-
Create a connection to Spark using rxSparkConnect()
-
Create a text data source from the AirlineDemoSmall.csv data that you should have copied to HDFS at /user/RevoScaleR/<your username> directory
-
Build an ensemble model using rxFastTrees(). To use Spark, you need to pass in an ensemble object to rxFastTrees(). The Spark cluster is then utilized to build the ensemble models in parallel
For a comprehensive view of all the capabilities in Microsoft R Server 9.1, refer to
this blog
.
Authors: Regi John and Premal Shah