First published on MSDN on Aug 07, 2017
R Server on HDInsight cluster
allows R scripts to use Spark and MapReduce to run distributed computations. You can develop a model and operationalize the model to make predictions by
configuring Edge Node as One-Box
. We have
One-Click Deploy ARM Templates
using which you can create R Server on HDInsight Cluster with EdgeNode configured as One-Box. If you already have R Server on HDInsight Cluster, you can
configure operationalization using Admin Utility
.
Once you have configured Operationalization, you can use the
mrsdeploy
package on your local machine
Microsoft R Client
to connect to the Operationalization on edge node and start using its features like
remote execution
and
web-services
.
This article focuses on how to connect to Operationalization feature depending on whether your cluster is set up on a virtual network or not.
RServer Cluster on virtual network
-
Create
a Virtual Network + Subnet before creating the HDI cluster (The VNET and the HDI Cluster must be created in the same resource group).
-
When defining the HDI cluster via the Azure Portal, go into the “
Advanced Settings
” and specify to use the VNET and Subnet that were previously defined.
-
Once the cluster is created, select the resource group that was used, and select the “virtual network” that was created.
-
From there, a list of the attached devices will be shown. Select the edge node from the list, and then select IP configurations.
-
Select the default IP configuration assigned to the edge node, and then change “Public IP Address” from disabled to enabled.
-
Create a new Public IP Address and select it. Click the “save” button to save the changes.
-
Go back to the “edge node” and select “network security group” It should show “none”. Select the network security group, and add an “
inbound security rule
” that opens port 12800.
Now, the edge node has a public IP address and port 12800 open. You can connect to it using remoteLogin() from mrsdeploy package like this :
library(mrsdeploy)
remoteLogin(
deployr_endpoint = "http://<EDGE NODE PUBLIC IP>:12800",
username = "admin",
password = "xxxxxxx"
)
RServer Cluster not set up on virtual network
If your cluster is not set up on VNET, you can use
SSH port forward tunneling
to connect to port 12800 of edgenode.
The SSH port forward tunneling method might be unworkable if you are invoking Operationalization feature from
Azure Data Factory
,
Azure Functions
,
Azure API Management
etc. In this scenario, we can use
ngrok
to expose port 12800 on edge node to the internet.
SSH into the edge node and run the following commands to install ngrok and open port 12800 :
wget
https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
unzip ngrok-stable-linux-amd64.zip
-
Explore ngrok commands, check version 2.2.8
./ngrok help
./ngrok version
-
Expose port 12800 http protocol. The following command will start a background process exposing port 12800 to the internet using a public URL. Logs will be directed to ngrok.log file. All incoming http requests will be logged in this file.
./ngrok http 12800 --log=stdout > ngrok.log &
-
You can find this public URL using the following command.
curl http://localhost:4040/api/tunnels | json_pp
-
The above command prints a json. Find the value of “public_url” property (will look like below URL)
"public_url" : "
https://fea660ec.ngrok.io"
Now from your application code, you can connect to Operationalization feature using this public_url. Sample code to connect using remoteLogin() function in mrsdeploy package:
library(mrsdeploy)
remoteLogin(
deployr_endpoint = "
https://fea660ec.ngrok.io",
username = "admin",
password = "xxxxxxx"
)