First published on MSDN on Feb 12, 2017
If you are running
Microsoft R Server
Microsoft R Client
from a Windows computer equipped with
, you can create a compute context that will run
from your local client in a distributed fashion on your Hadoop cluster. You use
to create the compute context, but use additional arguments to specify your user name, the file-sharing directory where you have read and write access, the publicly-facing host name or IP address of your Hadoop cluster’s name node or an edge node that will run the master processes, and any additional switches to pass to the ssh command (such as the -i flag if you are using a ppk file for authentication).
RxSpark assumes that the directory containing the plink.exe command (PuTTY) is in your path. If not, you can specify the location of these files using the sshClientDir argument.
In some cases, you may find that environment variables needed by Hadoop are not set in the remote sessions run on the sshHostname computer. This may be because a different profile or startup script is being read on ssh login. You can specify the appropriate profile script by specifying the sshProfileScript argument to RxSpark (this should be an absolute path).
Here is an example of using remote Spark Compute Context:
Using RClient 3.3.2/MRS 9.0.1, you might get the following error messages while running the above program:
Error in rxFindWindowsSsh("plink.exe", "ssh.exe", sshClientDir = "") : Could not find neither plink.exe nor ssh.exe.
plink: unknown option \"-p\"
We have released Hotfix to address the above error messages. You can download the fix and instructions
Few other possible error messages and troubleshooting steps:
Error in checkSshReturnMessage(printOut) : Access denied. Please validate the host connection credential file.
Error in checkSshReturnMessage(printOut) : Host key verification failed. Please validate host connection using the specified ssh tool directly and try again.
Error: Fail to fetch remote MRS version; text output is: Unable to open connection:, Host does not exist
If you get the above error messages, ensure the following:
The public key is set properly in /home/sshuser/.ssh/authorized_keys file. Check for extra spaces, newline, extra characters etc and remove them.
Key comment property of the public key does not have a space since OpenSSH's SSH-2 key format contains no space for a comment
sshuser has read access to /home/sshuser/.ssh/authorized_keys
azureuser has read access C:\Users\azureuser\Documents\privateKey.ppk
Connection to the edge node is successful using PuTTY.