Using Multiple R servers for distributed processing

Purpose of this blog is to demonstrate how to use R-servers in a distributed manner i.e how to access multiple R servers simultaneously for reasons of distributed processing, performance enhancement.

We have used Java and R integration for this approach.

For calling R on multiple servers from our java code we have used following –
RServe package for R
JARS to be include in java project are — REngine and RserveEngine (you can download these jars from http://www.rforge.net/Rserve/files/)
– Eclipse IDE for Windows.
– Amazon EC2 machines(with Linux platform) having R server installed on it. (For setting up EC2 machines you can refer http://www.louisaslett.com/RStudio_AMI/)

 

Brief description about Rserve Package

RServe is an R package that allows other applications to talk to R using TCP/IP. It creates a socket server to which other
applications can connect. RServe provides client implementation for common languages such as C/C++, Java and PHP.

 

Features of RServe are –

-The default port for RServe is 6311
-The R objects are transferred as binary object between the server and the client. Files can be transferred between the client and the server
-Each connection has its own working directory and namespace.
-The client converts R objects to java objects. The java client has classes such as RBool, RList etc.
-Multiple connections can access RServe in thread-safe manner. However, multiple eval calls using the same connection are not thread safe.

 

STEPS –

For every machine where R server is installed and you want to access it remotely(for distributed access) from your java application you have to setup the R on it in the following manner:

A. Setting up R

1. First you need to install Rserve package in R using R CMD.For this,access the EC2 machine with putty client.
2. After logging to EC2 machine with putty client ,type R in the putty terminal and R CMD will open in it.
3. Now install Rserve package in R ,command for it is “install.packages(‘Rserve’)”. (ignore double quotes)
4. After successfully installing Rserve package,load it in your R using command “library(‘Rserve’)”.
5. Before we start Rserve on our machine we should do some configuration settings for allowing remote access to aur Rserver and for that create a file in “etc” folder of your linux EC2 ,name it “Rserv.conf” and write this one line in it “remote enable”.(ignore quotes)
6. Now we have to start the Rserve,for it type command “Rserve()” in R CMD and Rserve will start running as a daemon.

B. Creating a java project to access R Servers

We have used Windows platform and Eclipse IDE for creating java application to access R and steps for it are:
1. First create new java project in a workspace.
2. Right click on project in package explorer go to “Build Path -> Add External Archives” and select both jars — REngine and RserveEngine.
Both of these will be added to the build path of your java project.
3.  Now you can create a sample program like below

I have created two classes and used multithreading for simultaneously accessing two machines running R servers :

//<strong>MultiRCaller.java</strong>
import org.rosuda.REngine.REXP;
import org.rosuda.REngine.Rserve.RConnection;
public class MultiRCaller {

public static void main(String[] args) {
NodeServer n1=new NodeServer("xxx.xxx.xxx.xxx"); //use public ip of your first ec2 machine as an argument
n1.start(); //for starting thread
NodeServer n2=new NodeServer("xxx.xxx.xxx.xxx"); //use public ip of your second ec2 machine as an argument
n2.start();
}
}


//<strong>NodeServer.java</strong>
import org.rosuda.REngine.REXP;
import org.rosuda.REngine.Rserve.RConnection;
public class NodeServer implements Runnable
{
private String IP_Address;
private Thread t;
<p style="padding-left: 30px;">            NodeServer(String ip)
{
IP_Address = ip;
}</p>
public void run()
{
try
{
RConnection c=null;
c = new RConnection(this.IP_Address);
System.out.println("Connection established successfully with :"+IP_Address);
REXP x=new REXP();
x=c.eval("getwd()"); //for getting working directory path
System.out.println("Working Directory: "+x.asString());

//you can do your tasks here.
}
catch(Exception e)
{
e.printStackTrace();
System.out.println(e.getMessage());
}

System.out.println("Operation with " +IP_Address+ " completed so, exiting.");
}

public void start ()
{
if (t == null)
{
t = new Thread (this, IP_Address);
t.start ();
}
}

}

On runnig this project as Java application you will get output similar to following in console

OUTPUT:
Connection established successfully with :”public IP 1″
Connection established successfully with :”public IP 2″
Working Directory: /tmp/Rserv/conn1292
Operation with “public IP 1″ completed so, exiting.
Working Directory: /tmp/Rserv/conn1395
Operation with “public IP 2″ completed so, exiting.

 

All of these steps were conducted on the Amazon EC2 cloud, following are the suggested settings for EC2 machines –
1. Make an entry in a rule list of the security group of your EC2 machine.
2. Goto your EC2 Management console on AWS and select your EC2 instance.
3. Goto “Security Groups” tag in the description below.
4. Click on security group and then edit that security group to add rules for Inbound and Outbound traffic.
5. Add a new rule with these specs:
Type:All traffic
Protocol:All
Port Range:0-65535
Source:Anywhere
6. Then click save.

One thought on “Using Multiple R servers for distributed processing

Leave a Reply

Your email address will not be published. Required fields are marked *


nine + = 12

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>