RStudio - How to use a local IDE for R to crunch data in R on Gearshift
RStudio is an Open Source and free Integrated Development Environment (IDE) for R. This allows you to work in Graphical User Environment (GUI) in addition to a terminal, which makes it easier when working with data interactively especially when working with graphs / plots as you can view them directly. If you work only on the commandline on the cluster, you would have to create a graph/plot there, copy the image file to your local desktop/laptop and then open the image on your local computer to view the image.
Note that there is also a paid commercial version of RStudio with extra bells and whistles, but the free version should suffice to work with R on Gearshift.
There are several flavors of RStudio and ways to work with them, but we can only support one of them.
How to NOT use RStudio
Do not run the normal a.k.a. non-server version of RStudio on Gearshift sending the graphical (X11) output to your local client computer. Well, it may "work", but the lag in the GUI will result in a very frustrating experience.
Do not run RStudio server on Gearshift, which is accessed using your local web browser. This is a security risk due to the weak security implementation in RStudio server. Therefore we will not install the server edition.
How to use RStudio
You can use RStudio by:
- Running the normal a.k.a. desktop a.k.a. non-server version of RStudio on your local computer and
- using the
remoteRR-package to create an R-session on Gearshift
- off loading the data processing from your local computer to Gearshift via the remote R-session created with
This setup will ensure:
- The original data sets can stay on Gearshift
- Communication between your local client computer and Gearshift is encrypted via the SSH protocol.
- Code to crunch data can be written on your local client and will be send to Gearshift for analysis of the data sets.
- Result files also stay on Gearshift.
Data access policies may differ, but in most cases it is allowed to download aggregate results to your local client. E.g. if you create a bar plot of the median of variable X for different cohorts, this will not contain any data for an individual sample. Using RStudio to look at such a plot would be compatible with such a data access policy.
Note that if you include measurements for individual samples to the plot - e.g. you include outliers and label them with sample IDs - then you may not be allowed to download that to your local computer.
1. Install software on your own computer
- Install a set of compilers and standard libraries, so you can compile R packages from source code
- For Linux: consult the documentation of your distro.
- For Windows: install Rtools
- For macOS: install Xcode either from the App store or using the
xcode-select --installcommand in the Terminal.
2. Login on Gearshift
3. Create a screen or tmux session on Gearshift
Optionally start a
tmux is beyond the scope of this documentation, but highly recommended.
It allows you to disconnect from Gearshift while leaving your session running,
so you can re-login and re-connect later.
tmux a dropped network connection will result in loosing the session and loosing any unsaved work in it.
There is a good article How to Use Linux’s screen Command on the How-To Geek website.
-S argument to give your
screen session a name. E.g. to create one named
screen -S rstudio
4. Create an interactive Slurm session on Gearshift
See Crunch - How to manage jobs on Gearshift for details.
Simple example requesting a single core and 1 GB RAM memory for max one hour:
srun --cpus-per-task=1 --mem=1gb --nodes=1 --qos=interactive --time=01:00:00 --pty bash -i
5. Load and start R in your interactive Slurm session on Gearshift
module load RPlus R
If you loaded
RPlus in the previous step this contains a bare
and a large list of R packages including
remoteR and its dependencies.
If you prefer to load a bare
R installation and manage your own list of R packages you will need to
and its dependencies in your R session yourself using:
install.packages('sodium') # Latest release no longer available from CRAN. #install.packages('remoter') # Workaround: install latest release from GitHub install.packages('remotes') library('remotes') remotes::install_github('wrathematics/argon2', ref = github_release()) remotes::install_github('RBigData/remoter', ref = github_release())
When you are asked if you would like to install in your own
library folder, then answer
6. Start a remoteR server session in your R session on Gearshift
Generate a random password to secure your
remoteR session and start
remoteR in server mode.
library('sodium') library('remoter') remoter_session_password <- rawToChar(as.raw(sample(c(65:90,97:122), 10, replace=T))) message(sprintf('Your password for remoteR is: %s', remoter_session_password)) remoter::server(verbose = TRUE, password = remoter_session_password, secure=TRUE)
remoteR package will now create a session and report on which machine it is running and which port it selected.
Each session needs its own unique port and
remoteR will automatically select one that is free.
You will need this port number and the machine name later on when creating the SSH tunnel...
[2023-05-10 12:05:47]: *** Launching secure server *** Hostname: gs-vcompute07 Port: 55555
Optionally in case you are running this inside
CTRL+a followed by
CTRL+d to detach from the
R session inside the interactive Slurm job inside the
screen session will continue to run in the background.
You can reconnect to the existing screen session using
-r argument and the name of the
screen -r rstudio
7. Create an SSH tunnel on your own computer
Now we need to create an SSH tunnel from your local client computer to the server and connect to the remote R session.
For Windows clients
The instructions below we assume you use MobaXterm to connect to Gearshift as described in SSH config and login to UI via Jumphost for users on Windows.
- 1: Select MobaSSHTunnel from the Tools menu.
- 2: Click the New SSH tunnel button.
- In the popup window fill in / select:
- 3: Local port forwarding
- 4: Remote server field: The hostname of the compute node where
- 5: Remote port field: The port number chosen by
remoteRon the server side.
- 6: SSH server field: Use jumphost address airlock.hpc.rug.nl
- 7: SSH login field: Use your account name as you received it by email from the helpdesk.
- 8: SSH port field: 22
- 9: Forwarded port field: The port number you chose on the client side.
For the Forwarded port you have to chose a free port yourself. Hence, it must be a port that is not yet used by another process. We suggest you simply try to use the same number as the one that
remoteRselected on the server side. If that one does not work because it is already taken, simply increment by one and retry until you found one that is free.
- 10: Click the Save button.
- 11: Give the tunnel config a name.
- 12: Click the key icon to select your private key file.
- 13: Click the play button to start the SSH tunnel.
For Linux & macOS clients
ssh command in a terminal to create an SSH tunnel from your local machine via jumphost airlock
to the Gearshift compute node on which
remoteR is running in your interactive Slurm session.
The general syntax is
ssh -N -L localhost:<port_number_on_client>:localhost:<port_number_on_server> airlock+<hostname_of_remoter_server>
The example below uses the same port number on the client side as the one chosen by
remoteR on the server side.
ssh -N -L localhost:55555:localhost:55555 airlock+gs-vcompute07
If you get a message like this:
bind [127.0.0.1]:55555: Address already in use channel_setup_fwd_listener_tcpip: cannot listen to port: 55555 Could not request local forwarding.
then the selected port on the client side is not free and already used by another process. Try the next port number until you find one that is free: in that case you will not get any message.
ssh -N -L localhost:55556:localhost:55555 airlock+gs-vcompute07
8. Install remoteR in RStudio on your own computer
install.packages('sodium') # Latest release remoteR no longer available from CRAN. #install.packages('remoter') # Workaround: install latest release from GitHub install.packages('remotes') library('remotes') remotes::install_github('wrathematics/argon2', ref = github_release()) remotes::install_github('RBigData/remoter', ref = github_release())
9. Using the remote R session in RStudio on your own computer
In RStudio on your own computer you can now connect to the remote R session using:
library('sodium') library('remoter') remoter::client('localhost', port = port_number_on_client, password = '******')
port_number_on_clientwith the port number you chose for the SSH tunnel on the client side
******with the password generated on the server side when starting the
To make plots you have to call
dev.off() after each plot you make.
This is the signal for the
remoteR package to send your plot to your local computer.
The default resolution of the plots is quite low. This can be resolved by increasing the width and height before each plot. E.g. something like this
rpng(width = 1000, height = 1000)
An example plot to test the plotting functionality can be created with
rpng(width = 1000, height = 1000) dat<- data.frame(t=seq(0, 2*pi, by=0.1) ) xhrt <- function(t) 16*sin(t)^3 yhrt <- function(t) 13*cos(t)-5*cos(2*t)-2*cos(3*t)-cos(4*t) dat$y=yhrt(dat$t) dat$x=xhrt(dat$t) with(dat, plot(x,y, type="l")) with(dat, polygon(x,y, col="hotpink")) points(c(10,-10, -15, 15), c(-10, -10, 10, 10), pch=169, font=5) dev.off()
10. Cleaning up
Don’t keep R running forever on Gearshift! Make sure to really exit your session on Gearshift when you are done to prevent wasting resources, which is not nice for others waiting in the queue.
q()to quit the
CTRL+xto exit your interactive Slurm job.
First re-attach to your screen session if you were detached. Next, use
q()to quit the
CTRL+xto exit your interactive Slurm job.