Data transfers - How to move data to / from Gearshift
Firstly and independent of technical options: make sure you are familiar with the code of conduct / terms and conditions / license or whatever it is called and that you are allowed to upload/download a data set! When in doubt contact your supervisor / principal investigator and the group/institute that created the data set.
Your options to move data to/from the Gearshift HPC cluster depend on the protocol you want to use for the upload/download:
- Push data from an external machine to the cluster UI via the jumphost or
Pull data on an external machine from the cluster UI via the jumphost.
Supported protocol:- SSH
- Push data from the cluster UI to an external server or
Pull data on the cluster UI from an external server.
Supported protocols:- SSH
- HTTP(S)
1. Push to or pull from cluster UI via jumphost
- via GUI on Windows
- via GUI on macOS
- via the commandline: see below for rsync over SSH
Using rsync over SSH
-
You can transfer data with
rsync
over SSH to copy files to for example your home dir on the cluster with something like the command below.$your_client> rsync -av some_directory airlock+gearshift:
Note the colon at the end of the
rsync
command:- Without the colon you would copy to a local file named
airlock+gearshift
instead. - If you do not specify a path after the colon you'll transfer data to the default location, which is your home dir.
- Without the colon you would copy to a local file named
-
If you want the data to go elsewhere you'll have to specify where. E.g.:
$your_client> rsync -av some_directory airlock+gearshift:/path/to/somewhere/else/
-
Swap source and destination to pull data from the cluster as opposed to pushing data to the cluster.
2. Push to or pull from another (SSH) server
Using rsync over SSH
When you login from your local computer (via a jumphost) to a server of the Gearshift HPC cluster and next need to transfer data from Gearshift to another SSH server or vice versa, you will need:
- A private key on Gearshift and
- A corresponding public key on the other server.
To get a private key on Gearshift you can
- either create a new key pair on Gearshift
- or temporarily forward your private key with SSH agent forwarding to Gearshift
Configure SSH agent forwarding
First, configure SSH agent forwarding using one of:
Next, login to Gearshift and verify that agent forwarding worked by executing the following command to list the identities (private keys) available to your SSH agent:
$gearshift> ssh-add -l
- You should get a response with at least one key fingerprint, which means you can now transfer data with
rsync
to/from the other server assuming you have an enabled account with public key on the other server and that no firewalls are blocking the connection. - If instead you get
The agent has no identities
orCould not open a connection to your authentication agent
, then the key forwarding failed. This may happen when you were already logged in to the same server without agent forwarding in another active SSH session; make sure you logout from all Gearshift servers in all terminals and try login with agent forwarding again.
Transfer data with rsync
Once you have a private key on Gearshift and can login to the other server using ssh, you can use rsync (over ssh) to pull data from the other server like this:
$gearshift> rsync -av your-account@other-server.some.domain:/path/to/source_folder /path/to/destination_folder/
Swap source and destination to push data to the other server as opposed to pulling data from the other sever.
Using http(s)
For downloads on / uploads from Gearshift over http(s) you can use the commandline tools curl
or wget
.
In case you want to pull from / push to a git repository you can use https URLs with the git
command.