Data transfers - How to move data to / from Gearshift

Firstly and independent of technical options: make sure you are familiar with the code of conduct / terms and conditions / license or whatever it is called and that you are allowed to upload/download a data set! When in doubt contact your supervisor / principal investigator and the group/institute that created the data set.

data-transfers

Your options to move data to/from the Gearshift HPC cluster depend on the protocol you want to use for the upload/download:

  1. Push data from an external machine to the cluster UI via the jumphost or
    Pull data on an external machine from the cluster UI via the jumphost.
    Supported protocol:
    • SSH
  2. Push data from the cluster UI to an external server or
    Pull data on the cluster UI from an external server.
    Supported protocols:
    • SSH
    • HTTP(S)

1. Push to or pull from cluster UI via jumphost

Using rsync over SSH

  • You can transfer data with rsync over SSH to copy files to for example your home dir on the cluster with something like the command below.

    $your_client> rsync -av some_directory airlock+gearshift:
    

    Note the colon at the end of the rsync command:

    1. Without the colon you would copy to a local file named airlock+gearshift instead.
    2. If you do not specify a path after the colon you'll transfer data to the default location, which is your home dir.
  • If you want the data to go elsewhere you'll have to specify where. E.g.:

    $your_client> rsync -av some_directory airlock+gearshift:/path/to/somewhere/else/
    
  • Swap source and destination to pull data from the cluster as opposed to pushing data to the cluster.

2. Push to or pull from another (SSH) server

Using rsync over SSH

When you login from your local computer (via a jumphost) to a server of the Gearshift HPC cluster and next need to transfer data from Gearshift to another SSH server or vice versa, you will need:

  1. A private key on Gearshift and
  2. A corresponding public key on the other server.

To get a private key on Gearshift you can

  • either create a new key pair on Gearshift
  • or temporarily forward your private key with SSH agent forwarding to Gearshift
Configure SSH agent forwarding

First, configure SSH agent forwarding using one of:

Next, login to Gearshift and verify that agent forwarding worked by executing the following command to list the identities (private keys) available to your SSH agent:

$gearshift> ssh-add -l
  • You should get a response with at least one key fingerprint, which means you can now transfer data with rsync to/from the other server assuming you have an enabled account with public key on the other server and that no firewalls are blocking the connection.
  • If instead you get The agent has no identities or Could not open a connection to your authentication agent, then the key forwarding failed. This may happen when you were already logged in to the same server without agent forwarding in another active SSH session; make sure you logout from all Gearshift servers in all terminals and try login with agent forwarding again.
Transfer data with rsync

Once you have a private key on Gearshift and can login to the other server using ssh, you can use rsync (over ssh) to pull data from the other server like this:

$gearshift> rsync -av your-account@other-server.some.domain:/path/to/source_folder   /path/to/destination_folder/

Swap source and destination to push data to the other server as opposed to pulling data from the other sever.

Using http(s)

For downloads on / uploads from Gearshift over http(s) you can use the commandline tools curl or wget. In case you want to pull from / push to a git repository you can use https URLs with the git command.