How to find and work with Solve-RD data on Hyperchicken

First: getting an account and starting a session on a User Interface (UI) server

In order to work with Slurm and manage jobs on the cluster you'll need a personal account and start a session on a User Interface (UI) server. If you are completely new here, please:

Using the EGA Python API with pyEGA3.

To be able to use pyEGA3 you need a username and password provided by the EGA. These EGA credentials are not the same as your account for the Hyperchicken cluster and must be requested separately.

Add your credentials into a credentials.json file, which is then used during data transfer. An example of the syntax for this credentials.json file is located in the PyEGA3 repository at GitHub. For more on PyEGA3 see the example code snippets below and the README.md in the root of the repo.

Example for downloading all files corresponding to specific sample from an EGA data set accession number.

echo 'Loading module pyEGA3 ...'
module load pyEGA3

output_dir='/groups/solve-rd/tmp10/username/yourdir'
ega_data_set_accession='EGAD00001005352'
sample_id='E577011'

mkdir -p -v "${outputdir}"

echo "Looking up all sampleIDs for data set ${ega_data_set_accession} and storing them into a ${ega_data_set_accession}.tmp file ..."
pyega3 -cf credentials.json files "${ega_data_set_accession}" > "${ega_data_set_accession}.tmp"

echo "Lookup all EGA file accession numbers corresponding to sample ID ${sample_id} ..."
grep "${sample_id}" "${ega_data_set_accession}.tmp" | awk '{print $1}' >> "${sample_id}.tmp"

echo "Staging all files for sample ${sample} to "${output_dir}" ..."
for ega_file_accession in $(cat "${sample_id}.tmp")
do
    pyega3 -cf credentials.json -c 10 fetch "${ega_file_accession}" --saveto "${output_dir}"
done

Or see more extended examples here: /groups/solve-rd/prm*/example_scripts