How to find and work with Solve-RD data on Hyperchicken
First: getting an account and starting a session on a User Interface (UI) server
In order to work with Slurm and manage jobs on the cluster you'll need a personal account and start a session on a User Interface (UI) server. If you are completely new here, please:
- follow these instructions to request an account.
- follow these instructions to login using your account.
Using the EGA Python API with pyEGA3.
To be able to use pyEGA3 you need a username and password provided by the EGA. These EGA credentials are not the same as your account for the Hyperchicken cluster and must be requested separately.
Add your credentials into a credentials.json
file, which is then used during data transfer.
An example of the syntax for this credentials.json file
is located in the PyEGA3 repository at GitHub.
For more on PyEGA3 see the example code snippets below and
the README.md in the root of the repo.
Example for downloading all files corresponding to specific sample from an EGA data set accession number.
echo 'Loading module pyEGA3 ...'
module load pyEGA3
output_dir='/groups/solve-rd/tmp10/username/yourdir'
ega_data_set_accession='EGAD00001005352'
sample_id='E577011'
mkdir -p -v "${outputdir}"
echo "Looking up all sampleIDs for data set ${ega_data_set_accession} and storing them into a ${ega_data_set_accession}.tmp file ..."
pyega3 -cf credentials.json files "${ega_data_set_accession}" > "${ega_data_set_accession}.tmp"
echo "Lookup all EGA file accession numbers corresponding to sample ID ${sample_id} ..."
grep "${sample_id}" "${ega_data_set_accession}.tmp" | awk '{print $1}' >> "${sample_id}.tmp"
echo "Staging all files for sample ${sample} to "${output_dir}" ..."
for ega_file_accession in $(cat "${sample_id}.tmp")
do
pyega3 -cf credentials.json -c 10 fetch "${ega_file_accession}" --saveto "${output_dir}"
done
Or see more extended examples here: /groups/solve-rd/prm*/example_scripts