An easy way to run parallel jobs on Stampede2

A new allocation on the TACC Stampede2 supercomputer has been awarded to our lab. I put it into use right away. My understanding is that Stampede2 is best suited for highly parallel computing jobs, especially the way by which the service units are calculated takes into account the whole node (of 16 cores). So even if only 1 core/cpu is used for 1 hour (1 core hour) for a job, the resources costed by that single job is counted as the 16 cores on the entire node running for 1 hour (16 core). So to be cost-effective, we need a way to run our jobs in parallel.

The solution is to use the GNU parallel tool. It can easily be installed on Stampede in your home dir by the command below

(wget -O – || curl || fetch -o – | bash

To put parallel in use, an example would be to extract the paired-end files from multiple SRA files downloaded from NCBI Sequence Read Archive. The command is

parallel “fastq-dump –split-files {}; mv {} SRA_File” ::: *.sra

What it does here is to invoke fastq-dump command with the option –split-files. The {} is a kind of placeholder for the SRA file name, which is specified by *.sra. ::: is just the symbol to pass the file names to  parallel. Parallel can inforporate multiple commands. the ; can be used to do that. What happens here is that for each SRA file that is first decompressed for paired-end file it is moved to a directory named SRA_File. More information about GNU parallel can be found here.

On Stampede2, to run this kind of parallel jobs, we need to submit a job script to the system. The job script is below with some annotation. The bold parameters are what I think may need to be set for each job accordingly. For example, how many jobs need to run at the same time (-n parameter) and then how many nodes (-N) are needed.

#SBATCH -J Dmagna_splitSRA # job name
#SBATCH -o Dmagna_splitSRA.o%j  # output and error file name (%j expands to jobID)
#SBATCH -N 1 # Total number of nodes requested (16 cores/node)
#SBATCH -n 16 # total number of mpi tasks requested
#SBATCH -p normal # queue (partition) — normal, development, etc.
#SBATCH -t 12:00:00 # run time (hh:mm:ss) – 1.5 hours
#SBATCH –mail-type=begin # email me when the job starts
#SBATCH –mail-type=end # email me when the job finishes

cd $SCRATCH/Dmagna_RNA

parallel “fastq-dump –split-files {}; mv {} SRA_File” ::: *.sra

To submit a job, the command is sbatch jobscript

To check job status showq -u yourusername

To cancel a job, the command is scancel youJobID