SLURM job array provides a way to submit multiple similar but independent computational jobs with large number of dataset in concurrent manner. Array jobs and resources are defined in the same master batch-script and the SLURM will control the workflow by automatically submitting array jobs that are based on single submitted master job.
Below we have an example of fastqc quality analysis for multiple fastq nucleotide files. It scans for the “data” directory for any fastq files and executes fastqc run on each file. Results are stored to common directory. SLURM creates array of maximum of 100 subjobs, distributes the subjobs to computing nodes and executes 3 computing job concurrently.
Each job has 5 minutes of runtime and the job has 500 MB of RAM reserved.
#!/bin/bash #SBATCH --job-name fastqc # Name for your job #SBATCH --ntasks 1 # Number of task #SBATCH --time 5 # Runtime in minutes. #SBATCH --mem=500 # Reserve 500 MB RAM for the job #SBATCH --partition small # Partition to submit #SBATCH --output fastq_%A_%a.out.out # Standard output goes to here #SBATCH --error fastqc_%A_%a.err # Standard error goes to here #SBATCH --mail-user username@uef.fi # this is the email you wish to be notified at #SBATCH --mail-type ALL # ALL will alert you of job beginning, completion, failure etc #SBATCH --array=1-100%3 # Array range and number of simultanous jobs # Make sure that the results directory exists mkdir -p ./results/ # Load required modules module load fastqc/0.12.1 # Search all the fastq files from the "data" directory and generate the array file=$(ls ./data/*.fastq | sed -n ${SLURM_ARRAY_TASK_ID}p) # Run quality analysis on each fastq file fastqc -o ./results/ $file
Submit the job to computing queue with sbatch command.