User Tools

Site Tools


guides:slurm:array-job

**This is an old revision of the document!**

SLURM Job Array

SLURM job array provides a way to submit multiple similar but independent computational jobs to large number of dataset in concurrent manner. Array jobs and required resources are defined in the same master batch-script and the SLURM will control the workflow by automatically submitting array tasks that are based on single submitted master job.

Below we have an example of fastqc quality analysis for multiple fastq nucleotide files. It scans for the “data” directory for any fastq files and executes fastqc run on each file. Results are stored to common directory. SLURM creates array of maximum of 100 subjobs, distributes the subjobs to computing nodes and executes 3 computing job concurrently.

#!/bin/bash
#SBATCH --job-name fastqc                       # Name for your job
#SBATCH --ntasks 1                              # Number of task
#SBATCH --time 5                                # Runtime in minutes.
#SBATCH --mem=500                               # Reserve 500 MB RAM for the job
#SBATCH --partition serial                      # Partition to submit
#SBATCH --output fastq_%A_%a.out.out            # Standard output goes to here
#SBATCH --error fastqc_%A_%a.err                # Standard error goes to here
##SBATCH --mail-user username@uef.fi            # this is the email you wish to be notified at
##SBATCH --mail-type ALL                        # ALL will alert you of job beginning, completion, failure etc
#SBATCH --array=1-100%3                         # Array range and number of simultanous jobs

# Make sure that the results exists
mkdir -p ./results/

# Load required modules
module load fastqc/0.11.7

# Search all the fastq files from the "data" directory and generate the array
file=$(ls ./data/*.fastq | sed -n ${SLURM_ARRAY_TASK_ID}p)

# Run quality analysis
fastqc -o ./results/ $file
guides/slurm/array-job.1583416856.txt.gz · Last modified: 05.03.2020 16:00 by Juha Kekäläinen