User Tools

Site Tools


guides:slurm

**This is an old revision of the document!**

SLURM Workload Manager

SLURM Workload Manager is an open source Job scheduler that is intended to control background executed programs. These background executed programs are called Jobs. User defines the Job with various parameters that include run time, number of tasks (CPU cores), amount of required memory (RAM) and specify which program(s) to execute. These jobs are called batch jobs. (Batch) Jobs are submitted to common job queue (partition) that is shared by the other users and SLURM will execute the submitted jobs automatically in turn. After the job is completed (or error occurs) SLURM can optionally notify the user with email notification. Additionally to the batch jobs user can reserve compute node for interactive jobs where you wait for your turn in queue and on your turn you are put on your reserved node where you can execute commands. After the reserved time is over your sessions is terminated.

SLURM Partitions on sampo.uef.fi:

  • serial. 4 out of 4 nodes. Maximum run time 3 days
  • longrun. 2 out of 4 nodes. Maximum run time 14 days
  • parallel. 2 of 4 nodes. Maximum run time 3 days.

Explanation of the partitions:

Compute nodes are grouped in multiple partitions and each partition can be considered as a job queue. Partitions can have multiple constraints and restrictions. For example access for certain partitions can be limited by the user/group or the maximum running time can restricted.

Serial partition is the default partition for all jobs that user submits. User can reserve maximum of 1 nodes for his/her job. Default run time is 5 minutes and maximum 3 days.

Longrun partition is for long running jobs and only one node is for this usage. Default run time 5 minutes and maximum 14 days.

Parallel partition is for parallel jobs that can span over multiple nodes (MPI jobs for example). User can reserve 2 nodes (minimum and maximum). Default run time is 5 minutes and maximum 2 days.

Using R with SLURM

Example script (hello.R):

sayHello <- function(){
  print("hello")
}
sayHello()

User can execute R scripts from the command line with the following commands:

  1. R CMD BATCH script.R
  2. Rscript script.R

Note: With the R CMD BATCH command the output of the R script is redirected to file instead of the screen

Next user must embed the script to the SLURM batch job file/control file (submit.sbatch):


#!/bin/bash
#SBATCH --job-name helloworld # Name for your job
#SBATCH --ntasks 1 # Number of task
#SBATCH --time 5 # Runtime in minutes.
#SBATCH --mem=2000 # Reserve 2 GB RAM for the job
#SBATCH --partition serial # Partition to submit
#SBATCH --output hello.out # Standard out goes to this file
#SBATCH --error hello.err # Standard err goes to this file
#SBATCH --mail-user username@uef.fi # this is the email you wish to be notified at
#SBATCH --mail-type ALL # ALL will alert you of job beginning, completion, failure etc

module load r # load modules

Rscript hello.R # Execute the script

User can submit the job to the compute queue with the sbatch command. Note that the batch file (and R script and data) must be located at the /home/ disk.

sbatch submit.sbatch

User can monitor the progress of the job with the squeue command. JOBID is provided by the sbatch commmand when the job is submitted.

squeue -j JOBID

Also while the job is running user can login to executing compute node with the ssh command. When job is over the ssh session is terminated.

ssh sampo1

Interactive session

User can get an interactive sessions for whatever purpose. For this to be effective free node is more or less required. Following command will open bash session to any free node on the serial parallel for the next 5 minutes.

srun -p serial --pty -t 0-00:05 /bin/bash

Slurm job efficiency report (seff) and Accounting

SLURM can provide the user with various job statistics. Like memory usage and CPU time. for example with seff (Slurm job effiency report) it is possible to monitor on how efficiency the job was.

seff JOBID

It is particularly useful to add following line to the end of the sbatch script:

seff $SLURM_JOBID

or if you wish to have more detailed information

# show all own jobs contained in the accounting database
sacct
# show specific job
sacct -j JOBID
# specify fields
sacct -j JOBID -o JobName,MaxRSS,MaxVMSize,CPUTime,ConsumedEnergy
# show all fields
sacct -j JOBID -o ALL
guides/slurm.1569065209.txt.gz · Last modified: 29.10.2019 15:10 (external edit)