User Tools

Site Tools


guides:slurm:basics

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
guides:slurm:basics [24.09.2019 21:18]
Juha Kekäläinen
— (current)
Line 1: Line 1:
-# Slurm basics 
  
-Slurm Workload manager is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. As a cluster workload manager, Slurm has three key functions.  
- 
-  * It allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. 
-  * It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. 
-  * It arbitrates contention for resources by managing a queue of pending work. 
-  * Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms. 
- 
-Bioinformatics Center uses unmodified version of Slurm on sampo.uef.fi computing cluster. This guarantees that the most of the tutorials and guides found from the Internet are applicable as-is. The most obvious starting place to search for usage information is documentation section of the Slurm own website [[https://slurm.schedmd.com|Slurm Workload Manager]]. 
- 
-### Example script 
-``` 
- 
-#!/bin/bash 
-#SBATCH --job-name helloworld # Name for your job 
-#SBATCH --ntasks 1 # Number of task 
-#SBATCH --time 5 # Runtime in minutes. 
-#SBATCH --mem=2000 # Reserve 2 GB RAM for the job 
-#SBATCH --partition serial # Partition to submit 
-#SBATCH --output hello.out # Standard out goes to this file 
-#SBATCH --error hello.err # Standard err goes to this file 
-#SBATCH --mail-user username@uef.fi # this is the email you wish to be notified at 
-#SBATCH --mail-type ALL # ALL will alert you of job beginning, completion, failure etc 
- 
-module load r # load modules 
- 
-Rscript hello.R # Execute the script 
- 
-``` 
- 
-User can submit the job to the compute queue with the **[sbatch](https://slurm.schedmd.com/sbatch.html)** command. Note that the batch file (and R script and data) must be located at the /home/ disk. 
- 
-``` 
-sbatch submit.sbatch 
-``` 
- 
-User can monitor the progress of the job with the **[squeue](https://slurm.schedmd.com/squeue.html)** command. JOBID is provided by the sbatch commmand when the job is submitted. 
- 
-``` 
-squeue -j JOBID 
-``` 
- 
-Also while the job is running user can login to executing compute node with the ssh command. When job is over the ssh session is terminated. 
- 
-``` 
-ssh sampo1 
-``` 
- 
-### Interactive session 
- 
-User can get an interactive sessions for whatever purpose. For this to be effective free node is more or less required. Following command will open bash session to any free node on the serial parallel for the next 5 minutes. 
- 
-``` 
-srun -p serial --pty -t 0-00:05 /bin/bash 
-``` 
- 
-### Slurm job efficiency report (seff) and Accounting 
- 
-SLURM can provide the user with various job statistics. Like memory usage and CPU time. 
-for example with seff (Slurm job effiency report) it is possible to monitor on how efficiency the job was. 
- 
-``` 
-seff JOBID 
-``` 
- 
-It is particularly useful to add following line to the end of the sbatch script: 
- 
-``` 
-seff $SLURM_JOBID 
-``` 
- 
-or if you wish to have more detailed information 
- 
-``` 
-# show all own jobs contained in the accounting database 
-sacct 
-# show specific job 
-sacct -j JOBID 
-# specify fields 
-sacct -j JOBID -o JobName,MaxRSS,MaxVMSize,CPUTime,ConsumedEnergy 
-# show all fields 
-sacct -j JOBID -o ALL 
-``` 
guides/slurm/basics.1569349132.txt.gz · Last modified: 29.10.2019 15:10 (external edit)