User Tools

Site Tools


guides:slurm:tips

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
guides:slurm:tips [30.04.2020 10:32]
Juha Kekäläinen created
guides:slurm:tips [15.11.2021 16:19] (current)
Administrator
Line 3: Line 3:
 ## Do not overallocate resources ## Do not overallocate resources
  
-With SLURM for your own sake it is important not to overallocate the resources while pending time slot for the computing job. For example if you request 50 GB of RAM for you computing job while it actually uses 5 GB. Then the job will wait for computing slot for 50 GB of job and this could mean that you will wait computing time for multiple days.+With SLURM for your own sake it is important not to overallocate the resources while pending time slot for the computing job. If you overallocate then your job will be pending for resources for a long time.
  
-Or if you running parallel jobs then your jobs will run one by one istead of running all the jobs at the same time.+For example if you request 50 GB of RAM for you computing job while it actually uses 5 GB. Then the job will wait for computing slot for 50 GB of job and this could mean that you will wait computing time for multiple days. Or if you are running multiple jobs parallel then your jobs will run one by one instead of running all jobs at the same time. 
 + 
 +Other users also will suffer while they are waiting for computing resources that are reserved for no reason. 
 + 
 +## Use local SSD of the computing node  
 + 
 +Each computing node has 300 GB of local storage and it is mounted to the /tmp path. For the computing jobs that write temporary results you could get much greater performance by using the local storage instead of the network drive. 
 + 
 +## Monitor your jobs 
 + 
 +You can monitor the resources usage with the [Grafana dashboard](https://sampo.uef.fi) 
 + 
 +## Slurm job efficiency report 
 + 
 +You should also check the Slurm job efficiency report for the completed jobs. This way you will find out how much was your CPU, memory and wall time usage. With these informations you can fine tune your jobs and not overallocate the computing cluster. 
 + 
 +``` 
 + 
 +seff JOBID 
 + 
 +Job ID: JOBID 
 +Array Job ID: JOBID_408 
 +Cluster: bic 
 +User/Group: user/domain users 
 +State: COMPLETED (exit code 0) 
 +Cores: 1 
 +CPU Utilized: 01:31:27 
 +CPU Efficiency: 166.17% of 00:55:02 core-walltime 
 +Job Wall-clock time: 00:55:02 
 +Memory Utilized: 10.26 GB 
 +Memory Efficiency: 16.42% of 62.50 GB 
 + 
 +```
  
-Also other users will suffer while they are waiting for computing resources that are reserver for no reason. 
guides/slurm/tips.1588231949.txt.gz · Last modified: 30.04.2020 10:32 by Juha Kekäläinen