This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
guides:slurm:tips [30.04.2020 10:32] Juha Kekäläinen created |
guides:slurm:tips [15.11.2021 16:19] (current) Administrator |
||
---|---|---|---|
Line 3: | Line 3: | ||
## Do not overallocate resources | ## Do not overallocate resources | ||
- | With SLURM for your own sake it is important not to overallocate the resources while pending time slot for the computing job. For example if you request 50 GB of RAM for you computing job while it actually uses 5 GB. Then the job will wait for computing slot for 50 GB of job and this could mean that you will wait computing | + | With SLURM for your own sake it is important not to overallocate the resources while pending time slot for the computing job. If you overallocate then your job will be pending |
- | Or if you running | + | For example if you request 50 GB of RAM for you computing job while it actually uses 5 GB. Then the job will wait for computing slot for 50 GB of job and this could mean that you will wait computing time for multiple days. Or if you are running |
+ | |||
+ | Other users also will suffer while they are waiting for computing resources that are reserved for no reason. | ||
+ | |||
+ | ## Use local SSD of the computing node | ||
+ | |||
+ | Each computing node has 300 GB of local storage and it is mounted to the /tmp path. For the computing jobs that write temporary results you could get much greater performance by using the local storage instead of the network drive. | ||
+ | |||
+ | ## Monitor your jobs | ||
+ | |||
+ | You can monitor the resources usage with the [Grafana dashboard](https:// | ||
+ | |||
+ | ## Slurm job efficiency report | ||
+ | |||
+ | You should also check the Slurm job efficiency report for the completed jobs. This way you will find out how much was your CPU, memory and wall time usage. With these informations you can fine tune your jobs and not overallocate the computing cluster. | ||
+ | |||
+ | ``` | ||
+ | |||
+ | seff JOBID | ||
+ | |||
+ | Job ID: JOBID | ||
+ | Array Job ID: JOBID_408 | ||
+ | Cluster: bic | ||
+ | User/Group: user/domain users | ||
+ | State: COMPLETED (exit code 0) | ||
+ | Cores: 1 | ||
+ | CPU Utilized: 01:31:27 | ||
+ | CPU Efficiency: 166.17% of 00:55:02 core-walltime | ||
+ | Job Wall-clock time: 00:55:02 | ||
+ | Memory Utilized: 10.26 GB | ||
+ | Memory Efficiency: 16.42% of 62.50 GB | ||
+ | |||
+ | ``` | ||
- | Also other users will suffer while they are waiting for computing resources that are reserver for no reason. |