User Tools

Site Tools


infrastructure:sampo

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
infrastructure:sampo [13.03.2020 14:06]
Juha Kekäläinen
infrastructure:sampo [15.11.2021 16:18] (current)
Administrator
Line 1: Line 1:
-#sampo.uef.fi+# sampo.uef.fi
  
 Sampo.uef.fi is High Performance Computing (HPC) environment running with the [Slurm](https://slurm.schedmd.com/overview.html) workload manager. It was launched in autumn, 2019 and is targeted at a wide range of workloads. Sampo.uef.fi is High Performance Computing (HPC) environment running with the [Slurm](https://slurm.schedmd.com/overview.html) workload manager. It was launched in autumn, 2019 and is targeted at a wide range of workloads.
Line 5: Line 5:
 **Note**: The login node (sampo.uef.fi) can be used for light pre- and postprocessing, compiling applications and moving data. All other tasks are to be done using the batch job system.  **Note**: The login node (sampo.uef.fi) can be used for light pre- and postprocessing, compiling applications and moving data. All other tasks are to be done using the batch job system. 
  
-##Specs+{{:guides:slurm:slurm.png}}
  
-In addition to the login node (sampo.uef.fi) the cluster has a total of 4 computing nodes. Each node is equipped with two Intel Xeon Gold processors, code name Skylake, with 40 cores each running at 2,4 GHz (max turbo frequency 3.7GHz). The interconnect is based on Intel Omnipath. The nodes are connected with a 100 Gbps link. The login node act also as NFS file server having 80TB (HDD) storage space. Also there is one GPU-node for the GPU-workloads.+## Specs 
 + 
 +In addition to the login node (sampo.uef.fi) the cluster has a total of 4 computing nodes. Each node is equipped with two Intel Xeon Gold processors, code name Skylake, with 40 cores each running at 2,4 GHz (max turbo frequency 3.7GHz). Additionally there are two GPU computing nodes equipped with 4xA100 (40GB) adapters. The nodes are connected with a 100 Gbps Omni-Path network. The login node act also as file server to the computing nodes. Also there is one GPU-node for the GPU-workloads.
  
 **Login node** **Login node**
Line 18: Line 20:
 **Compute nodes** **Compute nodes**
  
-4 x Dell C6420+4x Dell C6420 (sampo[1-4])
 * CPU: 2 x Intel Xeon Gold 6148 (40 Cores / 80 Threads) * CPU: 2 x Intel Xeon Gold 6148 (40 Cores / 80 Threads)
 * Memory:  * Memory: 
   * 3 Nodes 376 GB   * 3 Nodes 376 GB
   * 1 Nodes 768 GB   * 1 Nodes 768 GB
 +  * LOCAL DISK (/scratch): 300 GB SSD
  
  
-1 x VMWare VM with GPU +2x Lenovo SR670 v2 (sampo[5-6]) 
-  * CPU: 1 x E5-2630 (12 Cores) +  * GPU: 4x NVIDIA A100 40 GB 
-  * RAM: 64 GB +  * CPU: Intel Xeon Gold 6326 (32 Cores / 64 Threads
-  * GPU: NVIDIA P100/16 GPU+  * RAM: 512 GB 
 +  * LOCAL DISK (/scratch): 1.6 TB NVME
  
 +## Paths
  
 +Additionally to the [[guides:storage|UEF IT Services Research Storage]] the cluster has its own local storage. 
 +There are __**no backups**__ of the local storage so keep your important data on UEF IT Services Research Storage Space. Also in the future all old files (older than 2 months) will be automatically removed from group folders.
  
-##Paths+You can access the sampo.uef.fi storage via SMB-protocol.
  
-Additionally to the [[guides:storage|UEF IT Services Research Storage]] the cluster has its own local storage. Research storage is only connected to the **login** node and it __**cannot**__ be accessed from the compute nodes. +Research storage provided by the UEF IT Services is also connected to the login and computing nodes. 
  
-Therefore the scripts and the data sets must be copied to the cluster local storage if the user wishes to analyze them. There are __**no backups**__ of the local storage so keep your important data on UEF IT Services Research Storage Space. Also all old files (older than 2 months) will be automatically removed from group folders. +**Cluster storage**
- +
-**Cluster local storage**+
  
 - /home/users/username - 250 GB User home directory ($HOME) - /home/users/username - 250 GB User home directory ($HOME)
 - /home/groups/groupname - Minimum 5 TB User research group folder - /home/groups/groupname - Minimum 5 TB User research group folder
 +
 +** Computing node local storage **
 +Each computing node has 300 GB of local storage  (SSD storage). You can access the local disk with /tmp path
  
 **UEF IT Research Storage** **UEF IT Research Storage**
Line 49: Line 57:
 - /research/groups/groupname - User research group directory at \\research.uefad.uef.fi - /research/groups/groupname - User research group directory at \\research.uefad.uef.fi
  
-##Applications+## Applications
  
-   bamtools/2.5.1      fastqc/0.11.7                openmpi/1.10.7-2          r/3.6.1            +To see the list of terminal application visit the [available applications](https://bioinformatics.uef.fi/guides/available-applicationsweb page
-   bamutil/1.0.13      fastx-toolkit/0.0.14         openmpi/3.1.4             samtools/1.9 +
-   bcftools/1.9        freebayes/1.1.0              picard/2.18.3             sra-toolkit/2.9.2 +
-   bedtools2/2.27.1    hisat2/2.1.0                 plink/1.07                star/2.6.1b +
-   bowtie/1.2.3        homer/4.10                   plink/1.90b6.12           stringtie/1.3.4a +
-   bowtie2/2.3.4.1     htslib/1.9                   python/2.7.15             tabix/2013-12-16 +
-   bwa/0.7.17          matlab/R2015b                python/2.7.17             trimmomatic/0.36 +
-   clustalw/2.1        matlab/R2018b                python/3.7.0              vcftools/0.1.14 +
-   cufflinks/2.2.1     openjdk/1.8.0_202-b08        python/3.7.3              snptest/2.5.2 +
-   diamond/0.9.21      openjdk/11.0.2               r/3.5.3                   texlive/2019 +
-   bcl2fastq2/2.20     igv/2.8.0                    nf-core/1.8               snptest/2.5.4-beta3 (D) +
-   chilin/2.0          minimac4/1.0.2               shapeit/2.17      +
-   eagle/2.4.1         nextflow/19.10    +
  
- +## Slurm Workload Manager
-##Slurm Workload Manager+
  
 [SlurmWorkload Manager](https://Slurm.schedmd.com/overview.html) is an open source [Job scheduler](https://en.wikipedia.org/wiki/Job_scheduler) that is intended to control background executed programs. These background executed programs are called **Jobs**. User defines the Job with various parameters that include run time, number of tasks (CPU cores), amount of required memory (RAM) and specify which program(s) to execute. These jobs are called batch jobs. (Batch) Jobs are submitted to common job queue (partition) that is shared by the other users and Slurm will execute the submitted jobs automatically in turn. After the job is completed (or error occurs) Slurm can optionally notify the user with email notification. Additionally to the batch jobs user can reserve compute node for interactive jobs where you wait for your turn in queue and on your turn you are put on your reserved node where you can execute commands. After the reserved time is over your sessions is terminated. [SlurmWorkload Manager](https://Slurm.schedmd.com/overview.html) is an open source [Job scheduler](https://en.wikipedia.org/wiki/Job_scheduler) that is intended to control background executed programs. These background executed programs are called **Jobs**. User defines the Job with various parameters that include run time, number of tasks (CPU cores), amount of required memory (RAM) and specify which program(s) to execute. These jobs are called batch jobs. (Batch) Jobs are submitted to common job queue (partition) that is shared by the other users and Slurm will execute the submitted jobs automatically in turn. After the job is completed (or error occurs) Slurm can optionally notify the user with email notification. Additionally to the batch jobs user can reserve compute node for interactive jobs where you wait for your turn in queue and on your turn you are put on your reserved node where you can execute commands. After the reserved time is over your sessions is terminated.
  
-##Slurm Partitions+## Slurm Partitions
  
 - **serial**. 4 out of 4 nodes. Maximum run time 3 days - **serial**. 4 out of 4 nodes. Maximum run time 3 days
Line 87: Line 82:
 **Parallel** partition is for parallel jobs that can span over multiple nodes (MPI jobs for example). User can reserve 2 nodes (minimum and maximum). Default run time is 5 minutes and maximum 3 days. **Parallel** partition is for parallel jobs that can span over multiple nodes (MPI jobs for example). User can reserve 2 nodes (minimum and maximum). Default run time is 5 minutes and maximum 3 days.
  
-**Gpu** partition is for the GPU jobs. User can can reserve 1 node and default runtime is 5 minutes and maximum 3 days.+**GPU** partition is for the GPU jobs. User can can reserve 1 node and default runtime is 5 minutes and maximum 3 days.
  
-##Getting started+## Getting started
  
 See Slurm usage instruction from [[guides:slurm|Slurm Workload Manager]]. See Slurm usage instruction from [[guides:slurm|Slurm Workload Manager]].
 +
 +## System monitoring
 +
 +You can also monitor the status of the sampo computing cluster by visiting [https://sampo.uef.fi](https://sampo.uef.fi) URL. From there you can find various graphs concerning the CPU utilization, memory, network or disk usage.
 +{{ :infrastructure:grafana.png?direct&400 |}}
 +
infrastructure/sampo.1584101166.txt.gz · Last modified: 13.03.2020 14:06 by Juha Kekäläinen