Recommended software
Terminal
Terminal software and terminal sessions are a fundamental part of bioinformatics and data analysis. The most bioinformatics related tools are command line tools and feeling yourself comfortable (at least at some extend) with command line is mandatory. Terminals can be used locally or through Secure Shell (SSH) connection to a server. SSH is a cryptographic network protocol for operating network services securely over an unsecured network.
- Linux tutorial: https://ryanstutorials.net/linuxtutorial
- Linux Fundamentals (book): http://linux-training.be/files/books/linuxfun.pdf
- Getting started in linux-Bioinformatics: https://omicstutorials.com/getting-started-in-linux-bioinformatics
- Putty SSH client: https://www.putty.org
- Windows Terminal: https://docs.microsoft.com/en-us/windows/terminal
Command examples:
Create a directory:
mkdir NewDir
Rename the directory:
mv NewDir AnotherDir
Change the directory:
cd AnotherDir
Download file from Internet:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Connect to SSH gateway server:
ssh sshgw.uef.fi
Terminal multiplexer
screen
is a terminal multiplexer. It allows to manage multiple terminal sessions within the same console. In a way, it does the same thing as GUI terminal emulators with their built-in tab system and layout management. tmux
another terminal multiplexer and can be used in similar fashion as screen
.
Terminal multiplexer also provide persisten terminal sessions. This means that you can keep your terminal session running on the server even if you disconnect from the server and shutdown your computer. Once you reconnect to the server you can attach/connect to the previously started screen terminal session. This is not only recommended way of working from the command line but is sometimes only way to run long running analysis jobs.
- screen tutorial: https://linuxhint.com/linux-screen-command-tutorial
- tmux introduction: https://linuxhint.com/introduction_tmux
- screen & tmux comparison: https://linuxhint.com/tmux_vs_screen
Command examples:
Start new screen session:
screen
Attach (onnect to) the existing screen session (if only one session exists):
screen -r
List existing (running) sessions:
screen -ls
Attach (onnect to) the specific screen session (if more than one session exist):
screen -x 12345
Remote desktop connection
If you want to use full featured desktop environment instead of simple terminal session you can make Remote Desktop Connection
to the server. However, at the moment tuma.uef.fi is the only server provided by Bioinformatics Center that provides graphical desktop environment. In order to make remote desktop connection to tuma.uef.fi server you have to be in the UEF intranet. So remote desktop connections have to be made from within WVD instance.
Please note that Remote Desktop Connection
and Remote Desktop
software are two completely different software! Remote Desktop Connection
software is bundled together with every Windows 10 installation. However, Remote Dekstop
software appears to your computer not until you install WVD.
Remote Desktop Connection to tuma.uef.fi:
Remote Desktop to WVD service:
Conda and Mamba
Conda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments on your local computer. To start working with conda environments you have to first install conda by youself because it is not pre-installed on any of the servers provided by Bioinformatics Center. Miniconda is a good option to start with. It provides a minimal set of conda tools upon which you can start building your analysis environments.
Conda is not very fast at resolving package dependencies and it might take a while when you install new software into the current environment. This problem is solved by installing Mamba which is a reimplementation of the Conda package manager. Mamba works exactly the same way as Conda but does it's job much faster!
- Conda: https://docs.conda.io/en/latest
- Miniconda: https://docs.conda.io/en/latest/miniconda.html
- Mamba: https://github.com/mamba-org/mamba
- Package search: https://anaconda.org
Command examples:
Install miniconda on linux (64-bit):
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
Install miniconda on Windows (64-bit):
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe
bash Miniconda3-latest-Windows-x86_64.sh
Set up installation channels:
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
Create a new conda environment with name "MyEnv":
conda create -n MyEnv
Activate the environment:
conda activate MyEnv
Install bwa
into the current environment:
conda install bwa
Install bwa
into the current environment from conda-forge
channel:
conda install -c conda-forge bwa
Deactivate the environment:
conda deactivate
Install mamba
into the current environment:
conda install -c conda-forge mamba
Containers
Containers are a solution to the problem of how to get software to run reliably and reproducibly when moved from one computing environment to another. This could be e.g. from your personal laptop to a server environment. Problems arise when the software environments are not identical (e.g Windows vs Linux or Debian vs CentOS) and not all the the same software versions are always available (e.g. python 2.7 vs python 3). Also installing numerous software into new environments might be challenging. Also network topologies might be different, or the security policies and storage might be different but the software has to run on it.
There are two major container technologies available: Docker and Singularity. Docker containers requires root privileges to run but Singularity doesn't. So you can run singularity containers by using your own user account and privileges. However, developing and building singularity containers still requires root privileges. Singularity is capable of running docker containers and singularity is way to go if you want to utilize (pre-built) containers within your data analysis tasks.
- Docker: https://www.docker.com
- Docker Hub: https://hub.docker.com
- Singularity: https://sylabs.io
Command examples:
Download pre-built images from Docker Hub:
singularity pull --name [SINGULAR_IMAGE_NAME] docker://[REPOSITORY]/[IMAGE]
Run singularity container:
singularity run [SINGULAR_IMAGE_NAME]
Execute a custom command (ls /
) within a container:
singularity exec [SINGULAR_IMAGE_NAME] ls /
Spawn a new shell within a container and interact with the environment:
singularity shell [SINGULAR_IMAGE_NAME]
GitHub and Git
GitHub, Inc. is a provider of Internet hosting for software development and version control using Git. It offers the distributed version control and source code management functionality of Git, plus its own features. Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
If you want to version control your code and documentation GitHub service is a great option. In order to start working with GitHub it is advisable to first create a repository at GitHub and then clone the repository from your working environment. Once you have cloned the remote repository from GitHub you can start working with the local repository, add files, commit changes and finally push changes to GitHub.
- GitHub: https://github.com
- Getting started with GitHub: https://docs.github.com/en/github/getting-started-with-github
- Git: https://git-scm.com
- Git tutorial: https://git-scm.com/docs/gittutorial
Command examples:
Clone a repository from GitHub:
git clone https://github.com/[USERNAME]/[REPOSITORY_NAME].git
Add files into the local repository:
git add FILENAME
Add all the files in the current directory into the local repository:
git add *
Commit all the changes into the local repository (-m = commit message):
git commit -a -m "The latest changes"
Check status of the local repository:
git status
Push local changes to the remote repositor:
git push
Visual Studio Code
Visual Studio Code is a lightweight but powerful source code editor which runs on your desktop and is available for Windows, macOS and Linux. It comes with built-in support for JavaScript, TypeScript, Node.js, Git and Markdown and has a rich ecosystem of extensions for other languages (such as C++, C#, Java, Python, PHP, Go) and runtimes (such as .NET and Unity). Visual Studio Code is installed on tuma.uef.fi server and you can install it on your own computer too.
- Visual Studio Code: https://code.visualstudio.com
- Getting started with Visual Studio Code: https://code.visualstudio.com/docs/introvideos/basics
Visual Studio Code with Markdown preview:
Jupyter
Project Jupyter is a non-profit, open-source project. Jupyter Notebooks provides easy to use browser based analysis environments, support interactive data science and scientific computing across all programming languages. Jupyter Notebooks are not pre-installed on any of the servers provided by Bioinformatics Center. The easiest way to start working with Jupyter Notebooks is to install is by using conda
.
- Jupyter: https://jupyter.org
- Jupyter Notebook installation (classic and lab): https://jupyter.org/install
- Jupyter documentation: https://jupyter.org/documentation
Jupyter Notebook:
JupyterLab:
Command examples:
Install classic Jupyter Notebook into the current conda environment:
conda install -c conda-forge notebook
Run Jupyter Notebook:
jupyter notebook
Install Jupyterlab into current conda the environment:
conda install -c conda-forge jupyterlab
Run JupyterLab:
jupyter-lab
Run JupyterLab by using custom port 8890:
jupyter-lab --port 8890
Windows Subsystem for Linux
Nowadays it is not always mandatory to have access to Linux server to be able to run bioinformatic analyses from command line on your personal Windows 10 computer. A good alternative is Windows Subsystem for Linux (WSL). WSL lets you run a full freature Linux environment -- including most command line tools, utilities, and applications -- directly on Windows, unmodified, without the overhead of a traditional virtual machine or dual-boot setup.
- Windows Subsystem for Linux Documentation: https://docs.microsoft.com/en-us/windows/wsl
Linux terminals on Windows: