Initializing Varanto: variant enrichment analysis and annotation (R Shiny) user session...

Varanto

Input variations:


Input can be variant ids used in Ensembl eg. rs1801133 or genomic locations in form <chromosome 1-22,X,Y,MT>:<start position>. These can be mixed and matched. Please format input file to one variant per line.
Warning: Choosing annotations with large number of terms will increase execution time significantly

User Guides

Varanto tool can be used to annotate and analyze human genetic variations.

Input-tab

Input variations

Varanto understand variation identifiers supported by the Ensembl database and genomic locations. The primary type for the identifiers is dbSNP reference SNP identifiers (i.e. rs-numbers). You can use genomic locations in form chr:location (:). These can be mixed and matched separated by white space eg. 'rs1801133 22:19963748'. There are two ways to input the identifiers or genomic locations, by pasting them on the text-box (separated by whitespace) or by uploading a file with a single variation identifier on each row of the file.

Hint: The “Example variations” button allows you to quickly test Varanto with a small set of example variations.

Background Set

The enrichment analysis looks for over-representation/under-representation of variation linked annotations in your input set. The annotations linked to the input set is compared to a background set, which by default consist of all the variations in the database. If your input set is derived from experiments where your technological choices limit the possible detectable variations (for example when using a SNP microarray with predetermined variations), you should select a suitable background variation set. The most common SNP microarray types are supported (information retrieved from USCS Genome Browser).

Hint: If not using an input set derived from a SNP microarray experiment, use “All variations” as background set.

Filter variations by distance

To avoid the issue where your input set contain several closely located variations (e.g. within a single gene) resulting in an over-presentation of annotations liked to this loci, you can filter the input variation set by their genomic distance. Varanto will choose a single variation within the distance limit and filter out other variations.

Variation annotations

Variation annotations are annotations that are linked directly to an individual variation (e.g. alleles for the variation in question, effect on transcripts, changes in disease risk). These annotations give specific information on the variations themselves. Annotations are listed by selectable rows where the row includes data resource abbreviation (Ens - Ensemble, GET-E - Genotype + Environment = Trait Evidence(GET-Evidence)) inside brackets, name of the annotation and count of annotation terms inside brackets. Data resource information for the annotations can be found in the About tab.

Warning: Choosing annotations with large number of terms will increase execution time significantly

Gene annotations

Gene annotations are annotations that are linked to a gene, and through the gene, to all variations within that gene. Therefore these annotation are more general, as all the variations within the gene have the same gene annotations. Annotations are listed by selectable rows where the row includes data resource abbreviation (Ens - Ensemble, GET-E - GET-Evidence, MSigDB - Molecular Signatures Database) inside brackets, name of the annotation and count of annotation terms inside brackets. Data resource information for the annotations can be found in the About tab.

Hint: Gene annotations are more general and may yield results where the variations in the input list do not really have an effect on the gene or the related phenomena. On the other hand, gene annotations are useful for hypothesis generation and overall inspection of variation sets as much more information is known about genes than variations. When using gene annotations, it is often advisable to use the “Filter variations by distance” option to filter out variations within the same gene.

Warning: Choosing annotations with large number of terms will increase execution time significantly

Submit your query

After inputting the variation set and choosing suitable options, press “Submit” to analyze your input set.

Brief descriptive information about your input will be shown below the Submit-buton (e.g. how many variations from your input set were found from the background set, how many were left after filtering, how many unique annotations were found for these variations, and how many associations there was between the input variations and the annotations).

Below the descriptive infomration a table with the query results will be shown- The table consist of variations in your input set, general information about the variations (allele, chromosomal location), and association of the variations to annotations as a binary table (0 means no association, 1 means an association). The data can be downloaded through the “Download data” button. The downloaded data can be further analyzed with external tools.

Annotation Results

When input variations are correctly added, annotation results table will be appear consisting of variations in your input set with general information about them (variation id, strand, position, allele, chromosome).

After selecting variation and/or gene annotations "Annotation Results"-table will update to include those annotations as binary matrix (0 means no association, 1 means an association). The data can be downloaded through the “Download data” button. The downloaded data can be further analyzed with external tools.

Enrichment Analysis-tab

Enrichment Analysis-tab includes results from the enrichment analysis performed on your input variation set. The result table includes information about over- and under-presentation of associated annotations within your input variation set, when compared to the selected background variation set.

The columns in the enrichment analysis table are:

  • Label Label of the annotation
  • Description Description of the annotation (if available)
  • Observed Observed number of annotations associated to variations in the input set
  • Expected Expected number of associated annotations, based on the background set
  • Odds ratio Odds ratio (OR) for the ratio of the observed and expecgted associations
  • Under P Statistical significance (p-value) for under representation of the association
  • Under P-FDR Under P value adjusted for multiple testing using Benjamini-Hochberg false discovery rate correction
  • Over P Statistical significance (p-value) for over-representation of the association
  • Over P-FDR Over P value adjusted for multiple testing using Benjamini-Hochberg false discovery rate correction

Hint: The most common way to interpret the results is to focus on the Over P-FDR column, look for statistically significant over-representation (e.g. P-FDR < 0.05) to identify association terms statistically significantly over represented by the variations in your input set.

Heatmap-tab

Heatmap-tab includes a visualization of the variations and their associated annotations. The variations and annotations are ordered using hierarchical clustering, enabling identification of clusters and annotations that behave similarly.

Karyogram-tab

The karyogram tab allows you to visualize the genomic locations of your input variations in the context of the human genome. This enables visual inspection of the genomic loci, including detection of clustering of variations in certain parts of the genome.

About Varanto

Varanto is an online database and tool for annotating human genetic variations using various annotation data sources. Varanto can be used to query a set of input variations, retrieve associated annotations and to visualize and analyze the results.

Varanto has been developed by the Paananen research group at the University of Eastern Finland, Kuopio Finland.

To contact Paananen research group, please send e-mail to: jussi.paananen (at) uef.fi

Varanto is developed using R and Shiny web framework.

Source code and local deployment instructions are available at https://github.com/oqe/varanto/

Varanto by UEF Bioinformatics Center is licensed under a Creative Commons Attribution 4.0 International License.
Creative Commons License

Data sources

R Session information: