Retrieving Results...
Retrieving Results...
Retrieving Results...

1.GAIL Overview

What Is GAIL?

GAIL (Gene-Gene Association Inference based on biomedical Literature) is a web interface and database that allows the investigation of human gene-gene association networks based on shared gene-GO co-occurrences in PubMed abstracts. To see how the output of GAIL can be analyzed using relevant software (BayesGO), please visit the software page.

GAIL uses a Neo4j graph database to store its data in a graph data structure. The genes and GO terms are stored as graph nodes. Gene-GO associations and gene-gene associations are stored as graph edges. GAIL currently supports search using HUGO Gene Nomenclature Committee (HGNC) IDs and GO IDs. For genes, if you have other identifiers than HGNC IDs, GAIL provides an ID mapper function to translate supported identifiers into HGNC IDs.

Utilities

Function Description
Network Query

The network query currently provides service for user to query a gene-gene association network for a given list of gene, using HUGO Gene Nomenclature Committee (HGNC) official IDs.

Gene Query

The gene query allows user to search a list of interested genes using HGNC official IDs, and get all associated GO terms from our biomedical literature database. Hypergeometric test p-values with or without Bonferroni correction are also given. External links to GeneCard on the page allow user to get more detailed info about the gene.

GO Query

Similarly, the GO query allows user to search a list of interested GO terms using GO IDs, and get all associated genes from our database. Hypergeometric test p-values with or without Bonferroni correction are also given. External links to AmiGO on the page allow user to get more detailed info about the gene.

Gene-GO Query

The gene-GO query allows user to retrieve a gene-GO association p-value matrix. Hypergeometric test p-values with or without Bonferroni correction are also given.

ID Mapper

The gene query and network query require to use official HGNC ID. We provide a simple ID mapper function to map a list of gene names/synonyms to possible official HGNC IDs. A matched list of HGNC IDs will be returned.

2.Homepage

3.Network Query

Network Query Search

Network Query Results

The results page contains following components:

Advanced Functions Perform advanced functions (download, community detection, global go analysis) on the gene-gene association network.
Tab Menu Check results in different formats (network visualizaton, data table, data matrix, gene communities).
Results Display Box Display results chosen from the tab menu. Default set to display the network visualization.

The default network visualization has three panels. The left panel allows you to customize the network layout, the central panel visualize the network, and the right panel allows you to check edge/node information when you click an edge/node.

Advanced Functions

Unmapped Genes

If there are unmatched queried gene IDs, you can check the unmatched IDs by clicking the 'Unmapped IDs' button.

Download

You can download the gene-gene association cosine similarity score matrix in .csv format or the network graph in .png format.

Community Detection

You can perform the community detection among genes to find gene clusters.

Community GO Analysis

After running the community detection, you can perform a GO analysis on each gene community to find top GO terms associated with the community.

Change Community Color

After running the community detection, each gene cluster is highlighted with a distinct color. You can also change the cluster color using the color picker in each cluster page.

Global GO Analysis

You can run a global GO analysis to find top GO terms associated with queried genes. This is determined by calculating the average p-values of gene-GO associations among all queried genes.

Tab Menu

The 'Community Detection' and 'Global GO Analysis' will appear when corresponding advanced functions are ran.

Network

You can view the network visualization in the network tab.

Network Data View

You can view the gene-gene associations and their cosine similarity scores in a long table format or a matrix format.

Network Customization

Cosine Similarity

The gene-gene association network is constructed based on cosine similarity scores. Two genes are linked by an edge if their cosine similarity score is larger than a threshold (default:0.5). You can change the cosine similarity threshold to modify the network. In addition, you can choose the threshold by percentiles.

Find A Gene

You can zoom the network to a specific gene by using the 'Find A Gene' function. The found gene node is highlighted with a orange border.

Network Layout

You have following options to change the network layout:

Node Distance Adjust the visual distance between two gene nodes. After set the node distance, you should click the 'Update' button to update the visualization.
Zoom Back Reset the network to original position and size.
Show Labels Show/hide labels for genes (gene symbol) and associations (cosine similarity score). The default option displays both gene and association labels.

Edge/Node Information

You can check edge/node information when clicking an edge/node. We provide following information for edge/node:

Node (Gene) Gene symbol, gene name, link to GeneCard, top 25 associated GO terms with that gene.
Edge (Gene-Gene Association) Two gene symbols, cosine similarity score

Shared GO Terms

You can check shared GO terms between two genes when clicking on an edge. You can click the 'Checked Shared GOs' button to run the check function. This will display the top GO terms shared by two genes.

4.Gene Query

Gene Query Search

Gene query function allow you to obtain a list of genes and get associated GO terms for each gene. Currently, only HGNC IDs are supported for searching. We provide an ID mapper function to map other gene names/synonyms to HGNC IDs.

Gene Query Results

The results page contains following components:

Switch Genes You can check detailed information of each gene by switching gene tabs.
Download Functions You can download gene-GO association p-values for one specific gene in a csv file by clicking the 'Download Data' button, or you can download association p-values in a compressed file by clicking the 'Download All' button.
Basic Information Basic Information display the information of a specific gene(symbol, HGNC ID, link to GeneCard, total number of occurrences in PubMed abstracts).
Results Table Results table provides the raw p-value and a Bonferroni-corrected p-value between the gene and other GO terms.

Unmapped Genes

If there are gene IDs unmatched during the search, the gene IDs are displayed in a red box.

5.GO Query

GO Query Search

GO query function allow you to query a list of GO terms and get associated genes for each GO term. Currently, we only support searching using GO IDs.

GO Query Results

The results page contains following components:

Switch GO Terms You can check detailed information of each GO terms by switching GO tabs.
Download Functions You can download GO-gene association p-values for one specific GO term in a csv file by clicking the 'Download Data' button, or you can download association p-values in a compressed file by clicking the 'Download All' button.
Basic Information Basic Information display the information of a specific GO term (name, GO ID, link to AMIGO, total number of occurrences in PubMed abstracts)
Results Table Results table provides the raw p-value and a Bonferroni-corrected p-value between the GO term and other genes.

Unmatched GOs

Similar as gene query, if there are GO IDs unmatched during the search, the GO IDs are displayed in a red box.

6.Gene-GO Query

Gene-GO Query Search

Gene-GO query function allow you to query a p-value matrix among a list of genes and a list of GO terms. Currently, we only support for searching genes using HGNC IDs and searching GO terms using GO IDs. We provide an ID mapper function to map other gene names/synonyms to HGNC IDs.

Alternatively, you can choose a specific number of GO terms to query the gene-GO association matrix, or use the entire GO terms to construct the matrix.

Gene-GO Query Results

You can view gene-GO associations in a long table format or matrix format. You can download the p-value matrix in a csv file.

ID Mapper Query

ID Mapper Search

Supported Identifiers

Currently, our query functions only support HGNC IDs. The ID Mapper can take in a list of gene names/synonyms and return mapped HGNC IDs. You can click on 'Clear' to clear the box. The box on the right side contains a similar documentation for the ID Mapper.

You can enter a list of gene names/synonyms into the box on the left and click on 'Search' to search mappings. Various types of input are supported, including gene names/synonyms, Ensembl ID and NCBI Accession Number.

Supported Input Example of Input Example of Output
Gene Names breast cancer 1 BRCA1
Gene Synonyms BRCC1 BRCA1
Ensembl ID ENSG00000012048 BRCA1
NCBI Accession NM_007294 BRCA1
Previous Names/Synonyms PNCA4 BRCA1

ID Mapper Results

Results table display all mapped IDs, along with your queried term, gene symbol and name of mapped terms. Duplicated mappings are highlighted in yellow, you can check the box to determine which one to use. Afterwards, you can copy all terms to clipboard by clicking the 'Copy Selected' button.

Find Similar Genes

If no matched results are found for your queried term, you can use the 'Search for Similar Terms' function to identify similar terms.

Software

BayesGO

bayesGO is a Bayesian hierarchical model that simultaneously identifies pathway-modulating genes based on the literature mining data and facilitates interpreting functions of these new genes using Gene Ontology terms. This approach allows rigorous inference of gene-gene relationships based on the literature mining data while the GAIL web interface allows users to implement dynamic and interactive exploratory analyses.

Download

The R package bayesGO implements this statistical model and provides simple and user-friendly interface for its statistical inference. The 'P-Value Matrix' output from the GAIL web interface can be used as input for this software. Note, bayesGO requires JAGS. You can download JAGS and bayesGO through following links:

Usage

After you download the ‘P-Value Matrix’ output from the GAIL web interface, you can load this file into the R environment using the R function read.csv() and the R function bayesGO() fits the bayesGO model by taking this data as input. Then, the R function predict() implements clustering of genes and GO terms and identifies association between genes and GO terms. Finally, the R function plot() visualizes the analysis results as below, where column and row side bars show gene and GO term cluster indices and red colors within the heatmap indicates stronger association between genes and GO terms. Please check Yu et al. (2018) for the step-by-step analysis guideline and Chung et al. (2017) for more details about the statistical model.

Development

The data is integrated from HUGO, Ensembl, GenBank, Uniprot and Gene Ontology (GO). All data is stored using graph structures in Neo4j graph database. The web interface is developed using Django framework.