Several layers of information have been integrated in AlzBase. Information regarding gene expressionincludes Alzheimer’s disease (AD), non-dementia, related diseases and aging. Information regarding correlation with disease severity includes cortical atrophy, Braak stage, MMSE and NFT scores. Other annotation information comes from Allen brain atlas, GWAS catalogue, eQTL studies and CTD drug database.Inaddition, gene-gene correlation can also be retrieved.A summary of AlzBase statistics is shown in Table 1.
As shown in Figure 1, three categories of information have been integrated into AlzBase. These include differential gene expression, other gene annotations and gene-gene correlation. A comprehensive summary on the top genes from AlzBase and AD genetics is also provided.
Figure 1 Data processing and data content of alzBase.
Core datasets on Alzheimer's disease
The core datasets include the transcriptome data from both brain and blood. The brain datasets cover multiple stages of the disease development including aging, non-dementia, early AD and late AD. Several brain regions and sub-regions have been included in the brain datasets. The blood datasets also cover several stages including aging, MCI, mild/moderate AD and severe AD. For details please refer to Tables 2a & 2b.Most of the datasetswereacquired from Gene Expression Omnibus (GEO) at NCBI. The RankProd algorithm was used for differential gene expression analysis.
Datasets on related diseases and aging
Publically available datasets on brain transcriptome of other neurological disorders have been integrated into AlzBase. These include datasets for Parkinson’s disease, Huntington’s disease, schizophrenia, bipolar disorder, autism and some other diseases. The details are shown in Table 3.Similarly, the RankProd algorithm was used for differential gene expression analysis.
For in-depth analysis of brain aging, we collected several datasets on brain transcriptome covering a wide range of age span. The datasets are listed in Table 4. To select critical aging genes, a linear correlation algorithm was used due to the nature of time series rather than case-control design in aging studies.
In some previous studies, it has been claimed that people with type 2 diabetes(T2D) have higher risk of developing AD. Therefore, we collected several transcriptome datasets of T2D covering blood, muscle, liver and islet. The datasets are listed in Table 5. To be consistent, the RankProd algorithm was used for differential gene expression analysis.
Annotations of the genes in AlzBase
The annotations include the following sources,
1) Correlation with AD severity including Braak stage, cortical atrophy, MMSE score and NFT score. The information was retrieved from three published studies.
2) Annotations from Allen brain atlas(http://human.brain-map.org/). These include brain expression, molecular function, pathways and biological processes.
3) Phenotypic information from GWAS catalogue curated from large GWAS studies (https://www.genome.gov/26525384#download).
4) Regulatory SNPs from brain eQTL studies.
5) Drug-gene interaction from CTD drug database(http://ctdbase.org/).
Besides information on genedys-regulation and annotations, we also analyzed the relationship among the genes in AlzBase.
1) brain gene co-expression extracted from the brain co-expression network,
2) composite correlation pattern measured by normalized mutual information (NMI) using data in Table 2.
Footnote: For references please refer to our AlzBase paper (in submission) and previous works (listed in "About us").