Dental Center of China-Japan Friendship Hospital, Beijing100029, China.
Department of Computer Science and Engineering, University of North Texas, Denton, TX76203, USA.
Show Author Information
Hide Author Information
Abstract
Many human diseases involve multiple genes in complex interactions. Large Genome-Wide Association Studies (GWASs) have been considered to hold promise for unraveling such interactions. However, statistic tests for high-order epistatic interactions ( Single Nucleotide Polymorphisms (SNPs)) raise enormous computational and analytical challenges. It is well known that the block-wise structure exists in the human genome due to Linkage Disequilibrium (LD) between adjacent SNPs. In this paper, we propose a novel Bayesian method, named BAM, for simultaneously partitioning SNPs into LD-blocks and detecting genome-wide multi-locus epistatic interactions that are associated with multiple diseases. Experimental results on the simulated datasets demonstrate that BAM is powerful and efficient. We also applied BAM on two GWAS datasets from WTCCC, i.e., Rheumatoid Arthritis and Type 1 Diabetes, and accurately recovered the LD-block structure. Therefore, we believe that BAM is suitable and efficient for the full-scale analysis of multi-disease-related interactions in GWASs.
Y.Wang, Z.Cai, P.Stothard, S.Moore, R.Goebel, L.Wang, and G.Lin, Fast accurate missing SNP genotype local imputation, BMC Research Notes, vol. 5, no. 1, p. 404, 2012.
Y.He, Z.Zhang, X.Peng, F.Wu, and J.Wang, De novo assembly methods for next generation sequencing data, Tsinghua Science and Technology, vol. 18, no. 5, pp. 500-514, 2013.
M.Nikpay, A.Goel, H. H.Won, L. M.Hall, C.Willenborg, S.Kanoni, D.Saleheen, T.Kyriakou, C. P.Nelson, J. C.Hopewell, et al., A comprehensive 1000 genomes-based genome-wide association meta-analysis of coronary artery disease, Nature Genetics, vol. 47, no. 10, p. 1121, 2015.
H.Schunkert, I. R.König, S.Kathiresan, M. P.Reilly, T. L.Assimes, H.Holm, M.Preuss, A. F.Stewart, M.Barbalic, C.Gieger, et al., Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease, Nature Genetics, vol. 43, no. 4, p. 333, 2011.
J. C.Lambert, C. A.Ibrahim-Verbaas, D.Harold, A. C.Naj, R.Sims, C.Bellenguez, G.Jun, A. L.DeStefano, J. C.Bis, G. W.Beecham, et al., Meta-analysis of 74 046 individuals identifies 11 new susceptibility loci for alzheimer’s disease, Nature Genetics, vol. 45, no. 12, p. 1452, 2013.
C.Sun, Q.Li, L.Cui, H.Li, and Y.Shi, Heterogeneous network-based chronic disease progression mining, Big Data Mining and Analytics, vol. 2, no. 1, pp. 25-34, 2018.
W.Van Rheenen, A.Shatunov, A. M.Dekker, R. L.McLaughlin, F. P.Diekstra, S. L.Pulit, R. A.van der Spek, U.Võsa, S.de Jong, M. R.Robinson, et al., Genome-wide association analyses identify new risk variants and the genetic architecture of amyotrophic lateral sclerosis, Nature Genetics, vol. 48, no. 9, p. 1043, 2016.
S.Ripke, N. R.Wray, C. M.Lewis, S. P.Hamilton, M. M.Weissman, G.Breen, E. M.Byrne, D. H.Blackwood, D. I.Boomsma, S.Cichon, et al., A mega-analysis of genome-wide association studies for major depressive disorder, Molecular Psychiatry, vol. 18, no. 4, p. 497, 2013.
P.Sklar, S.Ripke, L. J.Scott, O. A.Andreassen, S.Cichon, N.Craddock, H. J.Edenberg, J. I.Nurnberger, M.Rietschel, D.Blackwood, et al., Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4, Nature Genetics, vol. 44, no. 9, p. 1072, 2012.
X.Wan, C.Yang, Q.Yang, H.Xue, N. L.Tang, and W.Yu, Detecting two-locus associations allowing for interactions in genome-wide association studies, Bioinformatics, vol. 26, no. 20, pp. 2517-2525, 2010.
L. S.Yung, C.Yang, X.Wan, and W.Yu, Gboost: A GPU-based tool for detecting gene-gene interactions in genome-wide case control studies, Bioinformatics, vol. 27, no. 9, pp. 1309-1310, 2011.
Y.Liu, H.Xu, S.Chen, X.Chen, Z.Zhang, Z.Zhu, X.Qin, L.Hu, J.Zhu, G. P.Zhao, et al., Genome-wide interaction-based association analysis identified multiple new susceptibility loci for common diseases, PLoS Genetics, vol. 7, no. 3, p. e1001338, 2011.
J.Marchini, P.Donnelly, and L. R.Cardon, Genome-wide strategies for detecting multiple loci that influence complex diseases, Nature Genetics, vol. 37, no. 4, p. 413, 2005.
J.Li, A novel strategy for detecting multiple loci in genome-wide association studies of complex diseases, International Journal of Bioinformatics Research and Applications, vol. 4, no. 2, p. 150, 2008.
X.Wan, C.Yang, Q.Yang, H.Xue, N. L.Tang, and W.Yu, Predictive rule inference for epistatic interaction detection in genome-wide association studies, Bioinformatics, vol. 26, no. 1, pp. 30-37, 2009.
B.Liu, S.Feng, X.Guo, and J.Zhang, Bayesian analysis of complex mutations in hbv, hcv, and hiv studies, Big Data Mining and Analytics, vol. 2, no. 3, pp. 145-158, 2019.
X.Guo, N.Yu, F.Gu, X.Ding, J.Wang, and Y.Pan, Genome-wide interaction-based association of human diseases—a survey, Tsinghua Science and Technology, vol. 19, no. 6, pp. 596-616, 2014.
P. M.Visscher, N. R.Wray, Q.Zhang, P.Sklar, M. I.McCarthy, M. A.Brown, and J.Yang, 10 years of GWAS discovery: Biology, function, and translation, The American Journal of Human Genetics, vol. 101, no. 1, pp. 5-22, 2017.
Y. J.Wen, H.Zhang, Y. L.Ni, B.Huang, J.Zhang, J. Y.Feng, S. B.Wang, J. M.Dunwell, Y. M.Zhang, and R.Wu, Methodological implementation of mixed linear models in multi-locus genome-wide association studies, Briefings in Bioinformatics, vol. 19, no. 4, pp. 700-712, 2017.
X.Guo, Searching genome-wide disease association through SNP data, PhD dissertation, Georgia State University, Athens, GA, USA, 2015.
[26]
X.Guo, J.Zhang, Z.Cai, D. Z.Du, and Y.Pan, Searching genome-wide multi-locus associations for multiple diseases based on bayesian inference, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 14, no. 3, pp. 600-610, 2017.
T.Berisa and J. K.Pickrell, Approximately independent linkage disequilibrium blocks in human populations, Bioinformatics, vol. 32, no. 2, p. 283, 2016.
S.Gazal, H. K.Finucane, N. A.Furlotte, P. R.Loh, P. F.Palamara, X.Liu, A.Schoech, B.Bulik-Sullivan, B. M.Neale, A.Gusev, et al., Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection, Nature Genetics, vol. 49, no. 10, p. 1421, 2017.
Y.Cheng, H.Sabaa, Z.Cai, R.Goebel, and G.Lin, Efficient haplotype inference algorithms in one whole genome scan for pedigree data with non-genotyped founders, Acta Mathematicae Applicatae Sinica, English Series, vol. 25, no. 3, pp. 477-488, 2009.
Z.Liu and S.Lin, Multilocus LD measure and tagging SNP selection with generalized mutual information, Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society, vol. 29, no. 4, pp. 353-364, 2005.
Y.Zhang, J.Zhang, and J. S.Liu, Block-based bayesian epistasis association mapping with application to WTCCC type 1 diabetes data, The Annals of Applied Statistics, vol. 5, no. 3, p. 2052, 2011.
Wellcome Trust Case Control Consortium, Genome-wide association study of 14 000 cases of seven common diseases and 3000 shared controls, Nature, vol. 447, no. 7145, p. 661, 2007.
X.Guo, J.Zhang, Z.Cai, D. Z.Du, and Y.Pan, DAM: A bayesian method for detecting genome-wide associations on multiple diseases, in Bioinformatics Research and Applications. New York, NY, USA: Springer, 2015, pp. 96-107.
S. B.Gabriel, S. F.Schaffner, H.Nguyen, J. M.Moore, J.Roy, B.Blumenstiel, J.Higgins, M.DeFelice, A.Lochner, M.Faggart, et al., The structure of haplotype blocks in the human genome, Science, vol. 296, no. 5576, pp. 2225-2229, 2002.
P. I.de Bakker, R.Yelensky, I.Pe’er, S. B.Gabriel, M. J.Daly, and D.Altshuler, Efficiency and power in genetic association studies, Nature Genetics, vol. 37, no. 11, pp. 1217-1223, 2005.
X.Wan, C.Yang, Q.Yang, H.Xue, X.Fan, N. L.Tang, and W.Yu, Boost: A fast approach to detecting gene-gene interactions in genome-wide case-control studies, The American Journal of Human Genetics, vol. 87, no. 3, pp. 325-340, 2010.
J.Marchini, B.Howie, S.Myers, G.McVean, and P.Donnelly, A new multipoint method for genome-wide association studies by imputation of genotypes, Nature Genetics, vol. 39, no. 7, pp. 906-913, 2007.
J. K.Pritchard and N. A.Rosenberg, Use of unlinked genetic markers to detect population stratification in association studies, The American Journal of Human Genetics, vol. 65, no. 1, pp. 220-228, 1999.
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
10.26599/TST.2019.9010064.F1
Illustration of ten association types in a dataset with three phenotypic traits.
10.26599/TST.2019.9010064.F2
False-positive rates of BAM under null models. The plots in (a) and (b) show the false-positive rates of BAM for different d when the sample sizes and the numbers of SNPs vary. In (a), and .
10.26599/TST.2019.9010064.F3
Performance comparison between BAM, SAM, and DAM on the datasets with two-locus epistatic interactions. The x-axis shows the MAF value.
10.26599/TST.2019.9010064.F4
Four block structures recovered by BAM in Chromosome 6. The top half figure is the Haploview. The x-axis in the bottom half figure is the physical locations of the SNPs, and the y-axis is the posterior probability of SNPs.
10.26599/TST.2019.9010064.T1Odds tables of disease Models 1 - 4.
BB
Bb
bb
Model 1
AA
Aa
aa
Model 2
AA
Aa
aa
Model 3
AA
Aa
aa
Model 4
AA
Aa
aa
Note: The disease prevalence , the genetic heritability , and the MAFs determine the parameters ( and ) as in Ref. [41]. In the simulation, we set for all models, for Model 1, for Models 2 - 4, and the MAFs of disease-associated SNPs to be 0.1, 0.2, and 0.4.