AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.2 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Metabolite-Disease Association Prediction Algorithm Combining DeepWalk and Random Forest

Jiaojiao TieXiujuan Lei( )Yi Pan( )
School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
Department of Computer Science, Georgia State University, Atlanta, GA 30302-3994, USA
Show Author Information

Abstract

Identifying the association between metabolites and diseases will help us understand the pathogenesis of diseases, which has great significance in diagnosing and treating diseases. However, traditional biometric methods are time consuming and expensive. Accordingly, we propose a new metabolite-disease association prediction algorithm based on DeepWalk and random forest (DWRF), which consists of the following key steps: First, the semantic similarity and information entropy similarity of diseases are integrated as the final disease similarity. Similarly, molecular fingerprint similarity and information entropy similarity of metabolites are integrated as the final metabolite similarity. Then, DeepWalk is used to extract metabolite features based on the network of metabolite-gene associations. Finally, a random forest algorithm is employed to infer metabolite-disease associations. The experimental results show that DWRF has good performances in terms of the area under the curve value, leave-one-out cross-validation, and five-fold cross-validation. Case studies also indicate that DWRF has a reliable performance in metabolite-disease association prediction.

References

[1]
J. A. Harris and F. G. Benedict, A biometric study of human basal metabolism, Proc. Natl. Acad. Sci. USA, vol. 4, no. 12, pp. 370-373, 1918.
[2]
L. Cheng, H. X. Yang, H. Q. Zhao, X. Y. Pei, H. B. Shi, J. Sun, Y. P. Zhang, Z. Z. Wang, and M. Zhou, MetSigDis: A manually curated resource for the metabolic signatures of diseases, Brief. Bioinform., vol. 20, no. 1, pp. 203-209, 2019.
[3]
Y. M. Chen, Y. Liu, R. F. Zhou, X. L. Chen, C. Wang, X. Y. Tan, L. J. Wang, R. D. Zheng, H. W. Zhang, W. H. Ling, et al., Associations of gut-flora-dependent metabolite trimethylamine-N-oxide, betaine and choline with non-alcoholic fatty liver disease in adults, Sci. Rep., vol. 6, no. 1, p. 19076, 2016.
[4]
D. Y. Hui, Intestinal phospholipid and lysophospholipid metabolism in cardiometabolic disease, Curr. Opin. Lipidol., vol. 27, no. 5, pp. 507-512, 2016.
[5]
E. T. Oni, R. Kalathiya, E. C. Aneni, S. S. Martin, M. J. Blaha, T. Feldman, A. S. Agatston, R. S. Blumenthal, R. D. Conceiçao, J. A. M. Carvalho, et al., Relation of physical activity to prevalence of nonalcoholic Fatty liver disease independent of cardiometabolic risk, Am.J. Cardiol., vol. 115, no. 1, pp. 34-39, 2015.
[6]
A. Budhu, A. Terunuma, G. Zhang, S. P. Hussain, S. Ambs, and X. W. Wang, Metabolic profiles are principally different between cancers of the liver, pancreas and breast, Int.J. Biol. Sci., vol. 10, no. 9, pp. 966-972, 2014.
[7]
R. A. Moats, T. Ernst, T. K. Shonk, and B. D. Ross, Abnormal cerebral metabolite concentrations in patients with probable Alzheimer disease, Magn. Reson. Med., vol. 32, no. 1, pp. 110-115, 1994.
[8]
P. G. Unschuld, R. A. E. Edden, A. Carass, X. Y. Liu, M. Shanahan, X. Wang, K. Oishi, J. Brandt, S. S. Bassett, G. W. Redgrave, et al., Brain metabolite alterations and cognitive dysfunction in early Huntington’s disease, Mov. Disord., vol. 27, no. 7, pp. 895-902, 2012.
[9]
S. Hori, S. Nishiumi, K. Kobayashi, M. Shinohara, Y. Hatakeyama, Y. Kotani, N. Hatano, Y. Maniwa, W. Nishio, T. Bamba, et al., A metabolomic approach to lung cancer, Lung Cancer, vol. 74, no. 2, pp. 284-292, 2011.
[10]
C. Cheng, S. M. Zhuo, B. Zhang, X. Zhao, Y. Liu, C. L. Liao, J. Quan, Z. Z. Li, A. M. Bode, Y. Cao, et al., Treatment implications of natural compounds targeting lipid metabolism in nonalcoholic fatty liver disease, obesity and cancer, Int.J. Biol. Sci., vol. 15, no. 8, pp. 1654-1663, 2019.
[11]
Y. J. Xu, H. X. Yang, T. Wu, Q. Dong, Z. G. Sun, D. S. Shang, F. Li, Y. Q. Xu, F. Su, and S. Y. Liu, BioM2MetDisease: A manually curated database for associations between microRNAs, metabolites, small molecules and metabolic diseases, Database, vol. 2017, p. bax037, 2017.
[12]
D. S. Wishart, Y. D. Feunang, A. Marcu, A. C. Guo, K. Liang, R. Vázquez-Fresno, T. Sajed, D. Johnson, C. Li, N. Karu, et al., HMDB 4.0: The human metabolome database for 2018, Nucleic Acids Res., vol. 46, no. D1, pp. D608-D617, 2018.
[13]
D. S. Shang, C. Q. Li, Q. L. Yao, H. X. Yang, Y. J. Xu, J. W. Han, J. Li, F. Su, Y. P. Zhang, C. L. Zhang, et al., Prioritizing candidate disease metabolites based on global functional relationships between metabolites in the context of metabolic pathways, PLoS One, vol. 9, no. 8, p. e104934, 2014.
[14]
Y. Hu, T. Y. Zhao, N. Y. Zhang, T. Y. Zang, J. Zhang, and L. Cheng, Identifying diseases-related metabolites using random walk, BMC Bioinformatics, vol. 19, no. S5, p. 116, 2018
[15]
Y. T. Wang, L. R. Juan, J. J. Peng, T. Y. Zang, and Y. D. Wang, Prioritizing candidate diseases-related metabolites based on literature and functional similarity, BMC Bioinformatics, vol. 20, no. 18, p. 574, 2019.
[16]
Y. J. Qi, Random forest for bioinformatics, in Ensemble Machine Learning: Methods and Applications, C. Zhang and Y. Q. Ma, eds. Boston, MA, USA: Springer, 2012, pp. 307-323.
[17]
C. Chen, A. Liaw, and L. Breiman, Using Random Forest to Learn Imbalanced Data, Berkeley, CA, USA: University of California, 2004.
[18]
H. J. Lowe and G. O. Barnett, Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches, JAMA, vol. 271, no. 14, pp. 1103-1108, 1994.
[19]
Z. Q. Fang and X. J. Lei, Prediction of miRNA-circRNA associations based on k-NN multi-label with random walk restart on a heterogeneous network, Big Data Mining and Analytics, vol. 2, no. 4, pp. 261-272, 2019.
[20]
X. Y. Li, Y. P. Lin, C. L. Gu, and J. L. Yang, FCMDAP: Using miRNA family and cluster information to improve the prediction accuracy of disease related miRNAs, BMC Syst. Biol., vol. 13, no. 2, p. 26, 2019.
[21]
B. Perozzi, R. Al-Rfou, and S. Skiena, DeepWalk: Online learning of social representations, in Proc. 20th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, New York, NY, USA, 2014, pp. 701-710.
[22]
[23]
A. Liaw and M. Wiener, Classification and regression by randomForest, R News, vol. 2, no. 3, pp. 18-22, 2002.
[24]
W. Jiang, J. Y. Lin, H. Q. Wang, and S. C. Zou. Hybrid semantic service matchmaking method based on a random forest, Tsinghua Sci. Technol., vol. 25, no. 6, pp. 798-812, 2020.
[25]
G. Y. Wu, X. Guo, and B. H. Xu. BAM: A block-based Bayesian method for detecting genome-wide associations with multiple diseases, Tsinghua Sci. Technol., vol. 25, no. 5, pp. 678-689, 2020.
[26]
M. Bouazizi and T. Ohtsuki, Multi-class sentiment analysis on twitter: Classification performance and challenges, Big Data Mining and Analytics, vol. 2, no. 3, pp. 181-194, 2019.
[27]
P. J. Rousseeuw, I. Ruts, and J. W. Tukey, The bagplot: A bivariate boxplot, Am. Stat., vol. 53, no. 4, pp. 382-387, 1999.
[28]
M. Goedert and M. G. Spillantini, A century of Alzheimer’s disease, Science, vol. 314, no. 5800, pp. 777-781, 2006.
[29]
R. L. Siegel, K. D. Miller, S. A. Fedewa, D. J. Ahnen, R. G. S. Meester, A. Barzi, and A. Jemal, Colorectal cancer statistics, 2017, CA: A Cancer J. Clin., vol. 67, no. 3, pp. 177-193, 2017.
[30]
C. C. Zhang, L. F. Ma, Y. J. Niu, Z. X. Wang, X. Xu, Y. Li, and Y. C. Yu, Circular RNA in lung cancer research: Biogenesis, functions, and roles, Int.J. Biol. Sci., vol. 16, no. 5, pp. 803-814, 2020.
Tsinghua Science and Technology
Pages 58-67
Cite this article:
Tie J, Lei X, Pan Y. Metabolite-Disease Association Prediction Algorithm Combining DeepWalk and Random Forest. Tsinghua Science and Technology, 2022, 27(1): 58-67. https://doi.org/10.26599/TST.2021.9010003

849

Views

97

Downloads

20

Crossref

15

Web of Science

23

Scopus

2

CSCD

Altmetrics

Received: 21 December 2020
Accepted: 13 January 2021
Published: 17 August 2021
© The author(s) 2022

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return