PDF (1.3 MB)
Collect
Submit Manuscript
Open Access

Exploratory and Interpretable Approach to Estimating Latent Health Risk Factors Without Using Domain Knowledge

Graduate School of Human Sciences, Waseda University, Tokorozawa 359-1192, Japan
Faculty of Human Sciences, Waseda University, Tokorozawa 359-1192, Japan
Show Author Information

Abstract

The identification of latent risk factors that can induce to health risks or an abnormal status is an important task in healthcare data analyses. In recent years, health analyses based on neural network models have been applied widely. However, such analysis processes are blackbox and the results lack explainability. Some approaches by constructing a domain model may tackle these issues. However, domain knowledge from an expert is required. In this study, we propose an exploratory and interpretable approach to estimating latent health risk factors without relying on domain knowledge, in which feature selection and causal discovery are used to construct a domain model for uncovering complex relationships in health and medical data. An evaluation experiment conducted on two datasets by comparing the proposed approach with four baselines demonstrated that the proposed approach outperformed the baselines in terms of model fitness. Furthermore, the number of model parameters in our method was smaller than that in the baselines, which reduced model complexity. Moreover, the analysis process of the proposed approach was visible and explainable, which improved the interpretability of the analysis processes.

References

[1]

S. Sakr and A. Elgammal, Towards a comprehensive data analytics framework for smart healthcare services, Big Data Res., vol. 4, pp. 44–58, 2016.

[2]

D. Cirillo and A. Valencia, Big data analytics for personalized medicine, Curr. Opin. Biotechnol., vol. 58, pp. 161–167, 2019.

[3]

T. Huang, L. Lan, X. Fang, P. An, J. Min, and F. Wang, Promises and challenges of big data computing in health sciences, Big Data Res., vol. 2, no. 1, pp. 2–11, 2015.

[4]

K. Tago, K. Takagi, and Q. Jin, Detection of health abnormality considering latent factors inducing a disease, IEEE Access, vol. 8, pp. 139433–139443, 2020.

[5]

P. Genevès, T. Calmant, N. Layaïda, M. Lepelley, S. Artemova, and J. L. Bosson, Scalable machine learning for predicting at-risk profiles upon hospital admission, Big Data Res., vol. 12, pp. 23–34, 2018.

[6]

Y. Yang, Y. Li, R. Chen, J. Zheng, Y. Cai, and G. Fortino, Risk prediction of renal failure for chronic disease population based on electronic health record big data, Big Data Res., vol. 25, p. 100234, 2021.

[7]

A. Gumaei, W. Ismail, M. R. Hassan, M. Hassan, E. Mohamed, A. Alelaiwi, and G. Fortino, A decision-level fusion method for COVID-19 patient health prediction, Big Data Res., vol. 27, p. 100287, 2021.

[8]

N. B. Yahia, M. D. Kandara, and N. B. BenSaoud, Integrating models and fusing data in a deep ensemble learning method for predicting epidemic diseases outbreak, Big Data Res., vol. 27, p. 100286, 2022.

[9]

D. Wu, X. Luo, M. Shang, Y. He, G. Wang, and X. Wu, A data-characteristic-aware latent factor model for web services QoS prediction, IEEE Trans. Knowl. Data Eng., vol. 34, no. 6, pp. 2525–2538, 2022.

[10]
Y. Yuan, X. Luo, M. Shang, and Z. Wang, A Kalman-filter-incorporated latent factor analysis model for temporally dynamic sparse data, IEEE Trans. Cybern., vol. 53, no. 9, pp. 5788–5801, 2023.
[11]

A. H. Shoabjareh, A. R. Mamdoohi, and T. Nordfjærn, Analysis of pedestrians’ behaviour: A segmentation approach based on latent variables, Accid. Anal. Prev., vol. 157, p. 106160, 2021.

[12]

K. Li, J. Wang, S. Li, H. Yu, L. Zhu, J. Liu, and L. Wu, Feature extraction and identification of Alzheimer’s disease based on latent factor of multi-channel EEG, IEEE Trans. Neural Syst. Rehabil. Eng., vol. 29, pp. 1557–1567, 2021.

[13]

Y. Yu, M. Li, L. Liu, Y. Li, and J. Wang, Clinical big data and deep learning: Applications, challenges, and future outlooks, Big Data Mining and Analytics, vol. 2, no. 4, pp. 288–305, 2019.

[14]

S. Ali, F. Akhlaq, A. S. Imran, Z. Kastrati, S. M. Daudpota, and M. Moosa, The enlightening role of explainable artificial intelligence in medical & healthcare domains: A systematic literature review, Comput. Biol. Med., vol. 166, p. 107555, 2023.

[15]
R. Cong, J. Wu, S. Nishimura, A. Ogihara, and Q. Jin, Determining important features in multidimensional health data for individualized precision healthcare, in Proc. IEEE Int. Conf. Dependable, Autonomic and Secure Computing, Int. Conf. Pervasive Intelligence and Computing, Int. Conf. Cloud and Big Data Computing, Int. Conf. Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Abu Dhabi, United Arab Emirates, 2023, pp. 77–83.
[16]

R. Cong, O. Deng, S. Nishimura, A. Ogihara, and Q. Jin, Multiple feature selection based on an optimization strategy for causal analysis of health data, Health Inf. Sci. Syst., vol. 12, no. 1, p. 52, 2024.

[17]

S. Shimizu, P. O. Hoyer, A. Hyvrinen, and A. Kerminen, A linear non-Gaussian acyclic model for causal discovery, Journal of Machine Learning Research, vol. 7, no. 72, pp. 2003–2030, 2006.

[18]

S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. O. Hoyer, and K. Bollen, DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model, J. Mach. Learn. Res., vol. 12, pp. 1225–1248, 2011.

[19]

J. Xie, D. Wen, L. Liang, Y. Jia, L. Gao, and J. Lei, Evaluating the validity of current mainstream wearable devices in fitness tracking under various physical activities: Comparative study, JMIR MHealth UHealth, vol. 6, no. 4, p. e94, 2018.

[20]
G. B. Moody and R. G. Mark, A database to support development and evaluation of intelligent intensive care monitoring, in Proc. Computers in Cardiology, Indianapolis, IN, USA, 1996, pp. 657–660.
[21]

D. Hooper, J. Coughlan, and M. R. Mullen, Structural equation modelling: Guidelines for determining model fit, The Electronic Journal of Business Research Methods, vol. 6, no. 1, pp. 53–60, 2008.

[22]
J. H. Steiger, Statistically based tests for the number of common factors, in Proc. Annu. Meeting Psychometric Soc., Iowa City, IA, USA, 1980, pp. 424–453.
[23]

K. G. Jöreskog and D. Sörbom, Recent developments in structural equation modeling, J. Mark. Res., vol. 19, no. 4, pp. 404–416, 1982.

[24]

P. M. Bentler, Comparative fit indexes in structural models, Psychol. Bull., vol. 107, no. 2, pp. 238–246, 1990.

[25]

L. T. Hu and P. M. Bentler, Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives, Struct. Equ. Model. A Multidiscip. J., vol. 6, no. 1, pp. 1–55, 1999.

[26]

Y. Rosseel, Lavaan: An R package for structural equation modeling, J. Stat. Soft., vol. 48, no. 2, pp. 1–36, 2012.

[27]
I. T. Jolliffe, Principal Component Analysis (Second Edition). New York, NY, USA: Springer, 2002.
[28]
J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, no. 14, pp. 281–297, 1967.
[29]
J. Pearl, Bayesian networks: A model of self-activated memory for evidential reasoning, https://escholarship.org/uc/item/0vr7830n, 1985.
Big Data Mining and Analytics
Pages 447-457
Cite this article:
Cong R, Nishimura S, Ogihara A, et al. Exploratory and Interpretable Approach to Estimating Latent Health Risk Factors Without Using Domain Knowledge. Big Data Mining and Analytics, 2025, 8(2): 447-457. https://doi.org/10.26599/BDMA.2024.9020081
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return