AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (6.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq Data

Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen 518172, China, and also with Shenzhen Research Institute of Big Data, Shenzhen 518172, China
School of Data Science, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen), Shenzhen 518172, China, and also with Warshel Institute for Computational Biology, Shenzhen 518172, China
Show Author Information

Abstract

Single-cell RNA sequencing (scRNA-seq) technology measures the expression of thousands of genes at the cellular level. Analyzing single-cell transcriptome allows the identification of heterogeneous cell groups, cellular-level regulations, and the trajectory of cell development. An important aspect in the analyses of scRNA-seq data is the clustering of cells, which is hampered by issues, such as high dimensionality, cell type imbalance, redundancy, and dropout. Given cells of each type are functionally consistent, incorporating biological relations among genes may improve the clustering results. In light of this, we have developed a deep-embedded clustering method, G3DC. This method combines a graph regularization based on the pre-existing gene network and a feature selector based on the 2,1-norm regularization, along with a reconstruction loss, to generate a discriminatory and informative embedding. Utilizing the gene interaction network bolsters the clustering performance and aids in selecting functionally coherent genes, consequently enriching the clustering results. Extensive experiments have shown that G3DC offers high clustering accuracy with regard to agreement with true cell types, outperforming other leading single-cell clustering methods. In addition, G3DC selects biologically relevant genes that contribute to the clustering, providing insight into biological functionality that differentiates cell groups.

References

[1]
A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algorithm, in Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, 2001, pp. 849−856.
[2]
J. Xie, R. Girshick, and A. Farhadi, Unsupervised deep embedding for clustering analysis, in Proc. 33 rd Int. Conf. Machine Learning, New York, NY, USA, 2016, pp. 478–487.
[3]
J. Fan, Large-scale subspace clustering via k-factorization, in Proc. 27 th ACM SIGKDD Conf. Knowledge Discovery & Data Mining, Singapore, 2021, pp. 342–352.
[4]
J. A. Hartigan and M. A. Wong, A K-means clustering algorithm, J. Roy. Stat. Soc. Ser. C : Appl. Stat., vol. 28, no. 1, pp. 100–108, 1979.
[5]
J. Žurauskienė and C. Yau, pcaReduce: Hierarchical clustering of single cell transcriptional profiles, BMC Bioinformatics, vol. 17, p. 140, 2016.
[6]

H. Hotelling, Analysis of a complex of statistical variables into principal components, J. Educ. Psychol., vol. 24, no. 6, pp. 417–441, 1933.

[7]

J. S. Herman and D. Grün, FateID infers cell fate bias in multipotent progenitors from single-cell RNA-seq data, Nat. Methods, vol. 15, no. 5, pp. 379–386, 2018.

[8]
S. C. Hicks, R. Liu, Y. Ni, E. Purdom, and D. Risso, mbkmeans: Fast clustering for single cell data using mini-batch k-means, PLoS Comput. Biol., vol. 17, no. 1, p. e1008625, 2021.
[9]
P. Lin, M. Troup, and J. W. K. Ho, CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data, Genome Biol., vol. 18, no. 1, p. 59, 2017.
[10]

A. Zeisel, A. B. Muñoz-Manchado, S. Codeluppi, P. Lönnerberg, G. La Manno, A. Juréus, S. Marques, H. Munguba, L. He, C. Betsholtz, et al., Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq, Science, vol. 347, no. 6226, pp. 1138–1142, 2015.

[11]
M. Guo, H. Wang, S. S. Potter, J. A. Whitsett, and Y. Xu, SINCERA: A pipeline for single-cell RNA-Seq profiling analysis, PLoS Comput. Biol., vol. 11, no. 11, p. e1004575, 2015.
[12]
J. Fan, Y. Tu, Z. Zhang, M. Zhao, and H. Zhang, A simple approach to automated spectral clustering, in Proc. 36 th Int. Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2022, p. 720.
[13]
D. Qiao, C. Ding, and J. Fan, Federated spectral clustering via secure similarity reconstruction, presented at the 37th Int. Conf. Neural Information Processing Systems, New Orleans, LA, USA, 2023.
[14]

J. Fan, Z. Tian, M. Zhao, and T. W. S. Chow, Accelerated low-rank representation for subspace clustering and semi-supervised classification on large-scale data, Neural Netw., vol. 100, pp. 39–48, 2018.

[15]

J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, 2000.

[16]

V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks, J. Stat. Mech.: Theory Exp., vol. 2008, no. 10, p. P10008, 2008.

[17]

V. A. Traag, L. Waltman, and N. J. van Eck, From Louvain to Leiden: Guaranteeing well-connected communities, Sci. Rep., vol. 9, no. 1, p. 5233, 2019.

[18]

R. Satija, J. A. Farrell, D. Gennert, A. F. Schier, and A. Regev, Spatial reconstruction of single-cell gene expression data, Nat. Biotechnol., vol. 33, no. 5, pp. 495–502, 2015.

[19]
F. A. Wolf, P. Angerer, and F. J. Theis, SCANPY: Large-scale single-cell gene expression data analysis, Genome Biol., vol. 19, no. 1, p. 15, 2018.
[20]

S. H. H. Anuar, Z. A. Abas, N. M. Yunos, N. H. M. Zaki, N. A. Hashim, M. F. Mokhtar, S. A. Asmai, Z. Z. Abidin, and A. F. Nizam, Comparison between Louvain and Leiden algorithm for network structure: A review, J. Phys.: Conf. Ser., vol. 2129, no. 1, p. 012028, 2021.

[21]

C. Xu and Z. Su, Identification of cell types from single-cell transcriptomes using a novel clustering method, Bioinformatics, vol. 31, no. 12, pp. 1974–1980, 2015.

[22]
B. Wang, D. Ramazzotti, L. De Sano, J. Zhu, E. Pierson, and S. Batzoglou, SIMLR: A tool for large-scale genomic analyses by multi-kernel learning, Proteomics, vol. 18, no. 2, p. 1700232, 2018.
[23]

X. Qiu, Q. Mao, Y. Tang, L. Wang, R. Chawla, H. A. Pliner, and C. Trapnell, Reversed graph embedding resolves complex single-cell trajectories, Nat. Methods, vol. 14, no. 10, pp. 979–982, 2017.

[24]
Y. Yang, R. Huh, H. W. Culpepper, Y. Lin, M. I. Love, and Y. Li, SAFE-clustering: Single-cell aggregated (from Ensemble) clustering for single-cell RNA-seq data, Bioinformatics, vol. 35, no. 8, pp. 1269–1277, 2019.
[25]
R. Huh, Y. Yang, Y. Jiang, Y. Shen, and Y. Li, SAME-clustering: Single-cell aggregated clustering via mixture model ensemble, Nucleic Acids Res., vol. 48, no. 1, pp. 86–95, 2020.
[26]

X. Zhu, J. Li, H. D. Li, M. Xie, Miao, and J. Wang, sc-GPE: A graph partitioning-based cluster ensemble method for single-cell, Front. Genet., vol. 11, p. 604790, 2020.

[27]
B. Ranjan, F. Schmidt, W. Sun, J. Park, M. A. Honardoost, J. Tan, N. A. Rayan, and S. Prabhakar, scConsensus: Combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data, BMC Bioinformatics, vol. 22, no. 1, p. 186, 2021.
[28]

X. Zhu, J. Zhang, Y. Xu, J. Wang, X. Peng, and H. D. Li, Single-cell clustering based on shared nearest neighbor and graph partitioning, Interdiscip. Sci.: Comput. Life Sci., vol. 12, no. 2, pp. 117–130, 2020.

[29]

X. Zhu, H. D. Li, L. Guo, F. X. Wu, and J. Wang, Analysis of single-cell RNA-seq data by clustering approaches, Curr. Bioinf., vol. 14, no. 4, pp. 314–322, 2019.

[30]
V. Y. Kiselev, K. Kirschner, M. T. Schaub, T. Andrews, A. Yiu, T. Chandra, K. N. Natarajan, W. Reik, M. Barahona, A. R. Green, et al., SC3: Consensus clustering of single-cell RNA-seq data, Nat. Methods, vol. 14, no. 5, pp. 483–486, 2017.
[31]

G. Eraslan, L. M. Simon, M. Mircea, N. S. Mueller, and F. J. Theis, Single-cell RNA-seq denoising using a deep count autoencoder, Nat. Commun., vol. 10, no. 1, p. 390, 2019.

[32]
J. Cai, J. Fan, W. Guo, S. Wang, Y. Zhang, and Z. Zhang, Efficient deep embedded subspace clustering, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022, pp. 1–10.
[33]

T. Tian, J. Wan, Q. Song, and Z. Wei, Clustering single-cell RNA-seq data with a model-based deep learning approach, Nat. Mach. Intell., vol. 1, no. 4, pp. 191–198, 2019.

[34]

X. Li, K. Wang, Y. Lyu, H. Pan, J. Zhang, D. Stambolian, K. Susztak, M. P. Reilly, G. Hu, and M. Li, Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis, Nat. Commun., vol. 11, no. 1, p. 2338, 2020.

[35]
X. Guo, L. Gao, X. Liu, and J. Yin, Improved deep embedded clustering with local structure preservation, in Proc. 26 th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 1753–1759.
[36]
B. Tran, D. Tran, H. Nguyen, S. Ro, and T. Nguyen, scCAN: Single-cell clustering using autoencoder and network fusion, Sci. Rep., vol. 12, no. 1, p. 10267, 2022.
[37]
T. Wang, B. Li, and S. Nabavi, Single-cell RNA sequencing data clustering using graph convolutional networks, in Proc. 2021 IEEE Int. Conf. Bioinformatics and Biomedicine, Houston, TX, USA, 2021, pp. 2163–2170.
[38]
T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, in Proc. 5 th Int. Conf. on Learning Representations, Toulon, France, 2017.
[39]
Y. Cheng and X. Ma, scGAC: A graph attentional architecture for clustering single-cell RNA-seq data, Bioinformatics, vol. 38, no. 8, pp. 2187–2193, 2022.
[40]

Y. Gan, X. Huang, G. Zou, S. Zhou, and J. Guan, Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network, Brief. Bioinform., vol. 23, no. 2, p. bbac018, 2022.

[41]
A. Patil and H. Nakamura, HINT: A database of annotated protein-protein interactions and their homologs, Biophysics, vol. 1, pp. 21–24, 2005.
[42]

L. van der Maaten and G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008.

[43]
X. He, D. Cai, and P. Niyogi, Laplacian score for feature selection, in Proc. 18 th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2005, pp. 507–514.
[44]

T. Beißbarth and T. P. Speed, GOstat: Find statistically overrepresented gene ontologies within a group of genes, Bioinformatics, vol. 20, no. 9, pp. 1464–1465, 2004.

[45]
R. Lin, X. Bao, H. Wang, S. Zhu, Z. Liu, Q. Chen, K. Ai, and B. Shi, TRPM2 promotes pancreatic cancer by PKC/MAPK pathway, Cell Death Dis., vol. 12, no. 6, p. 585, 2021.
[46]
S. Khoo, T. B. Gibson, D. Arnette, M. Lawrence, B. January, K. McGlynn, C. A. Vanderbilt, S. C. Griffen, M. S. German, and M. H. Cobb, MAP kinases and their roles in pancreatic β-cells, Cell Biochem. Biophys., vol. 40, no. S3, pp. 191–200, 2004.
[47]

M. Fasolino, G. W. Schwartz, A. R. Patil, A. Mongia, M. L. Golson, Y. J. Wang, A. Morgan, C. Liu, J. Schug, J. Liu, et al., Single-cell multi-omics analysis of human pancreatic islets reveals novel cellular states in type 1 diabetes, Nat. Metab., vol. 4, no. 2, pp. 284–299, 2022.

[48]

X. He, F. Gao, J. Hou, T. Li, J. Tan, C. Wang, X. Liu, M. Wang, H. Liu, Y. Chen, et al., Metformin inhibits MAPK signaling and rescues pancreatic aquaporin 7 expression to induce insulin secretion in type 2 diabetes mellitus, J. Biol. Chem., vol. 297, no. 2, p. 101002, 2021.

[49]

C. Bogdan, Nitric oxide and the immune response, Nat. Immunol., vol. 2, no. 10, pp. 907–916, 2001.

[50]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, in Proc. 3 rd Int. Conf. on Learning Representations, San Diego, CA, USA, 2015.
[51]

Å. Segerstolpe, A. Palasantza, P. Eliasson, E. M. Andersson, A. C. Andréasson, X. Sun, S. Picelli, A. Sabirsh, M. Clausen, M. K. Bjursell, et al., Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes, Cell Metab., vol. 24, no. 4, pp. 593–607, 2016.

[52]

M. Baron, A. Veres, S. L. Wolock, A. L. Faust, R. Gaujoux, A. Vetere, J. H. Ryu, B. K. Wagner, S. S. Shen-Orr, A. M. Klein, et al., A single-cell transcriptomic map of the human and mouse pancreas reveals inter-and intra-cell population structure, Cell Syst., vol. 3, no. 4, pp. 346–360.e4, 2016.

[53]
Y. Xin, J. Kim, H. Okamoto, M. Ni, Y. Wei, C. Adler, A. J. Murphy, G. D. Yancopoulos, C. Lin, and J. Gromada, RNA sequencing of single human islet cells reveals type 2 diabetes genes, Cell Metab., vol. 24, no. 4, pp. 608–615, 2016.
[54]

D. Usoskin, A. Furlan, S. Islam, H. Abdo, P. Lönnerberg, D. Lou, J. Hjerling-Leffler, J. Haeggström, O. Kharchenko, P. V. Kharchenko, et al., Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing, Nat. Neurosci., vol. 18, no. 1, pp. 145–153, 2015.

[55]

H. M. Kang, M. Subramaniam, S. Targ, M. Nguyen, L. Maliskova, E. McCarthy, E. Wan, S. Wong, L. Byrnes, C. M. Lanata, et al., Multiplexed droplet single-cell RNA-sequencing using natural genetic variation, Nat. Biotechnol., vol. 36, no. 1, pp. 89–94, 2018.

Big Data Mining and Analytics
Pages 809-827
Cite this article:
He S, Fan J, Yu T. G3DC: A Gene-Graph-Guided Selective Deep Clustering Method for Single Cell RNA-seq Data. Big Data Mining and Analytics, 2024, 7(3): 809-827. https://doi.org/10.26599/BDMA.2024.9020011

174

Views

13

Downloads

1

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 09 October 2023
Revised: 22 February 2024
Accepted: 23 February 2024
Published: 28 August 2024
© The author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return