AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.3 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

A Data-Driven Clustering Recommendation Method for Single-Cell RNA-Sequencing Data

Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SKS7N5A9, Canada

† Yu Tian and Ruiqing Zheng contribute equally to this paper.

Show Author Information

Abstract

Recently, the emergence of single-cell RNA-sequencing (scRNA-seq) technology makes it possible to solve biological problems at the single-cell resolution. One of the critical steps in cellular heterogeneity analysis is the cell type identification. Diverse scRNA-seq clustering methods have been proposed to partition cells into clusters. Among all the methods, hierarchical clustering and spectral clustering are the most popular approaches in the downstream clustering analysis with different preprocessing strategies such as similarity learning, dropout imputation, and dimensionality reduction. In this study, we carry out a comprehensive analysis by combining different strategies with these two categories of clustering methods on scRNA-seq datasets under different biological conditions. The analysis results show that the methods with spectral clustering tend to perform better on datasets with continuous shapes in two-dimension, while those with hierarchical clustering achieve better results on datasets with obvious boundaries between clusters in two-dimension. Motivated by this finding, a new strategy, called QRS, is developed to quantitatively evaluate the latent representative shape of a dataset to distinguish whether it has clear boundaries or not. Finally, a data-driven clustering recommendation method, called DDCR, is proposed to recommend hierarchical clustering or spectral clustering for scRNA-seq data. We perform DDCR on two typical single cell clustering methods, SC3 and RAFSIL, and the results show that DDCR recommends a more suitable downstream clustering method for different scRNA-seq datasets and obtains more robust and accurate results.

References

[1]
A. Regev, S. A. Teichmann, E. S. Lander, I. Amit, C. Benoist, E. Birney, B. Bodenmiller, P. Campbell, P. Carninci, M. Clatworthy, et al., Science forum: The human cell atlas, eLife, vol. 6, p. e27041, 2017.
[2]
A. Regev, S. Teichmann, O. Rozenblatt-Rosen, M. Stubbington, K. Ardlie, I. Amit, P. Arlotta, G. Bader, C. Benoist, M. Biton, et al., The human cell atlas white paper, arXiv preprint arXiv: 1810.05192, 2018.
[3]
O. Rozenblatt-Rosen, M. J. T. Stubbington, A. Regev, and S. A. Teichmann, The human cell atlas: from vision to reality, Nature, vol. 550, no. 7677, pp. 451453, 2017.
[4]
Y. H. Choi and J. K. Kim, Dissecting cellular heterogeneity using single-cell RNA sequencing, Molecules and Cells, vol. 42, no. 3, p. 189, 2019.
[5]
E. A. A. Alaoui, S. C. K. Tekouabou, S. Hartini, Z. Rustam, H. Silkan, and S Agoujil, Improvement in automated diagnosis of soft tissues tumors using machine learning, Big Data Mining and Analytics, vol. 4, no. 1, pp. 3346, 2021.
[6]
A. E. Saliba, A. J. Westermann, S. A. Gorski, and J. Vogel, Single-cell RNA-seq: Advances and future challenges, Nucleic Acids Research, vol. 42, no. 14, pp. 88458860, 2014.
[7]
V. Y. Kiselev, T. S. Andrews, and M. Hemberg, Challenges in unsupervised clustering of single-cell RNA-seq data, Nature Reviews Genetics, vol. 20, no. 5, pp. 273282, 2019.
[8]
O. Stegle, S. A. Teichmann, and J. C. Marioni, Computational and analytical challenges in single-cell transcriptomics, Nature Reviews Genetics, vol. 16, no. 3, pp. 133145, 2015.
[9]
B. Wang, J. J. Zhu, E. Pierson, D. Ramazzotti, and S. Batzoglou, Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning, Nature Methods, vol. 14, no. 4, pp. 414416, 2017.
[10]
P. J. Lin, M. Troup, and J. W. K. Ho, CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data, Genome Biology, vol. 18, pp. 59, 2017.
[11]
E. Pierson and C. Yau, ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis, Genome Biology, vol. 16, pp. 241, 2015.
[12]
W. V. Li and J. J. Li, An accurate and robust imputation method scImpute for single-cell RNA-seq data, Nature Communications, vol. 9, pp. 997, 2018.
[13]
Z. G. Wang, X. Xiao, and S. Rajasekaran, Novel and efficient randomized algorithms for feature selection, Big Data Mining and Analytics, vol. 3, no. 3, pp. 208224, 2020.
[14]
E. Becht, L. McInnes, J. Healy, C. A. Dutertre, I. W. H. Kwok, L. G. Ng, F. Ginhoux, and E. W. Newell, Dimensionality reduction for visualizing single-cell data using UMAP, Nature Biotechnology, vol. 37, pp. 3844, 2019.
[15]
M. Z. Guo, H. Wang, S. S. Potter, J. A. Whitsett, and Y. Xu, SINCERA: A pipeline for single-cell RNA-Seq profiling analysis, PLoS Computational Biology, vol. 11, no. 11, p. e1004575, 2015.
[16]
H. Jiang, L. L. Sohn, H. Y. Huang, and L. N. Chen, Single cell clustering based on cell-pair differentiability correlation and variance analysis, Bioinformatics, vol. 34, no. 21, pp. 36843694, 2018.
[17]
V. Ntranos, G. M. Kamath, J. M. Zhang, L. Pachter, and D. N. Tse, Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts, Genome Biology, vol. 17, pp. 112, 2016.
[18]
M. B. Pouyan and D. Kostka, Random forest based similarity learning for single cell RNA sequencing data, Bioinformatics, vol. 34, no. 13, pp. i79i88, 2018.
[19]
G. C. Liu, Z. C. Lin, and Y. Yu, Robust subspace segmentation by low-rank representation, in Proc. 27th Int. Conf. Machine Learning, Madison, WI, USA, 2010, pp. 663670.
[20]
R. Vidal and P. Favaro, Low rank subspace clustering (LRSC), Pattern Recognition Letters, vol. 43, pp. 4761, 2014.
[21]
R. Q. Zheng, M. Li, Z. L. Liang, F. X. Wu, Y. Pan, and J. X. Wang, SinNLRR: A robust subspace clustering method for cell type detection by non-negative and low-rank representation, Bioinformatics, vol. 35, no. 19, pp. 36423650, 2019.
[22]
R. Q. Zheng, Z. L. Liang, X. Chen, Y. Tian, C. Cao, and M. Li, An adaptive sparse subspace clustering for cell type identification, Frontiers in Genetics, vol. 11, pp. 407, 2020.
[23]
A. Butler, P. Hoffman, P. Smibert, E. Papalexi, and R. Satija, Integrating single-cell transcriptomic data across different conditions, technologies, and species, Nature Biotechnology, vol. 36, no. 5, pp. 411420, 2018.
[24]
S. Park and H. Y. Zhao, Spectral clustering based on learning similarity matrix, Bioinformatics, vol. 34, no. 12, pp. 20692076, 2018.
[25]
V. Y. Kiselev, K. Kirschner, M. T. Schaub, T. Andrews, A. Yiu, T. Chandra, K. N. Natarajan, W. Reik, M. Barahona, A. R. Green, et al., SC3: Consensus clustering of single-cell RNA-seq data, Nature Methods, vol. 14, no. 5, pp. 483486, 2017.
[26]
R. Huh, Y. C. Yang, Y. C. Jiang, Y. Shen, and Y. Li, SAME-clustering: Single-cell aggregated clustering via mixture model ensemble, Nucleic Acids Research, vol. 48, no. 1, pp. 8695, 2020.
[27]
A. Zeisel, A. B. Muñoz-Manchado, S. Codeluppi, P. Lönnerberg, G. La Manno, A. Juréus, S. Marques, H. Munguba, L. Q. He, C. Betsholtz, et al., Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq, Science, vol. 347, no. 6226, pp. 11381142, 2015.
[28]
J. Žurauskienė and C. Yau, pcaReduce: Hierarchical clustering of single cell transcriptional profiles, BMC Bioinformatics, vol. 17, p. 140, 2016.
[29]
J. M. Zhang, J. Fan, H. C. Fan, D. Rosenfeld, and D. N. Tse, An interpretable framework for clustering single-cell RNA-Seq datasets, BMC Bioinformatics, vol. 19, pp. 93, 2018.
[30]
L. U. Von, A tutorial on spectral clustering, Statistics and Computing, vol. 17, no. 4, pp. 395416, 2007.
[31]
V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, 2008.
[32]
Y. Zhang, B. Wu, Y. Liu and J. Lv, Local community detection based on network motifs, Tsinghua Science and Technology, vol. 24, no. 6, pp. 716727, 2019.
[33]
B. Zhao, J. Wang, M. Li, F. Wu, and Y. Pan, Detecting protein complexes based on uncertain graph model, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 11, no. 3, pp. 486497, 2014.
[34]
X. Meng, J. Xiang, R. Zheng, F. Wu, and M. Li, DPCMNE: Detecting protein complexes from protein-protein interaction networks via multi-level network embedding, IEEE/ACM Transactions on Computational Biology and Bioinformatics, .
[35]
Z. Liang, M. Li, R. Zheng, Y. Tian, X. Yan, J. Chen, F. X. Wu, and J. Wang, Cell type detection based on sparse subspace representation and similarity enhancement, Genomics, Proteomics & Bioinformatics, https://doi.org/10.1016/j.gpb.2020.09.004.
[36]
L Jiang, H Chen, L Pinello and GC Yuan, GiniClust: Detecting rare cell types from single-cell gene expression data with Gini index, Genome Biology, vol. 17, no. 1, pp. 113, 2016.
[37]
C. Trapnell, D. Cacchiarelli, J. Grimsby, P. Pokharel, S. Li, M. Morse, N. J Lennon, K. J. Livak, T. S. Mikkelsen, and J. L. Rinn, The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells, Nature Biotechnology, vol. 32, no. 4, p. 381, 2014.
[38]
A. Brazma, H. Parkinson, U. Sarkans, M. Shojatalab, J. Vilo, N. Abeygunawardena, E. Holloway, M. Kapushesky, P. Kemmeren, G. G. Lara, et al., ArrayExpress—A public repository for microarray gene expression data at the EBI, Nucleic Acids Research, vol. 31, no. 1, pp. 6871, 2013.
[39]
R. Edgar, M. Domrachev, and A. E. Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository, Nucleic Acids Research, vol. 30, no. 1, pp. 207210, 2002.
[40]
J. Zhao, S. Zhang, Y. Liu, X. He, M. Qu, G. Xu, H. Wang, M. Huang, J. Pan, Z. Liu, Z. Li, L. Liu, and Z. Zhang, Single-cell RNA sequencing reveals the heterogeneity of liver-resident immune cells in human, Cell Discovery, vol. 6, no. 1, pp. 119, 2020.
[41]
S. L. Goldman, M. MacKay, E. Afshinnekoo, A. M. Melnick, S. Wu, and C. E. Mason, The impact of heterogeneity on single-cell sequencing, Frontiers in Genetics, vol. 10, p. 8, 2019.
[42]
D. T. Ting, B. S. Wittner, M. Ligorio, N. V. Jordan, A. M. Shah, D. T. Miyamoto, N. Aceto, F. Bersani, B. W. Brannigan, K. Xega, et al., Single-cell rna sequencing identifies extracellular matrix gene expression by pancreatic circulating tumor cells, Cell Reports, vol. 8, no. 6, pp. 19051918, 2014.
[43]
F. Buettner, K. N. Natarajan, F. P. Casale, V. Proserpoi, A. Scialdone, F. J. Theis, S. A. Teichmann, J. C. Marioni, and O. Stegle, Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells, Nature Biotechnology, vol. 33, no. 2, pp. 155160, 2015.
[44]
A. A. Pollen and T. J. Nowakowski, Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex, Nature Biotechnology, vol. 32, no. 10, p. 1053, 2014.
[45]
A. Schlitzer, V. Sivakamasundari, J. Chen, H. R. B. Sumatoh, J. Schreuder, J. Lum, B. Malleret, S. Zhang, A. Larbi, F. Zolezzi, et al., Identification of cdc1- and cdc2- committed dc progenitors reveals early lineage priming at the common dc progenitor stage in the bone marrow, Nature Immunology, vol. 16, no. 7, pp. 718728, 2015.
[46]
G. La Manno, D. Gyllborg, S. Codeluppi, K. Nishimura, C. Salto, A. Zeisel, L. E. Borrm, S. R. W. Stott, E. M. Toledo, et al, Molecular diversity of midbrain development in mouse, Human, and Stem Cells, Cell, vol. 167, no. 2, pp. 566580, 2016.
[47]
S. Darmanis, S. A. Sloan, Y. Zhang, M. Enge, C. Caneda, L. M. Shuer, M. G. H. Gephart, B. A. Barres, and S. R. Quake, A survey of human brain transcriptome diversity at the single cell level, PNAS, vol. 112, no. 23, pp. 72857290, 2015.
[48]
N. Leng, L. F. Chu, C. Barry, Y. Li, J. Choi, P. Jiang, R. M. Stewart, J. Thomson, and C Kendziorski, Oscope identifies oscillatory genes in unsynchronized single-cell RNA-seq experiments, Nature Methods, vol. 12, no. 10, p. 947, 2015.
[49]
J. G. Camp, K. Sekine, T. Gerber, H. Loeffler-Wirth, H. Binder, M. Gac, S. Kanton, J. Kageyama, G. Damm, D. Seehofer, L. Belicova, et al., Multilineage communication regulates human liver bud development from pluripotency, Nature, vol. 546, no. 7659, pp. 533538, 2017.
[50]
D. Gokie, G. M. Stanley, B. Treutlein, N. F. Neff, J. G. Camp, R. C. Malenka, P. E. Rothwell, M. V. Fuccillo, T. C. Südhof, and S. R. Quake, Cellular Taxonomy of the Mouse Striatum as Revealed by Single Cell RNA Sequencing, Biophysical Journal, vol. 16, no. 4, pp. 11261137, 2016.
[51]
S. Nestorowa, F. K. Hamey, S. B. Pijuan, E. Diamanti, M. Shepherd, E. Laurenti, N. K. Wilson, D. G. Kent, and B. Gottgens, A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation, Blood, vol. 128, no. 8, pp. e20-e31, 2016.
[52]
J. L. Close, Z. Z. Yao, B. P. Levi, J. A. Miller, T. E. Bakken, V. Menon, J. T. Ting, A. Wall, A. R. Krostag, E. R. Thomsen, et al., Single-cell profiling of an in vitro model of human interneuron development reveals temporal dynamics of cell type production and maturation, Neuron, vol. 93, no. 5, pp. 10351048, 2017.
[53]
M. Ester, H. P. Kriegel, J. Sander, and X. W. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining, Portland, OR, USA, 1996, pp. 226231.
[54]
A. Smoliński, B. Walczak, and J. W. Einax, Hierarchical clustering extended with visual complements of environmental data set, Chemometrics and Intelligent Laboratory Systems, vol. 64, no. 1, pp. 4554, 2002.
[55]
A. Strehl and J. Ghosh, Cluster ensembles—A knowledge reuse framework for combining multiple partitions, Journal of Machine Learning Research, vol. 3, pp. 583617, 2003.
[56]
S. Wagner and D. Wagner, Comparing Clusterings—AnOverview. Karlsruhe, Germany: University at Karlsruhe, 2017.
[57]
H. Cho, B. Berger, and J. Peng, Generalizable and scalable visualization of single-cell data using neural networks, Cell Systems, vol. 7, no. 2, pp. 185191, 2018.
[58]
L. V. der Maaten and G Hinton, Visualizing data using tSNE, Journal of machine learning research, vol. 9, no. 11, pp. 2579-2605, 2008.
[59]
L. McInnes, J. Healy, and J. Melville, UMAP: Uniform manifold approximation and projection for dimension reduction, arXiv preprint arXiv: 1802.03426, 2018.
[60]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 2nd ed. Cambridge, MA, USA: MIT Press, 2001, pp. 561579.
[61]
W. B. March, P. Ram, and A. G. Gray, Fast euclidean minimum spanning tree: Algorithm, analysis, and applications, in Proc. 16th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, Washington, DC, USA, 2010, pp. 603612.
[62]
R. R. Curtin, J. R. Cline, N. P. Slagle, W. B. March, P. Ram, N. A. Mehta, and A. G. Gray, MLPACK: A scalable C++ machine learning library, Journal of Machine Learning Research, vol. 14, pp. 801805, 2013.
[63]
T. Nakamura, I. Okamoto, K. Sasaki, Y. Yabuta, C. Iwatani, H. Tsuchiya, Y. Seita, S. Nakamura, T. Yamamoto, and M. Saitou, A developmental coordinate of pluripotency among mice, monkeys and humans, Nature, vol. 537, no. 7618, pp. 5762, 2016.
[64]
C. Lin, S. Jain, H. Kim, and Z. Bar-Joseph, Using neural networks for reducing the dimensions of single-cell RNA-seq data, Nucleic Acids Research, vol. 45, no. 17, p. e156, 2017.
[65]
H. J. Li, F. Horns, B. Wu, Q. J. Xie, J. F. Li, T. C. Li, D. J. Luginbuhl, S. R. Quake, and L. Q. Luo, Classifying Drosophila olfactory projection neuron subtypes by single-cell RNA sequencing, Cell, vol. 171, no. 5, pp. 12061220, 2017.
[66]
D. Usoskin, A. Furlan, S. Islam, H. Abdo, P. Lönnerberg, D. H. Lou, J. Hjerling-Leffler, J. Haeggström, O. Kharchenko, P. V. Kharchenko, et al., Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing, Nature Neuroscience, vol. 18, no. 1, pp. 145153, 2015.
[67]
L. F. Chu, N. Leng, J. Zhang, Z. Hou, D. Manott, D. T. Vereide, J. Choi, C. Kendziorski, R. Stewart, and J. A. Thomson, Singlecell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm, Genome Biology, vol. 17, no. 1, pp. 120, 2016.
[68]
S. Petropoulos, D. Edsgärd, B. Reinius, Q. L. Deng, S. P. Panula, S. Codeluppi, A. P. Reyes, S. Linnarsson, R. Sandberg, and F. Lanner, Single-cell RNA-seq reveals lineage and X chromosome dynamics in human preimplantation embryos, Cell, vol. 165, no. 4, pp. 10121026, 2016.
[69]
M. Baron, A. Veres, S. L. Wolock, A. L. Faust, R. Gaujoux, A. Vetere, J. H. Ryu, B. K. Wagner, S. S. Shen-Orr, and A. M. Klein, A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure, Cell Systems, vol. 3, no. 4, pp. 346360, 2016.
[70]
J. Park, R. Shrestha, C. X. Qiu, A. Kondo, S. Z. Huang, M. Werth, M. Y. Li, J. Barasch, and K. Suszták, Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease, Science, vol. 360, no. 6390, pp. 758763, 2018.
[71]
A. T. L. Lun, D. J. McCarthy, and J. C. Marioni, A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor, F1000 Research, vol. 5, pp. 2122, 2016.
[72]
D. J. McCarthy, K. R. Campbell, A. T. L. Lun, and Q. F. Wills, Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R, Bioinformatics, vol. 33, no. 8, pp. 11791186, 2017.
[73]
W. Saelens, R. Cannoodt, H. Todorov, and Y. Saeys, A comparison of single-cell trajectory inference methods, Nature Biotechnology, vol. 37, no. 5, pp. 547554, 2019.
[74]
R. Q. Zheng, M. Li, X. Chen, S. Y. Zhao, F. X. Wu, Y. Pan, and J. X. Wang, An ensemble method to reconstruct gene regulatory networks based on multivariate adaptive regression splines, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 18, no. 1, pp. 347354, 2021.
[75]
X. Chen, M. Li, R. Q. Zheng, S. Y. Zhao, F. X. Wu, Y. H. Li, and J. X. Wang, A novel method of gene regulatory network structure inference from gene knock-out expression data, Tsinghua Science and Technology, vol. 24, no. 4, pp. 446455, 2019.
[76]
M. S. Mahmud, J. Z. Huang, S. Salloum, T. Z. Emara, and K. Sadatdiynov, A survey of data partitioning and sampling methods to support big data analysis, Big Data Mining and Analytics, vol. 3, no. 2, pp. 85101, 2020.
[77]
L. Wang and W. Fan, A multilevel splitting algorithm for quick sampling, Tsinghua Science and Technology, vol. 26, no. 4, pp. 417425, 2021.
Tsinghua Science and Technology
Pages 772-789
Cite this article:
Tian Y, Zheng R, Liang Z, et al. A Data-Driven Clustering Recommendation Method for Single-Cell RNA-Sequencing Data. Tsinghua Science and Technology, 2021, 26(5): 772-789. https://doi.org/10.26599/TST.2020.9010028

1007

Views

88

Downloads

28

Crossref

24

Web of Science

27

Scopus

2

CSCD

Altmetrics

Received: 03 March 2021
Accepted: 23 March 2021
Published: 20 April 2021
© The author(s) 2021

© The author(s) 2021. The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return