Recently, the emergence of single-cell RNA-sequencing (scRNA-seq) technology makes it possible to solve biological problems at the single-cell resolution. One of the critical steps in cellular heterogeneity analysis is the cell type identification. Diverse scRNA-seq clustering methods have been proposed to partition cells into clusters. Among all the methods, hierarchical clustering and spectral clustering are the most popular approaches in the downstream clustering analysis with different preprocessing strategies such as similarity learning, dropout imputation, and dimensionality reduction. In this study, we carry out a comprehensive analysis by combining different strategies with these two categories of clustering methods on scRNA-seq datasets under different biological conditions. The analysis results show that the methods with spectral clustering tend to perform better on datasets with continuous shapes in two-dimension, while those with hierarchical clustering achieve better results on datasets with obvious boundaries between clusters in two-dimension. Motivated by this finding, a new strategy, called QRS, is developed to quantitatively evaluate the latent representative shape of a dataset to distinguish whether it has clear boundaries or not. Finally, a data-driven clustering recommendation method, called DDCR, is proposed to recommend hierarchical clustering or spectral clustering for scRNA-seq data. We perform DDCR on two typical single cell clustering methods, SC3 and RAFSIL, and the results show that DDCR recommends a more suitable downstream clustering method for different scRNA-seq datasets and obtains more robust and accurate results.
- Article type
- Year
- Co-author
Proteins drive virtually all cellular-level processes. The proteins that are critical to cell proliferation and survival are defined as essential. These essential proteins are implicated in key metabolic and regulatory networks, and are important in the context of rational drug design efforts. The computational identification of the essential proteins benefits from the proliferation of publicly available protein interaction datasets. Scientists have developed several algorithms that use these interaction datasets to predict essential proteins. However, a comprehensive web platform that facilitates the analysis and prediction of essential proteins is missing. In this study, we design, implement, and release NetEPD: a network-based essential protein discovery platform. This resource integrates data on Protein-Protein Interaction (PPI) networks, gene expression, subcellular localization, and a native set of essential proteins. It also computes a variety of node centrality measures, evaluates the predictions of essential proteins, and visualizes PPI networks. This comprehensive platform functions by implementing four activities, which include the collection of datasets, computation of centrality measures, evaluation, and visualization. The results produced by NetEPD are visualized on its website, and sent to a user-provided email, and they are available to download in a parsable format. This platform is freely available at http://bioinformatics.csu.edu.cn/netepd.
Biological elements usually exert their functions through interactions with others to form various types of biological networks. The ability of controlling the dynamics of biological networks is of enormous benefits to pharmaceutical and medical industry as well as scientific research. Though there are many mathematical methods for steering dynamic systems towards desired states, the methods are usually not feasible for applying to complex biological networks. The difficulties come from the lack of accurate model that can capture the dynamics of interactions between biological elements and the fact that many mathematical methods are computationally intractable for large-scale networks. Recently, a concept in control theory — controllability, has been applied to investigate the dynamics of complex networks. In this article, recent advances on the controllability of complex networks and applications to biological networks are reviewed. Developing dynamic models is the prior concern for analyzing dynamics of biological networks. First, we introduce a widely used dynamic model for investigating controllability of complex networks. Then recent studies of theorems and algorithms for having complex biological networks controllable in general or specific application scenarios are reviewed. Finally, applications to real biological networks manifest that investigating the controllability of biological networks can shed lights on many critical physiological or medical problems, such as revealing biological mechanisms and identifying drug targets, from a systematic perspective.