Biological elements usually exert their functions through interactions with others to form various types of biological networks. The ability of controlling the dynamics of biological networks is of enormous benefits to pharmaceutical and medical industry as well as scientific research. Though there are many mathematical methods for steering dynamic systems towards desired states, the methods are usually not feasible for applying to complex biological networks. The difficulties come from the lack of accurate model that can capture the dynamics of interactions between biological elements and the fact that many mathematical methods are computationally intractable for large-scale networks. Recently, a concept in control theory — controllability, has been applied to investigate the dynamics of complex networks. In this article, recent advances on the controllability of complex networks and applications to biological networks are reviewed. Developing dynamic models is the prior concern for analyzing dynamics of biological networks. First, we introduce a widely used dynamic model for investigating controllability of complex networks. Then recent studies of theorems and algorithms for having complex biological networks controllable in general or specific application scenarios are reviewed. Finally, applications to real biological networks manifest that investigating the controllability of biological networks can shed lights on many critical physiological or medical problems, such as revealing biological mechanisms and identifying drug targets, from a systematic perspective.
- Article type
- Year
- Co-author
Although the protein sequence-structure gap continues to enlarge due to the development of high-throughput sequencing tools, the protein structure universe tends to be complete without proteins with novel structural folds deposited in the protein data bank (PDB) recently. In this work, we identify a protein structural dictionary (Frag-K) composed of a set of backbone fragments ranging from 4 to 20 residues as the structural “keywords” that can effectively distinguish between major protein folds. We firstly apply randomized spectral clustering and random forest algorithms to construct representative and sensitive protein fragment libraries from a large scale of high-quality, non-homologous protein structures available in PDB. We analyze the impacts of clustering cut-offs on the performance of the fragment libraries. Then, the Frag-K fragments are employed as structural features to classify protein structures in major protein folds defined by SCOP (Structural Classification of Proteins). Our results show that a structural dictionary with ~400 4- to 20-residue Frag-K fragments is capable of classifying major SCOP folds with high accuracy.
In recent years, the recommendation systems have become increasingly popular and have been used in a broad variety of applications. Here, we investigate the matrix completion techniques for the recommendation systems that are based on collaborative filtering. The collaborative filtering problem can be viewed as predicting the favorability of a user with respect to new items of commodities. When a rating matrix is constructed with users as rows, items as columns, and entries as ratings, the collaborative filtering problem can then be modeled as a matrix completion problem by filling out the unknown elements in the rating matrix. This article presents a comprehensive survey of the matrix completion methods used in recommendation systems. We focus on the mathematical models for matrix completion and the corresponding computational algorithms as well as their characteristics and potential issues. Several applications other than the traditional user-item association prediction are also discussed.
Essential proteins are vital to the survival of a cell. There are various features related to the essentiality of proteins, such as biological and topological features. Many computational methods have been developed to identify essential proteins by using these features. However, it is still a big challenge to design an effective method that is able to select suitable features and integrate them to predict essential proteins. In this work, we first collect 26 features, and use SVM-RFE to select some of them to create a feature space for predicting essential proteins, and then remove the features that share the biological meaning with other features in the feature space according to their Pearson Correlation Coefficients (PCC). The experiments are carried out on S. cerevisiae data. Six features are determined as the best subset of features. To assess the prediction performance of our method, we further compare it with some machine learning methods, such as SVM, Naive Bayes, Bayes Network, and NBTree when inputting the different number of features. The results show that those methods using the 6 features outperform that using other features, which confirms the effectiveness of our feature selection method for essential protein prediction.
Parameterized computation is a new method dealing with NP-hard problems, which has attracted a lot of attentions in theoretical computer science. As a practical preprocessing method for NP-hard problems, kernelizaiton in parameterized computation has recently become an active research area. In this paper, we discuss several kernelizaiton techniques, such as crown decomposition, planar graph vertex partition, randomized methods, and kernel lower bounds, which have been used widely in the kernelization of many hard problems.
Phylogenetic trees have been widely used in the study of evolutionary biology for representing the tree-like evolution of a collection of species. However, different data sets and different methods often lead to the construction of different phylogenetic trees for the same set of species. Therefore, comparing these trees to determine similarities or, equivalently, dissimilarities, becomes the fundamental issue. Typically, Tree Bisection and Reconnection (TBR) and Subtree Prune and Regraft (SPR) distances have been proposed to facilitate the comparison between different phylogenetic trees. In this paper, we give a survey on the aspects of computational complexity, fixed-parameter algorithms, and approximation algorithms for computing the TBR and SPR distances of phylogenetic trees.