Abstract
Essential proteins play a vital role in biological processes, and the combination of gene expression profiles with Protein-Protein Interaction (PPI) networks can improve the identification of essential proteins. However, gene expression data are prone to significant fluctuations due to noise interference in topological networks. In this work, we discretized gene expression data and used the discrete similarities of the gene expression spectrum to eliminate noise fluctuation. We then proposed the Pearson Jaccard coefficient (PJC) that consisted of continuous and discrete similarities in the gene expression data. Using the graph theory as the basis, we fused the newly proposed similarity coefficient with the existing network topology prediction algorithm at each protein node to recognize essential proteins. This strategy exhibited a high recognition rate and good specificity. We validated the new similarity coefficient PJC on PPI datasets of Krogan, Gavin, and DIP of yeast species and evaluated the results by receiver operating characteristic analysis, jackknife analysis, top analysis, and accuracy analysis. Compared with that of node-based network topology centrality and fusion biological information centrality methods, the new similarity coefficient PJC showed a significantly improved prediction performance for essential proteins in DC, IC, Eigenvector centrality, subgraph centrality, betweenness centrality, closeness centrality, NC, PeC, and WDC. We also compared the PJC coefficient with other methods using the NF-PIN algorithm, which predicts proteins by constructing active PPI networks through dynamic gene expression. The experimental results proved that our newly proposed similarity coefficient PJC has superior advantages in predicting essential proteins.