AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (14.8 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Continuous and Discrete Similarity Coefficient for Identifying Essential Proteins Using Gene Expression Data

College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China
Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences Shenzhen, Guangzhou 518055, China
Show Author Information

Abstract

Essential proteins play a vital role in biological processes, and the combination of gene expression profiles with Protein-Protein Interaction (PPI) networks can improve the identification of essential proteins. However, gene expression data are prone to significant fluctuations due to noise interference in topological networks. In this work, we discretized gene expression data and used the discrete similarities of the gene expression spectrum to eliminate noise fluctuation. We then proposed the Pearson Jaccard coefficient (PJC) that consisted of continuous and discrete similarities in the gene expression data. Using the graph theory as the basis, we fused the newly proposed similarity coefficient with the existing network topology prediction algorithm at each protein node to recognize essential proteins. This strategy exhibited a high recognition rate and good specificity. We validated the new similarity coefficient PJC on PPI datasets of Krogan, Gavin, and DIP of yeast species and evaluated the results by receiver operating characteristic analysis, jackknife analysis, top analysis, and accuracy analysis. Compared with that of node-based network topology centrality and fusion biological information centrality methods, the new similarity coefficient PJC showed a significantly improved prediction performance for essential proteins in DC, IC, Eigenvector centrality, subgraph centrality, betweenness centrality, closeness centrality, NC, PeC, and WDC. We also compared the PJC coefficient with other methods using the NF-PIN algorithm, which predicts proteins by constructing active PPI networks through dynamic gene expression. The experimental results proved that our newly proposed similarity coefficient PJC has superior advantages in predicting essential proteins.

References

[1]
P. R. Graves and T. A. J. Haystead, Molecular biologist’s guide to proteomics, Microbiol. Mol. Biol. Rev., vol. 66, no. 1, pp. 3963, 2002.
[2]
E. A. Winzeler, D. D. Shoemaker, A. Astromoff, H. Liang, K. Anderson, B. Andre, R. Bangham, R. Benito, J. D. Boeke, H. Bussey, et al., Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis, Science, vol. 285, no. 5429, pp. 901906, 1999.
[3]
S. Asur, D. Ucar, and S. Parthasarathy, An ensemble framework for clustering protein-protein interaction networks, Bioinformatics, vol. 23, no. 13, pp. i29i40, 2007.
[4]
G. Butland, J. M. Peregrín-Alvarez, J. Li, W. H. Yang, X. C. Yang, V. Canadien, A. Starostine, D. Richards, B. Beattie, N. Krogan, et al., Interaction network containing conserved and essential protein complexes in Escherichia coli, Nature, vol. 433, no. 7025, pp. 531537, 2005.
[5]
G. Giaever, A. M. Chu, L. Ni, C. Connelly, L. Riles, S. Véronneau, S. Dow, A. Lucau-Danila, K. Anderson, B. André, et al., Functional profiling of the Saccharomyces cerevisiae genome, Nature, vol. 418, no. 6896, pp. 387391, 2002.
[6]
L. M. Cullen and G. M. Arndt, Genome-wide screening for gene function using RNAi in mammalian cells, Immunol. Cell Biol., vol. 83, no. 3, pp. 217223, 2005.
[7]
T. Roemer, B. Jiang, J. Davison, T. Ketela, K. Veillette, A. Breton, F. Tandia, A. Linteau, S. Sillaots, C. Marta, et al., Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery, Mol. Microbiol., vol. 50, no. 1, pp. 167181, 2003.
[8]
H. Jeong, S. P. Mason, A. L. Barabási, and Z. N. Oltvai, Lethality and centrality in protein networks, Nature, vol. 411, no. 6833, pp. 4142, 2001.
[9]
M. W. Hahn and A. D. Kern, Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks, Mol. Biol. Evol., vol. 22, no. 4, pp. 803806, 2005.
[10]
M. P. Joy, A. Brock, D. E. Ingber, and S. Huang, High-betweenness proteins in the yeast protein interaction network, J. Biomed. Biotechnol., vol. 2005, no. 2, pp. 96103, 2005.
[11]
S. Wuchty and P. F. Stadler, Centers of complex networks, J. Theor. Biol., vol. 223, no. 1, pp. 4553, 2003.
[12]
E. Estrada and J. A. Rodríguez-Velázquez, Subgraph centrality in complex networks, Phys. Rev.E. Stat. Nonlin. Soft. Matter. Phys., vol. 71, no. 5Pt2, p. 056103, 2005.
[13]
P. Bonacich, Power and centrality: A family of measures, Am.J. Sociol., vol. 92, no. 5, pp. 11701182, 1987.
[14]
K. Stephenson and M. Zelen, Rethinking centrality: Methods and examples, Soc. Networks, vol. 11, no. 1, pp. 137, 1989.
[15]
M. Li, H. H. Zhang, J. X. Wang, and Y. Pan, A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data, BMC Syst. Biol., vol. 6, p. 15, 2012.
[16]
X. W. Tang, J. X. Wang, and Y. Pan. Identifying essential proteins via integration of protein interaction and gene expression data, in Proc. 2012 IEEE Int. Conf. on Bioinformatics and Biomedicine, Philadelphia, PA, USA, 2012, pp. 14.
[17]
W. Peng, J. X. Wang, W. P. Wang, Q. Liu, F. X. Wu, and Y. Pan, Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks, BMC Syst. Biol., vol. 6, p. 87, 2012.
[18]
G. S. Li, M. Li, J. X. Wang, Y. H. Li, and Y. Pan, United neighborhood closeness centrality and orthology for predicting essential proteins, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 17, no. 4, pp. 14511458, 2020.
[19]
S. Y. Li, Z. P. Chen, X. He, Z. Zhang, T. Pei, Y. H. Tan, and L. Wang, An iteration method for identifying yeast essential proteins from weighted PPI network based on topological and functional features of proteins, IEEE Access, vol. 8, pp. 9079290804, 2020.
[20]
X. Y. Zhu, Y. C. Zhu, Y. H. Tan, Z. P. Chen, and L. Wang, An iterative method for predicting essential proteins based on multifeature fusion and linear neighborhood similarity, Front. Aging Neurosci., vol. 13, p. 799500, 2021.
[21]
B. H. Zhao, X. Han, X. E. Liu, Y. C. Luo, S. Hu, Z. H. Zhang, and L. Wang, A novel method to predict essential proteins based on diffusion distance networks, IEEE Access, vol. 8, pp. 2938529394, 2020.
[22]
U. de Lichtenberg, L. J. Jensen, S. Brunak, and P. Bork, Dynamic complex formation during the yeast cell cycle, Science, vol. 307, no. 5710, pp. 724727, 2005.
[23]
Q. H. Xiao, J. X. Wang, X. Q. Peng, F. X. Wu, and Y. Pan, Identifying essential proteins from active PPI networks constructed with dynamic gene expression, BMC Genomics, vol. 16, no. 3, p. S1, 2015.
[24]
M. Li, P. Ni, X. P. Chen, J. X. Wang, F. X. Wu, and Y. Pan, Construction of refined protein interaction network for predicting essential proteins, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 16, no. 4, pp. 13861397, 2019.
[25]
F. Y. Zhang, W. Peng, Y. F. Yang, W. Dai, and J. R. Song, A novel method for identifying essential genes by fusing dynamic protein-protein interactive networks, Genes, vol. 10, no. 1, p. 31, 2019.
[26]
J. C. Zhong, C. Tang, W. Peng, M. Z. Xie, Y. S. Sun, Q. Tang, Q. Xiao, and J. H. Yang, A novel essential protein identification method based on PPI networks and gene expression data, BMC Bioinformatics, vol. 22, no. 1, p. 248, 2021.
[27]
W. M. Sun, L. Wang, J. X. Peng, Z. Zhang, T. R. Pei, Y. H. Tan, X. Y. Li, and Z. P. Chen, A cross-entropy-based method for essential protein identification in yeast protein-protein interaction network, Curr. Bioinf., vol. 16, no. 4, pp. 565575, 2021.
[28]
D. Sahoo, Boolean analysis of high-throughput biological datasets, PhD dissertation, Stanford University, Palo Alto, CA, USA, 2008.
[29]
C. Stark, B. J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers, BioGRID: A general repository for interaction datasets, Nucleic Acids Res., vol. 34, no. suppl_1, pp. D535D539, 2006.
[30]
P. Pagel, S. Kovac, M. Oesterheld, B. Brauner, I. Dunger-Kaltenbach, G. Frishman, C. Montrone, P. Mark, V. Stümpflen, H. W. Mewes, et al., The MIPS mammalian protein-protein interaction database, Bioinformatics, vol. 21, no. 6, pp. 832834, 2005.
[31]
S. S. Dwight, M. A. Harris, K. Dolinski, C. A. Ball, G. Binkley, K. R. Christie, D. G. Fisk, L. Issel-Tarver, M. Schroeder, G. Sherlock, et al., Saccharomyces genome database (SGD) provides secondary gene annotation using the gene ontology (GO), Nucleic Acids Res., vol. 30, no. 1, pp. 6972, 2002.
[32]
R. Zhang and Y. Lin, DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes, Nucleic Acids Res., vol. 37, no. suppl_1, pp. D455D458, 2009.
[33]
G. Giaever and C. Nislow, The yeast deletion collection: A decade of functional genomics, Genetics, vol. 197, no. 2, pp. 451465, 2014.
Big Data Mining and Analytics
Pages 185-200
Cite this article:
Zhong J, Qu Z, Zhong Y, et al. Continuous and Discrete Similarity Coefficient for Identifying Essential Proteins Using Gene Expression Data. Big Data Mining and Analytics, 2023, 6(2): 185-200. https://doi.org/10.26599/BDMA.2022.9020019

823

Views

58

Downloads

2

Crossref

3

Web of Science

4

Scopus

0

CSCD

Altmetrics

Received: 27 June 2022
Accepted: 13 August 2022
Published: 26 January 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return