Continuous and Discrete Similarity Coefficient for Identifying Essential Proteins Using Gene Expression Data

Jiancheng Zhong; Zuohang Qu; Ying Zhong; Chao Tang; Yi Pan

doi:10.26599/BDMA.2022.9020019

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (14.8 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Continuous and Discrete Similarity Coefficient for Identifying Essential Proteins Using Gene Expression Data

Jiancheng Zhong^¹(

), Zuohang Qu^¹, Ying Zhong^¹, Chao Tang^¹, Yi Pan^²(

)

1College of Information Science and Engineering, Hunan Normal University, Changsha 410081, China

2Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences Shenzhen, Guangzhou 518055, China

Show Author Information

Abstract

Essential proteins play a vital role in biological processes, and the combination of gene expression profiles with Protein-Protein Interaction (PPI) networks can improve the identification of essential proteins. However, gene expression data are prone to significant fluctuations due to noise interference in topological networks. In this work, we discretized gene expression data and used the discrete similarities of the gene expression spectrum to eliminate noise fluctuation. We then proposed the Pearson Jaccard coefficient (PJC) that consisted of continuous and discrete similarities in the gene expression data. Using the graph theory as the basis, we fused the newly proposed similarity coefficient with the existing network topology prediction algorithm at each protein node to recognize essential proteins. This strategy exhibited a high recognition rate and good specificity. We validated the new similarity coefficient PJC on PPI datasets of Krogan, Gavin, and DIP of yeast species and evaluated the results by receiver operating characteristic analysis, jackknife analysis, top analysis, and accuracy analysis. Compared with that of node-based network topology centrality and fusion biological information centrality methods, the new similarity coefficient PJC showed a significantly improved prediction performance for essential proteins in DC, IC, Eigenvector centrality, subgraph centrality, betweenness centrality, closeness centrality, NC, PeC, and WDC. We also compared the PJC coefficient with other methods using the NF-PIN algorithm, which predicts proteins by constructing active PPI networks through dynamic gene expression. The experimental results proved that our newly proposed similarity coefficient PJC has superior advantages in predicting essential proteins.

Keywords

Protein-Protein Interaction (PPI) network continuous and discrete similarity coefficient essential proteins

References

[1]

P. R.

Graves

and T. A. J.

Haystead

, Molecular biologist’s guide to proteomics, Microbiol. Mol. Biol. Rev., vol. 66, no. 1, pp. 39–63, 2002.

Crossref Google Scholar

[2]

E. A.

Winzeler

, D. D.

Shoemaker

, A.

Astromoff

, H.

Liang

, K.

Anderson

, B.

Andre

, R.

Bangham

, R.

Benito

, J. D.

Boeke

, H.

Bussey

, et al., Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis, Science, vol. 285, no. 5429, pp. 901–906, 1999.

Crossref Google Scholar

[3]

Asur

, D.

Ucar

, and S.

Parthasarathy

, An ensemble framework for clustering protein-protein interaction networks, Bioinformatics, vol. 23, no. 13, pp. i29–i40, 2007.

Crossref Google Scholar

[4]

Butland

, J. M.

Peregrín-Alvarez

, J.

, W. H.

Yang

, X. C.

Yang

, V.

Canadien

, A.

Starostine

, D.

Richards

, B.

Beattie

, N.

Krogan

, et al., Interaction network containing conserved and essential protein complexes in Escherichia coli, Nature, vol. 433, no. 7025, pp. 531–537, 2005.

Crossref Google Scholar

[5]

Giaever

, A. M.

Chu

, L.

, C.

Connelly

, L.

Riles

, S.

Véronneau

, S.

Dow

, A.

Lucau-Danila

, K.

Anderson

, B.

André

, et al., Functional profiling of the Saccharomyces cerevisiae genome, Nature, vol. 418, no. 6896, pp. 387–391, 2002.

Crossref Google Scholar

[6]

L. M.

Cullen

and G. M.

Arndt

, Genome-wide screening for gene function using RNAi in mammalian cells, Immunol. Cell Biol., vol. 83, no. 3, pp. 217–223, 2005.

Crossref Google Scholar

[7]

Roemer

, B.

Jiang

, J.

Davison

, T.

Ketela

, K.

Veillette

, A.

Breton

, F.

Tandia

, A.

Linteau

, S.

Sillaots

, C.

Marta

, et al., Large-scale essential gene identification in Candida albicans and applications to antifungal drug discovery, Mol. Microbiol., vol. 50, no. 1, pp. 167–181, 2003.

Crossref Google Scholar

[8]

Jeong

, S. P.

Mason

, A. L.

Barabási

, and Z. N.

Oltvai

, Lethality and centrality in protein networks, Nature, vol. 411, no. 6833, pp. 41–42, 2001.

Crossref Google Scholar

[9]

M. W.

Hahn

and A. D.

Kern

, Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks, Mol. Biol. Evol., vol. 22, no. 4, pp. 803–806, 2005.

Crossref Google Scholar

[10]

M. P.

Joy

, A.

Brock

, D. E.

Ingber

, and S.

Huang

, High-betweenness proteins in the yeast protein interaction network, J. Biomed. Biotechnol., vol. 2005, no. 2, pp. 96–103, 2005.

Crossref Google Scholar

[11]

Wuchty

and P. F.

Stadler

, Centers of complex networks, J. Theor. Biol., vol. 223, no. 1, pp. 45–53, 2003.

Crossref Google Scholar

[12]

Estrada

and J. A.

Rodríguez-Velázquez

, Subgraph centrality in complex networks, Phys. Rev.E. Stat. Nonlin. Soft. Matter. Phys., vol. 71, no. 5Pt2, p. 056103, 2005.

Crossref Google Scholar

[13]

Bonacich

, Power and centrality: A family of measures, Am.J. Sociol., vol. 92, no. 5, pp. 1170–1182, 1987.

Crossref Google Scholar

[14]

Stephenson

and M.

Zelen

, Rethinking centrality: Methods and examples, Soc. Networks, vol. 11, no. 1, pp. 1–37, 1989.

Crossref Google Scholar

[15]

, H. H.

Zhang

, J. X.

Wang

, and Y.

Pan

, A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data, BMC Syst. Biol., vol. 6, p. 15, 2012.

Crossref Google Scholar

[16]

X. W.

Tang

, J. X.

Wang

, and Y. Pan. Identifying essential proteins via integration of protein interaction and gene expression data, in Proc. 2012 IEEE Int. Conf. on Bioinformatics and Biomedicine, Philadelphia, PA, USA, 2012, pp. 1–4.

Crossref Google Scholar

[17]

Peng

, J. X.

Wang

, W. P.

Wang

, Q.

Liu

, F. X.

, and Y.

Pan

, Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks, BMC Syst. Biol., vol. 6, p. 87, 2012.

Crossref Google Scholar

[18]

G. S.

, M.

, J. X.

Wang

, Y. H.

, and Y.

Pan

, United neighborhood closeness centrality and orthology for predicting essential proteins, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 17, no. 4, pp. 1451–1458, 2020.

Google Scholar

[19]

S. Y.

, Z. P.

Chen

, X.

, Z.

Zhang

, T.

Pei

, Y. H.

Tan

, and L.

Wang

, An iteration method for identifying yeast essential proteins from weighted PPI network based on topological and functional features of proteins, IEEE Access, vol. 8, pp. 90792–90804, 2020.

Crossref Google Scholar

[20]

X. Y.

Zhu

, Y. C.

Zhu

, Y. H.

Tan

, Z. P.

Chen

, and L.

Wang

, An iterative method for predicting essential proteins based on multifeature fusion and linear neighborhood similarity, Front. Aging Neurosci., vol. 13, p. 799500, 2021.

Crossref Google Scholar

[21]

B. H.

Zhao

, X.

Han

, X. E.

Liu

, Y. C.

Luo

, S.

, Z. H.

Zhang

, and L.

Wang

, A novel method to predict essential proteins based on diffusion distance networks, IEEE Access, vol. 8, pp. 29385–29394, 2020.

Crossref Google Scholar

[22]

de Lichtenberg

, L. J.

Jensen

, S.

Brunak

, and P.

Bork

, Dynamic complex formation during the yeast cell cycle, Science, vol. 307, no. 5710, pp. 724–727, 2005.

Crossref Google Scholar

[23]

Q. H.

Xiao

, J. X.

Wang

, X. Q.

Peng

, F. X.

, and Y.

Pan

, Identifying essential proteins from active PPI networks constructed with dynamic gene expression, BMC Genomics, vol. 16, no. 3, p. S1, 2015.

Crossref Google Scholar

[24]

, P.

, X. P.

Chen

, J. X.

Wang

, F. X.

, and Y.

Pan

, Construction of refined protein interaction network for predicting essential proteins, IEEE/ACM Trans. Comput. Biol. Bioinform., vol. 16, no. 4, pp. 1386–1397, 2019.

Crossref Google Scholar

[25]

F. Y.

Zhang

, W.

Peng

, Y. F.

Yang

, W.

Dai

, and J. R.

Song

, A novel method for identifying essential genes by fusing dynamic protein-protein interactive networks, Genes, vol. 10, no. 1, p. 31, 2019.

Crossref Google Scholar

[26]

J. C.

Zhong

, C.

Tang

, W.

Peng

, M. Z.

Xie

, Y. S.

Sun

, Q.

Tang

, Q.

Xiao

, and J. H.

Yang

, A novel essential protein identification method based on PPI networks and gene expression data, BMC Bioinformatics, vol. 22, no. 1, p. 248, 2021.

Crossref Google Scholar

[27]

W. M.

Sun

, L.

Wang

, J. X.

Peng

, Z.

Zhang

, T. R.

Pei

, Y. H.

Tan

, X. Y.

, and Z. P.

Chen

, A cross-entropy-based method for essential protein identification in yeast protein-protein interaction network, Curr. Bioinf., vol. 16, no. 4, pp. 565–575, 2021.

Crossref Google Scholar

[28]

Sahoo

, Boolean analysis of high-throughput biological datasets, PhD dissertation, Stanford University, Palo Alto, CA, USA, 2008.

[29]

Stark

, B. J.

Breitkreutz

, T.

Reguly

, L.

Boucher

, A.

Breitkreutz

, and M.

Tyers

, BioGRID: A general repository for interaction datasets, Nucleic Acids Res., vol. 34, no. suppl_1, pp. D535–D539, 2006.

Crossref Google Scholar

[30]

Pagel

, S.

Kovac

, M.

Oesterheld

, B.

Brauner

, I.

Dunger-Kaltenbach

, G.

Frishman

, C.

Montrone

, P.

Mark

, V.

Stümpflen

, H. W.

Mewes

, et al., The MIPS mammalian protein-protein interaction database, Bioinformatics, vol. 21, no. 6, pp. 832–834, 2005.

Crossref Google Scholar

[31]

S. S.

Dwight

, M. A.

Harris

, K.

Dolinski

, C. A.

Ball

, G.

Binkley

, K. R.

Christie

, D. G.

Fisk

, L.

Issel-Tarver

, M.

Schroeder

, G.

Sherlock

, et al., Saccharomyces genome database (SGD) provides secondary gene annotation using the gene ontology (GO), Nucleic Acids Res., vol. 30, no. 1, pp. 69–72, 2002.

Crossref Google Scholar

[32]

Zhang

and Y.

Lin

, DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes, Nucleic Acids Res., vol. 37, no. suppl_1, pp. D455–D458, 2009.

Crossref Google Scholar

[33]

Giaever

and C.

Nislow

, The yeast deletion collection: A decade of functional genomics, Genetics, vol. 197, no. 2, pp. 451–465, 2014.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 6 Issue 2,
June 2023

Pages 185-200

DOI: 10.26599/BDMA.2022.9020019

Cite this article:

Zhong J, Qu Z, Zhong Y, et al. Continuous and Discrete Similarity Coefficient for Identifying Essential Proteins Using Gene Expression Data. Big Data Mining and Analytics, 2023, 6(2): 185-200. https://doi.org/10.26599/BDMA.2022.9020019

867

Views

Downloads

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 27 June 2022

Accepted: 13 August 2022

Published: 26 January 2023

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).