AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (7.3 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

DMSS: An Attention-Based Deep Learning Model for High-Quality Mass Spectrometry Prediction

Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, and with University of Chinese Academy of Sciences, Beijing 100049, China
Syneron Technology, Guangzhou 510000, China
College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
Insitute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, and with Western Institute of Computing Technology, Chongqing 400000, China

Show Author Information

Abstract

Accurate prediction of peptide spectra is crucial for improving the efficiency and reliability of proteomic analysis, as well as for gaining insight into various biological processes. In this study, we introduce Deep MS Simulator (DMSS), a novel attention-based model tailored for forecasting theoretical spectra in mass spectrometry. DMSS has undergone rigorous validation through a series of experiments, consistently demonstrating superior performance compared to current methods in forecasting theoretical spectra. The superior ability of DMSS to distinguish extremely similar peptides highlights the potential application of incorporating our predicted intensity information into mass spectrometry search engines to enhance the accuracy of protein identification. These findings contribute to the advancement of proteomics analysis and highlight the potential of the DMSS as a valuable tool in the field.

References

[1]

K. Biemann, Mass spectrometry of peptides and proteins, Annu. Rev. Biochem., vol. 61, pp. 977–1010, 1992.

[2]

R. Aebersold and M. Mann, Mass spectrometry-based proteomics, Nature, vol. 422, pp. 198–207, 2003.

[3]

M. Wilhelm, D. P. Zolg, M. Graber, S. Gessulat, T. Schmidt, K. Schnatbaum, C. Schwencke-Westphal, P. Seifert, N. de Andrade Krätzig, J. Zerweck, et al., Deep learning boosts sensitivity of mass spectrometry-based immunopeptidomics, Nat. Commun., vol. 12, no. 1, p. 3346, 2021.

[4]

Z. Mao, R. Zhang, L. Xin, and M. Li, Mitigating the missing-fragmentation problem in de novo peptide sequencing with a two-stage graph-based deep learning model, Nat. Mach. Intell., vol. 5, no. 11, pp. 1250–1260, 2023.

[5]

J. Cox, Prediction of peptide mass spectral libraries with machine learning, Nature Biotechnology, vol. 41, no. 1, pp. 33–43, 2023.

[6]

V. Lange, P. Picotti, B. Domon, and R. Aebersold, Selected reaction monitoring for quantitative proteomics: A tutorial, Mol. Syst. Biol., vol. 4, no. 1, p. 222, 2008.

[7]

L. C. Gillet, P. Navarro, S. Tate, H. Röst, N. Selevsek, L. Reiter, R. Bonner, and R. Aebersold, Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: A new concept for consistent and accurate proteome analysis, Mol. Cell. Proteom., vol. 11, no. 6, p. O111.016717, 2012.

[8]
A. Doerr, DIA mass spectrometry, Nat. Meth., vol. 12, no. 1, p. 35, 2015.
[9]

P. Sinitcyn, J. D. Rudolph, and J. Cox, Computational methods for understanding mass spectrometry–based shotgun proteomics data, Annu. Rev. Biomed. Data Sci., vol. 1, pp. 207–234, 2018.

[10]

J. Cox, N. Neuhauser, A. Michalski, R. A. Scheltema, J. V. Olsen, and M. Mann, Andromeda: A peptide search engine integrated into the MaxQuant environment, J. Proteome Res., vol. 10, no. 4, pp. 1794–1805, 2011.

[11]

D. N. Perkins, D. J. C. Pappin, D. M. Creasy, and J. S. Cottrell, Probability-based protein identification by searching sequence databases using mass spectrometry data, 3.0.CO;2-2">Electrophoresis, vol. 20, no. 18, pp. 3551–3567, 1999.

[12]

J. K. Eng, A. L. McCormack, and J. R. Yates, An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database, J. Am. Soc. Mass Spectrom., vol. 5, no. 11, pp. 976–989, 1994.

[13]

M. Scigelova and A. Makarov, Orbitrap mass analyze—overview and applications in proteomics, Proteomics, vol. 6, no. S2, pp. 16–21, 2006.

[14]
S. Sun, F. Yang, Q. Yang, H. Zhang, Y. Wang, D. Bu, and B. Ma, MS-simulator: Predicting Y-ion intensities for peptides with two charges based on the intensity ratio of neighboring ions, J. Proteome Res., vol. 11, no. 9, pp. 4509–4516, 2012.
[15]

J. E. Elias, F. D. Gibbons, O. D. King, F. P. Roth, and S. P. Gygi, Intensity-based protein identification by machine learning from a library of tandem mass spectra, Nat. Biotechnol., vol. 22, no. 2, pp. 214–219, 2004.

[16]
R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang, and P. Radivojac, A machine learning approach to predicting peptide fragmentation spectra, in Proc. Pacific Symposium on Biocomputing, Kohala Coast, HI, USA, pp. 219–230.
[17]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436–444, 2015.

[18]

S. Li, R. J. Arnold, H. Tang, and P. Radivojac, On the accuracy and limits of peptide fragmentation spectrum prediction, Anal. Chem., vol. 83, no. 3, pp. 790–796, 2011.

[19]
S. Degroeve, D. Maddelein, and L. Martens, MS2PIP prediction server: Compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation, Nucleic Acids Res., vol. 43, no. W1, pp. W326–W330, 2015.
[20]
S. Hochreiter and J. J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, pp. 1–32, 1997.
[21]
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint arXiv: 1412.3555, 2014.
[22]

Y. Yu, X. Si, C. Hu, and J. Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural Comput., vol. 31, no. 7, pp. 1235–1270, 2019.

[23]

X Zhou, W. Zeng, H. Chi, C. Luo, C. Liu, J. Zhan, S. He, and Z. Zhang, pDeep: Predicting MS/MS spectra of peptides with deep learning, Anal. Chem., vol. 89, no. 23, pp. 12690–12697, 2017.

[24]
W. Zeng, X. Zhou, W. Zhou, H. Chi, J. Zhan, and S. He, MS/MS spectrum prediction for modified peptides using pDeep2 trained by transfer learning, Anal. Chem., vol. 91, no. 15, pp. 9724–9731, 2019.
[25]

S. Gessulat, T. Schmidt, D. P. Zolg, P. Samaras, K. Schnatbaum, J. Zerweck, T. Knaute, J. Rechenberger, B. Delanghe, A. Huhmer, et al., Prosit: Proteome-wide prediction of peptide tandem mass spectra by deep learning, Nat. Meth., vol. 16, no. 6, pp. 509–518, 2019.

[26]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. of the 31st Int. Conf. on Neural Information Processing Systems, Red Hook, NY, USA, 2017, pp. 6000–6010.
[27]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv: 2010.11929, 2020.
[28]
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin Transformer: Hierarchical vision Transformer using shifted windows, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9992–10002.
[29]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv: 1810.04805, 2018.
[30]
A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, Improving language understanding by generative pretraining, https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.
[31]
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, Language models are unsupervised multitask learners, https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf, 2019.
[32]
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, arXiv preprint arXiv:2005.14165 .
[33]

R. Lou, W. Liu, R. Li, S. Li, X. He, and W. Shui, DeepPhospho accelerates DIA phosphoproteome profiling through in silico library generation, Nat. Commun., vol. 12, no. 1, p. 6685, 2021.

[34]

M. Ekvall, P. Truong, W. Gabriel, M. Wilhelm, and L. Käll, Prosit Transformer: A transformer for prediction of MS2 spectrum intensities, J. Proteome Res., vol. 21, no. 5, pp. 1359–1364, 2022.

[35]

U. H. Toprak, L. C. Gillet, A. Maiolica, P. Navarro, A. Leitner, and R. Aebersold, Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics, Mol. Cell. Proteom., vol. 13, no. 8, pp. 2056–2071, 2014.

[36]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.
[37]

D. P. Zolg, M. Wilhelm, K. Schnatbaum, J. Zerweck, T. Knaute, B. Delanghe, D. J. Bailey, S. Gessulat, H.-C. Ehrlich, M. Weininger, et al., Building proteome tools based on a complete synthetic human proteome, Nature Methods, vol. 14, no. 3, pp. 259–262, 2017.

[38]

J. V. Olsen, B. Macek, O. Lange, A. Makarov, S. Horning, and M. Mann, Higher-energy C-trap dissociation for peptide modification analysis, Nat. Meth., vol. 4, no. 9, pp. 709–712, 2007.

[39]

D. B Bekker-Jensen, C. D Kelstrup, T. S Batth, S. C Larsen, C. Haldrup, J. B Bramsen, K. D Sorensen, S. Hoyer, T. F Orntoft, C. L Andersen, et al., An optimized shotgun strategy for the rapid generation of comprehensive human proteomes, Cell Systems, vol. 4, no. 6, pp. 587–599, 2017.

Big Data Mining and Analytics
Pages 577-589
Cite this article:
Ren Y, Wang Y, Han W, et al. DMSS: An Attention-Based Deep Learning Model for High-Quality Mass Spectrometry Prediction. Big Data Mining and Analytics, 2024, 7(3): 577-589. https://doi.org/10.26599/BDMA.2024.9020006

1220

Views

509

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 07 November 2023
Revised: 25 December 2023
Accepted: 29 January 2024
Published: 28 August 2024
© The author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return