| Sign up

PDF (7.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

DMSS: An Attention-Based Deep Learning Model for High-Quality Mass Spectrometry Prediction

Yihui Ren^{¹^,^Y}, Yu Wang^{²^,^Y}, Wenkai Han^⁴, Yikang Huang^³, Xiaoyang Hou^¹, Chunming Zhang^⁵, Dongbo Bu^¹, Xin Gao^⁴(), Shiwei Sun^¹()

1Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, and with University of Chinese Academy of Sciences, Beijing 100049, China

2Syneron Technology, Guangzhou 510000, China

3College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China

4Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia

5Insitute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, and with Western Institute of Computing Technology, Chongqing 400000, China

Show Author Information

Abstract

Accurate prediction of peptide spectra is crucial for improving the efficiency and reliability of proteomic analysis, as well as for gaining insight into various biological processes. In this study, we introduce Deep MS Simulator (DMSS), a novel attention-based model tailored for forecasting theoretical spectra in mass spectrometry. DMSS has undergone rigorous validation through a series of experiments, consistently demonstrating superior performance compared to current methods in forecasting theoretical spectra. The superior ability of DMSS to distinguish extremely similar peptides highlights the potential application of incorporating our predicted intensity information into mass spectrometry search engines to enhance the accuracy of protein identification. These findings contribute to the advancement of proteomics analysis and highlight the potential of the DMSS as a valuable tool in the field.

Keywords

mass spectrometry proteomics machine learning deep learning

References

[1]

K. Biemann, Mass spectrometry of peptides and proteins, Annu. Rev. Biochem., vol. 61, pp. 977–1010, 1992.

Crossref Google Scholar

[2]

R. Aebersold and M. Mann, Mass spectrometry-based proteomics, Nature, vol. 422, pp. 198–207, 2003.

Crossref Google Scholar

[3]

M. Wilhelm, D. P. Zolg, M. Graber, S. Gessulat, T. Schmidt, K. Schnatbaum, C. Schwencke-Westphal, P. Seifert, N. de Andrade Krätzig, J. Zerweck, et al., Deep learning boosts sensitivity of mass spectrometry-based immunopeptidomics, Nat. Commun., vol. 12, no. 1, p. 3346, 2021.

Crossref Google Scholar

[4]

Z. Mao, R. Zhang, L. Xin, and M. Li, Mitigating the missing-fragmentation problem in de novo peptide sequencing with a two-stage graph-based deep learning model, Nat. Mach. Intell., vol. 5, no. 11, pp. 1250–1260, 2023.

Crossref Google Scholar

[5]

J. Cox, Prediction of peptide mass spectral libraries with machine learning, Nature Biotechnology, vol. 41, no. 1, pp. 33–43, 2023.

Crossref Google Scholar

[6]

V. Lange, P. Picotti, B. Domon, and R. Aebersold, Selected reaction monitoring for quantitative proteomics: A tutorial, Mol. Syst. Biol., vol. 4, no. 1, p. 222, 2008.

Crossref Google Scholar

[7]

L. C. Gillet, P. Navarro, S. Tate, H. Röst, N. Selevsek, L. Reiter, R. Bonner, and R. Aebersold, Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: A new concept for consistent and accurate proteome analysis, Mol. Cell. Proteom., vol. 11, no. 6, p. O111.016717, 2012.

Crossref Google Scholar

[8]

A. Doerr, DIA mass spectrometry, Nat. Meth., vol. 12, no. 1, p. 35, 2015.

[9]

P. Sinitcyn, J. D. Rudolph, and J. Cox, Computational methods for understanding mass spectrometry–based shotgun proteomics data, Annu. Rev. Biomed. Data Sci., vol. 1, pp. 207–234, 2018.

Crossref Google Scholar

[10]

J. Cox, N. Neuhauser, A. Michalski, R. A. Scheltema, J. V. Olsen, and M. Mann, Andromeda: A peptide search engine integrated into the MaxQuant environment, J. Proteome Res., vol. 10, no. 4, pp. 1794–1805, 2011.

Crossref Google Scholar

[11]

D. N. Perkins, D. J. C. Pappin, D. M. Creasy, and J. S. Cottrell, Probability-based protein identification by searching sequence databases using mass spectrometry data, 3.0.CO;2-2">Electrophoresis, vol. 20, no. 18, pp. 3551–3567, 1999.

Crossref Google Scholar

[12]

J. K. Eng, A. L. McCormack, and J. R. Yates, An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database, J. Am. Soc. Mass Spectrom., vol. 5, no. 11, pp. 976–989, 1994.

Crossref Google Scholar

[13]

M. Scigelova and A. Makarov, Orbitrap mass analyze—overview and applications in proteomics, Proteomics, vol. 6, no. S2, pp. 16–21, 2006.

Crossref Google Scholar

[14]

S. Sun, F. Yang, Q. Yang, H. Zhang, Y. Wang, D. Bu, and B. Ma, MS-simulator: Predicting Y-ion intensities for peptides with two charges based on the intensity ratio of neighboring ions, J. Proteome Res., vol. 11, no. 9, pp. 4509–4516, 2012.

[15]

J. E. Elias, F. D. Gibbons, O. D. King, F. P. Roth, and S. P. Gygi, Intensity-based protein identification by machine learning from a library of tandem mass spectra, Nat. Biotechnol., vol. 22, no. 2, pp. 214–219, 2004.

Crossref Google Scholar

[16]

R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang, and P. Radivojac, A machine learning approach to predicting peptide fragmentation spectra, in Proc. Pacific Symposium on Biocomputing, Kohala Coast, HI, USA, pp. 219–230.

[17]

Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, vol. 521, no. 7553, pp. 436–444, 2015.

Crossref Google Scholar

[18]

S. Li, R. J. Arnold, H. Tang, and P. Radivojac, On the accuracy and limits of peptide fragmentation spectrum prediction, Anal. Chem., vol. 83, no. 3, pp. 790–796, 2011.

Crossref Google Scholar

[19]

S. Degroeve, D. Maddelein, and L. Martens, MS2PIP prediction server: Compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation, Nucleic Acids Res., vol. 43, no. W1, pp. W326–W330, 2015.

[20]

S. Hochreiter and J. J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, pp. 1–32, 1997.

[21]

J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv preprint arXiv: 1412.3555, 2014.

[22]

Y. Yu, X. Si, C. Hu, and J. Zhang, A review of recurrent neural networks: LSTM cells and network architectures, Neural Comput., vol. 31, no. 7, pp. 1235–1270, 2019.

Crossref Google Scholar

[23]

X Zhou, W. Zeng, H. Chi, C. Luo, C. Liu, J. Zhan, S. He, and Z. Zhang, pDeep: Predicting MS/MS spectra of peptides with deep learning, Anal. Chem., vol. 89, no. 23, pp. 12690–12697, 2017.

Crossref Google Scholar

[24]

W. Zeng, X. Zhou, W. Zhou, H. Chi, J. Zhan, and S. He, MS/MS spectrum prediction for modified peptides using pDeep2 trained by transfer learning, Anal. Chem., vol. 91, no. 15, pp. 9724–9731, 2019.

[25]

S. Gessulat, T. Schmidt, D. P. Zolg, P. Samaras, K. Schnatbaum, J. Zerweck, T. Knaute, J. Rechenberger, B. Delanghe, A. Huhmer, et al., Prosit: Proteome-wide prediction of peptide tandem mass spectra by deep learning, Nat. Meth., vol. 16, no. 6, pp. 509–518, 2019.

Crossref Google Scholar

[26]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, in Proc. of the 31st Int. Conf. on Neural Information Processing Systems, Red Hook, NY, USA, 2017, pp. 6000–6010.

[27]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv: 2010.11929, 2020.

[28]

Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, Swin Transformer: Hierarchical vision Transformer using shifted windows, in Proc. IEEE/CVF Int. Conf. Computer Vision (ICCV), Montreal, Canada, 2021, pp. 9992–10002.

[29]

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv: 1810.04805, 2018.

[30]

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, Improving language understanding by generative pretraining, https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf, 2018.

[31]

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, Language models are unsupervised multitask learners, https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf, 2019.

[32]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., Language models are few-shot learners, arXiv preprint arXiv:2005.14165 .

[33]

R. Lou, W. Liu, R. Li, S. Li, X. He, and W. Shui, DeepPhospho accelerates DIA phosphoproteome profiling through in silico library generation, Nat. Commun., vol. 12, no. 1, p. 6685, 2021.

Crossref Google Scholar

[34]

M. Ekvall, P. Truong, W. Gabriel, M. Wilhelm, and L. Käll, Prosit Transformer: A transformer for prediction of MS2 spectrum intensities, J. Proteome Res., vol. 21, no. 5, pp. 1359–1364, 2022.

Crossref Google Scholar

[35]

U. H. Toprak, L. C. Gillet, A. Maiolica, P. Navarro, A. Leitner, and R. Aebersold, Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics, Mol. Cell. Proteom., vol. 13, no. 8, pp. 2056–2071, 2014.

Crossref Google Scholar

[36]

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.

[37]

D. P. Zolg, M. Wilhelm, K. Schnatbaum, J. Zerweck, T. Knaute, B. Delanghe, D. J. Bailey, S. Gessulat, H.-C. Ehrlich, M. Weininger, et al., Building proteome tools based on a complete synthetic human proteome, Nature Methods, vol. 14, no. 3, pp. 259–262, 2017.

Crossref Google Scholar

[38]

J. V. Olsen, B. Macek, O. Lange, A. Makarov, S. Horning, and M. Mann, Higher-energy C-trap dissociation for peptide modification analysis, Nat. Meth., vol. 4, no. 9, pp. 709–712, 2007.

Crossref Google Scholar

[39]

D. B Bekker-Jensen, C. D Kelstrup, T. S Batth, S. C Larsen, C. Haldrup, J. B Bramsen, K. D Sorensen, S. Hoyer, T. F Orntoft, C. L Andersen, et al., An optimized shotgun strategy for the rapid generation of comprehensive human proteomes, Cell Systems, vol. 4, no. 6, pp. 587–599, 2017.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 7 Issue 3,
September 2024

Pages 577-589

DOI: 10.26599/BDMA.2024.9020006

Cite this article:

Ren Y, Wang Y, Han W, et al. DMSS: An Attention-Based Deep Learning Model for High-Quality Mass Spectrometry Prediction. Big Data Mining and Analytics, 2024, 7(3): 577-589. https://doi.org/10.26599/BDMA.2024.9020006

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号