AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.9 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

DeepFilter: A Deep Learning Based Variant Filter for VarDict

School of Software, Shandong University, Jinan 250100, China
Shenzhen Research Institute of Shandong University, Shenzhen 518057, China
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
Institute for Computer Science, Johannes Gutenberg University, Mainz 55128, Germany
Show Author Information

Abstract

With the development of sequencing technologies, somatic mutation analysis has become an important component in cancer research and treatment. VarDict is a commonly used somatic variant caller for this task. Although the heuristic-based VarDict algorithm exhibits high sensitivity and versatility, it may detect higher amounts of false positive variants than callers, limiting its clinical practicality. To address this problem, we propose DeepFilter, a deep-learning based filter for VarDict, which can filter out the false positive variants detected by VarDict effectively. Our approach trains two models for insertion-deletion mutations (InDels) and single nucleotide variants (SNVs), respectively. Experiments show that DeepFilter can filter at least 98.5% of false positive variants and retain 93.5% of true positive variants for InDels and SNVs in the commonly used tumor-normal paired mode. Source code and pre-trained models are available at https://github.com/LeiHaoa/DeepFilter.

References

[1]
D. Benjamin, T. Sato, K. Cibulskis, G. Getz, C. Stewart, and L. Lichtenstein, Calling somatic SNVs and indels with mutect2, arXiv preprint arXiv: 10.1101/861054, 2019.
[2]
E. Garrison and G. Marth, Haplotype-based variant detection from short-read sequencing, arXiv preprint arXiv: 1207.3907, 2012.
[3]
S. Kim, K. Scheffler, A. L. Halpern, M. A. Bekritsky, E. Noh, M. Kallberg, X. Chen, Y. Kim, D. Beyter, P. Krusche, et al., Strelka2: Fast and accurate calling of germline and somatic variants, Nature Methods, vol. 15, no. 8, pp. 591594, 2018.
[4]
D. C. Koboldt, Q. Zhang, D. E. Larson, D. Shen, M. D. McLellan, L. Lin, C. A. Miller, E. R. Mardis, L. Ding, and R. K. Wilson, Varscan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing, Genome Res., vol. 22, no. 3, pp. 568576, 2012.
[5]
R. Luo, F. J. Sedlazeck, T. -W. Lam, and M. C. Schatz, A multi-task convolutional deep neural network for variant calling in single molecule sequencing, Nature Communications, vol. 10, no. 1, p. 998, 2019.
[6]
R. Poplin, P. -C. Chang, D. Alexander, S. Schwartz, T. Colthurst, A. Ku, D. Newburger, J. Dijamco, N. Nguyen, P. T. Afshar, et al., A universal SNP and small-indel variant caller using deep neural networks, Nature Biotech., vol. 36, no. 10, pp. 983987, 2018.
[7]
Z. Lai, A. Markovets, M. Ahdesmaki, B. Chapman, O. Hofmann, R. McEwen, J. Johnson, B. Dougherty, J. C. Barrett, and J. R. Dry, Vardict: A novel and versatile variant caller for next-generation sequencing in cancer research, Nucleic Acids Research, vol. 44, no. 11, p. e108, 2016.
[8]
S. Sandmann, A. O. D. Graaf, M. Karimi, B. A. V. D. Reijden, E. Hellström-Lindberg, J. H. Jansen, and M. Dugas, Evaluating variant calling tools for non-matched next-generation sequencing data, Scientific Rep., vol. 7, no. 1, p. 43169, 2017.
[9]
X. He, S. Chen, R. Li, X. Han, Z. He, D. Yuan, S. Zhang, X. Duan, and B. Niu, Comprehensive fundamental somatic variant calling and quality management strategies for human cancer genomes, Briefings in Bioinformatics, vol. 22, no. 3, p. bbaa083, 2021.
[10]
X. Bian, B. Zhu, M. Wang, Y. Hu, Q. Chen, C. Nguyen, B. Hicks, and D. Meerzaman, Comparing the performance of selected variant callers using synthetic data and genome segmentation, BMC Bioinformatics, vol. 19, no. 1, p. 429, 2018.
[11]
P. Cingolani, A. Platts, L. L. Wang, M. Coon, T. Nguyen, L. Wang, S. J. Land, X. Lu, and D. M. Ruden, A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3, Fly, vol. 6, no. 2, pp. 8092, 2012.
[12]
P. Danecek, A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, et al., The variant call format and VCFtools, Bioinformatics, vol. 27, no. 15, pp. 21562158, 2011.
[13]
C. P. Wardell, C. Ashby, and M. A. Bauer, FiNGS: High quality somatic mutations using filters for next generation sequencing, BMC Bioinformatics, vol. 22, no. 1, p. 77, 2021.
[14]
S. M. E. Sahraeian, R. Liu, B. Lau, K. Podesta, M. Mohiyuddin, and H. Y. K. Lam, Deep convolutional neural networks for accurate somatic mutation detection, Nature Communications, vol. 10, no. 1, p. 1041, 2019.
[15]
M. Wang, W. Luo, K. Jones, X. Bian, R. Williams, H. Higson, D. Wu, B. Hicks, M. Yeager, and B. Zhu, SomaticCombiner: Improving the performance of somatic variant calling based on evaluation tests and a consensus approach, Scientific Reports, vol. 10, no. 1, p. 12898, 2020.
[16]
V. Ravasio, M. Ritelli, A. Legati, and E. Giacopuzzi, Garfield-NGS: Genomic variants filtering by deep learning models in NGS, Bioinformatics, vol. 34, no. 17, pp. 30383040, 2018.
[17]
X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proc. 13th International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 2010, pp. 249256.
[18]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.
[19]
W. Chen, Y. Zhao, X. Chen, Z. Yang, X. Xu, Y. Bi, V. Chen, J. Li, H. Choi, B. Ernest, et al., A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples, Nature Biotechnology, vol. 39, no. 9, pp. 11031114, 2020.
[20]
R. V. Guimera, Bcbio-nextgen: Automated, distributed next-gen sequencing pipeline, Embnet Journal, vol. 17, no. B, p. 30, 2011.
[21]
A. D. Ewing, K. E. Houlahan, Y. Hu, K. Ellrott, C. Caloian, T. N. Yamaguchi, J. C. Bare, C. P’ng, D. Waggott, V. Y. Sabelnykova, et al., Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection, Nature Methods, vol. 12, no. 7, pp. 623630, 2015.
Tsinghua Science and Technology
Pages 665-672
Cite this article:
Zhang H, Yin Z, Wei Y, et al. DeepFilter: A Deep Learning Based Variant Filter for VarDict. Tsinghua Science and Technology, 2023, 28(4): 665-672. https://doi.org/10.26599/TST.2022.9010032

2576

Views

557

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 26 February 2022
Revised: 27 April 2022
Accepted: 22 August 2022
Published: 06 January 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return