Abstract
With the development of sequencing technologies, somatic mutation analysis has become an important component in cancer research and treatment. VarDict is a commonly used somatic variant caller for this task. Although the heuristic-based VarDict algorithm exhibits high sensitivity and versatility, it may detect higher amounts of false positive variants than callers, limiting its clinical practicality. To address this problem, we propose DeepFilter, a deep-learning based filter for VarDict, which can filter out the false positive variants detected by VarDict effectively. Our approach trains two models for insertion-deletion mutations (InDels) and single nucleotide variants (SNVs), respectively. Experiments show that DeepFilter can filter at least 98.5% of false positive variants and retain 93.5% of true positive variants for InDels and SNVs in the commonly used tumor-normal paired mode. Source code and pre-trained models are available at https://github.com/LeiHaoa/DeepFilter.