AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Effective Identification and Annotation of Fungal Genomes

College of Computer Science, Nankai University, Tianjin 300350, China
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Show Author Information

Abstract

In the past few decades, the dangers of mycosis have caused widespread concern. With the development of the sequencing technology, the effective analysis of fungal sequencing data has become a hotspot. With the gradual increase of fungal sequencing data, there is now a lack of sufficient approaches for the identification and functional annotation of fungal chromosomal genomes. To overcome this challenge, this paper firstly deals with the approaches of the identification and annotation of fungal genomes based on short and long reads sequenced by using multiple platforms such as Illumina and Pacbio. Then this paper develops an automated bioinformatics pipeline called PFGI for the identification and annotation task. The experimental evaluation on a real-world dataset ENA (European Nucleotide Archive) shows that PFGI provides a user-friendly way to perform fungal identification and annotation based on the sequencing data analysis, and could provide accurate analyzing results, accurate to the species level (97% sequence identity).

Electronic Supplementary Material

Download File(s)
jcst-36-2-248-Highlights.pdf (416.3 KB)

References

[1]

Desprez-Loustau M L, Robin C, Buée M, Courtecuisse R, Garbaye J, Suffert F, Sache I, Rizzo D M. The fungal dimension of biological invasions. Trends in Ecology & Evolution, 2007, 22(9): 472-480. https://doi.org/10.1016/j.tree.2007.04.005.

[2]

Schuster S C. Next-generation sequencing transforms today’s biology. Nature Methods, 2008, 5(1): 16-18. https://doi.org/10.1038/nmeth1156.

[3]

van Dijk E L, Auger H, Jaszczyszyn Y, Thermes C. Ten years of next-generation sequencing technology. Trends in Genetics, 2014, 30(9): 418-426. https://doi.org/10.1016/j.tig.2014.07.001.

[4]

van Dijk E L, Jaszczyszyn Y, Naquin D, Thermes C. The third revolution in sequencing technology. Trends in Genetics, 2018, 34(9): 666-681. https://doi.org/10.1016/j.tig.2018.05.008.

[5]

Dannemiller K C, Reeves D, Bibby K, Yamamoto N, Peccia J. Fungal high-throughput taxonomic identification tool for use with next-generation sequencing (FHiTINGS). Journal of Basic Microbiology, 2014, 54(4): 315-321. https://doi.org/10.1002/jobm.201200507.

[6]

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden T L. BLAST+: Architecture and applications. BMC Bioinformatics, 2009, 10(1): Article No. 421. https://doi.org/10.1186/1471-2105-10-421.

[7]

Gweon H S, Oliver A, Taylor J, Booth T, Gibbs M, Read D S, Griffiths R I, Schonrogge K. PIPITS: An automated pipeline for analyses of fungal internal transcribed spacer sequences from the I llumina sequencing platform. Methods in Ecology and Evolution, 2015, 6(8): 973-980. https://doi.org/10.1111/2041-210X.12399.

[8]

Eng A, Verster A J, Borenstein E. Meta-LAFFA: A flexible, end-to-end, distributed computing-compatible metagenomic functional annotation pipeline. BMC Bioinformatics, 2020, 21(1): Article No. 471. https://doi.org/10.1186/s12859-020-03815-9.

[9]

Clarke E L, Taylor L J, Zhao C, Connell A, Lee J J, Fett B, Bushman F D, Bittinger K. Sunbeam: An extensible pipeline for analyzing metagenomic sequencing experiments. Microbiome, 2019, 7(1): Article No. 46. https://doi.org/10.1186/s40168-019-0658-x.

[10]

Rhoads A, Au K F. PacBio sequencing and its applications. Genomics, Proteomics & Bioinformatics, 2015, 13(5): 278-289. https://doi.org/10.1016/j.gpb.2015.08.002.

[11]

Seemann T. Prokka: Rapid prokaryotic genome annotation. Bioinformatics, 2014, 30(14): 2068-2069. https://doi.org/10.1093/bioinformatics/btu153.

[12]

Jolley K A, Maiden M C. BIGSdb: Scalable analysis of bacterial genome variation at the population level. BMC Bioinformatics, 2010, 11(1): Article No. 595. https://doi.org/10.1186/1471-2105-11-595.

[13]

Chen S, Zhou Y, Chen Y, Gu J. FASTQ: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 2018, 34(17): i884-i890. https://doi.org/10.1093/bioinformatics/bty560.

[14]

Bolger A M, Lohse M, Usadel B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 2014, 30(15): 2114-2120. https://doi.org/10.1093/bioinformatics/btu170.

[15]

Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet Journal, 2011, 17(1): 10-12. https://doi.org/10.14806/ej.17.1.200.

[16]

Benson D A, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman D J, Ostell J, Sayers E W. GenBank. Nucleic Acids Research, 2012, 41(D1): D36-D42. https://doi.org/10.1093/nar/gks1195.

[17]

Li D, Liu C M, Luo R, Sadakane K, Lam T W. MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 2015, 31(10): 1674-1676. https://doi.org/10.1093/bioinformatics/btv033.

[18]

Zerbino D R, Birney E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 2008, 18(5): 821-829. https://doi.org/10.1101/gr.074492.107.

[19]

Bankevich A, Nurk S, Antipov D et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology, 2012, 19(5): 455-477. https://doi.org/10.1089/cmb.2012.0021.

[20]

Koren S, Walenz B P, Berlin K, Miller J R, Bergman N H, Phillippy A M. Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research, 2017, 27(5): 722-736. https://doi.org/10.1101/gr.215087.116.

[21]

Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: Quality assessment tool for genome assemblies. Bioinformatics, 2013, 29(8): 1072-1075. https://doi.org/10.1093/bioinformatics/btt086.

[22]

Cock P J, Antao T, Chang J T et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 2009, 25(11): 1422-1423. https://doi.org/10.1093/bioinformatics/btp163.

[23]

Rowe W P. When the levee breaks: A practical guide to sketching algorithms for processing the flood of genomic data. Genome Biology, 2019, 20(1): Article No. 199. https://doi.org/10.1186/s13059-019-1809-x.

[24]

Li H. Minimap2: Pairwise alignment for nucleotide sequences. Bioinformatics, 2018, 34(18): 3094-3100. https://doi.org/10.1093/bioinformatics/bty191.

[25]

Kanz C, Aldebert P, Althorpe N et al. The EMBL nucleotide sequence database. Nucleic Acids Research, 2005, 33(suppl_1): D29-D33. https://doi.org/10.1093/nar/gki098.

[26]

Cornish-Bowden A. Nomenclature for incompletely specified bases in nucleic acid sequences: Recommendations 1984. Nucleic Acids Research, 1985, 13(9): 3021-3030. https://doi.org/10.1093/nar/13.9.3021.

[27]

Caboche S, Even G, Loywick A, Audebert C, Hot D. MICRA: An automatic pipeline for fast characterization of microbial genomes from high-throughput sequencing data. Genome Biology, 2017, 18(1): Article No. 233. https://doi.org/10.1186/s13059-017-1367-z.

Journal of Computer Science and Technology
Pages 248-260
Cite this article:
Liu J, Sun J-L, Liu Y-Z. Effective Identification and Annotation of Fungal Genomes. Journal of Computer Science and Technology, 2021, 36(2): 248-260. https://doi.org/10.1007/s11390-021-0856-4

311

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 01 August 2020
Accepted: 23 February 2021
Published: 05 March 2021
©Institute of Computing Technology, Chinese Academy of Sciences 2021
Return