Journal Home > Online First

With the development of high-throughput technologies, RNA sequencing (RNA-seq) has become a widely used technology in biological studies and thus a large number of RNA-seq data are emerging and remain to be analyzed. Although there are many different options for analysis methods and tools, a unified pipeline for RNA-seq data analysis is always necessary for a laboratory. Given the update of new methods and tools, I summarized my customized analysis codes to generate an updated pipeline for RNA-seq data analysis. During aging, gene mutations accumulate, and hormone regulation is disrupted, which may exacerbate age-related diseases. Therefore, we generated a dataset from mice with a gene mutation or not and under different hormone treatments to study the effects of two factors, i.e., hormone and gene mutation, on the transcriptome. Based on the Nextflow nf-core rnaseq pipeline, this project established this pipeline consisting of three stages: (1) upstream analysis containing quality control of fastq files before and after trimming, trimming, alignment, and quantification; (2) midstream analysis containing count normalization, differentially expressed genes analysis, and visualization via boxplot, PCA, t-SNE, sample distance heatmap, MA plot, volcano plot, and gene expression heatmap; and (3) downstream analysis containing functional enrichments of KEGG pathways and GO terms. Results showed distinct effects of the single factor as well as interactive effects of the two factors. Codes are also provided for readers who want to customize their analysis pipeline adapted from this pipeline easily.


menu
Abstract
Full text
Outline
About this article

Pipeline for RNA sequencing data analysis by combination of Nextflow and R

Show Author's information Jia-Hua Qu1,2,3( )
Department of Pathology, University of California San Francisco, San Francisco, CA 94143, USA
Department of Immunology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
Department of Host-Microbe Interactions, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA

Abstract

With the development of high-throughput technologies, RNA sequencing (RNA-seq) has become a widely used technology in biological studies and thus a large number of RNA-seq data are emerging and remain to be analyzed. Although there are many different options for analysis methods and tools, a unified pipeline for RNA-seq data analysis is always necessary for a laboratory. Given the update of new methods and tools, I summarized my customized analysis codes to generate an updated pipeline for RNA-seq data analysis. During aging, gene mutations accumulate, and hormone regulation is disrupted, which may exacerbate age-related diseases. Therefore, we generated a dataset from mice with a gene mutation or not and under different hormone treatments to study the effects of two factors, i.e., hormone and gene mutation, on the transcriptome. Based on the Nextflow nf-core rnaseq pipeline, this project established this pipeline consisting of three stages: (1) upstream analysis containing quality control of fastq files before and after trimming, trimming, alignment, and quantification; (2) midstream analysis containing count normalization, differentially expressed genes analysis, and visualization via boxplot, PCA, t-SNE, sample distance heatmap, MA plot, volcano plot, and gene expression heatmap; and (3) downstream analysis containing functional enrichments of KEGG pathways and GO terms. Results showed distinct effects of the single factor as well as interactive effects of the two factors. Codes are also provided for readers who want to customize their analysis pipeline adapted from this pipeline easily.

Keywords: pipeline, RNA sequencing, R, bioinformatics analysis, Nextflow

References(19)

[1]

Qu, J.-H., Telljohann, R., Byshkov, R. et al. Characterization of diverse populations of sinoatrial node cells and their proliferation potential at single nucleus resolution. Heliyon, 2023, 9: e12708. https://doi.org/10.1016/j.heliyon.2022.e12708

[2]

Qu, J.-H., Tarasov, K. V., Tarasova, Y. S., Chakir, K., Lakatta, E. G. Transcriptome of left ventricle and sinoatrial node in young and old C57 mice. Fortune Journal of Health Sciences, 2023, 6: 332–356. https://doi.org/10.26502/fjhs.134

[3]

Qu, J. H., Tarasov, K. V., Chakir, K., Tarasova, Y. S., Riordon, D. R., Lakatta, E. G. Proteomic landscape and deduced functions of the cardiac 14-3-3 protein interactome. Cells, 2022, 11: 3496. https://doi.org/10.3390/cells11213496

[4]

Agrimi, J., Menicucci, D., Qu, J. H., Laurino, M., MacKey, C. D., Hasnain, L., Tarasova, Y. S., Tarasov, K. V., McDevitt, R. A., Hoover, D. B. et al. Cardiac AC8 over-expression Increases Locomotion by altering heart-brain communication. JACC: Clinical Electrophysiology, 2023. https://doi.org/10.1016/j.jacep.2023.07.023

[5]

Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., Notredame, C. Nextflow enables reproducible computational workflows. Nature Biotechnology, 2017, 35: 316–319. https://doi.org/10.1038/nbt.3820

[6]

Köster, J., Rahmann, S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics, 2018, 34: 3600. https://doi.org/10.1093/bioinformatics/bty350

[7]
Voss, K., Van der Auwera, G., Gentry, J. Full-stack genomics pipelining with 993 GATK4 + WDL + Cromwell. In: Proceedings of 18th Annual Bioinformatics Open Source Conference, 2017. https://doi.org/10.7490/f1000research.1114634.1
[8]

Ewels, P. A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M. U., Di Tommaso, P., Nahnsen, S. The nf-core framework for community-curated bioinformatics pipelines. Nature Biotechnology, 2020, 38: 276–278. https://doi.org/10.1038/s41587-020-0439-x

[9]

Tarasov, K. V., Chakir, K., Riordon, D. R., Lyashkov, A. E., Ahmet, I., Perino, M. G., Silvester, A. J., Zhang, J., Wang, M. Y., Lukyanenko, Y. O. et al. A remarkable adaptive paradigm of heart performance and protection emerges in response to marked cardiac-specific overexpression of ADCY8. eLife, 2022, 11: 80949. https://doi.org/10.7554/elife.80949

[10]

Aging Biomarker Consortium, Bao, H. N., Cao, J. N. et al. Biomarkers of aging. Science China Life Science, 2023, 66: 893–1066. https://doi.org/10.1007/s11427-023-2305-0

[11]

Dollé, M. E. T., Giese, H., Hopkins, C. L., Martus, H. J., Hausdorff, J. M., Vijg, J. Rapid accumulation of genome rearrangements in liver but not in brain of old mice. Nature Genetics, 1997, 17: 431–434. https://doi.org/10.1038/ng1297-431

[12]

Stuart, G. R., Oda, Y., de Boer, J. G., Glickman, B. W. Mutation frequency and specificity with age in liver, bladder and brain of lacI transgenic mice. Genetics, 2000, 154: 1291–1300. https://doi.org/10.1093/genetics/154.3.1291

[13]

Broekmans, F. J., Soules, M. R., Fauser, B. C. Ovarian aging: Mechanisms and clinical consequences. Endocrine Reviews, 2009, 30: 465–493. https://doi.org/10.1210/er.2009-0006

[14]

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J. et al. Welcome to the tidyverse. Journal of Open Source Software, 2019, 4: 1686. https://doi.org/10.21105/joss.01686

[15]

Love, M. I., Huber, W., Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology, 2014, 15: 550. https://doi.org/10.1186/s13059-014-0550-8

[16]

Van der Maaten, L., Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9: 1012.

[17]

Van der Maaten, L. Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research, 2014, 15: 3221–3245.

[18]

Wu, T. Z., Hu, E. Q., Xu, S. B., Chen, M. J., Guo, P. F., Dai, Z. H., Feng, T. Z., Zhou, L., Tang, W. L., Zhan, L. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation, 2021, 2. https://doi.org/10.1016/j.xinn.2021.100141

[19]

Kurtzer, G. M., Sochat, V., Bauer, M. W. Singularity: Scientific containers for mobility of compute. PLoS One, 2017, 12: e0177459. https://doi.org/10.1371/journal.pone.0177459

Publication history
Copyright
Rights and permissions

Publication history

Received: 11 October 2023
Revised: 17 October 2023
Accepted: 22 October 2023
Published: 22 December 2023

Copyright

© The Author(s) 2023. Aging Research published by Tsinghua University Press.

Rights and permissions

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Return