Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Identifying cancer-related differentially expressed genes provides significant information for diagnosing tumors, predicting prognoses, and effective treatments. Recently, deep learning methods have been used to perform gene differential expression analysis using microarray-based high-throughput gene profiling and have achieved good results. In this study, we proposed a new robust multiple-datasets-based semi-supervised learning model, MSSL, to perform tumor type classification and candidate cancer-specific biomarkers discovery across multiple tumor types and multiple datasets, which addressed the following long-lasting obstacles: (1) the data volume of the existing single dataset is not enough to fully exert the advantages of deep learning; (2) a large number of datasets from different research institutions cannot be effectively used due to inconsistent internal variances and low quality; (3) relatively uncommon cancers have limited effects on deep learning methods. In our article, we applied MSSL to The Cancer Genome Atlas (TCGA) and the Gene Expression Comprehensive Database (GEO) pan-cancer normalized-level3 RNA-seq data and got 97.6% final classification accuracy, which had a significant performance leap compared with previous approaches. Finally, we got the ranking of the importance of the corresponding genes for each cancer type based on classification results and validated that the top genes selected in this way were biologically meaningful for corresponding tumors and some of them had been used as biomarkers, which showed the efficacy of our method.
Baldi P, Long AD (2001) A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 17: 509−519
Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R (2007) NCBI GEO: mining tens of millions of expression profiles — Database and tools update. Nucleic Acids Res 35: D760−D765
Carvalho BS, Irizarry RA (2010) A framework for oligonucleotide microarray preprocessing. Bioinformatics 26: 2363−2367
Chen C-R, McLachlan SM, Hubbard PA, McNally R, Murali R, Rapoport B (2018) Structure of a thyrotropin receptor monoclonal antibody variable region provides insight into potential mechanisms for its inverse agonist activity. Thyroid 28: 933−940
Cheriyath V, Leaman DW, Borden EC (2011) Emerging roles of FAM14 family members (G1P3/ISG 6–16 and ISG12/IFI27) in innate immunity and cancer. J Interf Cytok Res 31: 173−181
da Silveira W, Palma P, Sicchieri R, Villacis RA, Mandarano L, Oliveira T, Antonio H, Andrade J, Muglia V, Rogatto S (2017) Transcription factor networks derived from breast cancer stem cells control the immune response in the basal subtype. Sci Rep 7(1): 2851. https://doi.org/10.1038/s41598-017-02761-6
Dai W, Chang Q, Peng W, Zhong J, Li Y (2020) Network embedding the protein–protein interaction network for human essential genes identification. Genes 11: 153. https://doi.org/10.3390/genes11020153
Danaee P, Ghaeini R, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. Pacific symposium on biocomputing 2017: 219−229
Díaz-Uriarte R, de Andres SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7: 3. https://doi.org/10.1186/1471-2105-7-3
Gautier L, Cope L, Bolstad BM, Irizarry RA (2004) Affy — Analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20: 307−315
Jafari P, Azuaje F (2006) An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Med Inform Decis Mak 6: 27. https://doi.org/10.1186/1472-6947-6-27
Kuang S, Wei Y, Wang L (2021) Expression-based prediction of human essential genes and candidate lncRNAs in cancer cells. Bioinformatics 37: 396−403
Leary RJ, Kinde I, Diehl F, Schmidt K, Clouser C, Duncan C, Antipova A, Lee C, McKernan K, Francisco M (2010) Development of personalized tumor biomarkers using massively parallel sequencing. Sci Transl Med 2: 20ra14. https://doi.org/10.1126/scitranslmed.3000702
Liu JJ, Cutler G, Li W, Pan Z, Peng S, Hoey T, Chen L, Ling XB (2005) Multiclass cancer classification and biomarker discovery using GA-based algorithms. Bioinformatics 21: 2691−2697
Mooney SM, Talebian V, Jolly MK, Jia D, Gromala M, Levine H, McConkey BJ (2017) The GRHL2/ZEB feedback loop — A key axis in the regulation of EMT in breast cancer. J Cell Biochem 118: 2559−2570
Novaković S (2004) Tumor markers in clinical oncology. Radiol Oncol 38(2): 73−83 + 155
The Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM (2013) The cancer genome atlas pan-cancer analysis project. Nat Genet 45: 1113−1120
Tseng I, Yeh MM, Yang C-Y, Jeng Y-M (2015) NKX6-1 is a novel immunohistochemical marker for pancreatic and duodenal neuroendocrine tumors. Am J Surg Pathol 39: 850−857
Wang H (2015) The distribution and expression of BAMBI in breast cancer cell lines. Open Access Library Journal 2: 1−7
Yang B, Li M, Tang W, Liu W, Zhang S, Chen L, Xia J (2018) Dynamic network biomarker indicates pulmonary metastasis at the tipping point of hepatocellular carcinoma. Nat Commun 9(1): 678. https://doi.org/10.1038/s41467-018-03024-2
Zhu H, Peng Y-G, Ma S-G, Liu H (2015) TPO gene mutations associated with thyroid carcinoma: case report and literature review. Cancer Biomark 15: 909−913
Zhuo H, Zhao Y, Cheng X, Xu M, Wang L, Lin L, Lyu Z, Hong X, Cai J (2019) Tumor endothelial cell-derived cadherin-2 promotes angiogenesis and has prognostic significance for lung adenocarcinoma. Mol cancer 18(1): 34. https://doi.org/10.1186/s12943-019-0987-1
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.