| Sign up

PDF (10.4 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Text Classification Techniques: A Holistic Review, Observational Analysis, and Experimental Investigation

Kamal Taha^¹(), Paul D. Yoo^², Chan Yeun^¹, Aya Taha^³

1Department of Computer Science, Khalifa University, Abu Dhabi 127788, United Arab Emirates

2School of Computing and Mathematical Sciences, Birkbeck College, University of London, London, WC1E 7HU, UK

4Department of Science, Brighton College, Dubai 122002, United Arab Emirates

Show Author Information

Abstract

This review article provides a thorough assessment of modern and innovative algorithms for text classification through both observational and experimental evaluations. We propose a new classification system, grounded in methodology, to categorize text classification algorithms into an organized structure from general categories down to particular fine-grained techniques. Drawing on more than 100 academic papers from prominent publishers, our extensive review spans a wide range of algorithms, encompassing traditional, deep learning, and emerging approaches. Through observational studies and comparative experiments among various algorithms, techniques, and methodological categories, we offer detailed insights into the area of text classification. The goal of this survey is to assist scholars in choosing the right methods for specific projects while encouraging further advancements in this area. This detailed examination not only contributes to the scholarly conversation on text classification but also seeks to direct future progress by identifying promising avenues for innovation and enhancement. The primary contributions of this article include the sophisticated methodological classification, a thorough review and examination of state-of-the-art algorithms, along with observational and experimental assessments, and a visionary outlook on the future development of text classification methods.

Keywords

text classification classical methods deep learning methods experimental evaluation

References

[1]

S. Liu, Liu, X. Wang, C. Collins, W. Dou, F. Ouyang, M. El-Assady, L. Jiang, and D. A. Keim, Bridging text visualization and mining: A task-driven survey, IEEE Trans. Vis. Comput. Graph., vol. 25, no. 7, pp. 2482–2504, 2019.

Crossref Google Scholar

[2]

L. Ignaczak, G. Goldschmidt, C. A. da Costa, and R. da R. Righi, Text mining in cybersecurity: A systematic literature review, ACM Comput. Surv., vol. 54, no. 7, p. 140, 2021.

Crossref Google Scholar

[3]

A. Joshi, S. Karimi, R. Sparks, C. Paris, and C. R. Macintyre, Survey of text-based epidemic intelligence: A computational linguistics perspective, ACM Comput. Surv., vol. 52, no. 6, p. 119, 2019.

Crossref Google Scholar

[4]

K. Taha and R. Elmasri, BusSEngine: A business search engine, Knowl. Inf. Syst., vol. 23, no. 2, pp. 153–197, 2010.

Crossref Google Scholar

[5]

E. H. Park and V. C, Storey, Emotion ontology studies: A framework for expressing feelings digitally and its application to sentiment analysis, ACM Comput. Surv., vol. 55, no. 9, p. 181, 2023.

Crossref Google Scholar

[6]

L. Benedetto, P. Cremonesi, A. Caines, P. Buttery, A. Cappelli, A. Giussani, and R. Turrin, A survey on recent approaches to question difficulty estimation from text, ACM Comput. Surv., vol. 55, no. 9, p. 178, 2023.

Crossref Google Scholar

[7]

L. Ying and L. Huidi, Review of text analysis based on deep learning, in Proc. Int. Conf. Intelligent Computing and Human-Computer Interaction (ICHCI ), Sanya, China, 2020, pp. 384–388.

[8]

Y. Lan, Y. Hao, K. Xia, B. Qian, and C. Li, Stacked residual recurrent neural networks with cross-layer attention for text classification, IEEE Access, vol. 8, pp. 70401–70410, 2020.

Crossref Google Scholar

[9]

K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep residual networks, in Proc. 14^th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 630–645.

[10]

X. Ren, Y. Zhou, Z. Huang, J. Sun, X. Yang, and K. Chen, A novel text structure feature extractor for Chinese scene text detection and recognition, IEEE Access, vol. 5, pp. 3193–3204, 2017.

Crossref Google Scholar

[11]

Y. Kim, Convolutional neural networks for sentence classification, in Proc. Conf. Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 1746–1751.

[12]

X. Dong, R. Hu, Y. Li, M. Liu, and Y. Xiao, Text sentiment polarity classification based on TextCNN-SVM combination model, in Proc. 2021 IEEE Int. Conf. Artificial Intelligence and Computer Applications (ICAICA ), Dalian, China, 2021, pp. 325–328.

[13]

B. Qin, Y. Wang, and C. Ma, API call based ransomware dynamic detection approach using TextCNN, in Proc. 2020 Int. Conf. Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE ), Fuzhou, China, 2020, pp. 162–166.

[14]

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. Conf. North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Volume 1 (Long and Short Papers ), Minneapolis, MN, USA, 2019, pp. 4171–4186.

[15]

H. Xu, B. Liu, L. Shu, and P. S. Yu, BERT post-training for review reading comprehension and aspect-based sentiment analysis, in Proc. Conf. North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Volume 1 (Long and Short Papers ), Minneapolis, MN, USA, 2019, pp. 2324–2335.

[16]

B. Zeng, H. Yang, R. Xu, W. Zhou, and X. Han, LCF: A local context focus mechanism for aspect-based sentiment classification, Appl. Sci., vol. 9, no. 16, p. 3389, 2019.

[17]

Y. Song, J. Wang, T. Jiang, Z. Liu, and Y. Rao, Attentional encoder network for targeted sentiment classification, arXiv preprint arXiv: 1902.09314, 2019.

[18]

A. Coenen, E. Reif, A. Yuan, B. Kim, A. Pearce, F. Viégas, and M. Wattenberg, Visualizing and measuring the geometry of BERT, in Proc. 33^rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 8594–8603.

[19]

T. Kim, J. Choi, D. Edmiston, and S. G. Lee, Are pre-trained language models aware of phrases? Simple but strong baselines for grammar induction, arXiv preprint arXiv: 2002.00737, 2020.

[20]

A. Onan and H. A. Alhumyani, FuzzyTP-BERT: Enhancing extractive text summarization with fuzzy topic modeling and transformer networks, J. King Saud Univ. Comput. Inf. Sci., vol. 36, no. 6, p. 102080, 2024.

Crossref Google Scholar

[21]

A. Onan, Hierarchical graph-based text classification framework with contextual node embedding and BERT-based dynamic fusion, J. King Saud Univ. Comput. Inf. Sci., vol. 35, no. 7, p. 101610, 2023.

Crossref Google Scholar

[22]

D. Ma, S. Li, X. Zhang, and H. Wang, Interactive attention networks for aspect-level sentiment classification, in Proc. 26^th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 4068–4074.

[23]

D. Tang, B. Qin, X. Feng, and T. Liu, Effective LSTMs for target-dependent sentiment classification, in Proc. COLING 2016, the 26^th Int. Conf. Computational Linguistics : Technical Papers, Osaka, Japan, 2016, pp. 3298–3307.

[24]

P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, and B. Xu, Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling, in Proc. COLING 2016, the 26^th Int. Conf. Computational Linguistics : Technical Papers, Osaka, Japan, 2016, pp. 3485–3495.

[25]

W. Xue, W. Zhou, T. Li, and Q. Wang, MTNA: A neural multi-task model for aspect category classification and aspect term extraction on restaurant reviews, in Proc. Eighth Int. Joint Conf. Natural Language Processing (Volume 2 : Short Papers ), Taipei, China, 2017, pp. 151–156.

[26]

S. Biswas, Stock price prediction using bidirectional LSTM with attention, in Proc. 1^st Int. Conf. AI in Cybersecurity (ICAIC ), Victoria, TX, USA, 2022, pp. 1–5.

[27]

S. Su, L. Sun, Z. Zhang, G. Li, and J. Qu, MASTER: Across multiple social networks, integrate attribute and structure embedding for reconciliation, in Proc. 27^th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 3863–3869.

[28]

A. Onan and M. A. Toçoğlu, A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification, IEEE Access, vol. 9, pp. 7701–7722, 2021.

Crossref Google Scholar

[29]

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

Crossref Google Scholar

[30]

H. Huan, J. Yan, Y. Xie, Y. Chen, P. Li, and R. Zhu, Feature-enhanced nonequilibrium bidirectional long short-term memory model for Chinese text classification, IEEE Access, vol. 8, pp. 199629–199637, 2020.

Crossref Google Scholar

[31]

H. Ahmad, M. U. Asghar, M. Z. Asghar, A. Khan, and A. H. Mosavi, A hybrid deep learning technique for personality trait classification from text, IEEE Access, vol. 9, pp. 146214–146232, 2021.

Crossref Google Scholar

[32]

H. Tang, Y. Mi, F. Xue, and Y. Cao, An integration model based on graph convolutional network for text classification, IEEE Access, vol. 8, pp. 148865–148876, 2020.

Crossref Google Scholar

[33]

J. L. Wu, Y. He, L. C. Yu, and K. R. Lai, Identifying emotion labels from psychiatric social texts using a bi-directional LSTM-CNN model, IEEE Access, vol. 8, pp. 66638–66646, 2020

Crossref Google Scholar

[34]

C. Luo and H. Wang, Fuzzy forecasting for long-term time series based on time-variant fuzzy information granules, Appl. Soft Comput., vol. 88, p. 106046, 2020.

Crossref Google Scholar

[35]

B. Zhang, X. Li, X. Xu, K. C. Leung, Z. Chen, and Y. Ye, Knowledge guided capsule attention network for aspect-based sentiment analysis, IEEE/ACM Trans. Audio Speech Lang. Process., vol. 28, pp. 2538–2551, 2020.

Crossref Google Scholar

[36]

Y. Wang, A. Sun, J. Han, Y. Liu, and X. Zhu, Sentiment analysis by capsules, in Proc. World Wide Web Conf., Lyon, France, 2018, pp. 1165–1174.

[37]

E. Akbas and P. Zhao, Attributed graph clustering: An attribute-aware graph embedding approach, in Proc. 2017 IEEE/ACM Int. Conf. Advances in Social Networks Analysis and Mining, Sydney, Australia, 2017, pp. 305–308.

[38]

H. Nguyen and M. L. Nguyen, A deep neural architecture for sentence-level sentiment classification in Twitter social networking, in Proc. 15^th Int. Conf. Pacific Association for Computational Linguistics, Yangon, Myanmar, 2017, pp. 15–27.

[39]

Z. Chen and T. Qian, Transfer capsule network for aspect level sentiment classification, in Proc. 57^th Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 547–556.

[40]

R. Socher, B. Huval, C. D. Manning, and A. Y. Ng, Semantic compositionality through recursive matrix-vector spaces, in Proc. 2012 Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 2012, pp. 1201–1211.

[41]

J. Gao, Y. Guo, and Z. Wang, Matrix neural networks, in Proc. 14^th Int. Symp. Advances in Neural Networks, Hokkaido, Japan, 2017, pp. 313–320.

[42]

D. Miao and F. Lang, A recommendation system based on text mining, in Proc. Int. Conf. Cyber-Enabled Distributed Computing & Knowledge Discovery, Nanjing, China, 2017, pp. 318–321.

[43]

R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning, Semi-supervised recursive autoencoders for predicting sentiment distributions, in Proc. 2011 Conf. Empirical Methods in Natural Language Processing, Edinburgh, UK, 2011, pp. 151–161.

[44]

L. Huang, D. Ma, S. Li, X. Zhang, and H. Wang, Text level graph neural network for text classification, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9^th Int. Joint Conf. Natural Language Processing, Hong Kong, China , 2019, pp. 3440–3450.

[45]

J. Pennington, R. Socher, and C. Manning, GloVe: Global vectors for word representation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 1532–1543.

[46]

Y. Zhang, X. Yu, Z. Cui, S. Wu, Z. Wen, and L. Wang, Every document owns its structure: Inductive text classification via graph neural networks, in Proc. 58^th Annu. Meeting of the Association for Computational Linguistics, Virtual Event, 2020, pp. 334–339.

[47]

L. Yao, C. Mao, and Y. Luo, Graph convolutional networks for text classification, in Proc. 33^rd AAAI Conf. Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 7370–7377.

[48]

Y. Li, A. Algarni, M. Albathan, Y. Shen, and M. A. Bijaksana, Relevance feature discovery for text mining, IEEE Trans. Knowl. Data Eng., vol. 27, no. 6, pp. 1656–1669, 2015.

Crossref Google Scholar

[49]

A. Pal, M. Selvakumar, and M. Sanarasubbu, MAGNET: Multi-label text classification using attention-based graph neural network, in Proc. 12^th Int. Conf. Agents and Artificial Intelligence, Valletta, Malta, 2020, pp. 494–505.

[50]

X. Jia and L. Wang, Attention enhanced capsule network for text classification by encoding syntactic dependency trees with graph convolutional neural network, PeerJ Comput. Sci., vol. 8, p. e831, 2022.

Crossref Google Scholar

[51]

C. Zhang, Q. Li, and D. Song, Aspect-based sentiment classification with aspect-specific graph convolutional networks, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9^th Int. Joint Conf. Natural Language Processing, Hong Kong, China, 2019, pp. 4568–4578.

[52]

K. Ma, C. Tang, W. Zhang, B. Cui, K. Ji, Z. Chen, and A. Abraham, DC-CNN: Dual-channel Convolutional Neural Networks with attention-pooling for fake news detection, Appl. Intell., vol. 53, no. 7, pp. 8354–8369, 2023.

[53]

L. Liu, H. Li, Z. Hu, H. Shi, Z. Wang, J. Tang, and M. Zhang, Learning hierarchical representations of electronic health records for clinical outcome prediction, Proc. AMIA Annu. Symp. Proc., vol. 2019, pp. 597–606, 2019.

[54]

J. Sun, F. Wang, J. Hu, and S. Edabollahi, Supervised patient similarity measure of heterogeneous patient records, ACM SIGKDD Explor. Newsl., vol. 14, no. 1, pp. 16–24, 2012.

Crossref Google Scholar

[55]

J. Zhang and X. Gan, Meta-neighbor aggregated graph attention network for heterogeneous graph representation, in Proc. 2021 IEEE/CIC Int. Conf. Communications in China (ICCC ), Xiamen, China, 2021, pp. 248–253.

[56]

L. Hu, T. Yang, C. Shi, H. Ji, and X. Li, Heterogeneous graph attention networks for semi-supervised short text classification, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9^th Int. Joint Conf. Natural Language Processing, Hong Kong, China, 2019, pp. 4821–4830.

[57]

L. Liu, J. Shen, M. Zhang, Z. Wang, and J. Tang, Learning the joint representation of heterogeneous temporal events for clinical endpoint prediction, in Proc. 32^nd AAAI Conf. Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 109–116.

[58]

L. Liu, Y. Zhang, S. Fu, F. Zhong, J. Hu, and P. Zhang, ABNE: An attention-based network embedding for user alignment across social networks, IEEE Access, vol. 7, pp. 23595–23605, 2019.

[59]

A. Onan and H. Alhumyani, Contextual hypergraph networks for enhanced extractive summarization: Introducing multi-element contextual hypergraph extractive summarizer (MCHES), Appl. Sci., vol. 14, no. 11, p. 4671, 2024.

Crossref Google Scholar

[60]

A. Kumar, N. Esmaili, and M. Piccardi, Topic-document inference with the Gumbel-Softmax distribution, IEEE Access, vol. 9, pp. 1313–1320, 2021.

Crossref Google Scholar

[61]

S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, A large annotated corpus for learning natural language inference, in Proc. 2015 Conf. Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 632–642.

[62]

Y. Sun, S. Wang, Y. Li, S. Feng, X. Chen, H. Zhang, X. Tian, D. Zhu, H. Tian, and H. Wu, ERNIE: Enhanced representation through knowledge integration, arXiv preprint arXiv: 1904.09223, 2019.

[63]

K. Tseng and C. S. Lin, Enhancing natural language inference of cross-lingual N-shot transfer with multilingual data, in Proc. 2022 8^th Int. Conf. Applied System Innovation (ICASI ), Nantou, China, 2022, pp. 68–71.

[64]

C. Tandon, P. Bongale, T. M. Arpita, R. R. Sanjana, H. Palivela, and C. R. Nirmala, Use of natural language inference in optimizing reviews and providing insights to end consumers, in Proc. 7^th Int. Conf. Advanced Computing & Communication Systems (ICACCS ), Coimbatore, India, 2021, pp. 60–65.

[65]

A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, Supervised learning of universal sentence representations from natural language inference data, in Proc. Conf. Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 670–680.

[66]

A. H. Nasution and A. Onan, ChatGPT Label: Comparing the quality of human-generated and LLM-generated annotations in low-resource language NLP tasks, IEEE Access, vol. 12, pp. 71876–71900, 2024.

Crossref Google Scholar

[67]

A. Onan and K. F. Balbal, Improving Turkish text sentiment classification through task-specific and universal transformations: An ensemble data augmentation approach, IEEE Access, vol. 12, pp. 4413–4458, 2024.

Crossref Google Scholar

[68]

W. Zhang, Q. Chen, and Y. Chen, Deep learning based robust text classification method via virtual adversarial training, IEEE Access, vol. 8, pp. 61174–61182, 2020.

Crossref Google Scholar

[69]

P. I. Khan, S. A. Siddiqui, I. Razzak, A. Dengel, and S. Ahmed, Improving health mention classification of social media content using contrastive adversarial training, IEEE Access, vol. 10, pp. 87900–87910, 2022.

Crossref Google Scholar

[70]

W. Zhang, C. Dong, J. Yin, and J. Wang, Attentive representation learning with adversarial training for short text clustering, IEEE Trans. Knowl. Data Eng., vol. 34, no. 11, pp. 5196–5210, 2022.

Crossref Google Scholar

[71]

T. Miyato, S. I. Maeda, M. Koyama, and S. Ishii, Virtual adversarial training: A regularization method for supervised and semi-supervised learning, arXiv preprint arXiv: 1704.03976, 2018.

[72]

A. Tariq, A. Mehmood, M. Elhadef, and M. U. G. Khan, Adversarial training for fake news classification, IEEE Access, vol. 10, pp. 82706–82715, 2022.

Crossref Google Scholar

[73]

Y. Zhang, G. Xu, X. Fu, L. Jin, and T. Huang, Adversarial training improved multi-path multi-scale relation detector for knowledge base question answering, IEEE Access, vol. 8, pp. 63310–63319, 2020.

Crossref Google Scholar

[74]

A. Onan, GTR-GA: Harnessing the power of graph-based neural networks and genetic algorithms for text augmentation, Expert Syst. Appl., vol. 232, p. 120908, 2023.

[75]

A. Onan, SRL-ACO: A text augmentation framework based on semantic role labeling and ant colony optimization, J. King Saud Univ. Comput. Inf. Sci., vol. 35, no. 7, p. 101611, 2023.

[76]

X. Luo, X. Wen, M. Zhou, A. Abusorrah and L. Huang, Decision-tree-initialized dendritic neuron model for fast and accurate data classification, IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 9, pp. 4173–4183, 2022.

Crossref Google Scholar

[77]

X. Wang and F. Liu, Data-driven relay selection for physical-layer security: A decision tree approach, IEEE Access, vol. 8, pp. 12105–12116, 2020.

Crossref Google Scholar

[78]

N. Younas, A. Ali, H. Hina, M. Hamraz, Z. Khan, and S. Aldahmani, Optimal causal decision trees ensemble for improved prediction and causal inference, IEEE Access, vol. 10, pp. 13000–13011, 2022.

Crossref Google Scholar

[79]

F. Es-Sabery, K. Es-Sabery, J. Qadir, B. Sainz-De-Abajo, A. Hair, B. Garcia-Zapirain, and I. De La Torre-Diez, A MapReduce opinion mining for COVID-19-related tweets classification using enhanced ID3 decision tree classifier, IEEE Access, vol. 9, pp. 58706–58739, 2021.

[80]

T. Chen and C. Guestrin, XGBoost: A scalable tree boosting system, in Proc. 22^nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785–794.

[81]

C. A. E. Piter, S. Hadi, and I. N. Yulita, Multi-label classification for scientific conference activities information text using extreme gradient boost (XGBoost) method, in Proc. 2021 Int. Conf. Artificial Intelligence and Big Data Analytics, Bandung, Indonesia, 2021, pp. 1–5.

[82]

X. Chen, D. Yu, X. Fan, L. Wang, and J. Chen, Multiclass classification for self-admitted technical debt based on XGBoost, IEEE Trans. Reliab., vol. 71, no. 3, pp. 1309–1324, 2022.

Crossref Google Scholar

[83]

P. Wang, M. Yan, X. Zhan, M. Tian, Y. Si, Y. Sun, L. Jiao, and X. Wu, Predicting self-reported proactive personality classification with Weibo text and short answer text, IEEE Access, vol. 9, pp. 77203–77211, 2021.

Crossref Google Scholar

[84]

E. S. Gualberto, R. T. De Sousa, T. P. De Brito Vieira, J. P. C. L. Da Costa, and C. G. Duque, The answer is in the text: Multi-stage methods for phishing detection based on feature engineering, IEEE Access, vol. 8, pp. 223529–223547, 2020.

Crossref Google Scholar

[85]

Z. Qi, The text classification of theft crime based on TF-IDF and XGBoost model, in Proc. 2020 IEEE Int. Conf. Artificial Intelligence and Computer Applications (ICAICA ), Dalian, China, 2020, pp. 1241–1246.

[86]

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Y. Liu, LightGBM: A highly efficient gradient boosting decision tree, in Proc. 31^st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 3149–3157.

[87]

H. Che, Multi-sensor data fusion method based on ARIMA-LightGBM for AGV positioning, in Proc. 2021 5^th Int. Conf. Robotics and Automation Sciences (ICRAS ), Wuhan, China, 2021, pp. 272–276.

[88]

Y. Wu and Q. Wang, LightGBM based optiver realized volatility prediction, in Proc. 2021 IEEE Int. Conf. Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE ), Virtual Event, 2021, pp. 227–230.

[89]

Y. Gao, H. Hasegawa, Y. Yamaguchi, and H. Shimada, Malware detection using LightGBM with a custom logistic loss function, IEEE Access, vol. 10, pp. 47792–47804, 2022.

Crossref Google Scholar

[90]

J. Dhar, Multistage ensemble learning model with weighted voting and genetic algorithm optimization strategy for detecting chronic obstructive pulmonary disease, IEEE Access, vol. 9, pp. 48640–48657, 2021.

Crossref Google Scholar

[91]

J. H. Kim, J. H. Choi, Y. H. Park, C. K. S. Leung, and A. Nasridinov, KNN-SC: Novel spectral clustering algorithm using k-nearest neighbors, IEEE Access, vol. 9, pp. 152616–152627, 2021.

[92]

W. Zhang, X. Chen, Y. Liu, and Q. Xi, A distributed storage and computation k-nearest neighbor algorithm based cloud-edge computing for cyber-physical-social systems, IEEE Access, vol. 8, pp. 50118–50130, 2020.

Crossref Google Scholar

[93]

C. Ren, L. Sun, Y. Yu, and Q. Wu, Effective density peaks clustering algorithm based on the layered k-nearest neighbors and subcluster merging, IEEE Access, vol. 8, pp. 123449–123468, 2020.

Crossref Google Scholar

[94]

T. Liao, Z. Lei, T. Zhu, S. Zeng, Y. Li, and C. Yuan, Deep metric learning for K nearest neighbor classification, IEEE Trans. Knowl. Data Eng., vol. 35, no. 1, pp. 264–275, 2023.

Crossref Google Scholar

[95]

A. J. Gallego, J. Calvo-Zaragoza, and J. R. Rico-Juan, Insights into efficient k-nearest neighbor classification with convolutional neural codes, IEEE Access, vol. 8, pp. 99312–99326, 2020.

Crossref Google Scholar

[96]

U. Majumder, N. Balaji, K. Brey, W. Fu, and T. Menzies, 500+ times faster than deep learning: A case study exploring faster methods for text mining stackoverflow, in Proc. 15^th Int. Conf. Mining Software Repositories, Gothenburg, Sweden, 2018, pp. 554–563.

[97]

J. Long, L. D. Wang, Z. D. Li, and Z. P. Zhang, Service retrieval based on hybrid SLVM of WSDL, in Proc. 6^th Asia-Pacific Symp. Internetware, Hong Kong, China, 2014, pp. 120–126.

[98]

Y. Wang and L. Xu, Research on text categorization of KNN based on k-means for class imbalanced problem, in Proc. 2016 Sixth Int. Conf. Instrumentation & Measurement, Computer, Communication and Control (IMCCC ), Harbin, China, 2016, pp. 579–583.

[99]

T. Kim and J. S. Lee, Exponential loss minimization for learning weighted Naïve Bayes classifiers, IEEE Access, vol. 10, pp. 22724–22736, 2022.

Crossref Google Scholar

[100]

C. K. Aridas, S. Karlos, V. G. Kanas, N. Fazakis, and S. B. Kotsiantis, Uncertainty based under-sampling for learning Naïve Bayes classifiers under imbalanced data sets, IEEE Access, vol. 8, pp. 2122–2133, 2020.

Crossref Google Scholar

[101]

S. Ruan, H. Li, C. Li, and K. Song, Class-specific deep feature weighting for Naïve Bayes text classifiers, IEEE Access, vol. 8, pp. 20151–20159, 2020.

Crossref Google Scholar

[102]

L. Jiang, C. Li, S. Wang, and L. Zhang, Deep feature weighting for Naïve Bayes and its application to text classification, Eng. Appl. Artif. Intell., vol. 52, pp. 26–39, 2016.

Crossref Google Scholar

[103]

B. Tang, H. He, P. M. Baggenstoss, and S. Kay, A Bayesian classification approach using class-specific features for text categorization, IEEE Trans. Knowl. Data Eng., vol. 28, no. 6, pp. 1602–1606, 2016.

[104]

E. Dai, G. R. Xue, Q. Yang, and Y. Yu, Transferring Naïve Bayes classifiers for text classification, in Proc. 22^nd National Conf. Artificial Intelligence, Vancouver, Canada, 2007, pp. 540–545.

[105]

E. W. Son, S. B. Park, and H. J. Song, Learning Naïve Bayes transfer classifier through class-wise test distribution estimation, in Proc. 19^th ACM Int. Conf. Information and Knowledge Management, Toronto, Canada, 2010, pp. 1729–1732.

[106]

Q. Zhang, J. Lu, D. Wu, and G. Zhang, A cross-domain recommender system with kernel-induced knowledge transfer for overlapping entities, IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 7, pp. 1998–2012, 2019.

Crossref Google Scholar

[107]

X. Liu, Z. Liu, G. Wang, Z. Cai, and H. Zhang, Ensemble transfer learning algorithm, IEEE Access, vol. 6, pp. 2389–2396, 2018.

Crossref Google Scholar

[108]

D. Dua and C. Graff, UCI machine learning repository, http://archive.ics.uci.edu/ml, 2017.

[109]

T. P. Sahu and S. Ahuja, Sentiment analysis of movie reviews: A study on feature selection & classification algorithms, in Proc. 2016 Int. Conf. Microelectronics, Computing and Communications (MicroCom ), Durgapur, India, 2016, pp. 1–6.

[110]

TweetEval datasets, https://github.com/cardiffnlp/tweeteval, 2024.

[111]

AG News Classification Dataset, https://www.kaggle.com/datasets/amananandrai/ag-news-classification-dataset, 2024.

[112]

K. P. Sinaga and M. S. Yang, Unsupervised K-means clustering algorithm, IEEE Access, vol. 8, pp. 80716–80727, 2020.

Crossref Google Scholar

[113]

D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2017.

Big Data Mining and Analytics

Volume 8 Issue 3,
May 2025

Pages 624-660

DOI: 10.26599/BDMA.2024.9020092

Cite this article:

Taha K, D. Yoo P, Yeun C, et al. Text Classification Techniques: A Holistic Review, Observational Analysis, and Experimental Investigation. Big Data Mining and Analytics, 2025, 8(3): 624-660. https://doi.org/10.26599/BDMA.2024.9020092

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号