This review article provides a thorough assessment of modern and innovative algorithms for text classification through both observational and experimental evaluations. We propose a new classification system, grounded in methodology, to categorize text classification algorithms into an organized structure from general categories down to particular fine-grained techniques. Drawing on more than 100 academic papers from prominent publishers, our extensive review spans a wide range of algorithms, encompassing traditional, deep learning, and emerging approaches. Through observational studies and comparative experiments among various algorithms, techniques, and methodological categories, we offer detailed insights into the area of text classification. The goal of this survey is to assist scholars in choosing the right methods for specific projects while encouraging further advancements in this area. This detailed examination not only contributes to the scholarly conversation on text classification but also seeks to direct future progress by identifying promising avenues for innovation and enhancement. The primary contributions of this article include the sophisticated methodological classification, a thorough review and examination of state-of-the-art algorithms, along with observational and experimental assessments, and a visionary outlook on the future development of text classification methods.
S. Liu, Liu, X. Wang, C. Collins, W. Dou, F. Ouyang, M. El-Assady, L. Jiang, and D. A. Keim, Bridging text visualization and mining: A task-driven survey, IEEE Trans. Vis. Comput. Graph., vol. 25, no. 7, pp. 2482–2504, 2019.
L. Ignaczak, G. Goldschmidt, C. A. da Costa, and R. da R. Righi, Text mining in cybersecurity: A systematic literature review, ACM Comput. Surv., vol. 54, no. 7, p. 140, 2021.
A. Joshi, S. Karimi, R. Sparks, C. Paris, and C. R. Macintyre, Survey of text-based epidemic intelligence: A computational linguistics perspective, ACM Comput. Surv., vol. 52, no. 6, p. 119, 2019.
K. Taha and R. Elmasri, BusSEngine: A business search engine, Knowl. Inf. Syst., vol. 23, no. 2, pp. 153–197, 2010.
E. H. Park and V. C, Storey, Emotion ontology studies: A framework for expressing feelings digitally and its application to sentiment analysis, ACM Comput. Surv., vol. 55, no. 9, p. 181, 2023.
L. Benedetto, P. Cremonesi, A. Caines, P. Buttery, A. Cappelli, A. Giussani, and R. Turrin, A survey on recent approaches to question difficulty estimation from text, ACM Comput. Surv., vol. 55, no. 9, p. 178, 2023.
Y. Lan, Y. Hao, K. Xia, B. Qian, and C. Li, Stacked residual recurrent neural networks with cross-layer attention for text classification, IEEE Access, vol. 8, pp. 70401–70410, 2020.
X. Ren, Y. Zhou, Z. Huang, J. Sun, X. Yang, and K. Chen, A novel text structure feature extractor for Chinese scene text detection and recognition, IEEE Access, vol. 5, pp. 3193–3204, 2017.
A. Onan and H. A. Alhumyani, FuzzyTP-BERT: Enhancing extractive text summarization with fuzzy topic modeling and transformer networks, J. King Saud Univ. Comput. Inf. Sci., vol. 36, no. 6, p. 102080, 2024.
A. Onan, Hierarchical graph-based text classification framework with contextual node embedding and BERT-based dynamic fusion, J. King Saud Univ. Comput. Inf. Sci., vol. 35, no. 7, p. 101610, 2023.
A. Onan and M. A. Toçoğlu, A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification, IEEE Access, vol. 9, pp. 7701–7722, 2021.
S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
H. Huan, J. Yan, Y. Xie, Y. Chen, P. Li, and R. Zhu, Feature-enhanced nonequilibrium bidirectional long short-term memory model for Chinese text classification, IEEE Access, vol. 8, pp. 199629–199637, 2020.
H. Ahmad, M. U. Asghar, M. Z. Asghar, A. Khan, and A. H. Mosavi, A hybrid deep learning technique for personality trait classification from text, IEEE Access, vol. 9, pp. 146214–146232, 2021.
H. Tang, Y. Mi, F. Xue, and Y. Cao, An integration model based on graph convolutional network for text classification, IEEE Access, vol. 8, pp. 148865–148876, 2020.
J. L. Wu, Y. He, L. C. Yu, and K. R. Lai, Identifying emotion labels from psychiatric social texts using a bi-directional LSTM-CNN model, IEEE Access, vol. 8, pp. 66638–66646, 2020
C. Luo and H. Wang, Fuzzy forecasting for long-term time series based on time-variant fuzzy information granules, Appl. Soft Comput., vol. 88, p. 106046, 2020.
B. Zhang, X. Li, X. Xu, K. C. Leung, Z. Chen, and Y. Ye, Knowledge guided capsule attention network for aspect-based sentiment analysis, IEEE/ACM Trans. Audio Speech Lang. Process., vol. 28, pp. 2538–2551, 2020.
Y. Li, A. Algarni, M. Albathan, Y. Shen, and M. A. Bijaksana, Relevance feature discovery for text mining, IEEE Trans. Knowl. Data Eng., vol. 27, no. 6, pp. 1656–1669, 2015.
X. Jia and L. Wang, Attention enhanced capsule network for text classification by encoding syntactic dependency trees with graph convolutional neural network, PeerJ Comput. Sci., vol. 8, p. e831, 2022.
L. Liu, H. Li, Z. Hu, H. Shi, Z. Wang, J. Tang, and M. Zhang, Learning hierarchical representations of electronic health records for clinical outcome prediction, Proc. AMIA Annu. Symp. Proc., vol. 2019, pp. 597–606, 2019.
J. Sun, F. Wang, J. Hu, and S. Edabollahi, Supervised patient similarity measure of heterogeneous patient records, ACM SIGKDD Explor. Newsl., vol. 14, no. 1, pp. 16–24, 2012.
A. Onan and H. Alhumyani, Contextual hypergraph networks for enhanced extractive summarization: Introducing multi-element contextual hypergraph extractive summarizer (MCHES), Appl. Sci., vol. 14, no. 11, p. 4671, 2024.
A. Kumar, N. Esmaili, and M. Piccardi, Topic-document inference with the Gumbel-Softmax distribution, IEEE Access, vol. 9, pp. 1313–1320, 2021.
A. H. Nasution and A. Onan, ChatGPT Label: Comparing the quality of human-generated and LLM-generated annotations in low-resource language NLP tasks, IEEE Access, vol. 12, pp. 71876–71900, 2024.
A. Onan and K. F. Balbal, Improving Turkish text sentiment classification through task-specific and universal transformations: An ensemble data augmentation approach, IEEE Access, vol. 12, pp. 4413–4458, 2024.
W. Zhang, Q. Chen, and Y. Chen, Deep learning based robust text classification method via virtual adversarial training, IEEE Access, vol. 8, pp. 61174–61182, 2020.
P. I. Khan, S. A. Siddiqui, I. Razzak, A. Dengel, and S. Ahmed, Improving health mention classification of social media content using contrastive adversarial training, IEEE Access, vol. 10, pp. 87900–87910, 2022.
W. Zhang, C. Dong, J. Yin, and J. Wang, Attentive representation learning with adversarial training for short text clustering, IEEE Trans. Knowl. Data Eng., vol. 34, no. 11, pp. 5196–5210, 2022.
A. Tariq, A. Mehmood, M. Elhadef, and M. U. G. Khan, Adversarial training for fake news classification, IEEE Access, vol. 10, pp. 82706–82715, 2022.
Y. Zhang, G. Xu, X. Fu, L. Jin, and T. Huang, Adversarial training improved multi-path multi-scale relation detector for knowledge base question answering, IEEE Access, vol. 8, pp. 63310–63319, 2020.
X. Luo, X. Wen, M. Zhou, A. Abusorrah and L. Huang, Decision-tree-initialized dendritic neuron model for fast and accurate data classification, IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 9, pp. 4173–4183, 2022.
X. Wang and F. Liu, Data-driven relay selection for physical-layer security: A decision tree approach, IEEE Access, vol. 8, pp. 12105–12116, 2020.
N. Younas, A. Ali, H. Hina, M. Hamraz, Z. Khan, and S. Aldahmani, Optimal causal decision trees ensemble for improved prediction and causal inference, IEEE Access, vol. 10, pp. 13000–13011, 2022.
X. Chen, D. Yu, X. Fan, L. Wang, and J. Chen, Multiclass classification for self-admitted technical debt based on XGBoost, IEEE Trans. Reliab., vol. 71, no. 3, pp. 1309–1324, 2022.
P. Wang, M. Yan, X. Zhan, M. Tian, Y. Si, Y. Sun, L. Jiao, and X. Wu, Predicting self-reported proactive personality classification with Weibo text and short answer text, IEEE Access, vol. 9, pp. 77203–77211, 2021.
E. S. Gualberto, R. T. De Sousa, T. P. De Brito Vieira, J. P. C. L. Da Costa, and C. G. Duque, The answer is in the text: Multi-stage methods for phishing detection based on feature engineering, IEEE Access, vol. 8, pp. 223529–223547, 2020.
Y. Gao, H. Hasegawa, Y. Yamaguchi, and H. Shimada, Malware detection using LightGBM with a custom logistic loss function, IEEE Access, vol. 10, pp. 47792–47804, 2022.
J. Dhar, Multistage ensemble learning model with weighted voting and genetic algorithm optimization strategy for detecting chronic obstructive pulmonary disease, IEEE Access, vol. 9, pp. 48640–48657, 2021.
W. Zhang, X. Chen, Y. Liu, and Q. Xi, A distributed storage and computation k-nearest neighbor algorithm based cloud-edge computing for cyber-physical-social systems, IEEE Access, vol. 8, pp. 50118–50130, 2020.
C. Ren, L. Sun, Y. Yu, and Q. Wu, Effective density peaks clustering algorithm based on the layered k-nearest neighbors and subcluster merging, IEEE Access, vol. 8, pp. 123449–123468, 2020.
T. Liao, Z. Lei, T. Zhu, S. Zeng, Y. Li, and C. Yuan, Deep metric learning for K nearest neighbor classification, IEEE Trans. Knowl. Data Eng., vol. 35, no. 1, pp. 264–275, 2023.
A. J. Gallego, J. Calvo-Zaragoza, and J. R. Rico-Juan, Insights into efficient k-nearest neighbor classification with convolutional neural codes, IEEE Access, vol. 8, pp. 99312–99326, 2020.
T. Kim and J. S. Lee, Exponential loss minimization for learning weighted Naïve Bayes classifiers, IEEE Access, vol. 10, pp. 22724–22736, 2022.
C. K. Aridas, S. Karlos, V. G. Kanas, N. Fazakis, and S. B. Kotsiantis, Uncertainty based under-sampling for learning Naïve Bayes classifiers under imbalanced data sets, IEEE Access, vol. 8, pp. 2122–2133, 2020.
S. Ruan, H. Li, C. Li, and K. Song, Class-specific deep feature weighting for Naïve Bayes text classifiers, IEEE Access, vol. 8, pp. 20151–20159, 2020.
L. Jiang, C. Li, S. Wang, and L. Zhang, Deep feature weighting for Naïve Bayes and its application to text classification, Eng. Appl. Artif. Intell., vol. 52, pp. 26–39, 2016.
Q. Zhang, J. Lu, D. Wu, and G. Zhang, A cross-domain recommender system with kernel-induced knowledge transfer for overlapping entities, IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 7, pp. 1998–2012, 2019.
X. Liu, Z. Liu, G. Wang, Z. Cai, and H. Zhang, Ensemble transfer learning algorithm, IEEE Access, vol. 6, pp. 2389–2396, 2018.
K. P. Sinaga and M. S. Yang, Unsupervised K-means clustering algorithm, IEEE Access, vol. 8, pp. 80716–80727, 2020.