PDF (10.4 MB)
Collect
Submit Manuscript
Open Access

Text Classification Techniques: A Holistic Review, Observational Analysis, and Experimental Investigation

Department of Computer Science, Khalifa University, Abu Dhabi 127788, United Arab Emirates
School of Computing and Mathematical Sciences, Birkbeck College, University of London, London, WC1E 7HU, UK
Department of Science, Brighton College, Dubai 122002, United Arab Emirates
Show Author Information

Abstract

This review article provides a thorough assessment of modern and innovative algorithms for text classification through both observational and experimental evaluations. We propose a new classification system, grounded in methodology, to categorize text classification algorithms into an organized structure from general categories down to particular fine-grained techniques. Drawing on more than 100 academic papers from prominent publishers, our extensive review spans a wide range of algorithms, encompassing traditional, deep learning, and emerging approaches. Through observational studies and comparative experiments among various algorithms, techniques, and methodological categories, we offer detailed insights into the area of text classification. The goal of this survey is to assist scholars in choosing the right methods for specific projects while encouraging further advancements in this area. This detailed examination not only contributes to the scholarly conversation on text classification but also seeks to direct future progress by identifying promising avenues for innovation and enhancement. The primary contributions of this article include the sophisticated methodological classification, a thorough review and examination of state-of-the-art algorithms, along with observational and experimental assessments, and a visionary outlook on the future development of text classification methods.

References

[1]

S. Liu, Liu, X. Wang, C. Collins, W. Dou, F. Ouyang, M. El-Assady, L. Jiang, and D. A. Keim, Bridging text visualization and mining: A task-driven survey, IEEE Trans. Vis. Comput. Graph., vol. 25, no. 7, pp. 2482–2504, 2019.

[2]

L. Ignaczak, G. Goldschmidt, C. A. da Costa, and R. da R. Righi, Text mining in cybersecurity: A systematic literature review, ACM Comput. Surv., vol. 54, no. 7, p. 140, 2021.

[3]

A. Joshi, S. Karimi, R. Sparks, C. Paris, and C. R. Macintyre, Survey of text-based epidemic intelligence: A computational linguistics perspective, ACM Comput. Surv., vol. 52, no. 6, p. 119, 2019.

[4]

K. Taha and R. Elmasri, BusSEngine: A business search engine, Knowl. Inf. Syst., vol. 23, no. 2, pp. 153–197, 2010.

[5]

E. H. Park and V. C, Storey, Emotion ontology studies: A framework for expressing feelings digitally and its application to sentiment analysis, ACM Comput. Surv., vol. 55, no. 9, p. 181, 2023.

[6]

L. Benedetto, P. Cremonesi, A. Caines, P. Buttery, A. Cappelli, A. Giussani, and R. Turrin, A survey on recent approaches to question difficulty estimation from text, ACM Comput. Surv., vol. 55, no. 9, p. 178, 2023.

[7]
L. Ying and L. Huidi, Review of text analysis based on deep learning, in Proc. Int. Conf. Intelligent Computing and Human-Computer Interaction (ICHCI ), Sanya, China, 2020, pp. 384–388.
[8]

Y. Lan, Y. Hao, K. Xia, B. Qian, and C. Li, Stacked residual recurrent neural networks with cross-layer attention for text classification, IEEE Access, vol. 8, pp. 70401–70410, 2020.

[9]
K. He, X. Zhang, S. Ren, and J. Sun, Identity mappings in deep residual networks, in Proc. 14 th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 630–645.
[10]

X. Ren, Y. Zhou, Z. Huang, J. Sun, X. Yang, and K. Chen, A novel text structure feature extractor for Chinese scene text detection and recognition, IEEE Access, vol. 5, pp. 3193–3204, 2017.

[11]
Y. Kim, Convolutional neural networks for sentence classification, in Proc. Conf. Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 1746–1751.
[12]
X. Dong, R. Hu, Y. Li, M. Liu, and Y. Xiao, Text sentiment polarity classification based on TextCNN-SVM combination model, in Proc. 2021 IEEE Int. Conf. Artificial Intelligence and Computer Applications (ICAICA ), Dalian, China, 2021, pp. 325–328.
[13]
B. Qin, Y. Wang, and C. Ma, API call based ransomware dynamic detection approach using TextCNN, in Proc. 2020 Int. Conf. Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE ), Fuzhou, China, 2020, pp. 162–166.
[14]
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in Proc. Conf. North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Volume 1 (Long and Short Papers ), Minneapolis, MN, USA, 2019, pp. 4171–4186.
[15]
H. Xu, B. Liu, L. Shu, and P. S. Yu, BERT post-training for review reading comprehension and aspect-based sentiment analysis, in Proc. Conf. North American Chapter of the Association for Computational Linguistics : Human Language Technologies, Volume 1 (Long and Short Papers ), Minneapolis, MN, USA, 2019, pp. 2324–2335.
[16]
B. Zeng, H. Yang, R. Xu, W. Zhou, and X. Han, LCF: A local context focus mechanism for aspect-based sentiment classification, Appl. Sci., vol. 9, no. 16, p. 3389, 2019.
[17]
Y. Song, J. Wang, T. Jiang, Z. Liu, and Y. Rao, Attentional encoder network for targeted sentiment classification, arXiv preprint arXiv: 1902.09314, 2019.
[18]
A. Coenen, E. Reif, A. Yuan, B. Kim, A. Pearce, F. Viégas, and M. Wattenberg, Visualizing and measuring the geometry of BERT, in Proc. 33 rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 8594–8603.
[19]
T. Kim, J. Choi, D. Edmiston, and S. G. Lee, Are pre-trained language models aware of phrases? Simple but strong baselines for grammar induction, arXiv preprint arXiv: 2002.00737, 2020.
[20]

A. Onan and H. A. Alhumyani, FuzzyTP-BERT: Enhancing extractive text summarization with fuzzy topic modeling and transformer networks, J. King Saud Univ. Comput. Inf. Sci., vol. 36, no. 6, p. 102080, 2024.

[21]

A. Onan, Hierarchical graph-based text classification framework with contextual node embedding and BERT-based dynamic fusion, J. King Saud Univ. Comput. Inf. Sci., vol. 35, no. 7, p. 101610, 2023.

[22]
D. Ma, S. Li, X. Zhang, and H. Wang, Interactive attention networks for aspect-level sentiment classification, in Proc. 26 th Int. Joint Conf. Artificial Intelligence, Melbourne, Australia, 2017, pp. 4068–4074.
[23]
D. Tang, B. Qin, X. Feng, and T. Liu, Effective LSTMs for target-dependent sentiment classification, in Proc. COLING 2016, the 26 th Int. Conf. Computational Linguistics : Technical Papers, Osaka, Japan, 2016, pp. 3298–3307.
[24]
P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, and B. Xu, Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling, in Proc. COLING 2016, the 26 th Int. Conf. Computational Linguistics : Technical Papers, Osaka, Japan, 2016, pp. 3485–3495.
[25]
W. Xue, W. Zhou, T. Li, and Q. Wang, MTNA: A neural multi-task model for aspect category classification and aspect term extraction on restaurant reviews, in Proc. Eighth Int. Joint Conf. Natural Language Processing (Volume 2 : Short Papers ), Taipei, China, 2017, pp. 151–156.
[26]
S. Biswas, Stock price prediction using bidirectional LSTM with attention, in Proc. 1 st Int. Conf. AI in Cybersecurity (ICAIC ), Victoria, TX, USA, 2022, pp. 1–5.
[27]
S. Su, L. Sun, Z. Zhang, G. Li, and J. Qu, MASTER: Across multiple social networks, integrate attribute and structure embedding for reconciliation, in Proc. 27 th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 3863–3869.
[28]

A. Onan and M. A. Toçoğlu, A term weighted neural language model and stacked bidirectional LSTM based framework for sarcasm identification, IEEE Access, vol. 9, pp. 7701–7722, 2021.

[29]

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.

[30]

H. Huan, J. Yan, Y. Xie, Y. Chen, P. Li, and R. Zhu, Feature-enhanced nonequilibrium bidirectional long short-term memory model for Chinese text classification, IEEE Access, vol. 8, pp. 199629–199637, 2020.

[31]

H. Ahmad, M. U. Asghar, M. Z. Asghar, A. Khan, and A. H. Mosavi, A hybrid deep learning technique for personality trait classification from text, IEEE Access, vol. 9, pp. 146214–146232, 2021.

[32]

H. Tang, Y. Mi, F. Xue, and Y. Cao, An integration model based on graph convolutional network for text classification, IEEE Access, vol. 8, pp. 148865–148876, 2020.

[33]

J. L. Wu, Y. He, L. C. Yu, and K. R. Lai, Identifying emotion labels from psychiatric social texts using a bi-directional LSTM-CNN model, IEEE Access, vol. 8, pp. 66638–66646, 2020

[34]

C. Luo and H. Wang, Fuzzy forecasting for long-term time series based on time-variant fuzzy information granules, Appl. Soft Comput., vol. 88, p. 106046, 2020.

[35]

B. Zhang, X. Li, X. Xu, K. C. Leung, Z. Chen, and Y. Ye, Knowledge guided capsule attention network for aspect-based sentiment analysis, IEEE/ACM Trans. Audio Speech Lang. Process., vol. 28, pp. 2538–2551, 2020.

[36]
Y. Wang, A. Sun, J. Han, Y. Liu, and X. Zhu, Sentiment analysis by capsules, in Proc. World Wide Web Conf., Lyon, France, 2018, pp. 1165–1174.
[37]
E. Akbas and P. Zhao, Attributed graph clustering: An attribute-aware graph embedding approach, in Proc. 2017 IEEE/ACM Int. Conf. Advances in Social Networks Analysis and Mining, Sydney, Australia, 2017, pp. 305–308.
[38]
H. Nguyen and M. L. Nguyen, A deep neural architecture for sentence-level sentiment classification in Twitter social networking, in Proc. 15 th Int. Conf. Pacific Association for Computational Linguistics, Yangon, Myanmar, 2017, pp. 15–27.
[39]
Z. Chen and T. Qian, Transfer capsule network for aspect level sentiment classification, in Proc. 57 th Annu. Meeting of the Association for Computational Linguistics, Florence, Italy, 2019, pp. 547–556.
[40]
R. Socher, B. Huval, C. D. Manning, and A. Y. Ng, Semantic compositionality through recursive matrix-vector spaces, in Proc. 2012 Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Republic of Korea, 2012, pp. 1201–1211.
[41]
J. Gao, Y. Guo, and Z. Wang, Matrix neural networks, in Proc. 14 th Int. Symp. Advances in Neural Networks, Hokkaido, Japan, 2017, pp. 313–320.
[42]
D. Miao and F. Lang, A recommendation system based on text mining, in Proc. Int. Conf. Cyber-Enabled Distributed Computing & Knowledge Discovery, Nanjing, China, 2017, pp. 318–321.
[43]
R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning, Semi-supervised recursive autoencoders for predicting sentiment distributions, in Proc. 2011 Conf. Empirical Methods in Natural Language Processing, Edinburgh, UK, 2011, pp. 151–161.
[44]
L. Huang, D. Ma, S. Li, X. Zhang, and H. Wang, Text level graph neural network for text classification, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9 th Int. Joint Conf. Natural Language Processing, Hong Kong, China , 2019, pp. 3440–3450.
[45]
J. Pennington, R. Socher, and C. Manning, GloVe: Global vectors for word representation, in Proc. 2014 Conf. Empirical Methods in Natural Language Processing, Doha, Qatar, 2014, pp. 1532–1543.
[46]
Y. Zhang, X. Yu, Z. Cui, S. Wu, Z. Wen, and L. Wang, Every document owns its structure: Inductive text classification via graph neural networks, in Proc. 58 th Annu. Meeting of the Association for Computational Linguistics, Virtual Event, 2020, pp. 334–339.
[47]
L. Yao, C. Mao, and Y. Luo, Graph convolutional networks for text classification, in Proc. 33 rd AAAI Conf. Artificial Intelligence, Honolulu, HI, USA, 2019, pp. 7370–7377.
[48]

Y. Li, A. Algarni, M. Albathan, Y. Shen, and M. A. Bijaksana, Relevance feature discovery for text mining, IEEE Trans. Knowl. Data Eng., vol. 27, no. 6, pp. 1656–1669, 2015.

[49]
A. Pal, M. Selvakumar, and M. Sanarasubbu, MAGNET: Multi-label text classification using attention-based graph neural network, in Proc. 12 th Int. Conf. Agents and Artificial Intelligence, Valletta, Malta, 2020, pp. 494–505.
[50]

X. Jia and L. Wang, Attention enhanced capsule network for text classification by encoding syntactic dependency trees with graph convolutional neural network, PeerJ Comput. Sci., vol. 8, p. e831, 2022.

[51]
C. Zhang, Q. Li, and D. Song, Aspect-based sentiment classification with aspect-specific graph convolutional networks, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9 th Int. Joint Conf. Natural Language Processing, Hong Kong, China, 2019, pp. 4568–4578.
[52]
K. Ma, C. Tang, W. Zhang, B. Cui, K. Ji, Z. Chen, and A. Abraham, DC-CNN: Dual-channel Convolutional Neural Networks with attention-pooling for fake news detection, Appl. Intell., vol. 53, no. 7, pp. 8354–8369, 2023.
[53]

L. Liu, H. Li, Z. Hu, H. Shi, Z. Wang, J. Tang, and M. Zhang, Learning hierarchical representations of electronic health records for clinical outcome prediction, Proc. AMIA Annu. Symp. Proc., vol. 2019, pp. 597–606, 2019.

[54]

J. Sun, F. Wang, J. Hu, and S. Edabollahi, Supervised patient similarity measure of heterogeneous patient records, ACM SIGKDD Explor. Newsl., vol. 14, no. 1, pp. 16–24, 2012.

[55]
J. Zhang and X. Gan, Meta-neighbor aggregated graph attention network for heterogeneous graph representation, in Proc. 2021 IEEE/CIC Int. Conf. Communications in China (ICCC ), Xiamen, China, 2021, pp. 248–253.
[56]
L. Hu, T. Yang, C. Shi, H. Ji, and X. Li, Heterogeneous graph attention networks for semi-supervised short text classification, in Proc. 2019 Conf. Empirical Methods in Natural Language Processing and the 9 th Int. Joint Conf. Natural Language Processing, Hong Kong, China, 2019, pp. 4821–4830.
[57]
L. Liu, J. Shen, M. Zhang, Z. Wang, and J. Tang, Learning the joint representation of heterogeneous temporal events for clinical endpoint prediction, in Proc. 32 nd AAAI Conf. Artificial Intelligence, New Orleans, LA, USA, 2018, pp. 109–116.
[58]
L. Liu, Y. Zhang, S. Fu, F. Zhong, J. Hu, and P. Zhang, ABNE: An attention-based network embedding for user alignment across social networks, IEEE Access, vol. 7, pp. 23595–23605, 2019.
[59]

A. Onan and H. Alhumyani, Contextual hypergraph networks for enhanced extractive summarization: Introducing multi-element contextual hypergraph extractive summarizer (MCHES), Appl. Sci., vol. 14, no. 11, p. 4671, 2024.

[60]

A. Kumar, N. Esmaili, and M. Piccardi, Topic-document inference with the Gumbel-Softmax distribution, IEEE Access, vol. 9, pp. 1313–1320, 2021.

[61]
S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, A large annotated corpus for learning natural language inference, in Proc. 2015 Conf. Empirical Methods in Natural Language Processing, Lisbon, Portugal, 2015, pp. 632–642.
[62]
Y. Sun, S. Wang, Y. Li, S. Feng, X. Chen, H. Zhang, X. Tian, D. Zhu, H. Tian, and H. Wu, ERNIE: Enhanced representation through knowledge integration, arXiv preprint arXiv: 1904.09223, 2019.
[63]
K. Tseng and C. S. Lin, Enhancing natural language inference of cross-lingual N-shot transfer with multilingual data, in Proc. 2022 8 th Int. Conf. Applied System Innovation (ICASI ), Nantou, China, 2022, pp. 68–71.
[64]
C. Tandon, P. Bongale, T. M. Arpita, R. R. Sanjana, H. Palivela, and C. R. Nirmala, Use of natural language inference in optimizing reviews and providing insights to end consumers, in Proc. 7 th Int. Conf. Advanced Computing & Communication Systems (ICACCS ), Coimbatore, India, 2021, pp. 60–65.
[65]
A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, Supervised learning of universal sentence representations from natural language inference data, in Proc. Conf. Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017, pp. 670–680.
[66]

A. H. Nasution and A. Onan, ChatGPT Label: Comparing the quality of human-generated and LLM-generated annotations in low-resource language NLP tasks, IEEE Access, vol. 12, pp. 71876–71900, 2024.

[67]

A. Onan and K. F. Balbal, Improving Turkish text sentiment classification through task-specific and universal transformations: An ensemble data augmentation approach, IEEE Access, vol. 12, pp. 4413–4458, 2024.

[68]

W. Zhang, Q. Chen, and Y. Chen, Deep learning based robust text classification method via virtual adversarial training, IEEE Access, vol. 8, pp. 61174–61182, 2020.

[69]

P. I. Khan, S. A. Siddiqui, I. Razzak, A. Dengel, and S. Ahmed, Improving health mention classification of social media content using contrastive adversarial training, IEEE Access, vol. 10, pp. 87900–87910, 2022.

[70]

W. Zhang, C. Dong, J. Yin, and J. Wang, Attentive representation learning with adversarial training for short text clustering, IEEE Trans. Knowl. Data Eng., vol. 34, no. 11, pp. 5196–5210, 2022.

[71]
T. Miyato, S. I. Maeda, M. Koyama, and S. Ishii, Virtual adversarial training: A regularization method for supervised and semi-supervised learning, arXiv preprint arXiv: 1704.03976, 2018.
[72]

A. Tariq, A. Mehmood, M. Elhadef, and M. U. G. Khan, Adversarial training for fake news classification, IEEE Access, vol. 10, pp. 82706–82715, 2022.

[73]

Y. Zhang, G. Xu, X. Fu, L. Jin, and T. Huang, Adversarial training improved multi-path multi-scale relation detector for knowledge base question answering, IEEE Access, vol. 8, pp. 63310–63319, 2020.

[74]
A. Onan, GTR-GA: Harnessing the power of graph-based neural networks and genetic algorithms for text augmentation, Expert Syst. Appl., vol. 232, p. 120908, 2023.
[75]
A. Onan, SRL-ACO: A text augmentation framework based on semantic role labeling and ant colony optimization, J. King Saud Univ. Comput. Inf. Sci., vol. 35, no. 7, p. 101611, 2023.
[76]

X. Luo, X. Wen, M. Zhou, A. Abusorrah and L. Huang, Decision-tree-initialized dendritic neuron model for fast and accurate data classification, IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 9, pp. 4173–4183, 2022.

[77]

X. Wang and F. Liu, Data-driven relay selection for physical-layer security: A decision tree approach, IEEE Access, vol. 8, pp. 12105–12116, 2020.

[78]

N. Younas, A. Ali, H. Hina, M. Hamraz, Z. Khan, and S. Aldahmani, Optimal causal decision trees ensemble for improved prediction and causal inference, IEEE Access, vol. 10, pp. 13000–13011, 2022.

[79]
F. Es-Sabery, K. Es-Sabery, J. Qadir, B. Sainz-De-Abajo, A. Hair, B. Garcia-Zapirain, and I. De La Torre-Diez, A MapReduce opinion mining for COVID-19-related tweets classification using enhanced ID3 decision tree classifier, IEEE Access, vol. 9, pp. 58706–58739, 2021.
[80]
T. Chen and C. Guestrin, XGBoost: A scalable tree boosting system, in Proc. 22 nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785–794.
[81]
C. A. E. Piter, S. Hadi, and I. N. Yulita, Multi-label classification for scientific conference activities information text using extreme gradient boost (XGBoost) method, in Proc. 2021 Int. Conf. Artificial Intelligence and Big Data Analytics, Bandung, Indonesia, 2021, pp. 1–5.
[82]

X. Chen, D. Yu, X. Fan, L. Wang, and J. Chen, Multiclass classification for self-admitted technical debt based on XGBoost, IEEE Trans. Reliab., vol. 71, no. 3, pp. 1309–1324, 2022.

[83]

P. Wang, M. Yan, X. Zhan, M. Tian, Y. Si, Y. Sun, L. Jiao, and X. Wu, Predicting self-reported proactive personality classification with Weibo text and short answer text, IEEE Access, vol. 9, pp. 77203–77211, 2021.

[84]

E. S. Gualberto, R. T. De Sousa, T. P. De Brito Vieira, J. P. C. L. Da Costa, and C. G. Duque, The answer is in the text: Multi-stage methods for phishing detection based on feature engineering, IEEE Access, vol. 8, pp. 223529–223547, 2020.

[85]
Z. Qi, The text classification of theft crime based on TF-IDF and XGBoost model, in Proc. 2020 IEEE Int. Conf. Artificial Intelligence and Computer Applications (ICAICA ), Dalian, China, 2020, pp. 1241–1246.
[86]
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T. Y. Liu, LightGBM: A highly efficient gradient boosting decision tree, in Proc. 31 st Int. Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 3149–3157.
[87]
H. Che, Multi-sensor data fusion method based on ARIMA-LightGBM for AGV positioning, in Proc. 2021 5 th Int. Conf. Robotics and Automation Sciences (ICRAS ), Wuhan, China, 2021, pp. 272–276.
[88]
Y. Wu and Q. Wang, LightGBM based optiver realized volatility prediction, in Proc. 2021 IEEE Int. Conf. Computer Science, Artificial Intelligence and Electronic Engineering (CSAIEE ), Virtual Event, 2021, pp. 227–230.
[89]

Y. Gao, H. Hasegawa, Y. Yamaguchi, and H. Shimada, Malware detection using LightGBM with a custom logistic loss function, IEEE Access, vol. 10, pp. 47792–47804, 2022.

[90]

J. Dhar, Multistage ensemble learning model with weighted voting and genetic algorithm optimization strategy for detecting chronic obstructive pulmonary disease, IEEE Access, vol. 9, pp. 48640–48657, 2021.

[91]
J. H. Kim, J. H. Choi, Y. H. Park, C. K. S. Leung, and A. Nasridinov, KNN-SC: Novel spectral clustering algorithm using k-nearest neighbors, IEEE Access, vol. 9, pp. 152616–152627, 2021.
[92]

W. Zhang, X. Chen, Y. Liu, and Q. Xi, A distributed storage and computation k-nearest neighbor algorithm based cloud-edge computing for cyber-physical-social systems, IEEE Access, vol. 8, pp. 50118–50130, 2020.

[93]

C. Ren, L. Sun, Y. Yu, and Q. Wu, Effective density peaks clustering algorithm based on the layered k-nearest neighbors and subcluster merging, IEEE Access, vol. 8, pp. 123449–123468, 2020.

[94]

T. Liao, Z. Lei, T. Zhu, S. Zeng, Y. Li, and C. Yuan, Deep metric learning for K nearest neighbor classification, IEEE Trans. Knowl. Data Eng., vol. 35, no. 1, pp. 264–275, 2023.

[95]

A. J. Gallego, J. Calvo-Zaragoza, and J. R. Rico-Juan, Insights into efficient k-nearest neighbor classification with convolutional neural codes, IEEE Access, vol. 8, pp. 99312–99326, 2020.

[96]
U. Majumder, N. Balaji, K. Brey, W. Fu, and T. Menzies, 500+ times faster than deep learning: A case study exploring faster methods for text mining stackoverflow, in Proc. 15 th Int. Conf. Mining Software Repositories, Gothenburg, Sweden, 2018, pp. 554–563.
[97]
J. Long, L. D. Wang, Z. D. Li, and Z. P. Zhang, Service retrieval based on hybrid SLVM of WSDL, in Proc. 6 th Asia-Pacific Symp. Internetware, Hong Kong, China, 2014, pp. 120–126.
[98]
Y. Wang and L. Xu, Research on text categorization of KNN based on k-means for class imbalanced problem, in Proc. 2016 Sixth Int. Conf. Instrumentation & Measurement, Computer, Communication and Control (IMCCC ), Harbin, China, 2016, pp. 579–583.
[99]

T. Kim and J. S. Lee, Exponential loss minimization for learning weighted Naïve Bayes classifiers, IEEE Access, vol. 10, pp. 22724–22736, 2022.

[100]

C. K. Aridas, S. Karlos, V. G. Kanas, N. Fazakis, and S. B. Kotsiantis, Uncertainty based under-sampling for learning Naïve Bayes classifiers under imbalanced data sets, IEEE Access, vol. 8, pp. 2122–2133, 2020.

[101]

S. Ruan, H. Li, C. Li, and K. Song, Class-specific deep feature weighting for Naïve Bayes text classifiers, IEEE Access, vol. 8, pp. 20151–20159, 2020.

[102]

L. Jiang, C. Li, S. Wang, and L. Zhang, Deep feature weighting for Naïve Bayes and its application to text classification, Eng. Appl. Artif. Intell., vol. 52, pp. 26–39, 2016.

[103]
B. Tang, H. He, P. M. Baggenstoss, and S. Kay, A Bayesian classification approach using class-specific features for text categorization, IEEE Trans. Knowl. Data Eng., vol. 28, no. 6, pp. 1602–1606, 2016.
[104]
E. Dai, G. R. Xue, Q. Yang, and Y. Yu, Transferring Naïve Bayes classifiers for text classification, in Proc. 22 nd National Conf. Artificial Intelligence, Vancouver, Canada, 2007, pp. 540–545.
[105]
E. W. Son, S. B. Park, and H. J. Song, Learning Naïve Bayes transfer classifier through class-wise test distribution estimation, in Proc. 19 th ACM Int. Conf. Information and Knowledge Management, Toronto, Canada, 2010, pp. 1729–1732.
[106]

Q. Zhang, J. Lu, D. Wu, and G. Zhang, A cross-domain recommender system with kernel-induced knowledge transfer for overlapping entities, IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 7, pp. 1998–2012, 2019.

[107]

X. Liu, Z. Liu, G. Wang, Z. Cai, and H. Zhang, Ensemble transfer learning algorithm, IEEE Access, vol. 6, pp. 2389–2396, 2018.

[108]
D. Dua and C. Graff, UCI machine learning repository, http://archive.ics.uci.edu/ml, 2017.
[109]
T. P. Sahu and S. Ahuja, Sentiment analysis of movie reviews: A study on feature selection & classification algorithms, in Proc. 2016 Int. Conf. Microelectronics, Computing and Communications (MicroCom ), Durgapur, India, 2016, pp. 1–6.
[110]
TweetEval datasets, https://github.com/cardiffnlp/tweeteval, 2024.
[111]
AG News Classification Dataset, https://www.kaggle.com/datasets/amananandrai/ag-news-classification-dataset, 2024.
[112]

K. P. Sinaga and M. S. Yang, Unsupervised K-means clustering algorithm, IEEE Access, vol. 8, pp. 80716–80727, 2020.

[113]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2017.
Big Data Mining and Analytics
Pages 624-660
Cite this article:
Taha K, D. Yoo P, Yeun C, et al. Text Classification Techniques: A Holistic Review, Observational Analysis, and Experimental Investigation. Big Data Mining and Analytics, 2025, 8(3): 624-660. https://doi.org/10.26599/BDMA.2024.9020092
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return