Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
The increasing amount of user traffic on Internet discussion forums has led to a huge amount of unstructured natural language data in the form of user comments. Most modern recommendation systems rely on manual tagging, relying on administrators to label the features of a class, or story, which a user comment corresponds to. Another common approach is to use pre-trained word embeddings to compare class descriptions for textual similarity, then use a distance metric such as cosine similarity or Euclidean distance to find top k neighbors. However, neither approach is able to fully utilize this user-generated unstructured natural language data, reducing the scope of these recommendation systems. This paper studies the application of domain adaptation on a transformer for the set of user comments to be indexed, and the use of simple contrastive learning for the sentence transformer fine-tuning process to generate meaningful semantic embeddings for the various user comments that apply to each class. In order to match a query containing content from multiple user comments belonging to the same class, the construction of a subquery channel for computing class-level similarity is proposed. This channel uses query segmentation of the aggregate query into subqueries, performing k-nearest neighbors (KNN) search on each individual subquery. RecBERT achieves state-of-the-art performance, outperforming other state-of-the-art models in accuracy, precision, recall, and F1 score for classifying comments between four and eight classes, respectively. RecBERT outperforms the most precise state-of-the-art model (distilRoBERTa) in precision by 6.97% for matching comments between eight classes.
S. W. Kim and J. M. Gil, Research paper classification systems based on TF-IDF and LDA schemes, Hum. Centric Comput. Inf. Sci., vol. 9, no. 1, p. 30, 2019.
R. Chen, Q. Hua, Y. S. Chang, B. Wang, L. Zhang, and X. Kong, A survey of collaborative filtering-based recommender systems: From traditional methods to hybrid methods based on social networks, IEEE Access, vol. 6, pp. 64301–64320, 2018.
B. K. Mylavarapu, Collaborative filtering and artificial neural network based recommendation system for advanced applications, J. Comput. Commun, vol. 6, no. 12, pp. 1–14, 2018.
G. Linden, B. Smith, and J. York, Amazon. com recommendations: Item-to-item collaborative filtering, IEEE Internet Comput., vol. 7, no. 1, pp. 76–80, 2003.
W. Zhao, B. Wang, M. Yang, J. Ye, Z. Zhao, X. Chen, and Y. Shen, Leveraging long and short-term information in content-aware movie recommendation via adversarial training, IEEE Trans. Cybern., vol. 50, no. 11, pp. 4680–4693, 2020.
J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020.
L. Rasmy, Y. Xiang, Z. Xie, C. Tao, and D. Zhi, Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction, NPJ Digit. Med., vol. 4, no. 1, p. 86, 2021.
C. Liu, W. Zhu, X. Zhang, and Q. Zhai, Sentence part-enhanced BERT with respect to downstream tasks, Complex Intell. Syst., vol. 9, no. 1, pp. 463–474, 2023.
This work is available under the CC BY-NC-ND 3.0 IGO license:https://creativecommons.org/licenses/by-nc-nd/3.0/igo/