TST: Threshold Based Similarity Transitivity Method in Collaborative Filtering with Cloud Computing

Feng Xie; Zhen Chen; Hongfeng Xu; Xiwei Feng; Qi Hou

doi:10.1109/TST.2013.6522590

| Sign up

PDF (1.2 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

TST: Threshold Based Similarity Transitivity Method in Collaborative Filtering with Cloud Computing

Feng Xie, Zhen Chen(), Hongfeng Xu, Xiwei Feng, Qi Hou

Department of Automation, Research Institute of Information Technology and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China

Research Institute of Information Technology and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China

Department of Computer Science and Technologies and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China

Department of Electronic Engineering and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University, Beijing 100084, China

Show Author Information

Abstract

Collaborative filtering solves information overload problem by presenting personalized content to individual users based on their interests, which has been extensively applied in real-world recommender systems. As a class of simple but efficient collaborative filtering method, similarity based approaches make predictions by finding users with similar taste or items that have been similarly chosen. However, as the number of users or items grows rapidly, the traditional approach is suffering from the data sparsity problem. Inaccurate similarities derived from the sparse user-item associations would generate the inaccurate neighborhood for each user or item. Consequently, its poor recommendation drives us to propose a Threshold based Similarity Transitivity (TST) method in this paper. TST firstly filters out those inaccurate similarities by setting an intersection threshold and then replaces them with the transitivity similarity. Besides, the TST method is designed to be scalable with MapReduce framework based on cloud computing platform. We evaluate our algorithm on the public data set MovieLens and a real-world data set from AppChina (an Android application market) with several well-known metrics including precision, recall, coverage, and popularity. The experimental results demonstrate that TST copes well with the tradeoff between quality and quantity of similarity by setting an appropriate threshold. Moreover, we can experimentally find the optimal threshold which will be smaller as the data set becomes sparser. The experimental results also show that TST significantly outperforms the traditional approach even when the data becomes sparser.

Keywords

cloud computing recommender systems big data collaborative filtering data mining similarity transitivity machine learning mapReduce android applications

References

[1]

Resnick

and H. R.

Varian

, Recommender systems, Communications of the ACM, vol. 40, no. 3, pp. 56-58, 1997.