School of Computer Science and Technology, Anhui University, Hefei230601, China
School of Electronic and Information Engineering, Anhui Jianzhu University, Hefei230601, China.
Show Author Information
Hide Author Information
Abstract
Collaborative Filtering (CF) is a commonly used technique in recommendation systems. It can promote items of interest to a target user from a large selection of available items. It is divided into two broad classes: memory-based algorithms and model-based algorithms. The latter requires some time to build a model but recommends online items quickly, while the former is time-consuming but does not require pre-building time. Considering the shortcomings of the two types of algorithms, we propose a novel Community-based User domain Collaborative Recommendation Algorithm (CUCRA). The idea comes from the fact that recommendations are usually made by users with similar preferences. The first step is to build a user-user social network based on users’ preference data. The second step is to find communities with similar user preferences using a community detective algorithm. Finally, items are recommended to users by applying collaborative filtering on communities. Because we recommend items to users in communities instead of to an entire social network, the method has perfect online performance. Applying this method to a collaborative tagging system, experimental results show that the recommendation accuracy of CUCRA is relatively good, and the online time-complexity reduces to .
J. S.Breese, D.Heckerman, and C.Kadie, Empirical analysis of predictive algorithms for collaborative filtering, inProceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc, 1998, pp. 43-52.
[4]
P.Resnick, N.Iacovou, M.Suchak, P.Bergstrom, and J.Riedl, GroupLens: An open architecture for collaborative filtering of net news, in CSCW 1994: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, Chapel Hill, North Carolina, United States: ACM Press, 1994, pp. 175-186.
L.Ungar and D.Foster, A formal statistical approach to collaborative filtering, inProceedings of the Conference of Automated Learning and Discovery, Pittsburg, PA, USA, 1998.
[6]
G.Takács, I.Pilászy, B.Németh, and D.Tikk, On the gravity recommendation system, inProceedings of KDD Cup Workshop at SIGKDD 2007,13th ACM International Conference on Knowledge Discovery and DataMining, SanJose, CA, USA, 2007, pp. 22-30.
[7]
T.Hofmann, Latent semantic models for collaborative filtering, ACM Transactions on Information Systems, vol. 22, no. 1, pp. 89-115, Jan.2004.
R.Salakhutdinov and A.Mnih, Bayesian probabilistic matrix factorization using Markov chain Monte Carlo, inProceedings of the 25th International Conference on Machine Learning, ACM, July2008, pp. 880-887.
D. M.Pennock, E.Horvitz, S.Lawrence, and C. L.Giles, Collaborative filtering by personality diagnosis: A hybrid memory-and model-based approach, inProceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., June2000, pp. 473-480.
[12]
S. K. L.Al Mamunur Rashid, G.Karypis, and J.Riedl, ClustKNN: A highly scalable hybrid model-& memory-based CF algorithm, inProceeding of WebKDD, Philadelphia, PA, USA, 2006.
[13]
G. R.Xue, C.Lin, Q.Yang, W.Xi, H. J.Zeng, Y.Yu, and Z.Chen, Scalable collaborative filtering using cluster-based smoothing, inProceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Aug.2005, pp. 114-121.
E.Viennet, Collaborative filtering in social networks: A community-based approach, inComputing, Management and Telecommunications (ComManTel), 2013 International Conference on, IEEE, Jan.2013, pp. 128-133.
[15]
R.Pan, P.Dolog, and G.Xu, KNN-based clustering for improving social recommender systems, in Agents and Data Mining Interaction, Springer, 2013, pp. 115-125.
A.Shepitsen, J.Gemmell, B.Mobasher, and R.Burke, Personalized recommendation in social tagging systems using hierarchical clustering, inProceedings of the 2nd ACM Conference on Recommender Systems, New York, NY, USA, 2008, pp. 259-266.
Qian F, Zhang Y, Zhang Y, et al. Community-Based User Domain Model Collaborative Recommendation Algorithm. Tsinghua Science and Technology, 2013, 18(4): 353-359. https://doi.org/10.1109/TST.2013.6574673
10.1109/TST.2013.6574673.F001
(a) User-item relationship; (b) User-user social network. The black spots represent users and the white spots represent items. The figure shows how the user-item relationship is mapped to the user-user social network.
2.1 Community-based user domain model generation method
Real-world social networks often exhibit a strong community structure[
16
]. A community is usually considered as a group of users whose members interact with others more frequently than with those outside the group. Users in the same communities are frequently connected, and thus probably have similar tastes and interests. In a social network, these communities often yield better recommendation results.
Based on the above motivation, we propose the CUCRA algorithm implemented in two parts: the first part is building the offline user domain model; the second part is recommending items to target users in the model.
The more important part is the former, that is, the community-based user domain model generation method. The generation method follows three steps: (1) Calculate user similarities using a user-item preference dataset; (2) map the similarity relationships to user-user social network relationships; (3) apply an existing community detection method to generate communities as user domains.
In the second step, we define a -nearest neighbor graph as a user-user social network where is the set of users, and is the set of edges. According to the KNN strategy, for each user, compute the top- largest similarity users, and establish edges between each two users. In cases where there are many edges between two users, only a single edge is kept. We map the user-item data relationship to the user-user network relationship through the above method. The intrinsic mechanism is mining user social behavior through user-item rating behavior. This mining process is described in
Fig. 1
.
10.1109/TST.2013.6574673.F001
(a) User-item relationship; (b) User-user social network. The black spots represent users and the white spots represent items. The figure shows how the user-item relationship is mapped to the user-user social network.
Building the model consumes offline time; next, we analyze the time complexity of Algorithm 1. We calculate the similarity between any two users according to the rating of items in Phase 1, so the minimum time cost is . We then determine the top- largest similarity users of the target user in Phase 2. The essence of the algorithm is sorting, and the time-complexity of sort algorithms is . There are various community detection algorithms and the most well-known and mature algorithm is the Gauss-Newton (GN) algorithm, the detection algorithm used in this version of CUCRA. Other algorithms include the Newman fast algorithm, PLA, and modifications of these, algorithms. The maximum time complexity of community detection algorithms is , such as GN, while minimum time complexity is such as PLA. Because the whole time complexity of an algorithm is determined by the maximum value of every phase. Thus, time complexity in building a user domain model is
2.2 Produce recommendation
After building the user domain model, we recommend items of interest to a target user in a community. The formula to compute prediction is defined by Eq. (
1
).
denotes the set of users in the same community as user , represents the similarity between user and user , and is the rating of user for item .
In the online phase, which is described by Algorithm 2, because the similarity relation has already been calculated for the target users, we only need to find common rating items within the community of users. Online time complexity is , and is far less than , so online time complexity for a community-based algorithm is .
Traditional memory-based collaborative filtering algorithms directly calculate similarities between target users and all other users. The algorithms have no offline time and only have online time. In order to calculate the similarity between the target user and another user, they first need to find common rating items between users. Thus calculating similarities between the target user and all users requires community time complexity. Thus a community-based user domain algorithm has obvious advantages in online time complexity from the above analysis.
10.1109/TST.2013.6574673.F002
The value of the hybridization parameter . (a) The MAE with the different values of , and (b) the RMSE with the different values of . MAE and RMSE are minimized when .
10.1109/TST.2013.6574673.F003
The relationship between the KNN algorithm and CUCRA. KNN clustering is presented by the dashed lines, while GN clustering is presented by the solid lines. CUCRA produces twice the number of clusters, including both the KNN clusters and GN clusters.
10.1109/TST.2013.6574673.F004
The degree distribution of the user-user social network when K=13.