| Sign up

PDF (787.6 KB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering

Samin Poudel^¹(), Marwan Bikdash^¹

1Department of Computational Data Science and Engineering, North Carolina A & T State University, Greensboro, NC 27401, USA

Show Author Information

Abstract

We postulate and analyze a nonlinear subsampling accuracy loss (SSAL) model based on the root mean square error (RMSE) and two SSAL models based on the mean square error (MSE), suggested by extensive preliminary simulations. The SSAL models predict accuracy loss in terms of subsampling parameters like the fraction of users dropped (FUD) and the fraction of items dropped (FID). We seek to investigate whether the models depend on the characteristics of the dataset in a constant way across datasets when using the SVD collaborative filtering (CF) algorithm. The dataset characteristics considered include various densities of the rating matrix and the numbers of users and items. Extensive simulations and rigorous regression analysis led to empirical symmetrical SSAL models in terms of FID and FUD whose coefficients depend only on the data characteristics. The SSAL models came out to be multi-linear in terms of odds ratios of dropping a user (or an item) vs. not dropping it. Moreover, one MSE deterioration model turned out to be linear in the FID and FUD odds where their interaction term has a zero coefficient. Most importantly, the models are constant in the sense that they are written in closed-form using the considered data characteristics (densities and numbers of users and items). The models are validated through extensive simulations based on 850 synthetically generated primary (pre-subsampling) matrices derived from the 25M MovieLens dataset. Nearly 460 000 subsampled rating matrices were then simulated and subjected to the singular value decomposition (SVD) CF algorithm. Further validation was conducted using the 1M MovieLens and the Yahoo! Music Rating datasets. The models were constant and significant across all 3 datasets.

Keywords

collaborative filtering subsampling accuracy loss models performance loss recommendation system simulation rating matrix root mean square error

References

[1]

B.

Smith

and G.

Linden

, Two decades of recommender systems at Amazon.com, IEEE Internet Comput., vol. 21, no. 3, pp. 12–18, 2017.

Crossref Google Scholar

[2]

C. A.

Gomez-Uribe

and N.

Hunt

, The Netflix recommender system: Algorithms, business value, and innovation, ACM Trans. Manag. Inf. Syst., vol. 6, no. 4, p. 13, 2015.

Crossref Google Scholar

[3]

I.

Pilászy

and D.

Tikk

, Recommending new movies: Even a few ratings are more valuable than metadata, in Proc. 3^rd ACM Conf. on Recommender Systems, New York, NY, USA, 2009, pp. 93–100.

Crossref Google Scholar

[4]

P. K.

Singh

, P. K. D.

Pramanik

, and P.

Choudhury

, Collaborative filtering in recommender systems: Technicalities, challenges, applications, and research trends, in New Age Analytics, G.

Shrivastava

, S. L.

Peng

, H.

Bansal

, K.

Sharma

, and M.

Sharma

, eds. New York, NY, USA: Apple Academic Press, 2020, pp. 183–215.

[5]

J. L.

Herlocker

, J. A.

Konstan

, and J.

Riedl

, Explaining collaborative filtering recommendations, in Proc. 2000 ACM Conf. on Computer Supported Cooperative Work, Philadelphia, PA, USA, 2000, pp. 241–250.

Crossref Google Scholar

[6]

G.

Adomavicius

and A.

Tuzhilin

, Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions, IEEE Trans. Knowl. Data Eng., vol. 17, no. 6, pp. 734–749, 2005.

Crossref Google Scholar

[7]

Z.

Liu

, X.

Luo

, and Z.

Wang

, Convergence analysis of single latent factor-dependent, nonnegative, and multiplicative update-based nonnegative latent factor models, IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 4, pp. 1737–1749, 2021.

Crossref Google Scholar

[8]

D.

Wu

, M.

Shang

, X.

Luo

, and Z.

Wang

, An L

_{1}

-and-L

_{2}

-norm-oriented latent factor model for recommender systems, IEEE Trans. Neural Netw. Learn. Syst.,.

Crossref Google Scholar

[9]

D.

Wu

, X.

Luo

, M.

Shang

, Y.

He

, G.

Wang

, and X.

Wu

, A data-characteristic-aware latent factor model for web services QoS prediction, IEEE Trans. Knowl. Data Eng., vol. 34, no. 6, pp. 2525–2538, 2022.

[10]

X.

Luo

, Z.

Wang

, and M.

Shang

, An instance-frequency-weighted regularization scheme for non-negative latent factor analysis on high-dimensional and sparse data, IEEE Trans. Syst. Man Cybern. Syst., vol. 51, no. 6, pp. 3522–3532, 2021.

Crossref Google Scholar

[11]

X.

Luo

, W.

Qin

, A.

Dong

, K.

Sedraoui

, and M.

Zhou

, Efficient and high-quality recommendations via momentum-incorporated parallel stochastic gradient descent-based learning, IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 402–411, 2021.

Crossref Google Scholar

[12]

Y.

Liu

, T. A. N.

Pham

, G.

Cong

, and Q.

Yuan

, An experimental evaluation of point-of-interest recommendation in location-based social networks, Proceedings VLDB Endowment, vol. 10, no. 10, pp. 1010–1021, 2017.

Crossref Google Scholar

[13]

G.

Adomavicius

and J.

Zhang

, Impact of data characteristics on recommender systems performance, ACM Trans. Manag. Inf. Syst., vol. 3, no. 1, p. 3, 2012.

Crossref Google Scholar

[14]

A.

Bellogín

and A. P.

de Vries

, Understanding similarity metrics in neighbour-based recommender systems, in Proc. Conf. on the Theory of Information Retrieval, Copenhagen, Denmark, 2013, pp. 48–55.

Crossref Google Scholar

[15]

C.

Desrosiers

and G.

Karypis

, A comprehensive survey of neighborhood-based recommendation methods, in Recommender Systems Handbook, F.

Ricci

, L.

Rokach

, B.

Shapira

, and P. B.

Kantor

, eds. New York, NY, USA: Springer, 2011, pp. 107–144.

[16]

F.

Cacheda

, V.

Carneiro

, D.

Fernández

, and V.

Formoso

, Comparison of collaborative filtering algorithms: Limitations of current techniques and proposals for scalable, high-performance recommender systems, ACM Trans. Web, vol. 5, no. 1, p. 2, 2011.

Crossref Google Scholar

[17]

M. A.

Ghazanfar

and A.

Prugel-Bennett

, The advantage of careful imputation sources in sparse data-environment of recommender systems: Generating improved SVD-based recommendations, Informatica, vol. 37, no. 1, pp. 61–92, 2013.

[18]

V. W.

Anelli

, T.

Di Noia

, E.

Di Sciascio

, C.

Pomo

, and A.

Ragone

, On the discriminative power of hyper-parameters in cross-validation and how to choose them, in Proc. 13^th ACM Conf. on Recommender Systems, Copenhagen, Denmark, 2019, pp. 447–451.

Crossref Google Scholar

[19]

E. B.

Nilsen

, D. E.

Bowler

, and J. D. C.

Linnell

, Exploratory and confirmatory research in the open science era, J. Appl. Ecol., vol. 57, no. 4, pp. 842–847, 2020.

Crossref Google Scholar

[20]

J.

Lee

, M.

Sun

, and G.

Lebanon

, A comparative study of collaborative filtering algorithms, arXiv preprint arXiv: 1205.3193, 2012.

[21]

B.

Sarwar

, G.

Karypis

, J.

Konstan

, and J.

Riedl

, Item-based collaborative filtering recommendation algorithms, in Proc. 10th Int. Conf. on World Wide Web, Hong Kong, China, 2001, pp. 285–295.

Crossref Google Scholar

[22]

V. H.

Vegeborn

and H.

Rahmani

, Comparison and Improvement of Collaborative Filtering Algorithms, Stockholm: KTH, 2017.

[23]

Y.

Deldjoo

, T.

Di Noia

, E.

Di Sciascio

, and F. A.

Merra

, How dataset characteristics affect the robustness of collaborative recommendation models, in Proc. 43^rd Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2020, pp. 951–960.

Crossref Google Scholar

[24]

M.

Forster

and E.

Sober

, How to tell when simpler, more unified, or less Ad hoc theories will provide more accurate predictions, Br. J. Philos. Sci., vol. 45, no. 1, pp. 1–35, 1994.

Crossref Google Scholar

[25]

R.

Dubin

, Theory Building. New York, NY, USA: Free Press, 1969.

[26]

M. C.

Lin

, A. J. T.

Lee

, R. T.

Kao

, and K. T.

Chen

, Stock price movement prediction using representative prototypes of financial reports, ACM Trans. Manag. Inf. Syst., vol. 2, no. 3, p. 19, 2011.

Crossref Google Scholar

[27]

G.

Shmueli

and O.

Koppius

, Predictive Analytics in Information Systems Research, College Park: University of Maryland, 2010.

[28]

S.

Poudel

and M.

Bikdash

, Optimal dependence of performance and efficiency of collaborative filtering on random stratified subsampling, Big Data Mining and Analytics, vol. 5, no. 3, pp. 192–205, 2022.

Crossref Google Scholar

[29]

GroupLens, MovieLens 25M dataset, https://grouplens.org/datasets/movielens/25m/, 2019.

[30]

F. M.

Harper

and J. A.

Konstan

, The movielens datasets: History and context, ACM Trans. Interact. Intell. Syst., vol. 5, no. 4, p. 19, 2016.

Crossref Google Scholar

[31]

G. H.

Golub

, and C.

Reinsch

, Singular value decomposition and least squares solutions, in Linear Algebra, J. H.

Wilkinson

and C.

Reinsch

, eds. Berlin, Heidelberg, Germany: Springer, 1971, pp. 134–151.

[32]

N.

Hug

, Surprise: A python library for recommender systems, J. Open Source Softw., vol. 5, no. 52, p. 2174, 2020.

Crossref Google Scholar

[33]

G.

Shani

and A.

Gunawardana

, Evaluating recommendation systems, in Recommender Systems Handbook, F.

Ricci

, L.

Rokach

, B.

Shapira

and P. B.

Kantor

, eds. New York, NY, USA: Springer, 2011, pp. 257–297.

[34]

G.

Schröder

, M.

Thiele

, and W.

Lehner

, Setting goals and choosing metrics for recommender system evaluations, in Proc. Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces, Chicago, IL, USA, 2011, pp. 78–85.

[35]

S.

Poudel

, A study of disease diagnosis using machine learning, presented at the 2^nd Int. Electronic Conf. on Healthcare, Basel, Switzerland, 2022.

[36]

M.

Jalili

, S.

Ahmadian

, M.

Izadi

, P.

Moradi

, and M.

Salehi

, Evaluating collaborative filtering recommender algorithms: A survey, IEEE Access, vol. 6, pp. 74003–74024, 2018.

Crossref Google Scholar

[37]

S.

Poudel

, Improving collaborative filtering recommendation systems via optimal sub-sampling and aspect-based interpretability, PhD dissertation, North Carolina Agricultural and Technical State University, Greensboro, NC, USA, 2022.

[38]

T.

Chai

and R. R.

Draxler

, Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature, Geosci. Model Dev., vol. 7, no. 3, pp. 1247–1250, 2014.

Crossref Google Scholar

[39]

J.

Frost

, Mean squared error (MSE), https://statisticsbyjim.com/regression/mean-squared-error-mse/, 2022.

[40]

GroupLens, MovieLens 1M dataset, https://grouplens.org/datasets/movielens/1m/, 2003.

[41]

Webscope | Yahoo labs, https://webscope.sandbox.yahoo.com/catalog.php?datatype=r&did=1, 2019.

[42]

E. C.

Alexopoulos

, Introduction to multivariate regression analysis, Hippokratia, vol. 14, no. Suppl 1, pp. 23–28, 2010.

[43]

K.

Kumari

and S.

Yadav

, Linear regression analysis study, J. Pract. Cardiovasc. Sci., vol. 4, no. 1, pp. 33–36, 2018.

Crossref Google Scholar

Big Data Mining and Analytics

Volume 6 Issue 1,
March 2023

Pages 72-84

DOI: 10.26599/BDMA.2022.9020024

Cite this article:

Poudel S, Bikdash M. Closed-Form Models of Accuracy Loss due to Subsampling in SVD Collaborative Filtering. Big Data Mining and Analytics, 2023, 6(1): 72-84. https://doi.org/10.26599/BDMA.2022.9020024

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号