Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Large-quantity and high-quality data is critical to the success of machine learning in diverse applications. Faced with the dilemma of data silos where data is difficult to circulate, emerging data markets attempt to break the dilemma by facilitating data exchange on the Internet. Crowdsourcing, on the other hand, is one of the important methods to efficiently collect large amounts of data with high-value in data markets. In this paper, we investigate the joint problem of efficient data acquisition and fair budget distribution across the crowdsourcing and data markets. We propose a new metric of data value as the uncertainty reduction of a Bayesian machine learning model by integrating the data into model training. Guided by this data value metric, we design a mechanism called Shapley Value Mechanism with Individual Rationality (SV-IR), in which we design a greedy algorithm with a constant approximation ratio to greedily select the most cost-efficient data brokers, and a fair compensation determination rule based on the Shapley value, respecting the individual rationality constraints. We further propose a fair reward distribution method for the data holders with various effort levels under the charge of a data broker. We demonstrate the fairness of the compensation determination rule and reward distribution rule by evaluating our mechanisms on two real-world datasets. The evaluation results also show that the selection algorithm in SV-IR could approach the optimal solution, and outperforms state-of-the-art methods.
Raykar V C, Yu S P, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. The Journal of Machine Learning Research , 2010, 11: 1297–1322. DOI: 10.5555/1756006.1859894.
Jia R X, Dao D, Wang B X, Hubis F A, Gürel N M, Li B, Zhang C, Spanos C, Song D. Efficient task-specific data valuation for nearest neighbor algorithms. Proceedings of the VLDB Endowment , 2019, 12(11): 1610–1623. DOI: 10.14778/3342263.3342637.
Moulin H, Shenker S. Strategyproof sharing of submodular costs: Budget balance versus efficiency. Economic Theory , 2001, 18(3): 511–533. DOI: 10.1007/PL00004200.
Li Q, Li Y L, Gao J, Su L, Zhao B, Demirbas M, Fan W, Han J W. A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment , 2014, 8(4): 425–436. DOI: 10.14778/2735496.2735 505.
Jin H M, He B X, Su L, Nahrstedt K, Wang X B. Data-driven pricing for sensing effort elicitation in mobile crowd sensing systems. IEEE/ACM Trans. Networking , 2019, 27(6): 2208–2221. DOI: 10.1109/TNET.2019.2938453.
Friedman E, Moulin H. Three methods to share joint costs or surplus. Journal of Economic Theory , 1999, 87(2): 275–312. DOI: 10.1006/jeth.1999.2534.
Shi W J, Wu C, Li Z P. A Shapley-value mechanism for bandwidth on demand between datacenters. IEEE Trans. Cloud Computing , 2018, 6(1): 19–32. DOI: 10.1109/TCC.2015.2481432.
Radanovic G, Faltings B, Jurca R. Incentives for effort in crowdsourcing using the peer truth serum. ACM Trans. Intelligent Systems and Technology , 2016, 7(4): 48. DOI: 10.1145/2856102.
Zhao D, Li X Y, Ma H D. Budget-feasible online incentive mechanisms for crowdsourcing tasks truthfully. IEEE/ACM Trans. Networking , 2016, 24(2): 647–661. DOI: 10.1109/TNET.2014.2379281.
Xu Y, Xiao M J, Wu J, Zhang S, Gao G J. Incentive mechanism for spatial crowdsourcing with unknown social-aware workers: A three-stage stackelberg game approach. IEEE Trans. Mobile Computing , 2023, 22(8): 4698–4713. DOI: 10.1109/TMC.2022.3157687.
Li C, Li D Y, Miklau G, Suciu D. A theory of pricing private data. Communications of the ACM , 2017, 60(12): 79–86. DOI: 10.1145/3139457.
Lin B R, Kifer D. On arbitrage-free pricing for general data queries. Proceedings of the VLDB Endowment , 2014, 7(9): 757–768. DOI: 10.14778/2732939.2732948.
Gao J W, Yang X F, Liu D. Uncertain Shapley value of coalitional game with application to supply chain alliance. Applied Soft Computing , 2017, 56: 551–556. DOI: 10.1016/j.asoc.2016.06.018.
Billera L J, Heath D C, Raanan J. Internal telephone billing rates—A novel application of non-atomic game theory. Operations Research , 1978, 26(6): 956–965. DOI: 10.1287/opre.26.6.956.
Samet D, Tauman Y, Zang I. An application of the Aumann-Shapley prices for cost allocation in transportation problems. Mathematics of Operations Research , 1984, 9(1): 25–42. DOI: 10.1287/moor.9.1.25.
Junqueira M, da Costa L C, Barroso L A, Oliveira G C, Thome L M, Pereira M V. An Aumann-Shapley approach to allocate transmission service cost among network users in electricity markets. IEEE Trans. Power Systems , 2007, 22(4): 1532–1546. DOI: 10.1109/TPWRS.2007.907133.
Banez-Chicharro F, Olmos L, Ramos A, Latorre J M. Estimating the benefits of transmission expansion projects: An Aumann-Shapley approach. Energy , 2017, 118: 1044–1054. DOI: 10.1016/j.energy.2016.10.135.
Owen A B, Prieur C. On Shapley value for measuring importance of dependent inputs. SIAM/ASA Journal on Uncertainty Quantification , 2017, 5(1): 986–1002. DOI: 10.1137/16M1097717.
Štrumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems , 2014, 41(3): 647–665. DOI: 10.1007/s10115-013-0679-x.