AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

When Crowdsourcing Meets Data Markets: A Fair Data Value Metric for Data Trading

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Show Author Information

Abstract

Large-quantity and high-quality data is critical to the success of machine learning in diverse applications. Faced with the dilemma of data silos where data is difficult to circulate, emerging data markets attempt to break the dilemma by facilitating data exchange on the Internet. Crowdsourcing, on the other hand, is one of the important methods to efficiently collect large amounts of data with high-value in data markets. In this paper, we investigate the joint problem of efficient data acquisition and fair budget distribution across the crowdsourcing and data markets. We propose a new metric of data value as the uncertainty reduction of a Bayesian machine learning model by integrating the data into model training. Guided by this data value metric, we design a mechanism called Shapley Value Mechanism with Individual Rationality (SV-IR), in which we design a greedy algorithm with a constant approximation ratio to greedily select the most cost-efficient data brokers, and a fair compensation determination rule based on the Shapley value, respecting the individual rationality constraints. We further propose a fair reward distribution method for the data holders with various effort levels under the charge of a data broker. We demonstrate the fairness of the compensation determination rule and reward distribution rule by evaluating our mechanisms on two real-world datasets. The evaluation results also show that the selection algorithm in SV-IR could approach the optimal solution, and outperforms state-of-the-art methods.

Electronic Supplementary Material

Download File(s)
JCST-2205-12519-Highlights.pdf (384.1 KB)

References

[1]
Kanhere S S. Participatory sensing: Crowdsourcing data from mobile smartphones in urban spaces. In Proc. the 12th International Conference on Mobile Data Management, Jun. 2011, pp.3–6. DOI: 10.1109/MDM.2011.16.
[2]

Raykar V C, Yu S P, Zhao L H, Valadez G H, Florin C, Bogoni L, Moy L. Learning from crowds. The Journal of Machine Learning Research , 2010, 11: 1297–1322. DOI: 10.5555/1756006.1859894.

[3]
Ghorbani A, Zou J Y. Data Shapley: Equitable valuation of data for machine learning. In Proc. the 36th International Conference on Machine Learning, Jun. 2019, pp.2242–2251.
[4]
Jia R X, Dao D, Wang B X, Hubis F A, Hynes N, Gürel N M, Li B, Zhang C, Song D, Spanos C J. Towards efficient data valuation based on the Shapley value. In Proc. the 22nd International Conference on Artificial Intelligence and Statistics, Apr. 2019, pp.1167–1176.
[5]

Jia R X, Dao D, Wang B X, Hubis F A, Gürel N M, Li B, Zhang C, Spanos C, Song D. Efficient task-specific data valuation for nearest neighbor algorithms. Proceedings of the VLDB Endowment , 2019, 12(11): 1610–1623. DOI: 10.14778/3342263.3342637.

[6]
Shapley L. 7. A value for n-person games. Contributions to the theory of games II (1953) 307-317. In Classics in Game Theory, Kuhn H W (ed.), Princeton University Press, 1997, pp.69–79. DOI: 10.1515/9781400829156-012.
[7]

Moulin H, Shenker S. Strategyproof sharing of submodular costs: Budget balance versus efficiency. Economic Theory , 2001, 18(3): 511–533. DOI: 10.1007/PL00004200.

[8]
Cressie N A C. Statistics for Spatial Data. Wiley, 2015.
[9]
Aumann R J, Shapley L S. Values of Non-Atomic Games. Princeton University Press, 1974.
[10]
Liu S Z, Zheng Z Z, Wu F, Tang S J, Chen G H. Context-aware data quality estimation in mobile crowdsensing. In Proc. the 2017 IEEE Conference on Computer Communications, May 2017, pp.1–9. DOI: 10.1109/INFOCOM.2017.8057033.
[11]
Zheng Z Z, Peng Y Q, Wu F, Tang S J, Chen G H. An online pricing mechanism for mobile crowdsensing data markets. In Proc. the 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Jul. 2017, Article No. 26. DOI: 10.1145/3084041.3084044.
[12]

Li Q, Li Y L, Gao J, Su L, Zhao B, Demirbas M, Fan W, Han J W. A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment , 2014, 8(4): 425–436. DOI: 10.14778/2735496.2735 505.

[13]
Ma F L, Li Y L, Li Q, Qiu M H, Gao J, Zhi S, Su L, Zhao B, Ji H, Han J W. FaitCrowd: Fine grained truth discovery for crowdsourced data aggregation. In Proc. the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2015, pp.745–754. DOI: 10.1145/2783258.2783314.
[14]
Gray R M. Entropy and Information Theory (2nd edition). Springer, 2011. DOI: 10.1007/978-1-4419-7970-4.
[15]
Krause A, Guestrin C. Near-optimal nonmyopic value of information in graphical models. In Proc. the 21st Conference in Uncertainty in Artificial Intelligence, Jul. 2005, pp.324–331.
[16]
Bishop C M. Pattern Recognition and Machine Learning. Springer, 2006.
[17]
Leskovec J, Krause A, Guestrin C, Faloutsos C, VanBriesen J, Glance N. Cost-effective outbreak detection in networks. In Proc. the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2007, pp.420–429. DOI: 10.1145/1281192.1281239.
[18]
Liben-Nowell D, Sharp A, Wexler T, Woods K. Computing Shapley value in supermodular coalitional games. In Proc. the 18th Annual International Conference on Computing and Combinatorics, Aug. 2012, pp.568–579. DOI: 10.1007/978-3-642-32241-9_48.
[19]
Cummings R, Ligett K, Roth A, Wu Z S, Ziani J. Accuracy for sale: Aggregating data with a variance constraint. In Proc. the 2015 Conference on Innovations in Theoretical Computer Science, Jan. 2015, pp.317–324. DOI: 10.1145/2688073.2688106.
[20]

Jin H M, He B X, Su L, Nahrstedt K, Wang X B. Data-driven pricing for sensing effort elicitation in mobile crowd sensing systems. IEEE/ACM Trans. Networking , 2019, 27(6): 2208–2221. DOI: 10.1109/TNET.2019.2938453.

[21]

Friedman E, Moulin H. Three methods to share joint costs or surplus. Journal of Economic Theory , 1999, 87(2): 275–312. DOI: 10.1006/jeth.1999.2534.

[22]
Xiao H J, Liu D X, Wu F, Kong L H, Chen G H. CORTEN: A real-time accurate indoor white space prediction mechanism. In Proc. the 15th IEEE International Conference on Mobile Ad Hoc and Sensor Systems, Oct. 2018, pp.415–423. DOI: 10.1109/MASS.2018.00065.
[23]

Shi W J, Wu C, Li Z P. A Shapley-value mechanism for bandwidth on demand between datacenters. IEEE Trans. Cloud Computing , 2018, 6(1): 19–32. DOI: 10.1109/TCC.2015.2481432.

[24]
Yang D J, Xue G L, Fang X, Tang J. Crowdsourcing to smartphones: Incentive mechanism design for mobile phone sensing. In Proc. the 18th Annual International Conference on Mobile Computing and Networking, Aug. 2012, pp.173–184. DOI: 10.1145/2348543.2348567.
[25]
Cheung M H, Southwell R, Hou F, Huang J W. Distributed time-sensitive task selection in mobile crowdsensing. In Proc. the 16th ACM International Symposium on Mobile Ad Hoc Networking and Computing, Jun. 2015, pp.157–166. DOI: 10.1145/2746285.2746293.
[26]
Karaliopoulos M, Telelis O, Koutsopoulos I. User recruitment for mobile crowdsensing over opportunistic networks. In Proc. the 2015 IEEE Conference on Computer Communications, Apr. 26–May 1, 2015, pp.2254–2262. DOI: 10.1109/INFOCOM.2015.7218612.
[27]

Radanovic G, Faltings B, Jurca R. Incentives for effort in crowdsourcing using the peer truth serum. ACM Trans. Intelligent Systems and Technology , 2016, 7(4): 48. DOI: 10.1145/2856102.

[28]

Zhao D, Li X Y, Ma H D. Budget-feasible online incentive mechanisms for crowdsourcing tasks truthfully. IEEE/ACM Trans. Networking , 2016, 24(2): 647–661. DOI: 10.1109/TNET.2014.2379281.

[29]
Jin H M, Su L, Nahrstedt K. CENTURION: Incentivizing multi-requester mobile crowd sensing. In Proc. the 2017 IEEE Conference on Computer Communications, May 2017, pp.1–9. DOI: 10.1109/INFOCOM.2017.8057111.
[30]
Jin H M, Guo H P, Su L, Nahrstedt K, Wang X B. Dynamic task pricing in multi-requester mobile crowd sensing with Markov correlated equilibrium. In Proc. the 2019 IEEE Conference on Computer Communications, Apr. 29–May 2, 2019, pp.1063–1071. DOI: 10.1109/INFOCOM.2019.8737506.
[31]
An B Y, Xiao M J, Liu A, Xie X K, Zhou X F. Crowdsensing data trading based on combinatorial multi-armed bandit and stackelberg game. In Proc. the 37th IEEE International Conference on Data Engineering, Apr. 2021, pp.253–264. DOI: 10.1109/ICDE51399.2021.00029.
[32]

Xu Y, Xiao M J, Wu J, Zhang S, Gao G J. Incentive mechanism for spatial crowdsourcing with unknown social-aware workers: A three-stage stackelberg game approach. IEEE Trans. Mobile Computing , 2023, 22(8): 4698–4713. DOI: 10.1109/TMC.2022.3157687.

[33]
Koutris P, Upadhyaya P, Balazinska M, Howe B, Suciu D. Toward practical query pricing with QueryMarket. In Proc. the 2013 ACM SIGMOD International Conference on Management of Data, Jun. 2013, pp.613–624. DOI: 10.1145/2463676.2465335.
[34]

Li C, Li D Y, Miklau G, Suciu D. A theory of pricing private data. Communications of the ACM , 2017, 60(12): 79–86. DOI: 10.1145/3139457.

[35]

Lin B R, Kifer D. On arbitrage-free pricing for general data queries. Proceedings of the VLDB Endowment , 2014, 7(9): 757–768. DOI: 10.14778/2732939.2732948.

[36]
Xia C, Muthukrishnan S. Arbitrage-free pricing in user-based markets. In Proc. the 17th International Conference on Autonomous Agents and Multiagent Systems, Jul. 2018, pp.327–335. DOI: 10.5555/3237383.3237436.
[37]

Gao J W, Yang X F, Liu D. Uncertain Shapley value of coalitional game with application to supply chain alliance. Applied Soft Computing , 2017, 56: 551–556. DOI: 10.1016/j.asoc.2016.06.018.

[38]
Dong M, Lan T, Zhong L. Rethink energy accounting with cooperative game theory. In Proc. the 20th Annual International Conference on Mobile Computing and Networking, Sept. 2014, pp.531–542. DOI: 10.1145/2639108.2639128.
[39]
Levinger C, Hazon N, Azaria A. Computing the Shapley value for ride-sharing and routing games. In Proc. the 19th International Conference on Autonomous Agents and Multiagent Systems, May 2020, pp.1895–1897. DOI: 10.5555/3398761.3399019.
[40]

Billera L J, Heath D C, Raanan J. Internal telephone billing rates—A novel application of non-atomic game theory. Operations Research , 1978, 26(6): 956–965. DOI: 10.1287/opre.26.6.956.

[41]

Samet D, Tauman Y, Zang I. An application of the Aumann-Shapley prices for cost allocation in transportation problems. Mathematics of Operations Research , 1984, 9(1): 25–42. DOI: 10.1287/moor.9.1.25.

[42]

Junqueira M, da Costa L C, Barroso L A, Oliveira G C, Thome L M, Pereira M V. An Aumann-Shapley approach to allocate transmission service cost among network users in electricity markets. IEEE Trans. Power Systems , 2007, 22(4): 1532–1546. DOI: 10.1109/TPWRS.2007.907133.

[43]

Banez-Chicharro F, Olmos L, Ramos A, Latorre J M. Estimating the benefits of transmission expansion projects: An Aumann-Shapley approach. Energy , 2017, 118: 1044–1054. DOI: 10.1016/j.energy.2016.10.135.

[44]
Cohen S, Ruppin E, Dror G. Feature selection based on the Shapley value. In Proc. the 19th International Joint Conference on Artificial Intelligence, Jul. 2005, pp.665–670. DOI: 10.5555/1642293.1642400.
[45]

Owen A B, Prieur C. On Shapley value for measuring importance of dependent inputs. SIAM/ASA Journal on Uncertainty Quantification , 2017, 5(1): 986–1002. DOI: 10.1137/16M1097717.

[46]
Lundberg S M, Lee S I. A unified approach to interpreting model predictions. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.4768–4777. DOI: 10.5555/3295222.3295230.
[47]
Datta A, Sen S, Zick Y. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Proc. the 2016 IEEE Symposium on Security and Privacy, May 2016, pp.598–617. DOI: 10.1109/SP.2016.42.
[48]

Štrumbelj E, Kononenko I. Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems , 2014, 41(3): 647–665. DOI: 10.1007/s10115-013-0679-x.

[49]
Chen J B, Song L, Wainwright M J, Jordan M I. L-Shapley and C-Shapley: Efficient model interpretation for structured data. In Proc. the 7th International Conference on Learning Representations, May 2019.
[50]
Sundararajan M, Taly A, Yan Q Q. Axiomatic attribution for deep networks. In Proc. the 34th International Conference on Machine Learning, Aug. 2017, pp.3319–3328. DOI: 10.5555/3305890.3306024.
[51]
Sim R H L, Zhang Y H, Chan M C, Low B K H. Collaborative machine learning with incentive-aware model rewards. In Proc. the 37th International Conference on Machine Learning, Jul. 2020, Article No. 828. DOI: 10.5555/3524938.3525766.
Journal of Computer Science and Technology
Pages 671-690
Cite this article:
Liu Y-S, Zheng Z-Z, Wu F, et al. When Crowdsourcing Meets Data Markets: A Fair Data Value Metric for Data Trading. Journal of Computer Science and Technology, 2024, 39(3): 671-690. https://doi.org/10.1007/s11390-023-2519-0

94

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 21 May 2022
Accepted: 15 March 2023
Published: 22 July 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return