| Sign up

PDF (2.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Open Access

Quantifying Bytes: Understanding Practical Value of Data Assets in Federated Learning

Minghao Yao^¹, Saiyu Qi^¹(), Zhen Tian^¹, Qian Li^², Yong Han^¹, Haihong Li^¹, Yong Qi^¹

1School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China

2School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China

Show Author Information

Abstract

The data asset is emerging as a crucial component in both industrial and commercial applications. Mining valuable knowledge from the data benefits decision-making and business. However, the usage of data assets raises tension between sensitive information protection and value estimation. As an emerging machine learning paradigm, Federated Learning (FL) allows multiple clients to jointly train a global model based on their data without revealing it. This approach harnesses the power of multiple data assets while ensuring their privacy. Despite the benefits, it relies on a central server to manage the training process and lacks quantification of the quality of data assets, which raises privacy and fairness concerns. In this work, we present a novel framework that combines Federated Learning and Blockchain by Shapley value (FLBS) to achieve a good trade-off between privacy and fairness. Specifically, we introduce blockchain in each training round to elect aggregation and evaluation nodes for training, enabling decentralization and contribution-aware incentive distribution, with these nodes functionally separated and able to supervise each other. The experimental results validate the effectiveness of FLBS in estimating contribution even in the presence of heterogeneity and noisy data.

Keywords

Federated Learning (FL)blockchain fairness

References

[1]

H. T. Tseng, N. Aghaali, and N. Hajli, Customer agility and big data analytics in new product context, Technol. Forecast. Soc. Change, vol. 180, p. 121690, 2022.

Crossref Google Scholar

[2]

Y. Chen, K. Sherren, M. Smit, and K. Y. Lee, Using social media images as data in social science research, New Media Soc., vol. 25, no. 4, pp. 849–871, 2023.

Crossref Google Scholar

[3]

A. T. Tomczyk, D. Buhalis, D. X. F. Fan, and N. L. Williams, Pricepersonalization: Customer typology based on hospitality business, J. Bus. Res., vol. 147, pp. 462–476, 2022.

Crossref Google Scholar

[4]

X. Chen, J. Sun, and H. Liu, Balancing web personalization and consumer privacy concerns: Mechanisms of consumer trust and reactance, J. Consum. Behav., vol. 21, no. 3, pp. 572–582, 2022.

Crossref Google Scholar

[5]

C. Barrett, Are the EU GDPR and the California CCPA becoming the de facto global standards for data privacy and protection? Scitech Lawyer, vol. 15, no. 3, pp. 24–29, 2019.

[6]

M. Nasr, R. Shokri, and A. Houmansadr, Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning, in Proc. 2019 IEEE Symp. Security and Privacy, San Francisco, CA, USA, 2019, pp. 739–753.

[7]

Z. Liu, Y. Chen, Y. Zhao, H. Yu, Y. Liu, R. Bao, J. Jiang, Z. Nie, Q. Xu, and Q. Yang, Contribution-aware federated learning for smart healthcare, in Proc. Thirty-Sixth AAAI Conf. Artificial Intelligence, Vancouver, Canada, 2022, pp. 12396–12404.

[8]

Q. Guo, Y. Qi, S. Qi, and D. Wu, Dual class-aware contrastive federated semi-supervised learning, arXiv preprint arXiv: 2211.08914, 2022.

[9]

Q. Yang, Y. Liu, T. Chen, and Y. Tong, Federated machine learning: Concept and applications, ACM Trans. Intell. Syst. Technol., vol. 10, no. 2, p. 12, 2019.

Crossref Google Scholar

[10]

J. P. Bharadiya, Machine learning and AI in business intelligence: Trends and opportunities, Int. J. Comput., vol. 48, no. 1, pp. 123–134, 2023.

Crossref Google Scholar

[11]

A. Matala, Reviewing the performance of local governments in managing corporate social responsibility program, AKADEMIK J. Mahas. Human., vol. 2, no. 2, pp. 55–63, 2022.

Crossref Google Scholar

[12]

R. N. Zaeem and K. S. Barber, The effect of the GDPR on privacy policies: Recent progress and future promise, ACM Trans. Manag. Inf. Syst., vol. 12, no. 1, p. 2, 2021.

Crossref Google Scholar

[13]

E. Goldman, An introduction to the California consumer privacy act (CCPA), Santa Clara Univ. Legal Studies Research Paper, 2020, http://dx.doi.org/10.2139/ssrn.3211013.

[14]

S. Wachter, Normative challenges of identification in the internet of things: Privacy, profiling, discrimination, and the GDPR, Comput. Law Secur. Rev., vol. 34, no. 3, pp. 436–449, 2018.

Crossref Google Scholar

[15]

B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proc. 20^th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2017, pp. 1273–1282.

[16]

R. Shokri, M. Stronati, C. Song, and V. Shmatikov, Membership inference attacks against machine learning models, in Proc. 2017 IEEE Symp. Security and Privacy, San Jose, CA, USA, 2017, pp. 3–18.

[17]

M. Fredrikson, S. Jha, and T. Ristenpart, Model inversion attacks that exploit confidence information and basic countermeasures, in Proc. 2015 22^nd ACM SIGSAC Conf. Computer and Communications Security, Denver, CO, USA, 2015, pp. 1322–1333.

[18]

L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, Exploiting unintended feature leakage in collaborative learning, in Proc. 2019 IEEE Symp. Security and Privacy, San Francisco, CA, USA, 2019, pp. 691–706.

[19]

D. Wu, S. Y. Qi, Y. Qi, Q. Li, B. W. Cai, Q. Guo, and J. X. Cheng, Understanding and defending against White-box membership inference attack in deep learning, Knowl.-Based Syst., vol. 259, p. 110014, 2023.

Crossref Google Scholar

[20]

Y. Sarikaya and O. Ercetin, Motivating workers in federated learning: A stackelberg game perspective, IEEE Netw. Lett., vol. 2, no. 1, pp. 23–27, 2020.

Crossref Google Scholar

[21]

J. Lin, M. Du, and J. Liu, Free-riders in federated learning: Attacks and defenses, arXiv preprint arXiv: 1911.12560, 2019.

[22]

N. Ding, Z. Fang, and J. Huang, Incentive mechanism design for federated learning with multi-dimensional private information, in Proc. 2020 18^th Int. Symp. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT), Volos, Greece, 2020, pp. 1–8.

[23]

J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, Federated learning: Strategies for improving communication efficiency, arXiv preprint arXiv: 1610.05492, 2016.

[24]

Q. Guo, Y. Qi, S. Qi, D. Wu, And Q. Li, FedMCSA: Personalized federated learning via model components self-attention, Neurocomputing, vol. 560, p. 126831, 2023.

Crossref Google Scholar

[25]

Q. Guo, D. Wu, Y. Qi, S. Qi, and Q. Li, FLMJR: Improving robustness of federated learning via model stability, in Proc. 2022 27^th European Symp. Research in Computer Security, Copenhagen, Denmark, 2022, pp. 405–424.

[26]

R. S. Antunes, C. A. Da Costa, A. Küderle, I. A. Yari, and B. Eskofier, Federated learning for healthcare: Systematic review and architecture proposal, ACM Trans. Intellig. Syst. Technol., vol. 13, no. 4, p. 54, 2022.

Crossref Google Scholar

[27]

N. Rieke, J. Hancox, W. Li, F. Milletarì, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Hein, et al., The future of digital health with federated learning, NPJ Digit. Med., vol. 3, no. 1, p. 119, 2020.

Crossref Google Scholar

[28]

I. Dayan, H. R. Roth, A. Zhong, A. Harouni, A. Gentili, A. Z. Abidin, A. Liu, A. B. Costa, B. J. Wood, and C. S. Tsai, et al., Federated learning for predicting clinical outcomes in patients with COVID-19, Nat. Med., vol. 27, no. 10, pp. 1735–1743, 2021.

Crossref Google Scholar

[29]

S. Singh, S. Rathore, O. Alfarraj, A. Tolba, and B. Yoon, A framework for privacy-preservation of IoT healthcare data using federated learning and blockchain technology, Fut. Generat. Comput. Syst., vol. 129, pp. 380–388, 2022.

Crossref Google Scholar

[30]

L. Yang, B. Tan, V. W. Zheng, K. Chen, and Q. Yang, Federated recommendation systems, in Federated Learning : Privacy and Incentive, Q. Yang, L. Fan, and H. Yu, eds. Cham, Switzerland: Springer, 2020, pp. 225–239.

[31]

C. Niu, F. Wu, S. Tang, L. Hua, R. Jia, C. Lv, Z. Wu, and G. Chen, Billion-scale federated learning on mobile clients: A submodel design with tunable privacy, in Proc. 26^th Ann. Int. Conf. Mobile Computing and Networking, London, UK, 2020, p. 31.

[32]

C. Meng, S. Rambhatla, and Y. Liu, Cross-node federated graph neural network for spatio-temporal data modeling, in Proc. 2021 27^th ACM SIGKDD Conf. Knowledge Discovery & Data Mining, Singapore, 2021, pp. 1202–1211.

[33]

Y. Zhu, Y. Liu, J. J. Q. James, and X. Yuan, Semi-supervised federated learning for travel mode identification from GPS trajectories, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 3, pp. 2380–2391, 2022.

Crossref Google Scholar

[34]

Y. Liu, J. J. Q. Yu, J. W. Kang, D. Niyato, and S. Y. Zhang, Privacy-preserving traffic flow prediction: A federated learning approach, IEEE Intern. Things J., vol. 7, no. 8, pp. 7751–7763, 2020.

Crossref Google Scholar

[35]

S. P. Karimireddy, W. Guo, and M. I. Jordan, Mechanisms that incentivize data sharing in federated learning, in Proc. Workshop on Federated Learning : Recent Advances and New Challenges (in Conjunction with NeurIPS 2022), New Orleans, LA, USA, arXiv preprint arXiv:2207.04557, 2022.

[36]

S. K. Lo, Q. Lu, L. Zhu, H. Y. Paik, X. Xu, and C. Wang, Architectural patterns for the design of federated learning systems, J. Syst. Software, vol. 191, p. 111357, 2022.

Crossref Google Scholar

[37]

M. Mohri, G. Sivek, and A. T. Suresh, Agnostic federated learning, in Proc. 36^th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 4615–4625.

[38]

W. Du, D. Xu, X. Wu, and H. Tong, Fairness-aware agnostic federated learning, in Proc. 2021 SIAM Int. Conf. Data Mining, Washingtou, WA, USA, 2021, pp. 181–189.

[39]

T. Li and S. Hu, A. Beirami, and V. Smith, Ditto: Fair and robust federated learning through personalization, in Proc. International Conference on Machine Learning, Virtual Event , 2021, pp. 6357–6368.

[40]

Y. Fraboni, R. Vidal, and M. Lorenzi, Free-rider attacks on model aggregation in federated learning, in Proc. 24^th Int. Conf. Artificial Intelligence and Statistics, Virtual Event, 2021, pp. 1846–1854.

[41]

M. Tang and V. W. S. Wong, An incentive mechanism for cross-silo federated learning: A public goods perspective, in Proc. IEEE INFOCOM 2021-IEEE Conf. Computer Communications, Vancouver, Canada, 2021, pp. 1–10.

[42]

Y. Qi, M. S. Hossain, J. Nie, and X. Li, Privacy-preserving blockchain-based federated learning for traffic flow prediction, Fut. Generat. Comput. Syst., vol. 117, pp. 328–337, 2021.

Crossref Google Scholar

[43]

G. Paragliola, Evaluation of the trade-off between performance and communication costs in federated learning scenario, Fut. Generat. Comput. Syst., vol. 136, pp. 282–293, 2022.

Crossref Google Scholar

[44]

A. Ghorbani and J. Zou, Data shapley: Equitable valuation of data for machine learning, in Proc. 36^th Int. Conf. Machine Learning. Long Beach, CA, USA, 2019, pp. 2242–2251.

[45]

R. Jia, D. Dao, B. Wang, F. A. Hubis, N. Hynes, N. M. Gürel, B. Li, C. Zhang, D. Song, and C. J. Spanos, Towards efficient data valuation based on the shapley value, in Proc. 22^nd Int. Conf. Artificial Intelligence and Statistics, Naha, Japan, 2019, pp. 1167–1176.

[46]

T. Song, Y. Tong, and S. Wei, Profit allocation for federated learning, in Proc. 2019 IEEE Int. Conf. Big Data, Los Angeles, CA, USA, 2019, pp. 2577–2586.

[47]

J. Xu, B. S. Glicksberg, C. Su, P. Walker, J. Bian, and F. Wang, Federated learning for healthcare informatics, J. Healthc. Inform. Res., vol. 5, no. 1, pp. 1–19, 2021.

Crossref Google Scholar

[48]

X. Wei, Q. Li, Y. Liu, H. Yu, T. Chen, and Q. Yang, Multi-agent visualization for explaining federated learning, in Proc. Twenty-Eighth Int. Joint Conf. Artificial Intelligence, Macao, China, 2019, pp. 6572–6574.

[49]

J. Kang, Z. Xiong, D. Niyato, S. Xie, and J. Zhang, Incentive mechanism for reliable federated learning: A joint optimization approach to combining reputation and contract theory, IEEE Intern. Things J., vol. 6, no. 6, pp. 10700–10714, 2019.

Crossref Google Scholar

[50]

G. Wang, C. X. Dang, and Z. Zhou, Measure contribution of participants in federated learning, in Proc. 2019 IEEE Int. Conf. Big Data, Los Angeles, CA, USA, 2019, pp. 2597–2604.

[51]

M. Kearns and D. Ron, Algorithmic stability and sanity-check bounds for leave-one-out cross-validation, in Proc. Tenth Ann. Conf. Computational Learning Theory, Nashville, TN, USA, 1997, pp. 152–162.

[52]

Y. Kwon, M. A. Rivas, and J. Zou, Efficient computation and analysis of distributional Shapley values, in Proc. 24^th Int. Conf. Artificial Intelligence and Statistics, Virtual Event, 2021, pp. 793–801.

[53]

H. W. Kuhn and A. W. Tucker, Contributions to the Theory of Games (AM-28). Princeton, NJ, USA: Princeton University Press, 1953.

[54]

T. Wang, J. Rausch, C. Zhang, R. Jia, and D. Song, A principled approach to data valuation for federated learning, in Federated Learning : Privacy and Incentive, Q. Yang, L. Fan, and H. Yu, eds. Cham, Switzerland: Springer, 2020, pp. 153–167.

[55]

A. Ghorbani, M. P. Kim, and J. Zou, A distributional framework for data valuation, in Proc. 37^th Int. Conf. Machine Learning, Virtual Event, 2020, p. 331.

[56]

H. Yu, Z. Liu, Y. Liu, T. Chen, M. Cong, X. Weng, D. Niyato, and Q. Yang, A fairness-aware incentive scheme for federated learning, in Proc. AAAI/ACM Conf. AI, Ethics, and Society, New York, NY, USA, 2020, pp. 393–399.

[57]

L. Lyu, X. Xu, Q. Wang, and H. Yu, Collaborative fairness in federated learning, in Federated Learning : Privacy and Incentive, Q. Yang, L. Fan, and H. Yu, eds. Cham, Switzerland: Springer, 2020, pp. 189–204.

[58]

D. C. Nguyen, M. Ding, Q. V. Pham, P. N. Pathirana, L. B. Le, A. Seneviratne, J. Li, D. Niyato, and H. V. Poor, Federated learning meets blockchain in edge computing: Opportunities and challenges, IEEE Intern. Things J., vol. 8, no. 16, pp. 12806–12825, 2021.

Crossref Google Scholar

[59]

A. C. Yao, Protocols for secure computations, in Proc. 23^rd Ann. Symp. Foundations of Computer Science, Chicago, IL, USA, 1982, pp. 160–164.

[60]

D. Rathee, M. Rathee, N. Kumar, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, Cryptflow2: Practical 2-party secure inference, in Proc. 2020 ACM SIGSAC Conf. Computer and Communications Security, Virtual Event, 2020, pp. 325–342.

[61]

D. C. Nguyen, P. N. Pathirana, M. Ding, and A. Seneviratne, BEdgeHealth: A decentralized architecture for edge-based IoMT networks using blockchain, IEEE Intern. Things J., vol. 8, no. 14, pp. 11743–11757, 2021.

Crossref Google Scholar

[62]

K. Toyoda and A. N. Zhang, Mechanism design for an incentive-aware blockchain-enabled federated learning platform, in Proc. 2019 IEEE Int. Conf. Big Data, Los Angeles, CA, USA, 2019, pp. 395–403.

[63]

P. Ramanan and K. Nakayama, BAFFLE: Blockchain based aggregator free federated learning, in Proc. 2020 IEEE Int. Conf. Blockchain, Rhodes, Greece, 2020, pp. 72–81.

[64]

H. Kim, J. Park, M. Bennis, and S. L. Kim, Blockchained on-device federated learning, IEEE Commun. Lett., vol. 24, no. 6, pp. 1279–1283, 2020.

Crossref Google Scholar

[65]

X. Bao, C. Su, Y. Xiong, W. Huang, and Y. Hu, FLChain: A blockchain for auditable federated learning with trust and incentive, in Proc. 2019 5^th Int. Conf. Big Data Computing and Communications, Qingdao, China, 2019, pp. 151–159.

[66]

W. Zhang, Q. Lu, Q. Yu, Z. Li, Y. Liu, S. K. Lo, S. Chen, X. Xu, and L. Zhu, Blockchain-based federated learning for device failure detection in industrial IoT, IEEE Intern. Things J., vol. 8, no. 7, pp. 5926–5937, 2021.

Crossref Google Scholar

[67]

J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, and M. Guizani, Reliable federated learning for mobile networks, IEEE Wireless Commun., vol. 27, no. 2, pp. 72–80, 2020.

Crossref Google Scholar

[68]

K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.

[69]

W. Huang, T. Li, D. Wang, S. Du, J. Zhang, and T. Huang, Fairness and accuracy in horizontal federated learning, Inf. Sci., vol. 589, pp. 170–185, 2022.

Crossref Google Scholar

[70]

Z. Wang, X. Fan, J. Qi, C. Wen, C. Wang, and R. Yu, Federated learning with fair averaging, in Proc. Thirtieth Int. Joint Conf. Artificial Intelligence, Montreal, Canada, 2021, pp. 1615–1623.

Tsinghua Science and Technology

Volume 30 Issue 1,
February 2025

Pages 135-147

DOI: 10.26599/TST.2024.9010034

Cite this article:

Yao M, Qi S, Tian Z, et al. Quantifying Bytes: Understanding Practical Value of Data Assets in Federated Learning. Tsinghua Science and Technology, 2025, 30(1): 135-147. https://doi.org/10.26599/TST.2024.9010034

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号