AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.1 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Quantifying Bytes: Understanding Practical Value of Data Assets in Federated Learning

School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China
School of Cyber Science and Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Show Author Information

Abstract

The data asset is emerging as a crucial component in both industrial and commercial applications. Mining valuable knowledge from the data benefits decision-making and business. However, the usage of data assets raises tension between sensitive information protection and value estimation. As an emerging machine learning paradigm, Federated Learning (FL) allows multiple clients to jointly train a global model based on their data without revealing it. This approach harnesses the power of multiple data assets while ensuring their privacy. Despite the benefits, it relies on a central server to manage the training process and lacks quantification of the quality of data assets, which raises privacy and fairness concerns. In this work, we present a novel framework that combines Federated Learning and Blockchain by Shapley value (FLBS) to achieve a good trade-off between privacy and fairness. Specifically, we introduce blockchain in each training round to elect aggregation and evaluation nodes for training, enabling decentralization and contribution-aware incentive distribution, with these nodes functionally separated and able to supervise each other. The experimental results validate the effectiveness of FLBS in estimating contribution even in the presence of heterogeneity and noisy data.

References

[1]

H. T. Tseng, N. Aghaali, and N. Hajli, Customer agility and big data analytics in new product context, Technol. Forecast. Soc. Change, vol. 180, p. 121690, 2022.

[2]

Y. Chen, K. Sherren, M. Smit, and K. Y. Lee, Using social media images as data in social science research, New Media Soc., vol. 25, no. 4, pp. 849–871, 2023.

[3]

A. T. Tomczyk, D. Buhalis, D. X. F. Fan, and N. L. Williams, Pricepersonalization: Customer typology based on hospitality business, J. Bus. Res., vol. 147, pp. 462–476, 2022.

[4]

X. Chen, J. Sun, and H. Liu, Balancing web personalization and consumer privacy concerns: Mechanisms of consumer trust and reactance, J. Consum. Behav., vol. 21, no. 3, pp. 572–582, 2022.

[5]

C. Barrett, Are the EU GDPR and the California CCPA becoming the de facto global standards for data privacy and protection? Scitech Lawyer, vol. 15, no. 3, pp. 24–29, 2019.

[6]
M. Nasr, R. Shokri, and A. Houmansadr, Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning, in Proc. 2019 IEEE Symp. Security and Privacy, San Francisco, CA, USA, 2019, pp. 739–753.
[7]
Z. Liu, Y. Chen, Y. Zhao, H. Yu, Y. Liu, R. Bao, J. Jiang, Z. Nie, Q. Xu, and Q. Yang, Contribution-aware federated learning for smart healthcare, in Proc. Thirty-Sixth AAAI Conf. Artificial Intelligence, Vancouver, Canada, 2022, pp. 12396–12404.
[8]
Q. Guo, Y. Qi, S. Qi, and D. Wu, Dual class-aware contrastive federated semi-supervised learning, arXiv preprint arXiv: 2211.08914, 2022.
[9]

Q. Yang, Y. Liu, T. Chen, and Y. Tong, Federated machine learning: Concept and applications, ACM Trans. Intell. Syst. Technol., vol. 10, no. 2, p. 12, 2019.

[10]

J. P. Bharadiya, Machine learning and AI in business intelligence: Trends and opportunities, Int. J. Comput., vol. 48, no. 1, pp. 123–134, 2023.

[11]

A. Matala, Reviewing the performance of local governments in managing corporate social responsibility program, AKADEMIK J. Mahas. Human., vol. 2, no. 2, pp. 55–63, 2022.

[12]

R. N. Zaeem and K. S. Barber, The effect of the GDPR on privacy policies: Recent progress and future promise, ACM Trans. Manag. Inf. Syst., vol. 12, no. 1, p. 2, 2021.

[13]
E. Goldman, An introduction to the California consumer privacy act (CCPA), Santa Clara Univ. Legal Studies Research Paper, 2020, http://dx.doi.org/10.2139/ssrn.3211013.
[14]

S. Wachter, Normative challenges of identification in the internet of things: Privacy, profiling, discrimination, and the GDPR, Comput. Law Secur. Rev., vol. 34, no. 3, pp. 436–449, 2018.

[15]
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proc. 20 th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2017, pp. 1273–1282.
[16]
R. Shokri, M. Stronati, C. Song, and V. Shmatikov, Membership inference attacks against machine learning models, in Proc. 2017 IEEE Symp. Security and Privacy, San Jose, CA, USA, 2017, pp. 3–18.
[17]
M. Fredrikson, S. Jha, and T. Ristenpart, Model inversion attacks that exploit confidence information and basic countermeasures, in Proc. 2015 22 nd ACM SIGSAC Conf. Computer and Communications Security, Denver, CO, USA, 2015, pp. 1322–1333.
[18]
L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, Exploiting unintended feature leakage in collaborative learning, in Proc. 2019 IEEE Symp. Security and Privacy, San Francisco, CA, USA, 2019, pp. 691–706.
[19]

D. Wu, S. Y. Qi, Y. Qi, Q. Li, B. W. Cai, Q. Guo, and J. X. Cheng, Understanding and defending against White-box membership inference attack in deep learning, Knowl.-Based Syst., vol. 259, p. 110014, 2023.

[20]

Y. Sarikaya and O. Ercetin, Motivating workers in federated learning: A stackelberg game perspective, IEEE Netw. Lett., vol. 2, no. 1, pp. 23–27, 2020.

[21]
J. Lin, M. Du, and J. Liu, Free-riders in federated learning: Attacks and defenses, arXiv preprint arXiv: 1911.12560, 2019.
[22]
N. Ding, Z. Fang, and J. Huang, Incentive mechanism design for federated learning with multi-dimensional private information, in Proc. 2020 18 th Int. Symp. Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT), Volos, Greece, 2020, pp. 1–8.
[23]
J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, Federated learning: Strategies for improving communication efficiency, arXiv preprint arXiv: 1610.05492, 2016.
[24]

Q. Guo, Y. Qi, S. Qi, D. Wu, And Q. Li, FedMCSA: Personalized federated learning via model components self-attention, Neurocomputing, vol. 560, p. 126831, 2023.

[25]
Q. Guo, D. Wu, Y. Qi, S. Qi, and Q. Li, FLMJR: Improving robustness of federated learning via model stability, in Proc. 2022 27 th European Symp. Research in Computer Security, Copenhagen, Denmark, 2022, pp. 405–424.
[26]

R. S. Antunes, C. A. Da Costa, A. Küderle, I. A. Yari, and B. Eskofier, Federated learning for healthcare: Systematic review and architecture proposal, ACM Trans. Intellig. Syst. Technol., vol. 13, no. 4, p. 54, 2022.

[27]

N. Rieke, J. Hancox, W. Li, F. Milletarì, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier, B. A. Landman, K. Maier-Hein, et al., The future of digital health with federated learning, NPJ Digit. Med., vol. 3, no. 1, p. 119, 2020.

[28]

I. Dayan, H. R. Roth, A. Zhong, A. Harouni, A. Gentili, A. Z. Abidin, A. Liu, A. B. Costa, B. J. Wood, and C. S. Tsai, et al., Federated learning for predicting clinical outcomes in patients with COVID-19, Nat. Med., vol. 27, no. 10, pp. 1735–1743, 2021.

[29]

S. Singh, S. Rathore, O. Alfarraj, A. Tolba, and B. Yoon, A framework for privacy-preservation of IoT healthcare data using federated learning and blockchain technology, Fut. Generat. Comput. Syst., vol. 129, pp. 380–388, 2022.

[30]
L. Yang, B. Tan, V. W. Zheng, K. Chen, and Q. Yang, Federated recommendation systems, in Federated Learning : Privacy and Incentive, Q. Yang, L. Fan, and H. Yu, eds. Cham, Switzerland: Springer, 2020, pp. 225–239.
[31]
C. Niu, F. Wu, S. Tang, L. Hua, R. Jia, C. Lv, Z. Wu, and G. Chen, Billion-scale federated learning on mobile clients: A submodel design with tunable privacy, in Proc. 26 th Ann. Int. Conf. Mobile Computing and Networking, London, UK, 2020, p. 31.
[32]
C. Meng, S. Rambhatla, and Y. Liu, Cross-node federated graph neural network for spatio-temporal data modeling, in Proc. 2021 27 th ACM SIGKDD Conf. Knowledge Discovery & Data Mining, Singapore, 2021, pp. 1202–1211.
[33]

Y. Zhu, Y. Liu, J. J. Q. James, and X. Yuan, Semi-supervised federated learning for travel mode identification from GPS trajectories, IEEE Trans. Intell. Transp. Syst., vol. 23, no. 3, pp. 2380–2391, 2022.

[34]

Y. Liu, J. J. Q. Yu, J. W. Kang, D. Niyato, and S. Y. Zhang, Privacy-preserving traffic flow prediction: A federated learning approach, IEEE Intern. Things J., vol. 7, no. 8, pp. 7751–7763, 2020.

[35]
S. P. Karimireddy, W. Guo, and M. I. Jordan, Mechanisms that incentivize data sharing in federated learning, in Proc. Workshop on Federated Learning : Recent Advances and New Challenges (in Conjunction with NeurIPS 2022), New Orleans, LA, USA, arXiv preprint arXiv:2207.04557, 2022.
[36]

S. K. Lo, Q. Lu, L. Zhu, H. Y. Paik, X. Xu, and C. Wang, Architectural patterns for the design of federated learning systems, J. Syst. Software, vol. 191, p. 111357, 2022.

[37]
M. Mohri, G. Sivek, and A. T. Suresh, Agnostic federated learning, in Proc. 36 th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 4615–4625.
[38]
W. Du, D. Xu, X. Wu, and H. Tong, Fairness-aware agnostic federated learning, in Proc. 2021 SIAM Int. Conf. Data Mining, Washingtou, WA, USA, 2021, pp. 181–189.
[39]
T. Li and S. Hu, A. Beirami, and V. Smith, Ditto: Fair and robust federated learning through personalization, in Proc. International Conference on Machine Learning, Virtual Event , 2021, pp. 6357–6368.
[40]
Y. Fraboni, R. Vidal, and M. Lorenzi, Free-rider attacks on model aggregation in federated learning, in Proc. 24 th Int. Conf. Artificial Intelligence and Statistics, Virtual Event, 2021, pp. 1846–1854.
[41]
M. Tang and V. W. S. Wong, An incentive mechanism for cross-silo federated learning: A public goods perspective, in Proc. IEEE INFOCOM 2021-IEEE Conf. Computer Communications, Vancouver, Canada, 2021, pp. 1–10.
[42]

Y. Qi, M. S. Hossain, J. Nie, and X. Li, Privacy-preserving blockchain-based federated learning for traffic flow prediction, Fut. Generat. Comput. Syst., vol. 117, pp. 328–337, 2021.

[43]

G. Paragliola, Evaluation of the trade-off between performance and communication costs in federated learning scenario, Fut. Generat. Comput. Syst., vol. 136, pp. 282–293, 2022.

[44]
A. Ghorbani and J. Zou, Data shapley: Equitable valuation of data for machine learning, in Proc. 36 th Int. Conf. Machine Learning. Long Beach, CA, USA, 2019, pp. 2242–2251.
[45]
R. Jia, D. Dao, B. Wang, F. A. Hubis, N. Hynes, N. M. Gürel, B. Li, C. Zhang, D. Song, and C. J. Spanos, Towards efficient data valuation based on the shapley value, in Proc. 22 nd Int. Conf. Artificial Intelligence and Statistics, Naha, Japan, 2019, pp. 1167–1176.
[46]
T. Song, Y. Tong, and S. Wei, Profit allocation for federated learning, in Proc. 2019 IEEE Int. Conf. Big Data, Los Angeles, CA, USA, 2019, pp. 2577–2586.
[47]

J. Xu, B. S. Glicksberg, C. Su, P. Walker, J. Bian, and F. Wang, Federated learning for healthcare informatics, J. Healthc. Inform. Res., vol. 5, no. 1, pp. 1–19, 2021.

[48]
X. Wei, Q. Li, Y. Liu, H. Yu, T. Chen, and Q. Yang, Multi-agent visualization for explaining federated learning, in Proc. Twenty-Eighth Int. Joint Conf. Artificial Intelligence, Macao, China, 2019, pp. 6572–6574.
[49]

J. Kang, Z. Xiong, D. Niyato, S. Xie, and J. Zhang, Incentive mechanism for reliable federated learning: A joint optimization approach to combining reputation and contract theory, IEEE Intern. Things J., vol. 6, no. 6, pp. 10700–10714, 2019.

[50]
G. Wang, C. X. Dang, and Z. Zhou, Measure contribution of participants in federated learning, in Proc. 2019 IEEE Int. Conf. Big Data, Los Angeles, CA, USA, 2019, pp. 2597–2604.
[51]
M. Kearns and D. Ron, Algorithmic stability and sanity-check bounds for leave-one-out cross-validation, in Proc. Tenth Ann. Conf. Computational Learning Theory, Nashville, TN, USA, 1997, pp. 152–162.
[52]
Y. Kwon, M. A. Rivas, and J. Zou, Efficient computation and analysis of distributional Shapley values, in Proc. 24 th Int. Conf. Artificial Intelligence and Statistics, Virtual Event, 2021, pp. 793–801.
[53]
H. W. Kuhn and A. W. Tucker, Contributions to the Theory of Games (AM-28). Princeton, NJ, USA: Princeton University Press, 1953.
[54]
T. Wang, J. Rausch, C. Zhang, R. Jia, and D. Song, A principled approach to data valuation for federated learning, in Federated Learning : Privacy and Incentive, Q. Yang, L. Fan, and H. Yu, eds. Cham, Switzerland: Springer, 2020, pp. 153–167.
[55]
A. Ghorbani, M. P. Kim, and J. Zou, A distributional framework for data valuation, in Proc. 37 th Int. Conf. Machine Learning, Virtual Event, 2020, p. 331.
[56]
H. Yu, Z. Liu, Y. Liu, T. Chen, M. Cong, X. Weng, D. Niyato, and Q. Yang, A fairness-aware incentive scheme for federated learning, in Proc. AAAI/ACM Conf. AI, Ethics, and Society, New York, NY, USA, 2020, pp. 393–399.
[57]
L. Lyu, X. Xu, Q. Wang, and H. Yu, Collaborative fairness in federated learning, in Federated Learning : Privacy and Incentive, Q. Yang, L. Fan, and H. Yu, eds. Cham, Switzerland: Springer, 2020, pp. 189–204.
[58]

D. C. Nguyen, M. Ding, Q. V. Pham, P. N. Pathirana, L. B. Le, A. Seneviratne, J. Li, D. Niyato, and H. V. Poor, Federated learning meets blockchain in edge computing: Opportunities and challenges, IEEE Intern. Things J., vol. 8, no. 16, pp. 12806–12825, 2021.

[59]
A. C. Yao, Protocols for secure computations, in Proc. 23 rd Ann. Symp. Foundations of Computer Science, Chicago, IL, USA, 1982, pp. 160–164.
[60]
D. Rathee, M. Rathee, N. Kumar, N. Chandran, D. Gupta, A. Rastogi, and R. Sharma, Cryptflow2: Practical 2-party secure inference, in Proc. 2020 ACM SIGSAC Conf. Computer and Communications Security, Virtual Event, 2020, pp. 325–342.
[61]

D. C. Nguyen, P. N. Pathirana, M. Ding, and A. Seneviratne, BEdgeHealth: A decentralized architecture for edge-based IoMT networks using blockchain, IEEE Intern. Things J., vol. 8, no. 14, pp. 11743–11757, 2021.

[62]
K. Toyoda and A. N. Zhang, Mechanism design for an incentive-aware blockchain-enabled federated learning platform, in Proc. 2019 IEEE Int. Conf. Big Data, Los Angeles, CA, USA, 2019, pp. 395–403.
[63]
P. Ramanan and K. Nakayama, BAFFLE: Blockchain based aggregator free federated learning, in Proc. 2020 IEEE Int. Conf. Blockchain, Rhodes, Greece, 2020, pp. 72–81.
[64]

H. Kim, J. Park, M. Bennis, and S. L. Kim, Blockchained on-device federated learning, IEEE Commun. Lett., vol. 24, no. 6, pp. 1279–1283, 2020.

[65]
X. Bao, C. Su, Y. Xiong, W. Huang, and Y. Hu, FLChain: A blockchain for auditable federated learning with trust and incentive, in Proc. 2019 5 th Int. Conf. Big Data Computing and Communications, Qingdao, China, 2019, pp. 151–159.
[66]

W. Zhang, Q. Lu, Q. Yu, Z. Li, Y. Liu, S. K. Lo, S. Chen, X. Xu, and L. Zhu, Blockchain-based federated learning for device failure detection in industrial IoT, IEEE Intern. Things J., vol. 8, no. 7, pp. 5926–5937, 2021.

[67]

J. Kang, Z. Xiong, D. Niyato, Y. Zou, Y. Zhang, and M. Guizani, Reliable federated learning for mobile networks, IEEE Wireless Commun., vol. 27, no. 2, pp. 72–80, 2020.

[68]
K. He, X. Zhang, S. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. 2016 IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 2016, pp. 770–778.
[69]

W. Huang, T. Li, D. Wang, S. Du, J. Zhang, and T. Huang, Fairness and accuracy in horizontal federated learning, Inf. Sci., vol. 589, pp. 170–185, 2022.

[70]
Z. Wang, X. Fan, J. Qi, C. Wen, C. Wang, and R. Yu, Federated learning with fair averaging, in Proc. Thirtieth Int. Joint Conf. Artificial Intelligence, Montreal, Canada, 2021, pp. 1615–1623.
Tsinghua Science and Technology
Pages 135-147
Cite this article:
Yao M, Qi S, Tian Z, et al. Quantifying Bytes: Understanding Practical Value of Data Assets in Federated Learning. Tsinghua Science and Technology, 2025, 30(1): 135-147. https://doi.org/10.26599/TST.2024.9010034

266

Views

50

Downloads

1

Crossref

1

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 27 October 2023
Revised: 05 January 2024
Accepted: 03 February 2024
Published: 11 September 2024
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return