AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.2 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Optimizing Data Distributions Based on Jensen-Shannon Divergence for Federated Learning

Academy of Military Sciences, Beijing 100081, China
College of Computer, National University of Defense Technology, Changsha 410073, China
Show Author Information

Abstract

In current federated learning frameworks, a central server randomly selects a small number of clients to train local models at the beginning of each global iteration. Since clients’ local data are non-dependent and identically distributed, partial local models are not consistent with the global model. Existing studies employ model cleaning methods to find inconsistent local models. Model cleaning methods measure the cosine similarity between local models and the global model. The inconsistent local model is cleaned out and will not be aggregated for the next global model. However, model cleaning methods incur negative effects such as large computation overheads and limited updates. In this paper, we propose a data distribution optimization method, called federated distribution optimization (FedDO), aiming to overcome the shortcomings of model cleaning methods. FedDO calculates the gradient of the Jensen-Shannon divergence to decrease the discrepancy between selected clients’ data distribution and the overall data distribution. We test our method on the multi-classification regression model, the multi-layer perceptron, and the convolutional neural network model on a handwritten digital image dataset. Compared with model cleaning methods, FedDO improves the training accuracy by 1.8%, 2.6%, and 5.6%, respectively.

References

[1]

Z. Hu, D. Li, D. Zhang, Y. Zhang, and B. Peng, Optimizing resource allocation for data-parallel jobs via GCN-based prediction, IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 9, pp. 2188–2201, 2021.

[2]
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proc. 20th Int. Conf. Artificial Intelligence and Statistics (AISTATS), Fort Lauderdale, FL, USA, 2017, pp. 1273–1282.
[3]

W. Zhang, Z. Li, and X. Chen, Quality-aware user recruitment based on federated learning in mobile crowd sensing, Tsinghua Science and Technology, vol. 26, no. 6, pp. 869–877, 2021.

[4]
Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, Federated learning with non-IID data, arXiv preprint arXiv: 1806.00582, 2018.
[5]

Y. Liu, T. Wang, S. Peng, G. Wang, and W. Jia, Edge-based model cleaning and device clustering in federated learning, (in Chinese), Chinese J. Computers, vol. 44, no. 12, pp. 2515–2528, 2021.

[6]

M. Duan, D. Liu, X. Ji, Y. Wu, L. Liang, X. Chen, Y. Tan, and A. Ren, Flexible clustered federated learning for client-level data distribution shift, IEEE Trans. Parallel Distrib. Syst., vol. 33, no. 11, pp. 2661–2674, 2022.

[7]
X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, On the convergence of FedAvg on non-IID data, arXiv preprint arXiv: 1907.02189, 2019.
[8]
T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, Federated optimization in heterogeneous networks, arXiv preprint arXiv: 1812.06127, 2018.
[9]

Z. Wang, J. Xin, H. Yang, S. Tian, G. Yu, C. Xu, and Y. Yao, Distributed and weighted extreme learning machine for imbalanced big data learning, Tsinghua Science and Technology, vol. 22, no. 2, pp. 160–173, 2017.

[10]
H. Wang, Z. Kaplan, D. Niu, and B. Li, Optimizing federated learning on non-IID data with reinforcement learning, in Proc. IEEE INFOCOM 2020 - IEEE Conf. Computer Communications, Toronto, Canada, 2020, pp. 1698–1707.
[11]
N. Abe, M. K. Warmuth, and J. I. Takeuchi, Polynomial learnability of probabilistic concepts with respect to the Kullback-Leibler divergence, in Proc. 4th Annual Workshop on Computational Learning Theory, Santa Cruz, CA, USA, 1991, pp. 277–289.
[12]

S. C. Tsai, W. G. Tzeng, and H. L. Wu, On the Jensen-Shannon divergence and variational distance, IEEE Trans. Inf. Theory, vol. 51, no. 9, pp. 3333–3336, 2005.

[13]

K. M. Borgwardt, A. Gretton, M. J. Rasch, H. P. Kriegel, B. Schölkopf, and A. J. Smola, Integrating structured biological data by kernel maximum mean discrepancy, Bioinformatics, vol. 22, no. 14, pp. e49–e57, 2006.

[14]

I. Yang, A convex optimization approach to distributionally robust Markov decision processes with Wasserstein distance, IEEE Contr. Syst. Lett., vol. 1, no. 1, pp. 164–169, 2017.

[15]
J. C. Guella, On Gaussian kernels on Hilbert spaces and kernels on Hyperbolic spaces, arXiv preprint arXiv: 2007.14697, 2020.
[16]

B. Yang, Y. Lei, F. Jia, N. Li, and Z. Du, A polynomial kernel induced distance metric to improve deep transfer learning for fault diagnosis of machines, IEEE Trans. Ind. Electron., vol. 67, no. 11, pp. 9747–9757, 2020.

[17]
Y. J. Chang and B. W. Wah, Lagrangian techniques for solving a class of zero-one integer linear programs, in Proc. 19th Annual Int. Computer Software and Applications Conference (COMPSAC'95), Dallas, TX, USA, 1995, pp. 156–161.
[18]

R. D. Rodman, Algorithm 166: MonteCarlo, Commun. ACM, vol. 6, no. 4, p. 164, 1963.

[19]

M. C. Bartholomew-Biggs, The estimation of the hessian matrix in nonlinear least squares problems with non-zero residuals, Math. Program. Ser. A B, vol. 12, no. 1, pp. 67–80, 1977.

[20]

Z. Q. Luo and W. Yu, An introduction to convex optimization for communications and signal processing, IEEE J. Sel. Areas Commun., vol. 24, no. 8, pp. 1426–1438, 2006.

[21]
L. Bottou, Large-scale machine learning with stochastic gradient descent, in Proc. Int. Conf. Computational Statistics, Paris, France, 2010, pp. 177–186.
[22]

C. Kwak and A. Clayton-Matthews, Multinomial logistic regression, Nurs. Res., vol. 51, no. 6, pp. 404–410, 2002.

[23]
Z. Zhang, M. Lyons, M. Schuster, and S. Akamatsu, Comparison between geometry-based and gabor-wavelets-based facial expression recognition using multi-layer perceptron, in Proc. 3rd. Int. Conf. Face and Gesture Recognition, Nara, Japan, 1998, pp. 454–459.
[24]
M. Oquab, L. Bottou, I. Laptev, and J. Sivic, Learning and transferring mid-level image representations using convolutional neural networks, in Proc. 2014 IEEE Conf. Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 1717–1724.
[25]
M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., TensorFlow: Large-scale machine learning on heterogeneous distributed systems, arXiv preprint arXiv: 1603.04467, 2016.
Tsinghua Science and Technology
Pages 670-681
Cite this article:
Hu Z, Li D, Yang K, et al. Optimizing Data Distributions Based on Jensen-Shannon Divergence for Federated Learning. Tsinghua Science and Technology, 2025, 30(2): 670-681. https://doi.org/10.26599/TST.2023.9010091

25

Views

0

Downloads

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 04 January 2023
Revised: 22 February 2023
Accepted: 27 August 2023
Published: 09 December 2024
© The Author(s) 2025.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return