| Sign up

PDF (6 MB)

Cite

Collect

Submit Manuscript

Open Access

Topology design and graph embedding for decentralized federated learning

Yubin Duan, Xiuqi Li, Jie Wu()

Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA

Show Author Information

Abstract

Federated learning has been widely employed in many applications to protect the data privacy of participating clients. Although the dataset is decentralized among training devices in federated learning, the model parameters are usually stored in a centralized manner. Centralized federated learning is easy to implement; however, a centralized scheme causes a communication bottleneck at the central server, which may significantly slow down the training process. To improve training efficiency, we investigate the decentralized federated learning scheme. The decentralized scheme has become feasible with the rapid development of device-to-device communication techniques under 5G. Nevertheless, the convergence rate of learning models in the decentralized scheme depends on the network topology design. We propose optimizing the topology design to improve training efficiency for decentralized federated learning, which is a non-trivial problem, especially when considering data heterogeneity. In this paper, we first demonstrate the advantage of hypercube topology and present a hypercube graph construction method to reduce data heterogeneity by carefully selecting neighbors of each training device—a process that resembles classic graph embedding. In addition, we propose a heuristic method for generating torus graphs. Moreover, we have explored the communication patterns in hypercube topology and propose a sequential synchronization scheme to reduce communication cost during training. A batch synchronization scheme is presented to fine-tune the communication pattern for hypercube topology. Experiments on real-world datasets show that our proposed graph construction methods can accelerate the training process, and our sequential synchronization scheme can significantly reduce the overall communication traffic during training.

Keywords

data heterogeneity decentralized federated learning graph embedding network topology

References

[1]

J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, Federated learning: Strategies for improving communication efficiency, arXiv preprint arXiv: 1610.05492, 2016.

[2]

X. Lian, C. Zhang, H. Zhang, C. J. Hsieh, W. Zhang, and J. Liu, Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent, arXiv preprint arXiv: 1705.09056, 2017.

[3]

R. I. Ansari, C. Chrysostomou, S. A. Hassan, M. Guizani, S. Mumtaz, J. Rodriguez, and J. J. P. C. Rodrigues, 5G D2D networks: Techniques, challenges, and future prospects, IEEE Syst. J., vol. 12, no. 4, pp. 3970–3984, 2018.

Crossref Google Scholar

[4]

I. Hegedűs, G. Danner, and M. Jelasity, Decentralized learning works: An empirical comparison of gossip learning and federated learning, J. Parallel Distrib. Comput., vol. 148, pp. 109–124, 2021.

Crossref Google Scholar

[5]

K. Seaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massoulié, Optimal algorithms for smooth and strongly convex distributed optimization in networks, in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 3027–3036.

[6]

A. Koloskova, T. Lin, S. U. Stich, and M. Jaggi, Decentralized deep learning with arbitrary communication compression, in Proc. Int. Conf. Learning Representations (ICLR 2020), Addis Ababa, Ethiopia, 2020, pp. 1–23.

[7]

G. Neglia, C. Xu, D. Towsley, and G. Calbi, Decentralized gradient methods: Does topology matter? in Proc. Int. Conf. Artificial Intelligence and Statistics, Palermo, Italy 2020, pp. 2348–2358.

[8]

S. Wang, M. Lee, S. Hosseinalipour, R. Morabito, M. Chiang, and C. G. Brinton, Device sampling for heterogeneous federated learning: Theory, algorithms, and implementation, in Proc. IEEE INFOCOM 2021—IEEE Conf. Computer Communications, Vancouver, Canada, 2021, pp. 1–10.

Crossref

[9]

F. T. Leighton, Introduction to Parallel Algorithms and Architectures : Arrays, Trees, Hypercubes. San Mateo, CA, USA: Morgan Kaufmann Publishers, Inc., 1992.

Crossref

[10]

A. Krizhevsky and G. Hinton, Learning multiple layers of features from tiny images, Technical report, Department of Computer Science, University of Toronto, Toronto, Canada, 2009.

[11]

S. Duan, D. Wang, J. Ren, F. Lyu, Y. Zhang, H. Wu, and X. Shen, Distributed artificial intelligence empowered by endedge- cloud computing: A survey, IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 591–624, 2023.

Crossref Google Scholar

[12]

M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. G. Andersen, and A. Smola, Parameter server for distributed machine learning, presented at Big Learning 2013: NIPS 2013 Workshop on Big Learning, Lake Tahoe, NV, USA, 2013.

[13]

Q. Ho, J. Cipar, H. Cui, J. K. Kim, S. Lee, P. B. Gibbons, G. A. Gibson, G. R. Ganger, and E. P. Xing, More effective distributed ML via a Stale synchronous parallel parameter server, in Proc. 26th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2013, pp. 1223–1231.

[14]

M. Li, D. G. Andersen, A. Smola, and K. Yu, Communication efficient distributed machine learning with the parameter server, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 19–27.

[15]

O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao, Optimal distributed online prediction using mini-batches, J. Mach. Learn. Res., vol. 13, no. 1, pp. 165–202, 2012.

Google Scholar

[16]

V. Smith, S. Forte, C. Ma, M. Takáč, M. I. Jordan, and M. Jaggi, CoCoa: A general framework for communication-efficient distributed optimization, J. Mach. Learn. Res., vol. 18, p. 230, 2018.

Google Scholar

[17]

J. Wang and G. Joshi, Adaptive communication strategies to achieve the best error-runtime trade-off in local-update SGD, arXiv preprint arXiv: 1810.08313, 2018.

[18]

A. Reisizadeh, A. Mokhtari, H. Hassani, A. Jadbabaie, and R. Pedarsani, FedPAQ: A communication-efficient federated learning method with periodic averaging and quantization, arXiv preprint arXiv: 1909.13014, 2019.

[19]

E. Ozfatura, K. Ozfatura, and D. Gunduz, Time-correlated sparsification for communication-efficient federated learning, in Proc. IEEE Int. Symp. on Information Theory (ISIT), Melbourne, Australia, 2021, pp. 461–466.

Crossref

[20]

C. Chen, H. Xu, W. Wang, B. Li, B. Li, L. Chen, and G. Zhang, Communication-efficient federated learning with adaptive parameter freezing, in Proc. IEEE 41st Int. Conf. Distributed Computing Systems (ICDCS), Washington, DC, USA, 2021, pp. 1–11.

Crossref

[21]

J. Wang and G. Joshi, Cooperative SGD: A unified framework for the design and analysis of communication-efficient SGD algorithms, arXiv preprint arXiv: 1808.07576, 2018.

[22]

S. U. Stich, Local SGD converges fast and communicates little, arXiv preprint arXiv: 1805.09767, 2018.

[23]

H. Sun, Y. Shao, J. Jiang, B. Cui, K. Lei, Y. Xu, and J. Wang, Sparse gradient compression for distributed SGD, in Database Systems for Advanced Applications, G. Li, J. Yang, J. Gama, J. Natwichai, and Y. Tong, eds. Cham, Switzerland: Springer, 2019, pp. 139–155.

Crossref

[24]

N. Ivkin, D. Rothchild, E. Ullah, V. Braverman, I. Stoica, and R. Arora, Communication-efficient distributed SGD with Sketching, arXiv preprint arXiv: 1903.04488, 2019.

[25]

Y. Li, J. Park, M. Alian, Y. Yuan, Z. Qu, P. Pan, R. Wang, A. Schwing, H. Esmaeilzadeh, and N. S. Kim, A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networks, in Proc. 51st Annual IEEE/ACM Int. Symp. on Microarchitecture (MICRO), Fukuoka, Japan, 2018, pp. 175–188.

Crossref

[26]

Y. Yu, J. Wu, and J. Huang, Exploring fast and communication-efficient algorithms in large-scale distributed networks, arXiv preprint arXiv: 1901.08924, 2019.

[27]

H. Wang, S. Sievert, Z. Charles, S. Liu, S. Wright, and D. Papailiopoulos, ATOMO: Communication-efficient learning via atomic sparsification, arXiv preprint arXiv: 1806.04090, 2018.

[28]

T. Vogels, S. P. Karimireddy, and M. Jaggi, PowerSGD: Practical low-rank gradient compression for distributed optimization, arXiv preprint arXiv: 1905.13727, 2019.

[29]

M. Cho, V. Muthusamy, B. Nemanich, and R. Puri, GradZip: Gradient compression using alternating matrix factorization for large-scale deep learning, presented at Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2019.

[30]

S. P. Karimireddy, Q. Rebjock, S. U. Stich, and M. Jaggi, Error feedback fixes SignSGD and other gradient compression schemes, arXiv preprint arXiv: 1901.09847, 2019.

[31]

C. Xie, S. Zheng, O. Koyejo, I. Gupta, M. Li, and H. Lin, CSER: Communication-efficient SGD with error reset, arXiv preprint arXiv: 2007.13221v2, 2020.

[32]

N. Mhaisen, A. A. Abdellatif, A. Mohamed, A. Erbad, and M. Guizani, Optimal user-edge assignment in hierarchical federated learning based on statistical properties and network topology constraints, IEEE Trans. Netw. Sci. Eng., vol. 9, no. 1, pp. 55–66, 2022.

Crossref Google Scholar

[33]

Y. Arjevani and O. Shamir, Communication complexity of distributed convex learning and optimization, arXiv preprint arXiv: 1506.01900, 2015.

[34]

A. Nedić, A. Olshevsky, and W. Shi, Achieving geometric convergence for distributed optimization over time-varying graphs, SIAM J. Optim., vol. 27, no. 4, pp. 2597–2633, 2017.

Crossref Google Scholar

[35]

A. Koloskova, S. Stich, and M. Jaggi, Decentralized stochastic optimization and gossip algorithms with compressed communication, in Proc. Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 3478–3487.

[36]

H. Gao, M. T. Thai, and J. Wu, When decentralized optimization meets federated learning, IEEE Netw., vol. 37, no. 5, pp. 233–239, 2023.

Crossref Google Scholar

[37]

K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečný, S. Mazzocchi, H. B. McMahan, et al., Towards federated learning at scale: System design, arXiv preprint arXiv: 1902.01046, 2019.

[38]

T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, Federated learning: Challenges, methods, and future directions, IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, 2020.

Crossref Google Scholar

[39]

L. Zhu, H. Lin, Y. Lu, Y. Lin, and S. Han, Delayed gradient averaging: Tolerate the communication latency for federated learning, presented at Neural Information Processing Systems (NeurIPS), Virtual, 2021.

[40]

A. B. Ozyurt and W. O. Popoola, LiFi-based D2D communication in industrial IoT, IEEE Syst. J., vol. 17, no. 1, pp. 1591–1598, 2023.

Crossref Google Scholar

[41]

B. L. Bars, A. Bellet, M. Tommasi, E. Lavoie, and A. Kermarrec, Refined convergence and topology learning for decentralized optimization with heterogeneous data, presented at Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with NeurIPS 2022), New Orleans, LA, USA, 2022.

[42]

Y. Hua, K. Miller, A. L. Bertozzi, C. Qian, and B. Wang, Efficient and reliable overlay networks for decentralized Federated learning, SIAM J. Appl. Math., vol. 82, no. 4, pp. 1558–1586, 2022.

Crossref Google Scholar

[43]

S. Li, T. Zhou, X. Tian, and D. Tao, Learning to collaborate in decentralized learning of personalized models, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 9756–9765.

Crossref

[44]

J. Edmonds, Maximum matching and a polyhedron with 0, 1-vertices, J. Res. Natl. Bur. Stand. Sect. B Math. Math. Phys., vol. 69B, nos. 1&2, pp. 125–130, 1965.

Crossref Google Scholar

Intelligent and Converged Networks

Volume 5 Issue 2,
June 2024

Pages 100-115

DOI: 10.23919/ICN.2024.0008

Cite this article:

Duan Y, Li X, Wu J. Topology design and graph embedding for decentralized federated learning. Intelligent and Converged Networks, 2024, 5(2): 100-115. https://doi.org/10.23919/ICN.2024.0008