PDF (6 MB)
Collect
Submit Manuscript
Open Access

Topology design and graph embedding for decentralized federated learning

Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA
Show Author Information

Abstract

Federated learning has been widely employed in many applications to protect the data privacy of participating clients. Although the dataset is decentralized among training devices in federated learning, the model parameters are usually stored in a centralized manner. Centralized federated learning is easy to implement; however, a centralized scheme causes a communication bottleneck at the central server, which may significantly slow down the training process. To improve training efficiency, we investigate the decentralized federated learning scheme. The decentralized scheme has become feasible with the rapid development of device-to-device communication techniques under 5G. Nevertheless, the convergence rate of learning models in the decentralized scheme depends on the network topology design. We propose optimizing the topology design to improve training efficiency for decentralized federated learning, which is a non-trivial problem, especially when considering data heterogeneity. In this paper, we first demonstrate the advantage of hypercube topology and present a hypercube graph construction method to reduce data heterogeneity by carefully selecting neighbors of each training device—a process that resembles classic graph embedding. In addition, we propose a heuristic method for generating torus graphs. Moreover, we have explored the communication patterns in hypercube topology and propose a sequential synchronization scheme to reduce communication cost during training. A batch synchronization scheme is presented to fine-tune the communication pattern for hypercube topology. Experiments on real-world datasets show that our proposed graph construction methods can accelerate the training process, and our sequential synchronization scheme can significantly reduce the overall communication traffic during training.

References

[1]
J. Konečný, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon, Federated learning: Strategies for improving communication efficiency, arXiv preprint arXiv: 1610.05492, 2016.
[2]
X. Lian, C. Zhang, H. Zhang, C. J. Hsieh, W. Zhang, and J. Liu, Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent, arXiv preprint arXiv: 1705.09056, 2017.
[3]

R. I. Ansari, C. Chrysostomou, S. A. Hassan, M. Guizani, S. Mumtaz, J. Rodriguez, and J. J. P. C. Rodrigues, 5G D2D networks: Techniques, challenges, and future prospects, IEEE Syst. J., vol. 12, no. 4, pp. 3970–3984, 2018.

[4]

I. Hegedűs, G. Danner, and M. Jelasity, Decentralized learning works: An empirical comparison of gossip learning and federated learning, J. Parallel Distrib. Comput., vol. 148, pp. 109–124, 2021.

[5]
K. Seaman, F. Bach, S. Bubeck, Y. T. Lee, and L. Massoulié, Optimal algorithms for smooth and strongly convex distributed optimization in networks, in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 3027–3036.
[6]
A. Koloskova, T. Lin, S. U. Stich, and M. Jaggi, Decentralized deep learning with arbitrary communication compression, in Proc. Int. Conf. Learning Representations (ICLR 2020), Addis Ababa, Ethiopia, 2020, pp. 1–23.
[7]
G. Neglia, C. Xu, D. Towsley, and G. Calbi, Decentralized gradient methods: Does topology matter? in Proc. Int. Conf. Artificial Intelligence and Statistics, Palermo, Italy 2020, pp. 2348–2358.
[8]
S. Wang, M. Lee, S. Hosseinalipour, R. Morabito, M. Chiang, and C. G. Brinton, Device sampling for heterogeneous federated learning: Theory, algorithms, and implementation, in Proc. IEEE INFOCOM 2021—IEEE Conf. Computer Communications, Vancouver, Canada, 2021, pp. 1–10.
[9]
F. T. Leighton, Introduction to Parallel Algorithms and Architectures : Arrays, Trees, Hypercubes. San Mateo, CA, USA: Morgan Kaufmann Publishers, Inc., 1992.
[10]
A. Krizhevsky and G. Hinton, Learning multiple layers of features from tiny images, Technical report, Department of Computer Science, University of Toronto, Toronto, Canada, 2009.
[11]

S. Duan, D. Wang, J. Ren, F. Lyu, Y. Zhang, H. Wu, and X. Shen, Distributed artificial intelligence empowered by endedge- cloud computing: A survey, IEEE Communications Surveys & Tutorials, vol. 25, no. 1, pp. 591–624, 2023.

[12]
M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. G. Andersen, and A. Smola, Parameter server for distributed machine learning, presented at Big Learning 2013: NIPS 2013 Workshop on Big Learning, Lake Tahoe, NV, USA, 2013.
[13]
Q. Ho, J. Cipar, H. Cui, J. K. Kim, S. Lee, P. B. Gibbons, G. A. Gibson, G. R. Ganger, and E. P. Xing, More effective distributed ML via a Stale synchronous parallel parameter server, in Proc. 26th Int. Conf. Neural Information Processing Systems, Lake Tahoe, NV, USA, 2013, pp. 1223–1231.
[14]
M. Li, D. G. Andersen, A. Smola, and K. Yu, Communication efficient distributed machine learning with the parameter server, in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 19–27.
[15]

O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao, Optimal distributed online prediction using mini-batches, J. Mach. Learn. Res., vol. 13, no. 1, pp. 165–202, 2012.

[16]

V. Smith, S. Forte, C. Ma, M. Takáč, M. I. Jordan, and M. Jaggi, CoCoa: A general framework for communication-efficient distributed optimization, J. Mach. Learn. Res., vol. 18, p. 230, 2018.

[17]
J. Wang and G. Joshi, Adaptive communication strategies to achieve the best error-runtime trade-off in local-update SGD, arXiv preprint arXiv: 1810.08313, 2018.
[18]
A. Reisizadeh, A. Mokhtari, H. Hassani, A. Jadbabaie, and R. Pedarsani, FedPAQ: A communication-efficient federated learning method with periodic averaging and quantization, arXiv preprint arXiv: 1909.13014, 2019.
[19]
E. Ozfatura, K. Ozfatura, and D. Gunduz, Time-correlated sparsification for communication-efficient federated learning, in Proc. IEEE Int. Symp. on Information Theory (ISIT), Melbourne, Australia, 2021, pp. 461–466.
[20]
C. Chen, H. Xu, W. Wang, B. Li, B. Li, L. Chen, and G. Zhang, Communication-efficient federated learning with adaptive parameter freezing, in Proc. IEEE 41st Int. Conf. Distributed Computing Systems (ICDCS), Washington, DC, USA, 2021, pp. 1–11.
[21]
J. Wang and G. Joshi, Cooperative SGD: A unified framework for the design and analysis of communication-efficient SGD algorithms, arXiv preprint arXiv: 1808.07576, 2018.
[22]
S. U. Stich, Local SGD converges fast and communicates little, arXiv preprint arXiv: 1805.09767, 2018.
[23]
H. Sun, Y. Shao, J. Jiang, B. Cui, K. Lei, Y. Xu, and J. Wang, Sparse gradient compression for distributed SGD, in Database Systems for Advanced Applications, G. Li, J. Yang, J. Gama, J. Natwichai, and Y. Tong, eds. Cham, Switzerland: Springer, 2019, pp. 139–155.
[24]
N. Ivkin, D. Rothchild, E. Ullah, V. Braverman, I. Stoica, and R. Arora, Communication-efficient distributed SGD with Sketching, arXiv preprint arXiv: 1903.04488, 2019.
[25]
Y. Li, J. Park, M. Alian, Y. Yuan, Z. Qu, P. Pan, R. Wang, A. Schwing, H. Esmaeilzadeh, and N. S. Kim, A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networks, in Proc. 51st Annual IEEE/ACM Int. Symp. on Microarchitecture (MICRO), Fukuoka, Japan, 2018, pp. 175–188.
[26]
Y. Yu, J. Wu, and J. Huang, Exploring fast and communication-efficient algorithms in large-scale distributed networks, arXiv preprint arXiv: 1901.08924, 2019.
[27]
H. Wang, S. Sievert, Z. Charles, S. Liu, S. Wright, and D. Papailiopoulos, ATOMO: Communication-efficient learning via atomic sparsification, arXiv preprint arXiv: 1806.04090, 2018.
[28]
T. Vogels, S. P. Karimireddy, and M. Jaggi, PowerSGD: Practical low-rank gradient compression for distributed optimization, arXiv preprint arXiv: 1905.13727, 2019.
[29]
M. Cho, V. Muthusamy, B. Nemanich, and R. Puri, GradZip: Gradient compression using alternating matrix factorization for large-scale deep learning, presented at Neural Information Processing Systems (NeurIPS), Vancouver, Canada, 2019.
[30]
S. P. Karimireddy, Q. Rebjock, S. U. Stich, and M. Jaggi, Error feedback fixes SignSGD and other gradient compression schemes, arXiv preprint arXiv: 1901.09847, 2019.
[31]
C. Xie, S. Zheng, O. Koyejo, I. Gupta, M. Li, and H. Lin, CSER: Communication-efficient SGD with error reset, arXiv preprint arXiv: 2007.13221v2, 2020.
[32]

N. Mhaisen, A. A. Abdellatif, A. Mohamed, A. Erbad, and M. Guizani, Optimal user-edge assignment in hierarchical federated learning based on statistical properties and network topology constraints, IEEE Trans. Netw. Sci. Eng., vol. 9, no. 1, pp. 55–66, 2022.

[33]
Y. Arjevani and O. Shamir, Communication complexity of distributed convex learning and optimization, arXiv preprint arXiv: 1506.01900, 2015.
[34]

A. Nedić, A. Olshevsky, and W. Shi, Achieving geometric convergence for distributed optimization over time-varying graphs, SIAM J. Optim., vol. 27, no. 4, pp. 2597–2633, 2017.

[35]
A. Koloskova, S. Stich, and M. Jaggi, Decentralized stochastic optimization and gossip algorithms with compressed communication, in Proc. Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 3478–3487.
[36]

H. Gao, M. T. Thai, and J. Wu, When decentralized optimization meets federated learning, IEEE Netw., vol. 37, no. 5, pp. 233–239, 2023.

[37]
K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečný, S. Mazzocchi, H. B. McMahan, et al., Towards federated learning at scale: System design, arXiv preprint arXiv: 1902.01046, 2019.
[38]

T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, Federated learning: Challenges, methods, and future directions, IEEE Signal Process. Mag., vol. 37, no. 3, pp. 50–60, 2020.

[39]
L. Zhu, H. Lin, Y. Lu, Y. Lin, and S. Han, Delayed gradient averaging: Tolerate the communication latency for federated learning, presented at Neural Information Processing Systems (NeurIPS), Virtual, 2021.
[40]

A. B. Ozyurt and W. O. Popoola, LiFi-based D2D communication in industrial IoT, IEEE Syst. J., vol. 17, no. 1, pp. 1591–1598, 2023.

[41]
B. L. Bars, A. Bellet, M. Tommasi, E. Lavoie, and A. Kermarrec, Refined convergence and topology learning for decentralized optimization with heterogeneous data, presented at Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with NeurIPS 2022), New Orleans, LA, USA, 2022.
[42]

Y. Hua, K. Miller, A. L. Bertozzi, C. Qian, and B. Wang, Efficient and reliable overlay networks for decentralized Federated learning, SIAM J. Appl. Math., vol. 82, no. 4, pp. 1558–1586, 2022.

[43]
S. Li, T. Zhou, X. Tian, and D. Tao, Learning to collaborate in decentralized learning of personalized models, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 9756–9765.
[44]

J. Edmonds, Maximum matching and a polyhedron with 0, 1-vertices, J. Res. Natl. Bur. Stand. Sect. B Math. Math. Phys., vol. 69B, nos. 1&2, pp. 125–130, 1965.

Intelligent and Converged Networks
Pages 100-115
Cite this article:
Duan Y, Li X, Wu J. Topology design and graph embedding for decentralized federated learning. Intelligent and Converged Networks, 2024, 5(2): 100-115. https://doi.org/10.23919/ICN.2024.0008
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return