AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (2.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

SIGNGD with Error Feedback Meets Lazily Aggregated Technique: Communication-Efficient Algorithms for Distributed Learning

National Laboratory for Parallel and Distributed Processing (PDL), College of Computer, National University of Defense Technology, Changsha 410073, China
Show Author Information

Abstract

The proliferation of massive datasets has led to significant interests in distributed algorithms for solving large-scale machine learning problems. However, the communication overhead is a major bottleneck that hampers the scalability of distributed machine learning systems. In this paper, we design two communication-efficient algorithms for distributed learning tasks. The first one is named EF-SIGNGD, in which we use the 1-bit (sign-based) gradient quantization method to save the communication bits. Moreover, the error feedback technique, i.e., incorporating the error made by the compression operator into the next step, is employed for the convergence guarantee. The second algorithm is called LE-SIGNGD, in which we introduce a well-designed lazy gradient aggregation rule to EF-SIGNGD that can detect the gradients with small changes and reuse the outdated information. LE-SIGNGD saves communication costs both in transmitted bits and communication rounds. Furthermore, we show that LE-SIGNGD is convergent under some mild assumptions. The effectiveness of the two proposed algorithms is demonstrated through experiments on both real and synthetic data.

References

[1]
A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola, Distributed large-scale natural graph factorization, in Proc. 22nd Int. Conf. World Wide Web, Rio de Janeiro, Brazil, 2013, pp. 37-48.
[2]
J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, et al., Large scale distributed deep networks, in Proc. 25th Int. Conf. Neural Information Processing Systems, Red Hook, NY, USA, 2012, pp. 1223-1231.
[3]
M. Li, D. G. Andersen, A. Smola, and K. Yu, Communication efficient distributed machine learning with the parameter server, in Proc. 27th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2014, pp. 19-27.
[4]
D. S. Li, Z. Q. Lai, K. S. Ge, Y. M. Zhang, Z. N. Zhang, Q. L. Wang, and H. M. Wang, Hpdl: Towards a general framework for high-performance distributed deep learning, presented at 2019 IEEE 39th Int. Conf. Distributed Computing Systems (ICDCS), Dallas, TX, USA, 2019, pp. 1742-1753.
[5]
K. M. Nan, S. C. Liu, J. Z. Du, and H. Liu, Deep model compression for mobile platforms: A survey, Tsinghua Science and Technology, vol. 24, no. 6, pp. 677-693, 2019.
[6]
J. Q. Huang, W. T. Han, X. Y. Wang, and W. G. Chen, Heterogeneous parallel algorithm design and performance optimization for WENO on the Sunway Taihulight supercomputer, Tsinghua Science and Technology, vol. 25, no. 1, pp. 56-67, 2020.
[7]
L. Guan, T. Sun, L. B. Qiao, Z. H. Yang, D. S. Li, K. S. Ge, and X. C. Lu, An efficient parallel and distributed solution to nonconvex penalized linear SVMs, Frontiers of Information Technology & Electronic Engineering, vol. 21, no. 4, pp. 587-603, 2020.
[8]
A. Nedic and A. Ozdaglar, Distributed subgradient methods for multi-agent optimization, IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48-61, 2009.
[9]
G. B. Giannakis, Q. Ling, G. Mateos, I. D. Schizas, and H. Zhu, Decentralized learning for wireless communications and networking, in Splitting Methods in Communication, Imaging, Science, and Engineering, R. Glowinski, S. Osher, and W. Yin, eds. Cham, Germany: Springer, 2016, pp. 461-497.
[10]
M. I. Jordan, J. D. Lee, and Y. Yang, Communication-efficient distributed statistical inference, Journal of the American Statistical Association, vol. 114, no. 526, pp. 668-681, 2019.
[11]
A. Nedić, A. Olshevsky, and M. G. Rabbat, Network topology and communication-computation tradeoffs in decentralized optimization, Proceedings of the IEEE, vol. 106, no. 5, pp. 953-976, 2018.
[12]
S. Zheng, Z. Y. Huang, and J. T. Kwok, Communication-efficient distributed blockwise momentum SGD with error-feedback, arXiv preprint arXiv: 1905.10936, 2019.
[13]
Z. X. Guo and S. H. Zhang, Sparse deep nonnegative matrix factorization, Big Data Mining and Analytics, vol. 3, no. 1, pp. 13-28, 2020.
[14]
F. Ablayev, M. Ablayev, J. Z. Huang, K. Khadiev, N. Salikhova, and D. M. Wu, On quantum methods for machine learning problems part II: Quantum classification algorithms, Big Data Mining and Analytics, vol. 3, no. 1, pp. 56-67, 2020.
[15]
D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, QSGD: Communication-efficient SGD via gradient quantization and encoding, presented at Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 1709-1720.
[16]
J. Sun, T. Y. Chen, G. B. Giannakis, and Z. Y. Yang, Communication-efficient distributed learning via lazily aggregated quantized gradients, presented at Advances in Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 3365-3375.
[17]
F. Seide, H. Fu, J. Droppo, G. Li, and D. Yu, 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs, in Proc. 15th Annu. Conf. Int. Speech Communication Association, Singapore, 2014, pp. 1058-1062.
[18]
J. Bernstein, Y. X. Wang, K. Azizzadenesheli, and A. Anandkumar, signSGD: Compressed optimisation for non-convex problems, in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 560-569.
[19]
J. Bernstein, J. W. Zhao, K. Azizzadenesheli, and A. Anandkumar, signSGD with majority vote is communication efficient and fault tolerant, in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.
[20]
S. P. Karimireddy, Q. Rebjock, S. U. Stich, and M. Jaggi, Error feedback fixes signSGD and other gradient compression schemes, in Proc. 36th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 3252-3261.
[21]
O. Shamir, N. Srebro, and T. Zhang, Communication-efficient distributed optimization using an approximate newton-type method, in Proc. 31st Int. Conf. Machine Learning, Beijing, China, 2014, pp. 1000-1008.
[22]
Y. C. Zhang and X. Lin, Disco: Distributed optimization for self-concordant empirical loss, in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 362-370.
[23]
A. Mokhtari, Q. Ling, and A. Ribeiro, Network newton distributed optimization methods, IEEE Transactions on Signal Processing, vol. 65, no. 1, pp. 146-161, 2017.
[24]
B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Y. Arcas, Communication-efficient learning of deep networks from decentralized data, in Proc. 20th Int. Conf. Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 2017, pp. 1273-1282.
[25]
S. X. Zhang, A. E. Choromanska, and Y. LeCun, Deep learning with elastic averaging SGD, in Proc. 28th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2015, pp. 685-693.
[26]
T. Y. Chen, G. Giannakis, T. Sun, and W. T. Yin, Lag: Lazily aggregated gradient for communication-efficient distributed learning, in Proc. 32nd Int. Conf. Neural Information Processing Systems, Red Hook, NY, USA, 2018, pp. 5050-5065.
[27]
J. Y. Wang and G. Joshi, Cooperative SGD: A unified framework for the design and analysis of communication-efficient SGD algorithms, arXiv preprint arXiv: 1808.07576, 2018.
[28]
T. Y. Chen, Y. J. Sun, and W. T. Yin, LASG: Lazily aggregated stochastic gradients for communication-efficient distributed learning, arXiv preprint arXiv: 2002.11360, 2020.
[29]
Y. Dong, J. Chen, Y. H. Tang, J. J. Wu, H. Q. Wang, and E. Q. Zhou, Lazy scheduling based disk energy optimization method, Tsinghua Science and Technology, vol. 25, no. 2, pp. 203-216, 2020.
[30]
Y. Arjevani and O. Shamir, Communication complexity of distributed convex learning and optimization, in Proc. 28th Int. Conf. Neural Information Processing Systems, Cambridge, MA, USA, 2015, pp. 1756-1764.
[31]
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[32]
D. Davis and W. T. Yin, Convergence rate analysis of several splitting schemes, in Splitting Methods in Communication, Imaging, Science, and Engineering, R. Glowinski, S. Osher, and W. Yin, eds. Cham, Germany: Springer, 2016, pp. 115-163.
Tsinghua Science and Technology
Pages 174-185
Cite this article:
Deng X, Sun T, Liu F, et al. SIGNGD with Error Feedback Meets Lazily Aggregated Technique: Communication-Efficient Algorithms for Distributed Learning. Tsinghua Science and Technology, 2022, 27(1): 174-185. https://doi.org/10.26599/TST.2021.9010045

891

Views

53

Downloads

10

Crossref

10

Web of Science

10

Scopus

0

CSCD

Altmetrics

Received: 23 August 2020
Accepted: 24 September 2020
Published: 17 August 2021
© The author(s) 2022

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return