Weight asynchronous update: Improving the diversity of filters in a deep convolutional network

Dejun Zhang; Linchao He; Mengting Luo; Zhanya Xu; Fazhi He

doi:10.1007/s41095-020-0185-5

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (1.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

Weight asynchronous update: Improving the diversity of filters in a deep convolutional network

Dejun Zhang^¹, Linchao He^², Mengting Luo^², Zhanya Xu^¹(

), Fazhi He^³

1 School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China

2 College of Information and Engineering, Sichuan Agricultural University, Yaan 625014, China

3 School of Computer, Wuhan University, Wuhan 430072, China

Show Author Information

Abstract

Deep convolutional networks have obtained remarkable achievements on various visual tasks due to their strong ability to learn a variety of features. A well-trained deep convolutional network can be compressed to 20%-40% of its original size by removing filters that make little contribution, as many overlapping features are generated by redundant filters. Model compression can reduce the number of unnecessary filters but does not take advantage of redundant filters since the training phase is not affected. Modern networks with residual, dense connections and inception blocks are considered to be able to mitigate the overlap in convolutional filters, but do not necessarily overcome the issue. To do so, we propose a new training strategy, weight asynchronous update, which helps to significantly increase the diversity of filters and enhance the representation ability of the network. The proposed method can be widely applied to different convolutional networks without changing the network topology. Our experiments show that the stochastic subset of filters updated in different iterations can significantly reduce filter overlap in convolutional networks. Extensive experiments show that our method yields noteworthy improvements in neural network performance.

Keywords

deep convolutional network model com-pression convolutional filter

References

[1]

A. Krizhevsky,; I. Sutskever,; G. E. Hinton, ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1, 1097-1105, 2012.

[2]

D. J. Zhang,; L. C. He,; Z. G. Tu,; S. F. Zhang,; F. Han,; B. X. Yang, Learning motion representation for real-time spatio-temporal action localization. Pattern Recognition Vol. 103, 107312, 2020.

Google Scholar

[3]

J. Li,; K. Xu,; S. Chaudhuri,; E. Yumer,; H. Zhang,; L. Guibas, Grass: Generative recursive autoencoders for shape structures. ACM Transactions on Graphics Vol. 36, No. 4, 1-14, 2017.

Google Scholar

[4]

J. Li,; B. M. Chen,; G. H. Lee, SO-Net: Self-organizing network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9397-9406, 2018.

[5]

D. J. Zhang,; F. Z. He,; Z. G. Tu,; L. Zou,; Y. L. Chen, Pointwise geometric and semantic learning network on 3D point clouds. Integrated Computer-Aided Engineering Vol. 27, No. 1, 57-75, 2019.

Google Scholar

[6]

L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE Vol. 77, No. 2, 257-286, 1989.

Google Scholar

[7]

G. Hinton,; L. Deng,; D. Yu,; G. Dahl,; A. R. Mohamed,; N. Jaitly,; A. Senior,; V. Vanhoucke,; P. Nguyen.; T. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine Vol. 29, No. 6, 82-97, 2012.

Google Scholar

[8]

D. J. Zhang,; M. B. Hong,; L. Zou,; F. Han,; F. Z. He,; Z. G. Tu,; Y. F. Ren, Attention pooling-based bidirectional gated recurrent units model for sentimental classification. International Journal of Computational Intelligence Systems Vol. 12, No. 2, 723-732, 2019.

Google Scholar

[9]

D. J. Zhang,; M. T. Luo,; F. Z. He, Reconstructed similarity for faster GANs-based word translation to mitigate hubness. Neurocomputing Vol. 362, 83-93, 2019.

Google Scholar

[10]

Y. T. Pan,; F. Z. He,; H. P. Yu, A novel enhanced collaborative autoencoder with knowledge distillation for top-N recommender systems. Neurocomputing Vol. 332, 137-148, 2019.

Google Scholar

[11]

G. Huang,; Z. Liu,; L. V. D. Maaten,; K. Q. Weinberger, Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4700-4708, 2017.

[12]

S. Xie,; R. Girshick,; P. Doll,; Z. Tu,; K. He, Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1492-1500, 2017.

[13]

K. M. He,; X. Y. Zhang,; S. Q. Ren,; J. Sun, Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.

[14]

C. Szegedy,; W. Liu,; Y. Jia,; P. Sermanet,; S. Reed,; D. Anguelov,; D. Erhan,; V. Vanhoucke,; A. Rabinovich. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-9, 2015.

[15]

S. Han,; J. Pool,; S. R. Narang,; H. Z. Mao,; E. H. Gong,; S. J. Tang,; E. Elsen,; P. Vajda,; M. Paluri,; J. Tran, et al. DSD: Dense-sparse-dense training for deep neural networks. arXiv preprint arXiv:1607.04381, 2016.

[16]

H. Y. Hu,; R. Peng,; Y. W. Tai,; C. K. Tang, Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.

[17]

P. Molchanov,; S. Tyree,; T. Karras,; T. Aila,; J. Kautz, Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440, 2016.

[18]

A. Prakash,; J. Storer,; D. Florencio,; C. Zhang, Repr: Improved training of convolutional filters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 10666-10675, 2019.

[19]

N. Srivastava,; G. Hinton,; A. Krizhevsky,; I. Sutskever,; R. Salakhutdinov, Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, No. 56, 1929-1958, 2014.

Google Scholar

[20]

X. Li,; S. Chen,; X. Hu,; J. Yang. Understanding the disharmony between dropout and batch normalization by variance shift. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2682-2690, 2019.

[21]

X. Gastaldi, Shake-shake regularization of 3-branch residual networks. In: Proceedings of the 5th International Conference on Learning Representations, 2017.

[22]

Y. Yamada,; M. Iwamura,; T. Akiba,; K. Kise, Shakedrop regularization for deep residual learning. IEEE Access Vol. 7, 186126-186136, 2019.

Google Scholar

[23]

Y. X. Li,; J. Yosinski,; J. Clune,; H. Lipson,; J. Hopcroft, Convergent Learning: Do different neural networks learn the same representations? arXiv preprint arXiv:1511.07543, 2015.

[24]

P. E. Latham, Associative memory in realistic neuronal networks. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, 237-244, 2001.

[25]

J. D. Norton, Science and certainty. Synthese Vol. 99, No. 1, 3-22, 1994.

Google Scholar

[26]

J. C. Duchi,; E. Hazan,; Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research Vol. 12, 2121-2159, 2011.

Google Scholar

[27]

D. P. Kingma,; J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[28]

A. Krizhevsky, Learning multiple layers of features from tiny images. Master Thesis. University of Tront, 2009.

[29]

K. Simonyan,; A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[30]

K. M. He,; X. Y. Zhang,; S. Q. Ren,; J. Sun, Identity mappings in deep residual networks. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. B. Leibe,; J. Matas,; N. Sebe,; M. Welling, Eds. Springer Cham, 630-645, 2016.

[31]

S. Zagoruyko,; N. Komodakis, Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.

[32]

S. Q. Ren,; K. M. He,; R. Girshick,; J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137-1149, 2017.

Google Scholar

[33]

T. Y. Lin,; M. Maire,; S. Belongie,; J. Hays,; P. Perona,; D. Ramanan,; P. Dollár,; C. L. Zitnick, Microsoft COCO: Common objects in context. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. D. Fleet,; T. Pajdla,; B. Schiele,; T. Tuytelaars, Eds. Springer Cham, 740-755, 2014.

[34]

T. Furlanello,; Z. Lipton,; M. Tschannen,; L. Itti,; A. Anandkumar, Born-again neural networks. In: Proceedings of the 35th International Conference on Machine Learning, 1602-1611, 2018.

[35]

S. Ioffe,; C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, 448-456, 2015.

Computational Visual Media

Volume 6 Issue 4,
December 2020

Pages 455-466

DOI: 10.1007/s41095-020-0185-5

Cite this article:

Zhang D, He L, Luo M, et al. Weight asynchronous update: Improving the diversity of filters in a deep convolutional network. Computational Visual Media, 2020, 6(4): 455-466. https://doi.org/10.1007/s41095-020-0185-5

652

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 06 April 2020

Accepted: 16 June 2020

Published: 17 October 2020

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduc-tion in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.