AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (4.6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

GAAF: Searching Activation Functions for Binary Neural Networks Through Genetic Algorithm

Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou 310027, China
Pacific Northwest National Laboratory, Richland, WA 99354, USA
Show Author Information

Abstract

Binary neural networks (BNNs) show promising utilization in cost and power-restricted domains such as edge devices and mobile systems. This is due to its significantly less computation and storage demand, but at the cost of degraded performance. To close the accuracy gap, in this paper we propose to add a complementary activation function (AF) ahead of the sign based binarization, and rely on the genetic algorithm (GA) to automatically search for the ideal AFs. These AFs can help extract extra information from the input data in the forward pass, while allowing improved gradient approximation in the backward pass. Fifteen novel AFs are identified through our GA-based search, while most of them show improved performance (up to 2.54% on ImageNet) when testing on different datasets and network models. Interestingly, periodic functions are identified as a key component for most of the discovered AFs, which rarely exist in human designed AFs. Our method offers a novel approach for designing general and application-specific BNN architecture. GAAF will be released on GitHub.

References

[1]
M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or –1, arXiv preprint arXiv: 1602.02830, 2016.
[2]
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio, Binarized neural networks, in Proc. 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 41144122.
[3]
Fasfous N., Vemparala M. -R., Frickenstein A., Frickenstein L., and Stechele W., BinaryCop: Binary neural network-based COVID-19 face-mask wear and positioning predictor on edge devices, arXiv preprint arXiv: 2102.03456, 2021.
[4]
G. Chen, H. Meng, Y. Liang, and K. Huang, GPU-accelerated real-time stereo estimation with binary neural network, IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 12, pp. 28962907, 2020.
[5]
C. -H. Huang, An FPGA-based hardware/software design using binarized neural networks for agricultural applications: A case study, IEEE Access, vol. 9, pp. 2652326531.
[6]
Y. Ma, H. Xiong, Z. Hu, and L. Ma, Efficient super resolution using binarized neural network, in Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 2019, pp. 694703, 2019.
[7]
C. Ma, Y. Guo, Y. Lei, and W. An, Binary volumetric convolutional neural networks for 3-D object recognition, IEEE Transactions on Instrumentation and Measurement, vol. 68, no. 1, pp. 3848, 2018.
[8]
A. Li, T. Geng, T. Wang, M. Herbordt, S. L. Song, and K. Barker, BSTC: A novel binarized-soft-tensor-core design for accelerating bitbased approximated neural nets, in Proc. International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, 2019, pp. 130.
[9]
A. Galloway, G. W. Taylor, and M. Moussa, Attacking binarized neural networks, arXiv preprint arXiv:1711.00449, 2018.
[10]
Y. Hu, J. Zhai, D. Li, Y. Gong, Y. Zhu, W. Liu, L. Su, and J. Jin, BitFlow: Exploiting vector parallelism for binary neural networks on CPU, in Proc. 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Vancouver, Canada, 2018, pp. 244253.
[11]
A. Li and S. Su, Accelerating binarized neural networks via bit-tensor-cores in turing GPUs, IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 7, pp. 18781891, 2020.
[12]
T. Geng, T. Wang, C. Wu, C. Yang, W. Wu, A. Li, and M. C. Herbordt, O3BNN: An out-of-order architecture for high-performance binarized neural network inference with fine-grained pruning, in Proc. ACM International Conference on Supercomputing, Phoenix, AZ, USA, 2019, pp. 461472.
[13]
T. Geng, T. Wang, C. Wu, C. Yang, S. L. Song, A. Li, and M. Herbordt, LP-BNN: Ultra-low-latency BNN inference with layer parallelism, in Proc. 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), New York, NY, USA, 2019, pp. 916.
[14]
T. Geng, A. Li, T. Wang, C. Wu, Y. Li, R. Shi, W. Wu, and M. Herbordt, O3BNN-R: An out-of-order architecture for high-performance and regularized BNN inference, IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 1, pp. 199213, 2020.
[15]
A. G. Anderson and C. P. Berg, The high-dimensional geometry of binary neural networks, arXiv preprint arXiv: 1705.07199, 2017.
[16]
J. Bethge, C. Bartz, H. Yang, Y. Chen, and C. Meinel, MeliusNet: An improved network architecture for binary neural networks, in Proc. 2021 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 2021, pp. 14381447.
[17]
Y. Bengio, N. Leonard, and A. Courville, Estimating or propagating gradients through stochastic neurons for conditional computation, arXiv preprint arXiv: 1308.3432, 2013.
[18]
Z. Liu, B. Wu, W. Luo, X. Yang, W. Liu, and K. -T. Cheng, Bi-real net: Enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm, in Proc. European Conference on Computer Vision (ECCV), Munich, Germany, 2018, pp. 747763.
[19]
S. Darabi, M. Belbahri, M. Courbariaux, and V. P. Nia, BNN+: Improved binary network training, arXiv preprint arXiv: 1812.11800, 2018.
[20]
C. Liu, W. Ding, X. Xia, B. Zhang, J. Gu, J. Liu, R. Ji, and D. Doermann, Circulant binary convolutional networks: Enhancing the performance of 1-bit DCNNs with circulant back propagation, in Proc. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019, pp. 26862694.
[21]
H. Qin, R. Gong, X. Liu, M. Shen, Z. Wei, F. Yu, and J. Song, Forward and backward information retention for accurate binary neural networks, in Proc. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 22502259.
[22]
F. Lahoud, R. Achanta, P. Márquez-Neila, and S. Süsstrunk, Self-binarizing networks, arXiv preprint arXiv: 1902.00730, 2019.
[23]
Z. Liu, Z. Shen, M. Savvides, and K. -T. Cheng, ReActNet: Towards precise binary neural network with generalized activation functions, in Proc. European Conference on Computer Vision, Glasgow, UK, 2020, pp. 143159.
[24]
P. Ramachandran, B. Zoph, and Q. V. Le, Searching for activation functions, arXiv preprint arXiv: 1710.05941, 2017.
[25]
H. Liu, A. Brock, K. Simonyan, and Q. V. Le, Evolving normalization-activation layers, arXiv preprint arXiv: 2004.02967, 2020.
[26]
H. Liu, K. Simonyan, and Y. Yang, DARTS: Differentiable architecture search, arXiv preprint arXiv: 1806.09055, 2018.
[27]
S. Han, H. Mao, and W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding, arXiv preprint arXiv: 1510.00149, 2015.
[28]
D. Blalock, J. J. G. Ortiz, J. Frankle, and J. Guttag, What is the state of neural network pruning? arXiv preprint arXiv: 2003.03033, 2020.
[29]
S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proc. 32nd International Conference on Machine Learning, Lile, France, 2015, pp. 448456.
[30]
D. E. Goldberg and J. H. Holland, Genetic algorithms and machine learning, Machine Learning, vol. 3, pp. 9599, 1988.
[31]
S. Katoch, S. S. Chauhan, and V. Kumar, A review on genetic algorithm: Past, present, and future, Multimedia Tools and Applications, vol. 80, pp. 80918126, 2020.
[32]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv: 1412.6980, 2014.
[33]
W. Tang, G. Hua, and L. Wang, How to train a compact binary neural network with high accuracy? in Proc. Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 2017, pp. 26252631.
[34]
S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, and Y. Zou, DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv preprint arXiv: 1606.06160, 2016.
[35]
X. Lin, C. Zhao, and W. Pan, Towards accurate binary convolutional neural network, in Proc. Advances in Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 345353.
[36]
M. Ghasemzadeh, M. Samragh, and F. Koushanfar, ReBNet: Residual binarized neural network, in Proc. 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Boulder, CO, USA, 2018, pp. 5764.
[37]
J. Bethge, H. Yang, M. Bornstein, and C. Meinel, BinaryDenseNet: Developing an architecture for binary neural networks, in Proc. IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 2019, pp. 19511960.
[38]
M. Alizadeh, J. Fernández-Marqués, N. D. Lane, and Y. Gal, An empirical study of binary neural networks’ optimization, presented at 7th International Conference on Learning Representations, New Orleans, LA, USA, 2018.
[39]
Hou L., Yao Q., and Kwok J. T., Loss-aware binarization of   deep networks, arXiv preprint arXiv: 1611.01600, 2016.
[40]
K. Helwegen, J. Widdicombe, L. Geiger, Z. Liu, K. -T. Cheng, and R. Nusselder, Latent weights do not exist: Rethinking binarized neural network optimization, in Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 75317542.
[41]
T. Simons and D. -J. Lee, A review of binarized neural networks, Electronics, vol. 8, no. 6, p. 661, 2019.
[42]
H. Qin, R. Gong, X. Liu, X. Bai, J. Song, and N. Sebe, Binary neural networks: A survey, Pattern Recognition, vol. 105, p. 107281, 2020.
[43]
K. Swersky, D. Duvenaud, J. Snoek, F. Hutter, and M. A. Osborne, Raiders of the lost architecture: Kernels for bayesian optimization in conditional parameter spaces, arXiv preprint arXiv: 1409.4011, 2014.
[44]
K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, and E. Xing, Neural architecture search with Bayesian optimisation and optimal transport, arXiv preprint arXiv: 1802.07191, 2018.
[45]
E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin, Large-scale evolution of image classifiers, in Proc. 34th International Conference on Machine Learning, Sydney, Australia, 2017, pp. 29022911.
[46]
T. Elsken, J. H. Metzen, and F. Hutter, Efficient multi-objective neural architecture search via Lamarckian evolution, arXiv preprint arXiv: 1804.09081, 2018.
[47]
E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, Aging evolution for image classifier architecture search, presented at AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 2019.
[48]
B. Baker, O. Gupta, N. Naik, and R. Raskar, Designing neural network architectures using reinforcement learning, arXiv preprint arXiv: 1611.02167, 2016.
[49]
Z. Zhong, J. Yan, W. Wu, J. Shao, and C. -L. Liu, Practical block-wise neural network architecture generation, in Proc. 2018 IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 24232432.
[50]
B. Zoph and Q. V. Le, Neural architecture search with reinforcement learning, arXiv preprint arXiv: 1611.01578, 2016.
[51]
S. Xie, H. Zheng, C. Liu, and L. Lin, SNAS: Stochastic neural architecture search, arXiv preprint arXiv: 1812.09926, 2018.
[52]
H. Cai, L. Zhu, and S. Han, ProxylessNAS: Direct neural architecture search on target task and hardware, arXiv preprint arXiv: 1812.00332, 2018.
Tsinghua Science and Technology
Pages 207-220
Cite this article:
Li Y, Geng T, Stein S, et al. GAAF: Searching Activation Functions for Binary Neural Networks Through Genetic Algorithm. Tsinghua Science and Technology, 2023, 28(1): 207-220. https://doi.org/10.26599/TST.2021.9010084

722

Views

39

Downloads

7

Crossref

2

Web of Science

8

Scopus

0

CSCD

Altmetrics

Received: 15 October 2021
Accepted: 01 November 2021
Published: 21 July 2022
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return