AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks

College of Software, Nankai University, Tianjin 300350, China
College of Computer Science, Nankai University, Tianjin 300350, China
State Key Laboratory of Processors, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China
Show Author Information

Abstract

Exploring the expected quantizing scheme with suitable mixed-precision policy is the key to compress deep neural networks (DNNs) in high efficiency and accuracy. This exploration implies heavy workloads for domain experts, and an automatic compression method is needed. However, the huge search space of the automatic method introduces plenty of computing budgets that make the automatic process challenging to be applied in real scenarios. In this paper, we propose an end-to-end framework named AutoQNN, for automatically quantizing different layers utilizing different schemes and bitwidths without any human labor. AutoQNN can seek desirable quantizing schemes and mixed-precision policies for mainstream DNN models efficiently by involving three techniques: quantizing scheme search (QSS), quantizing precision learning (QPL), and quantized architecture generation (QAG). QSS introduces five quantizing schemes and defines three new schemes as a candidate set for scheme search, and then uses the Differentiable Neural Architecture Search (DNAS) algorithm to seek the layer- or model-desired scheme from the set. QPL is the first method to learn mixed-precision policies by reparameterizing the bitwidths of quantizing schemes, to the best of our knowledge. QPL optimizes both classification loss and precision loss of DNNs efficiently and obtains the relatively optimal mixed-precision model within limited model size and memory footprint. QAG is designed to convert arbitrary architectures into corresponding quantized ones without manual intervention, to facilitate end-to-end neural network quantization. We have implemented AutoQNN and integrated it into Keras. Extensive experiments demonstrate that AutoQNN can consistently outperform state-of-the-art quantization. For 2-bit weight and activation of AlexNet and ResNet18, AutoQNN can achieve the accuracy results of 59.75% and 68.86%, respectively, and obtain accuracy improvements by up to 1.65% and 1.74%, respectively, compared with state-of-the-art methods. Especially, compared with the full-precision AlexNet and ResNet18, the 2-bit models only slightly incur accuracy degradation by 0.26% and 0.76%, respectively, which can fulfill practical application demands.

Electronic Supplementary Material

Download File(s)
JCST-2105-11632-Highlights.pdf (146.8 KB)

References

[1]

Williams S, Waterman A, Patterson D. Roofline: An insightful visual performance model for multicore architectures. Communications of the ACM, 2009, 52(4): 65–76. DOI: 10.1145/1498765.1498785.

[2]
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Binarized neural networks. In Proc. the 30th Int. Conf. Neural Information Processing Systems, Dec. 2016, pp.4114–4122. DOI: 10.5555/3157382.3157557.
[3]
Li F F, Liu B, Wang X X, Zhang B, Yan J C. Ternary weight networks. arXiv: 1605.04711, 2022. https://doi.org/10.48550/arXiv.1605.04711, Mar. 2024.
[4]
Zhou S C, Wu Y X, Ni Z K, Zhou X Y, Wen H, Zou Y H. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv: 1606.06160, 2018. https://doi.org/10.48550/arXiv.1606.06160, Mar. 2024.
[5]
Lin Z H, Courbariaux M, Memisevic R, Bengio Y. Neural networks with few multiplications. arXiv: 1510.03009, 2016. https://doi.org/10.48550/arXiv.1510.03009, Mar. 2024.
[6]

Gong C, Chen Y, Lu Y, Li T, Hao C, Chen D M. VecQ: Minimal loss DNN model compression with vectorized weight quantization. IEEE Trans. Computers, 2020, 70(5): 696–710. DOI: 10.1109/TC.2020.2995593.

[7]
Gong C, Li T, Lu Y, Hao C, Zhang X F, Chen D M, Chen Y. μL2Q: An ultra-low loss quantization method for DNN compression. In Proc. the 2019 International Joint Conference on Neural Networks, Jul. 2019. DOI: 10.1109/ijcnn.2019.8851699.
[8]
Dong Z, Yao Z W, Gholami A, Mahoney M, Keutzer K. HAWQ: Hessian aware quantization of neural networks with mixed-precision. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.293–302. DOI: 10.1109/iccv.2019.00038.
[9]
Dong Z, Yao Z W, Arfeen D, Gholami A, Mahoney M W, Keutzer K. HAWQ-V2: Hessian aware trace-weighted quantization of neural networks. In Proc. the 34th Conference on Neural Information Processing Systems, Dec. 2020.
[10]
Wang K, Liu Z J, Lin Y J, Lin J, Han S. HAQ: Hardware-aware automated quantization with mixed precision. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.8604–8612. DOI: 10.1109/cvpr.2019.00881.
[11]
Lin D D, Talathi S S, Annapureddy V S. Fixed point quantization of deep convolutional networks. In Proc. the 33rd International Conference on International Conference on Machine Learning, Jun. 2016, pp.2849–2858. DOI: 10.5555/3045390.3045690.
[12]
Wu B C, Wang Y H, Zhang P Z, Tian Y D, Vajda P, Keutzer K. Mixed precision quantization of convnets via differentiable neural architecture search. arXiv: 1812.00090, 2018. https://doi.org/10.48550/arXiv.1812.00090, Mar. 2024.
[13]
Lou Q, Liu L, Kim M, Jiang L. AutoQB: AutoML for network quantization and binarization on mobile devices. arXiv: 1902.05690v1, 2020. https://doi.org/10.48550/arXiv.1902.05690, Mar. 2024.
[14]
Zhu C Z, Han S, Mao H Z, Dally W J. Trained ternary quantization. arXiv: 1612.01064, 2017. https://doi.org/10.48550/arXiv.1612.01064, Mar. 2024.
[15]
Leng C, Dou Z S, Li H, Zhu S H, Jin R. Extremely low bit neural network: Squeeze the last bit out with ADMM. In Proc. the 32nd AAAI Conference on Artificial Intelligence, Feb. 2018, pp.3466–3473. DOI: 10.1609/aaai.v32i1.11713.
[16]
Phan H, Liu Z C, Huynh D, Savvides M, Cheng K T, Shen Z Q. Binarizing mobilenet via evolution-based searching. In Proc. the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.13417–13426. DOI: 10.1109/CVPR42600.2020.01343.
[17]
Courbariaux M, Bengio Y, David J P. BinaryConnect: Training deep neural networks with binary weights during propagations. arXiv: 1511.00363, 2016. https://doi.org/10.48550/arXiv.1511.00363, Mar. 2024.
[18]
Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.525–542. DOI: 10.1007/978-3-319-46493-0_32.
[19]
Alemdar H, Leroy V, Prost-Boucle A, Pétrot F. Ternary neural networks for resource-efficient AI applications. In Proc. the 2017 International Joint Conference on Neural Networks, May 2017, pp.2547–2554. DOI: 10.1109/ijcnn.2017.7966166.
[20]
Jin C R, Sun H M, Kimura S. Sparse ternary connect: Convolutional neural networks using ternarized weights with enhanced sparsity. In Proc. the 23rd Asia and South Pacific Design Automation Conference, Jan. 2018, pp.190–195. DOI: 10.1109/aspdac.2018.8297304.
[21]
Gysel P. Ristretto: Hardware-oriented approximation of convolutional neural networks. arXiv: 1605.06402, 2016. https://doi.org/10.48550/arXiv.1605.06402, Mar. 2024.
[22]
Chen Y, Zhang K, Gong C, Hao C, Zhang X F, Li T, Chen D M. T-DLA: An open-source deep learning accelerator for ternarized DNN models on embedded FPGA. In Proc. the 2019 IEEE Computer Society Annual Symposium on VLSI, Jul. 2019, pp.13–18. DOI: 10.1109/isvlsi.2019.00012.
[23]
Li Y H, Dong X, Wang W. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. arXiv: 1909.13144, 2020. https://doi.org/10.48550/arXiv.1909.13144, Mar. 2024.
[24]
Wang P S, Hu Q H, Zhang Y F, Zhang C J, Liu Y, Cheng J. Two-step quantization for low-bit neural networks. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4376–4384. DOI: 10.1109/cvpr.2018.00460.
[25]
Jung S, Son C, Lee S, Son J, Han J J, Kwak Y, Hwang S J, Choi C. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.4345–4354. DOI: 10.1109/CVPR.2019.00448.
[26]
Choi J, Wang Z, Venkataramani S, Chuang P I J, Srinivasan V, Gopalakrishnan K. PACT: Parameterized clipping activation for quantized neural networks. arXiv: 1805.06085, 2018. https://doi.org/10.48550/arXiv.1805.06085, Mar. 2024.
[27]
Miyashita D, Lee E H, Murmann B. Convolutional neural networks using logarithmic data representation. arXiv: 1603.01025, 2016. https://doi.org/10.48550/arXiv.1603.01025, Mar. 2024.
[28]
Zhou A J, Yao A B, Guo Y W, Xu L, Chen Y R. Incremental network quantization: Towards lossless CNNs with low-precision weights. arXiv: 1702.03044, 2017. https://doi.org/10.48550/arXiv.1702.03044, Mar. 2024.
[29]
Ghasemzadeh M, Samragh M, Koushanfar F. ReBNet: Residual binarized neural network. In Proc. the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, Apr. 29–May 1, 2018, pp.57–64. DOI: 10.1109/fccm.2018.00018.
[30]
Li Z F, Ni B B, Zhang W J, Yang X K, Gao W. Performance guaranteed network acceleration via high-order residual quantization. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2603–2611. DOI: 10.1109/iccv.2017.282.
[31]
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.770–778. DOI: 10.1109/cvpr.2016.90.
[32]

Li Z F, Ni B B, Yang X K, Zhang W J, Gao W. Residual quantization for low bit-width neural networks. IEEE Trans. Multimedia, 2023, 25: 214–227. DOI: 10.1109/TMM.2021.3124095.

[33]
Zhang D Q, Yang J L, Ye D Q Z, Hua G. LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Proc. the 15th European Conference on Computer Vision, Sept. 2018, pp.373–390. DOI: 10.1007/978-3-030-01237-3_23.
[34]
Lin X F, Zhao C, Pan W. Towards accurate binary convolutional neural network. In Proc. the 31st Conference on Neural Information Processing Systems, Dec. 2017, pp.345–353.
[35]
Lou Q, Guo F, Kim M, Liu L T, Jiang L. AutoQ: Automated kernel-wise neural network quantization. In Proc. the 8th Int. Conf. Learning Representations, Apr. 2020.
[36]
Yang H R, Duan L, Chen Y R, Li H. BSQ: Exploring bit-level sparsity for mixed-precision neural network quantization. In Proc. the 9th International Conference on Learning Representations, May 2021.
[37]
Maddison C J, Mnih A, Teh Y W. The concrete distribution: A continuous relaxation of discrete random variables. arXiv: 1611.00712, 2017. https://doi.org/10.48550/arXiv.1611.00712, Mar. 2024.
[38]
Jacob B, Kligys S, Chen B, Zhu M L, Tang M, Howard A, Adam H, Kalenichenko D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.2704–2713. DOI: 10.1109/cvpr.2018.00286.
[39]
Nair V, Hinton G E. Rectified linear units improve restricted Boltzmann machines. In Proc. the 27th Int. Conf. Machine Learning, Jun. 2010, pp.807–814.
[40]
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: A large-scale hierarchical image database. In Proc. the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2009, pp.248–255. DOI: 10.1109/cvpr.2009.5206848.
[41]
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th Int. Conf. Neural Information Processing Systems, Dec. 2012, pp.1097–1105. DOI: 10.5555/2999134.2999257.
[42]
Howard A G, Zhu M L, Chen B, Kalenichenko D, Wang W J, Weyand T, Andreetto M, Adam H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv: 1704.04861, 2017. https://doi.org/10.48550/arXiv.1704.04861, Mar. 2024.
[43]
Sandler M, Howard A, Zhu M L, Zhmoginov A, Chen L C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4510–4520. DOI: 10.1109/cvpr.2018.00474.
[44]
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.2818–2826. DOI: 10.1109/CVPR.2016.308.
[45]
Simon M, Rodner E, Denzler J. ImageNet pre-trained models with batch normalization. arXiv: 1612.01452, 2016. https://doi.org/10.48550/arXiv.1612.01452, Mar. 2024.
[46]
Gross S, Wilber M. Training and investigating residual nets. Facebook AI Research, 2016. https://torch.ch/blog/2016/02/04/resnets.html, Mar. 2024.
[47]
Gong R H, Liu X L, Jiang S H, Li T X, Hu P, Lin J Z, Yu F W, Yan J J. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27–Nov. 2, 2019, pp.4851–4860. DOI: 10.1109/iccv.2019.00495.
[48]
Yin P H, Zhang S, Lyu J C, Osher S, Qi Y Y, Xin J. Blended coarse gradient descent for full quantization of deep neural networks. arXiv: 1808.05240, 2019. https://doi.org/10.48550/arXiv.1808.05240, Mar. 2024.
[49]

Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735–1780. DOI: 10.1162/ neco.1997.9.8.1735.

[50]
He Q Y, Wen H, Zhou S C, Wu Y X, Yao C, Zhou X Y, Zou Y H. Effective quantization methods for recurrent neural networks. arXiv: 1611.10176, 2016. https://doi.org/10.48550/arXiv.1611.10176, Mar. 2024.
[51]

Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research, 2017, 18(1): 6869–6898.

[52]
Kapur S, Mishra A, Marr D. Low precision RNNs: Quantizing RNNs without losing accuracy. arXiv: 1710.07706, 2017. https://doi.org/10.48550/arXiv.1710.07706, Mar. 2024.
[53]

Zhou S C, Wang Y Z, Wen H, He Q Y, Zou Y H. Balanced quantization: An effective and efficient approach to quantized neural networks. Journal of Computer Science and Technology, 2017, 32(4): 667–682. DOI: 10.1007/s11390-017-1750-y.

[54]
Wang P Q, Xie X F, Deng L, Li G Q, Wang D S, Xie Y. HitNet: Hybrid ternary recurrent neural network. In Proc. the 32nd Conference on Neural Information Processing Systems, Dec. 2018, pp.602–612.
[55]
Taylor A, Marcus M, Santorini B. The Penn Treebank: An overview. In Treebanks, Abeillé A (ed.), Springer, 2003, pp.5–22. DOI: 10.1007/978-94-010-0201-1_1.
[56]
Kingma D P, Ba J. Adam: A method for stochastic optimization. In Proc. the 3rd International Conference on Learning Representations, May 2015.
[57]
Krizhevsky A, Hinton G. Learning multiple layers of features from tiny images. Technical Report TR-2009, University of Toronto, 2009. https://learning2hash.github.io/publications/cifar2009learning/, Mar. 2024.
[58]
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In Proc. the 3rd Int. Conf. Learning Representations, May 2015.
[59]
Lee C Y, Xie S N, Gallagher P W, Zhang Z Y, Tu Z W. Deeply-supervised nets. In Proc. the 18th Int. Conf. Artificial Intelligence and Statistics, May 2015, pp.562–570.
Journal of Computer Science and Technology
Pages 401-420
Cite this article:
Gong C, Lu Y, Dai S-R, et al. AutoQNN: An End-to-End Framework for Automatically Quantizing Neural Networks. Journal of Computer Science and Technology, 2024, 39(2): 401-420. https://doi.org/10.1007/s11390-022-1632-9

249

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 30 May 2021
Accepted: 26 December 2022
Published: 30 March 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return