A Survey of Approximate Computing: From Arithmetic Units Design to High-Level Applications

Hao-Hua Que; Yu Jin; Tong Wang; Ming-Kai Liu; Xing-Hua Yang; Fei Qiao

doi:10.1007/s11390-023-2537-y

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Survey

A Survey of Approximate Computing: From Arithmetic Units Design to High-Level Applications

Hao-Hua Que^¹, Yu Jin^¹, Tong Wang^¹, Ming-Kai Liu^¹, Xing-Hua Yang^¹(

), Fei Qiao^²

College of Science, Beijing Forestry University, Beijing 100091, China

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

Show Author Information

Abstract

Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers. Conventional researchers always concentrated on the tradeoffs between the energy and the performance in circuit and system design based on accurate computing. However, as video/image processing and machine learning algorithms are widespread, the technique of approximate computing in these applications has become a hot topic. The errors caused by approximate computing could be tolerated by these applications with specific processing or algorithms, and large improvements in performance or power savings could be achieved with some acceptable loss in final output quality. This paper presents a survey of approximate computing from arithmetic units design to high-level applications, in which we try to give researchers a comprehensive and insightful understanding of approximate computing. We believe that approximate computing will play an important role in the circuit and system design in the future, especially with the rapid development of artificial intelligence algorithms and their related applications.

Keywords

approximate computing arithmetic unit low power high performance reduced output quality

Electronic Supplementary Material

Download File(s)

JCST-2205-12537-Highlights.pdf (148.1 KB)

References

[1]

Xu Q, Mytkowicz T, Kim N S. Approximate computing: A survey. IEEE Design & Test, 2016, 33(1): 8–22. DOI: 10.1109/MDAT.2015.2505723.

Crossref Google Scholar

[2]

Zervakis G, Saadat H, Amrouch H, Gerstlauer A, Parameswaran S, Henkel J. Approximate computing for ML: State-of-the-art, challenges and visions. In Proc. the 26th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2021, pp.189–196. DOI: 10.1145/3394885.3431632.

Crossref

[3]

Jiang H L, Santiago F J H, Mo H, Liu L B, Han J. Approximate arithmetic circuits: A survey, characterization, and recent applications. Proceedings of the IEEE, 2020, 108(12): 2108–2135. DOI: 10.1109/JPROC.2020.3006451.

Crossref Google Scholar

[4]

Amanollahi S, Kamal M, Afzali-Kusha A, Pedram M. Circuit-level techniques for logic and memory blocks in approximate computing systems. Proceedings of the IEEE, 2020, 108(12): 2150–2177. DOI: 10.1109/JPROC.2020.3020792.

Crossref Google Scholar

[5]

Cheemalavagu S, Korkmaz P, Palem K V et al. A probabilistic CMOS switch and its realization by exploiting noise. In Proc. IFIP International Conference on VLSI, Oct. 2005, pp.535-541.

[6]

Gupta V, Mohapatra D, Raghunathan A, Roy K. Low-power digital signal processing using approximate adders. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2013, 32(1): 124–137. DOI: 10.1109/TCAD.2012.2217962.

Crossref Google Scholar

[7]

Kim Y, Zhang Y, Li P. An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems. In Proc. the 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2013, pp.130–137. DOI: 10.1109/ICCAD.2013.6691108.

Crossref

[8]

Zhu N, Goh W L, Wang G, Yeo K S. Enhanced low-power high-speed adder for error-tolerant application. In Proc. the 2010 International SoC Design Conference, Nov. 2010, pp.323–327. DOI: 10.1109/SOCDC.2010.5682905.

Crossref

[9]

Lin I C, Yang Y M, Lin C C. High-performance low-power carry speculative addition with variable latency. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2015, 23(9): 1591–1603. DOI: 10.1109/TVLSI.2014.2355217.

Crossref Google Scholar

[10]

Hu J J, Li Z J, Yang M, Huang Z X, Qian W K. A high-accuracy approximate adder with correct sign calculation. Integration, 2019, 65: 370–388. DOI: 10.1016/j.vlsi.2017.09.003.

Crossref Google Scholar

[11]

Yang X H, Xing Y, Qiao F, Yang H Z. Multistage latency adders architecture employing approximate computing. Journal of Circuits, Systems and Computers, 2017, 26(3): 1750039. DOI: 10.1142/S0218126617500396.

Crossref Google Scholar

[12]

Zhang T T, Liu W Q, McLarnon E, O'Neill M, Lombardi F. Design of majority logic (ML) based approximate full adders. In Proc. the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 2018. DOI: 10.1109/ISCAS.2018.8350962.

Crossref

[13]

Liang J H, Han J, Lombardi F. New metrics for the reliability of approximate and probabilistic adders. IEEE Trans. Computers, 2013, 62(9): 1760–1771. DOI: 10.1109/TC.2012.146.

Crossref Google Scholar

[14]

Niharika A, Ramesh M K. 16×16 modified booth multiplier implementation using Wallace tree structures. Journal of Signal Processing, 2022, 8(1): 16–21.

Google Scholar

[15]

Kulkarni P, Gupta P, Ercegovac M. Trading accuracy for power with an underdesigned multiplier architecture. In Proc. the 24th Internatioal Conference on VLSI Design, Jan. 2011, pp.346–351. DOI: 10.1109/VLSID.2011.51.

Crossref

[16]

Rehman S, El-Harouni W, Shafique M, Kumar A, Henkel J, Henkel J. Architectural-space exploration of approximate multipliers. In Proc. the 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2016. DOI: 10.1145/2966986.2967005.

Crossref

[17]

Waris H, Wang C H, Xu C Y, Liu W Q. AxRMs: Approximate recursive multipliers using high-performance building blocks. IEEE Trans. Emerging Topics in Computing, 2022, 10(2): 1229–1235. DOI: 10.1109/TETC.2021.3096515.

Crossref Google Scholar

[18]

Mahdiani H R, Ahmadi A, Fakhraie S M, Lucas C. Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Trans. Circuits and Systems I: Regular Papers, 2009, 57(4): 850–862. DOI: 10.1109/TCSI.2009.2027626.

Crossref Google Scholar

[19]

Baran D, Aktan M, Oklobdzija V G. Energy efficient implementation of parallel CMOS multipliers with improved compressors. In Proc. the 16th ACM/IEEE International Symposium on Low-Power Electronics and Design, Aug. 2010, pp.147–152. DOI: 10.1145/1840845.1840876.

Crossref

[20]

Zendegani R, Kamal M, Bahadori M, Afzali-Kusha A, Pedram M. RoBA multiplier: A rounding-based approximate multiplier for high-speed yet energy-efficient digital signal processing. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2017, 25(2): 393–401. DOI: 10.1109/TVLSI.2016.2587696.

Crossref Google Scholar

[21]

Narayanamoorthy S, Moghaddam H A, Liu Z H, Park T, Kim N S. Energy-efficient approximate multiplication for digital signal processing and classification applications. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2015, 23(6): 1180–1184. DOI: 10.1109/TVLSI.2014.2333366.

Crossref Google Scholar

[22]

Liu W Q, Qian L Y, Wang C H, Jiang H L, Han J, Lombardi F. Design of approximate radix-4 booth multipliers for error-tolerant computing. IEEE Trans. Computers, 2017, 66(8): 1435–1441. DOI: 10.1109/TC.2017.2672976.

Crossref Google Scholar

[23]

Venkatachalam S, Adams E, Lee H J, Ko S B. Design and analysis of area and power efficient approximate booth multipliers. IEEE Trans. Computers, 2019, 68(11): 1697–1703. DOI: 10.1109/TC.2019.2926275.

Crossref Google Scholar

[24]

Waris H, Wang C H, Liu W Q. Hybrid low radix encoding-based approximate booth multipliers. IEEE Trans. Circuits and Systems II: Express Briefs, 2020, 67(12): 3367–3371. DOI: 10.1109/TCSII.2020.2975094.

Crossref Google Scholar

[25]

Mitchell J N. Computer multiplication and division using binary logarithms. IRE Trans. Electronic Computers, 1962, EC-11(4): 512–517. DOI: 10.1109/TEC.1962.5219391.

Crossref Google Scholar

[26]

Liu W Q, Xu J H, Wang D Y, Wang C H, Montuschi P, Lombardi F. Design and evaluation of approximate logarithmic multipliers for low power error-tolerant applications. IEEE Trans. Circuits and Systems I: Regular Papers, 2018, 65(9): 2856–2868. DOI: 10.1109/TCSI.2018.2792902.

Crossref Google Scholar

[27]

Zhang T T, Jiang H L, Mo H, Liu W Q, Lombardi F, Liu L B, Han J. Design of majority logic-based approximate booth multipliers for error-tolerant applications. IEEE Trans. Nanotechnology, 2022, 21: 81–89. DOI: 10.1109/TNANO.2022.3145362.

Crossref Google Scholar

[28]

Chen L B, Han J, Liu W Q, Lombardi F. On the design of approximate restoring dividers for error-tolerant applications. IEEE Trans. Computers, 2016, 65(8): 2522–2533. DOI: 10.1109/TC.2015.2494005.

Crossref Google Scholar

[29]

Ercegovac M D, Lang T, Montuschi P. Very-high radix division with prescaling and selection by rounding. IEEE Trans. Computers, 1994, 43(8): 909–918. DOI: 10.1109/12.295853.

Crossref Google Scholar

[30]

Chen L B, Lombardi F, Montuschi P, Han J, Liu W Q. Design of approximate high-radix dividers by inexact binary signed-digit addition. In Proc. the on Great Lakes Symposium on VLSI 2017, May 2017, pp.293–298. DOI: 10.1145/3060403.3060404.

Crossref

[31]

Lin C P, Tseng P C, Chiu Y T, Lin S S, Cheng C C, Fang H C, Chao W M, Chen L G. A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications. In Proc. the 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers, Feb. 2006, pp.1626–1635. DOI: 10.1109/ISSCC.2006.1696217.

Crossref

[32]

Carroll A, Heiser G. An analysis of power consumption in a smartphone. In Proc. the 2010 USENIX Conference on USENIX Annual Technical Conference, Jun. 2010.

[33]

Chang I J, Mohapatra D, Roy K. A priority-based 6T/8T hybrid SRAM architecture for aggressive voltage scaling in video applications. IEEE Trans. Circuits and Systems for Video Technology, 2011, 21(2): 101–112. DOI: 10.1109/TCSVT.2011.2105550.

Crossref Google Scholar

[34]

Zhou N, Qiao F, Yang H Z, Wang H. Low-power off-chip memory design for video decoder using embedded bus-invert coding. In Proc. the 10th International Symposium on Autonomous Decentralized Systems, Mar. 2011, pp.251–255. DOI: 10.1109/ISADS.2011.33.

Crossref

[35]

Joo Y, Choi Y, Shim H. Energy exploration and reduction of SDRAM memory systems. In Proc. the 2002 Design Automation Conference, Jun. 2002, pp.892–897. DOI: 10.1109/DAC.2002.1012748.

Crossref

[36]

Liu S, Pattabiraman K, Moscibroda T, Zorn B G. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proc. the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.213–224. DOI: 10.1145/1950365.1950391.

Crossref

[37]

Tian Y, Zhang Q, Wang T, Yuan F, Xu Q. ApproxMA: Approximate memory access for dynamic precision scaling. In Proc. the 25th Edition on Great Lakes Symposium on VLSI, May 2015, pp.337–342. DOI: 10.1145/2742060.2743759.

Crossref

[38]

Shiga H, Takashima D, Shiratake S I, Hoya K, Miyakawa T, Ogiwara R, Fukuda R, Takizawa R, Hatsuda K, Matsuoka F, Nagadomi Y, Hashimoto D, Nishimura H, Hioka T, Doumae S, Shimizu S, Kawano M, Taguchi T, Watanabe Y, Fujii S, Ozaki T, Kanaya H, Kumura Y, Shimojo Y, Yamada Y, Minami Y, Shuto S, Yamakawa K, Yamazaki S, Kunishima I, Hamamoto T, Nitayama A, Furuyama T. A 1.6 GB/s DDR2 128 Mb chain FeRAM with scalable octal bitline and sensing schemes. IEEE Journal of Solid-State Circuits, 2010, 45(1): 142–152. DOI: 10.1109/JSSC.2009.2034414.

Crossref Google Scholar

[39]

Li B X, Xia L X, Gu P, Wang Y, Yang H Z. Merging the interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system. In Proc. the 52nd ACM/EDAC/IEEE Design Automation Conference, Jun. 2015. DOI: 10.1145/2744769.2744870.

Crossref

[40]

Nelson J, Sampson A, Ceze L. Dense approximate storage in phase-change memory. In Proc. the Wild and Crazy Ideas w/International Conference on Architectural Support for Programming Languages and Operating Systems (WACI w/ASPLOS), Mar. 2011.

[41]

Sidiroglou-Douskos S, Misailovic S, Hoffmann H, Rinard M. Managing performance vs. accuracy trade-offs with loop perforation. In Proc. the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Sept. 2011, pp.124–134. DOI: 10.1145/2025113.2025133.

Crossref

[42]

Lashgar A, Atoofian E, Baniasadi A. Loop perforation in OpenACC. In Proc. the 2018 IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), Dec. 2018, pp.163–170. DOI: 10.1109/BDCloud.2018.00036.

Crossref

[43]

Rubio-González C, Nguyen C, Nguyen H D, Demmel J, Kahan W, Sen K, Bailey D H, Iancu C, Hough D. Precimonious: Tuning assistant for floating-point precision. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2013. DOI: 10.1145/2503210.2503296.

Crossref

[44]

Hsiao C C, Chu S L, Chen C Y. Energy-aware hybrid precision selection framework for mobile GPUs. Computers & Graphics, 2013, 37(5): 431–444. DOI: 10.1016/j.cag.2013.03.003.

Crossref Google Scholar

[45]

Lesser B, Mücke M, Gansterer W N. Effects of reduced precision on floating-point SVM classification accuracy. Procedia Computer Science, 2011, 4: 508–517. DOI: 10.1016/j.procs.2011.04.053.

Crossref Google Scholar

[46]

Venkataramani S, Ranjan A, Roy K, Raghunathan A. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proc. the 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2014, pp.27–32. DOI: 10.1145/2627369.2627613.

Crossref

[47]

Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P. Deep learning with limited numerical precision. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.1737–1746.

[48]

Krishnamoorthi R. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv: 1806.08342, 2018. https://arxiv.org/abs/1806.08342, April 2023.

[49]

Zhu F, Gong R H, Yu F W, Liu X L, Wang Y F, Li Z L, Yang X Q, Yan J J. Towards unified INT8 training for convolutional neural network. In Proc. the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1966–1976. DOI: 10.1109/CVPR42600.2020.00204.

Crossref

[50]

Gysel P, Pimentel J, Motamedi M, Ghiasi S. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Trans. Neural Networks and Learning Systems, 2018, 29(11): 5784–5789. DOI: 10.1109/TNNLS.2018.2808319.

Crossref Google Scholar

[51]

Banner R, Nahshan Y, Soudry D. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, pp.7950–7958.

[52]

Sun X, Choi J, Chen C Y, Wang N G, Venkataramani S, Srinivasan V V, Cui X D, Zhang W, Gopalakrishnan K. Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, pp.4900–4909.

[53]

Micikevicius P, Narang S, Alben J, Diamos G, Elsen E, Garcia D, Ginsburg B, Houston M, Kuchaiev O, Venkatesh G, Wu H. Mixed precision training. arXiv: 1710.03740, 2017. https://arxiv.org/abs/1710.03740#, April 2023.

[54]

Hanson S J, Pratt L Y. Comparing biases for minimal network construction with back-propagation. In Proc. the 1st International Conference on Neural Information Processing Systems, Jan. 1988, pp.177–185.

[55]

LeCun Y, Denker J S, Solla S A. Optimal brain damage. In Proc. the Advances in Neural Information Processing Systems, Nov. 1989. pp.598–605.

[56]

Zhu M, Gupta S. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proc. the 6th International Conference on Learning Representations, Apr. 2018.

[57]

Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural networks. In Proc. the 28th International Conference on Advances in Neural Information Processing Systems, Dec. 2015. pp.1135–1143.

[58]

Liu Z, Li J G, Shen Z Q, Huang G, Yan S M, Zhang C S. Learning efficient convolutional networks through network slimming. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2755–2763. DOI: 10.1109/ICCV.2017.298.

Crossref

[59]

Ye J B, Lu X, Lin Z, Wang J Z. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In Proc. the 6th International Conference on Learning Representations, Apr. 2018.

[60]

Fleischer B, Shukla S, Ziegler M, Silberman J, Oh J, Srinivasan V, Choi J, Mueller S, Agrawal A, Babinsky T, Cao M Z, Chen C Y, Chuang P, Fox T, Gristede G, Guillorn M, Haynie H, Klaiber M, Lee D, LO S H, Maier G, Scheuermann M, Venkataramani S, Vezyrtzis C, Wang N G, Yee F, Zhou C, Lu P F, Curran B, Chang L, Gopalakrishnan K. A scalable multi-TeraOPS deep learning processor core for AI Trainina and inference. In Proc. the 2018 IEEE Symposium on VLSI Circuits, Jun. 2018, pp.35–36. DOI: 10.1109/VLSIC.2018.8502276.

Crossref

[61]

Li H, Pang Y R, Zhang J L. Security enhancements for approximate machine learning. In Proc. the on Great Lakes Symposium on VLSI 2021, Jun. 2021, pp.461–466. DOI: 10.1145/3453688.3461753.

Crossref

[62]

Leipnitz M T, Nazar G L. High-level synthesis of resource-oriented approximate designs for FPGAs. In Proc. the 56th ACM/IEEE Design Automation Conference (DAC), Jun. 2019.

Crossref

Journal of Computer Science and Technology

Volume 38 Issue 2,
March 2023

Pages 251-272

DOI: 10.1007/s11390-023-2537-y

Cite this article:

Que H-H, Jin Y, Wang T, et al. A Survey of Approximate Computing: From Arithmetic Units Design to High-Level Applications. Journal of Computer Science and Technology, 2023, 38(2): 251-272. https://doi.org/10.1007/s11390-023-2537-y

425

Views

Crossref

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 28 May 2022

Accepted: 16 March 2023

Published: 30 March 2023