AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Survey

A Survey of Approximate Computing: From Arithmetic Units Design to High-Level Applications

College of Science, Beijing Forestry University, Beijing 100091, China
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Show Author Information

Abstract

Realizing a high-performance and energy-efficient circuit system is one of the critical tasks for circuit designers. Conventional researchers always concentrated on the tradeoffs between the energy and the performance in circuit and system design based on accurate computing. However, as video/image processing and machine learning algorithms are widespread, the technique of approximate computing in these applications has become a hot topic. The errors caused by approximate computing could be tolerated by these applications with specific processing or algorithms, and large improvements in performance or power savings could be achieved with some acceptable loss in final output quality. This paper presents a survey of approximate computing from arithmetic units design to high-level applications, in which we try to give researchers a comprehensive and insightful understanding of approximate computing. We believe that approximate computing will play an important role in the circuit and system design in the future, especially with the rapid development of artificial intelligence algorithms and their related applications.

Electronic Supplementary Material

Download File(s)
JCST-2205-12537-Highlights.pdf (148.1 KB)

References

[1]

Xu Q, Mytkowicz T, Kim N S. Approximate computing: A survey. IEEE Design & Test, 2016, 33(1): 8–22. DOI: 10.1109/MDAT.2015.2505723.

[2]
Zervakis G, Saadat H, Amrouch H, Gerstlauer A, Parameswaran S, Henkel J. Approximate computing for ML: State-of-the-art, challenges and visions. In Proc. the 26th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2021, pp.189–196. DOI: 10.1145/3394885.3431632.
[3]

Jiang H L, Santiago F J H, Mo H, Liu L B, Han J. Approximate arithmetic circuits: A survey, characterization, and recent applications. Proceedings of the IEEE, 2020, 108(12): 2108–2135. DOI: 10.1109/JPROC.2020.3006451.

[4]

Amanollahi S, Kamal M, Afzali-Kusha A, Pedram M. Circuit-level techniques for logic and memory blocks in approximate computing systems. Proceedings of the IEEE, 2020, 108(12): 2150–2177. DOI: 10.1109/JPROC.2020.3020792.

[5]
Cheemalavagu S, Korkmaz P, Palem K V et al. A probabilistic CMOS switch and its realization by exploiting noise. In Proc. IFIP International Conference on VLSI, Oct. 2005, pp.535-541.
[6]

Gupta V, Mohapatra D, Raghunathan A, Roy K. Low-power digital signal processing using approximate adders. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, 2013, 32(1): 124–137. DOI: 10.1109/TCAD.2012.2217962.

[7]
Kim Y, Zhang Y, Li P. An energy efficient approximate adder with carry skip for error resilient neuromorphic VLSI systems. In Proc. the 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2013, pp.130–137. DOI: 10.1109/ICCAD.2013.6691108.
[8]
Zhu N, Goh W L, Wang G, Yeo K S. Enhanced low-power high-speed adder for error-tolerant application. In Proc. the 2010 International SoC Design Conference, Nov. 2010, pp.323–327. DOI: 10.1109/SOCDC.2010.5682905.
[9]

Lin I C, Yang Y M, Lin C C. High-performance low-power carry speculative addition with variable latency. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2015, 23(9): 1591–1603. DOI: 10.1109/TVLSI.2014.2355217.

[10]

Hu J J, Li Z J, Yang M, Huang Z X, Qian W K. A high-accuracy approximate adder with correct sign calculation. Integration, 2019, 65: 370–388. DOI: 10.1016/j.vlsi.2017.09.003.

[11]

Yang X H, Xing Y, Qiao F, Yang H Z. Multistage latency adders architecture employing approximate computing. Journal of Circuits, Systems and Computers, 2017, 26(3): 1750039. DOI: 10.1142/S0218126617500396.

[12]
Zhang T T, Liu W Q, McLarnon E, O'Neill M, Lombardi F. Design of majority logic (ML) based approximate full adders. In Proc. the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 2018. DOI: 10.1109/ISCAS.2018.8350962.
[13]

Liang J H, Han J, Lombardi F. New metrics for the reliability of approximate and probabilistic adders. IEEE Trans. Computers, 2013, 62(9): 1760–1771. DOI: 10.1109/TC.2012.146.

[14]

Niharika A, Ramesh M K. 16×16 modified booth multiplier implementation using Wallace tree structures. Journal of Signal Processing, 2022, 8(1): 16–21.

[15]
Kulkarni P, Gupta P, Ercegovac M. Trading accuracy for power with an underdesigned multiplier architecture. In Proc. the 24th Internatioal Conference on VLSI Design, Jan. 2011, pp.346–351. DOI: 10.1109/VLSID.2011.51.
[16]
Rehman S, El-Harouni W, Shafique M, Kumar A, Henkel J, Henkel J. Architectural-space exploration of approximate multipliers. In Proc. the 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov. 2016. DOI: 10.1145/2966986.2967005.
[17]

Waris H, Wang C H, Xu C Y, Liu W Q. AxRMs: Approximate recursive multipliers using high-performance building blocks. IEEE Trans. Emerging Topics in Computing, 2022, 10(2): 1229–1235. DOI: 10.1109/TETC.2021.3096515.

[18]

Mahdiani H R, Ahmadi A, Fakhraie S M, Lucas C. Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Trans. Circuits and Systems I: Regular Papers, 2009, 57(4): 850–862. DOI: 10.1109/TCSI.2009.2027626.

[19]
Baran D, Aktan M, Oklobdzija V G. Energy efficient implementation of parallel CMOS multipliers with improved compressors. In Proc. the 16th ACM/IEEE International Symposium on Low-Power Electronics and Design, Aug. 2010, pp.147–152. DOI: 10.1145/1840845.1840876.
[20]

Zendegani R, Kamal M, Bahadori M, Afzali-Kusha A, Pedram M. RoBA multiplier: A rounding-based approximate multiplier for high-speed yet energy-efficient digital signal processing. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2017, 25(2): 393–401. DOI: 10.1109/TVLSI.2016.2587696.

[21]

Narayanamoorthy S, Moghaddam H A, Liu Z H, Park T, Kim N S. Energy-efficient approximate multiplication for digital signal processing and classification applications. IEEE Trans. Very Large Scale Integration (VLSI) Systems, 2015, 23(6): 1180–1184. DOI: 10.1109/TVLSI.2014.2333366.

[22]

Liu W Q, Qian L Y, Wang C H, Jiang H L, Han J, Lombardi F. Design of approximate radix-4 booth multipliers for error-tolerant computing. IEEE Trans. Computers, 2017, 66(8): 1435–1441. DOI: 10.1109/TC.2017.2672976.

[23]

Venkatachalam S, Adams E, Lee H J, Ko S B. Design and analysis of area and power efficient approximate booth multipliers. IEEE Trans. Computers, 2019, 68(11): 1697–1703. DOI: 10.1109/TC.2019.2926275.

[24]

Waris H, Wang C H, Liu W Q. Hybrid low radix encoding-based approximate booth multipliers. IEEE Trans. Circuits and Systems II: Express Briefs, 2020, 67(12): 3367–3371. DOI: 10.1109/TCSII.2020.2975094.

[25]

Mitchell J N. Computer multiplication and division using binary logarithms. IRE Trans. Electronic Computers, 1962, EC-11(4): 512–517. DOI: 10.1109/TEC.1962.5219391.

[26]

Liu W Q, Xu J H, Wang D Y, Wang C H, Montuschi P, Lombardi F. Design and evaluation of approximate logarithmic multipliers for low power error-tolerant applications. IEEE Trans. Circuits and Systems I: Regular Papers, 2018, 65(9): 2856–2868. DOI: 10.1109/TCSI.2018.2792902.

[27]

Zhang T T, Jiang H L, Mo H, Liu W Q, Lombardi F, Liu L B, Han J. Design of majority logic-based approximate booth multipliers for error-tolerant applications. IEEE Trans. Nanotechnology, 2022, 21: 81–89. DOI: 10.1109/TNANO.2022.3145362.

[28]

Chen L B, Han J, Liu W Q, Lombardi F. On the design of approximate restoring dividers for error-tolerant applications. IEEE Trans. Computers, 2016, 65(8): 2522–2533. DOI: 10.1109/TC.2015.2494005.

[29]

Ercegovac M D, Lang T, Montuschi P. Very-high radix division with prescaling and selection by rounding. IEEE Trans. Computers, 1994, 43(8): 909–918. DOI: 10.1109/12.295853.

[30]
Chen L B, Lombardi F, Montuschi P, Han J, Liu W Q. Design of approximate high-radix dividers by inexact binary signed-digit addition. In Proc. the on Great Lakes Symposium on VLSI 2017, May 2017, pp.293–298. DOI: 10.1145/3060403.3060404.
[31]
Lin C P, Tseng P C, Chiu Y T, Lin S S, Cheng C C, Fang H C, Chao W M, Chen L G. A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications. In Proc. the 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers, Feb. 2006, pp.1626–1635. DOI: 10.1109/ISSCC.2006.1696217.
[32]
Carroll A, Heiser G. An analysis of power consumption in a smartphone. In Proc. the 2010 USENIX Conference on USENIX Annual Technical Conference, Jun. 2010.
[33]

Chang I J, Mohapatra D, Roy K. A priority-based 6T/8T hybrid SRAM architecture for aggressive voltage scaling in video applications. IEEE Trans. Circuits and Systems for Video Technology, 2011, 21(2): 101–112. DOI: 10.1109/TCSVT.2011.2105550.

[34]
Zhou N, Qiao F, Yang H Z, Wang H. Low-power off-chip memory design for video decoder using embedded bus-invert coding. In Proc. the 10th International Symposium on Autonomous Decentralized Systems, Mar. 2011, pp.251–255. DOI: 10.1109/ISADS.2011.33.
[35]
Joo Y, Choi Y, Shim H. Energy exploration and reduction of SDRAM memory systems. In Proc. the 2002 Design Automation Conference, Jun. 2002, pp.892–897. DOI: 10.1109/DAC.2002.1012748.
[36]
Liu S, Pattabiraman K, Moscibroda T, Zorn B G. Flikker: Saving DRAM refresh-power through critical data partitioning. In Proc. the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2011, pp.213–224. DOI: 10.1145/1950365.1950391.
[37]
Tian Y, Zhang Q, Wang T, Yuan F, Xu Q. ApproxMA: Approximate memory access for dynamic precision scaling. In Proc. the 25th Edition on Great Lakes Symposium on VLSI, May 2015, pp.337–342. DOI: 10.1145/2742060.2743759.
[38]

Shiga H, Takashima D, Shiratake S I, Hoya K, Miyakawa T, Ogiwara R, Fukuda R, Takizawa R, Hatsuda K, Matsuoka F, Nagadomi Y, Hashimoto D, Nishimura H, Hioka T, Doumae S, Shimizu S, Kawano M, Taguchi T, Watanabe Y, Fujii S, Ozaki T, Kanaya H, Kumura Y, Shimojo Y, Yamada Y, Minami Y, Shuto S, Yamakawa K, Yamazaki S, Kunishima I, Hamamoto T, Nitayama A, Furuyama T. A 1.6 GB/s DDR2 128 Mb chain FeRAM with scalable octal bitline and sensing schemes. IEEE Journal of Solid-State Circuits, 2010, 45(1): 142–152. DOI: 10.1109/JSSC.2009.2034414.

[39]
Li B X, Xia L X, Gu P, Wang Y, Yang H Z. Merging the interface: Power, area and accuracy co-optimization for RRAM crossbar-based mixed-signal computing system. In Proc. the 52nd ACM/EDAC/IEEE Design Automation Conference, Jun. 2015. DOI: 10.1145/2744769.2744870.
[40]
Nelson J, Sampson A, Ceze L. Dense approximate storage in phase-change memory. In Proc. the Wild and Crazy Ideas w/International Conference on Architectural Support for Programming Languages and Operating Systems (WACI w/ASPLOS), Mar. 2011.
[41]
Sidiroglou-Douskos S, Misailovic S, Hoffmann H, Rinard M. Managing performance vs. accuracy trade-offs with loop perforation. In Proc. the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, Sept. 2011, pp.124–134. DOI: 10.1145/2025113.2025133.
[42]
Lashgar A, Atoofian E, Baniasadi A. Loop perforation in OpenACC. In Proc. the 2018 IEEE International Conference on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), Dec. 2018, pp.163–170. DOI: 10.1109/BDCloud.2018.00036.
[43]
Rubio-González C, Nguyen C, Nguyen H D, Demmel J, Kahan W, Sen K, Bailey D H, Iancu C, Hough D. Precimonious: Tuning assistant for floating-point precision. In Proc. the International Conference on High Performance Computing, Networking, Storage and Analysis, Nov. 2013. DOI: 10.1145/2503210.2503296.
[44]

Hsiao C C, Chu S L, Chen C Y. Energy-aware hybrid precision selection framework for mobile GPUs. Computers & Graphics, 2013, 37(5): 431–444. DOI: 10.1016/j.cag.2013.03.003.

[45]

Lesser B, Mücke M, Gansterer W N. Effects of reduced precision on floating-point SVM classification accuracy. Procedia Computer Science, 2011, 4: 508–517. DOI: 10.1016/j.procs.2011.04.053.

[46]
Venkataramani S, Ranjan A, Roy K, Raghunathan A. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proc. the 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Aug. 2014, pp.27–32. DOI: 10.1145/2627369.2627613.
[47]
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P. Deep learning with limited numerical precision. In Proc. the 32nd International Conference on Machine Learning, Jul. 2015, pp.1737–1746.
[48]
Krishnamoorthi R. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv: 1806.08342, 2018. https://arxiv.org/abs/1806.08342, April 2023.
[49]
Zhu F, Gong R H, Yu F W, Liu X L, Wang Y F, Li Z L, Yang X Q, Yan J J. Towards unified INT8 training for convolutional neural network. In Proc. the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020, pp.1966–1976. DOI: 10.1109/CVPR42600.2020.00204.
[50]

Gysel P, Pimentel J, Motamedi M, Ghiasi S. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Trans. Neural Networks and Learning Systems, 2018, 29(11): 5784–5789. DOI: 10.1109/TNNLS.2018.2808319.

[51]
Banner R, Nahshan Y, Soudry D. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, pp.7950–7958.
[52]
Sun X, Choi J, Chen C Y, Wang N G, Venkataramani S, Srinivasan V V, Cui X D, Zhang W, Gopalakrishnan K. Hybrid 8-bit floating point (HFP8) training and inference for deep neural networks. In Proc. the 33rd International Conference on Neural Information Processing Systems, Dec. 2019, pp.4900–4909.
[53]
Micikevicius P, Narang S, Alben J, Diamos G, Elsen E, Garcia D, Ginsburg B, Houston M, Kuchaiev O, Venkatesh G, Wu H. Mixed precision training. arXiv: 1710.03740, 2017. https://arxiv.org/abs/1710.03740#, April 2023.
[54]
Hanson S J, Pratt L Y. Comparing biases for minimal network construction with back-propagation. In Proc. the 1st International Conference on Neural Information Processing Systems, Jan. 1988, pp.177–185.
[55]
LeCun Y, Denker J S, Solla S A. Optimal brain damage. In Proc. the Advances in Neural Information Processing Systems, Nov. 1989. pp.598–605.
[56]
Zhu M, Gupta S. To prune, or not to prune: Exploring the efficacy of pruning for model compression. In Proc. the 6th International Conference on Learning Representations, Apr. 2018.
[57]
Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural networks. In Proc. the 28th International Conference on Advances in Neural Information Processing Systems, Dec. 2015. pp.1135–1143.
[58]
Liu Z, Li J G, Shen Z Q, Huang G, Yan S M, Zhang C S. Learning efficient convolutional networks through network slimming. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2755–2763. DOI: 10.1109/ICCV.2017.298.
[59]
Ye J B, Lu X, Lin Z, Wang J Z. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In Proc. the 6th International Conference on Learning Representations, Apr. 2018.
[60]
Fleischer B, Shukla S, Ziegler M, Silberman J, Oh J, Srinivasan V, Choi J, Mueller S, Agrawal A, Babinsky T, Cao M Z, Chen C Y, Chuang P, Fox T, Gristede G, Guillorn M, Haynie H, Klaiber M, Lee D, LO S H, Maier G, Scheuermann M, Venkataramani S, Vezyrtzis C, Wang N G, Yee F, Zhou C, Lu P F, Curran B, Chang L, Gopalakrishnan K. A scalable multi-TeraOPS deep learning processor core for AI Trainina and inference. In Proc. the 2018 IEEE Symposium on VLSI Circuits, Jun. 2018, pp.35–36. DOI: 10.1109/VLSIC.2018.8502276.
[61]
Li H, Pang Y R, Zhang J L. Security enhancements for approximate machine learning. In Proc. the on Great Lakes Symposium on VLSI 2021, Jun. 2021, pp.461–466. DOI: 10.1145/3453688.3461753.
[62]
Leipnitz M T, Nazar G L. High-level synthesis of resource-oriented approximate designs for FPGAs. In Proc. the 56th ACM/IEEE Design Automation Conference (DAC), Jun. 2019.
Journal of Computer Science and Technology
Pages 251-272
Cite this article:
Que H-H, Jin Y, Wang T, et al. A Survey of Approximate Computing: From Arithmetic Units Design to High-Level Applications. Journal of Computer Science and Technology, 2023, 38(2): 251-272. https://doi.org/10.1007/s11390-023-2537-y

424

Views

3

Crossref

3

Web of Science

4

Scopus

0

CSCD

Altmetrics

Received: 28 May 2022
Accepted: 16 March 2023
Published: 30 March 2023
© Institute of Computing Technology, Chinese Academy of Sciences 2023
Return