More Bang for Your Buck: Boosting Performance with Capped Power Consumption

Juan Chen; Xinxin Qi; Feihao Wu; Jianbin Fang; Yong Dong; Yuan Yuan; Zheng Wang; Keqin Li

doi:10.26599/TST.2020.9010012

| Sign up

PDF (5.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

More Bang for Your Buck: Boosting Performance with Capped Power Consumption

Juan Chen(), Xinxin Qi, Feihao Wu, Jianbin Fang, Yong Dong, Yuan Yuan, Zheng Wang, Keqin Li

College of Computer, National University of Defense Technology, Changsha 410073, China.

College of Computer, University of Leeds, London LS2 9JT, UK.

School of Science and Engineering, State University of New York, New York, NY 12561, USA.

Show Author Information

Abstract

Achieving faster performance without increasing power and energy consumption for computing systems is an outstanding challenge. This paper develops a novel resource allocation scheme for memory-bound applications running on High-Performance Computing (HPC) clusters, aiming to improve application performance without breaching peak power constraints and total energy consumption. Our scheme estimates how the number of processor cores and CPU frequency setting affects the application performance. It then uses the estimate to provide additional compute nodes to memory-bound applications if it is profitable to do so. We implement and apply our algorithm to 12 representative benchmarks from the NAS parallel benchmark and HPC Challenge (HPCC) benchmark suites and evaluate it on a representative HPC cluster. Experimental results show that our approach can effectively mitigate memory contention to improve application performance, and it achieves this without significantly increasing the peak power and overall energy consumption. Our approach obtains on average 12.69% performance improvement over the default resource allocation strategy, but uses 7.06% less total power, which translates into 17.77% energy savings.

Keywords

energy efficiency high-performance computing performance boost power control processor frequency scaling

References

[1]

R. H.

Dennard

, F. H.

Gaensslen

, H. N.

, V. L.

Rideout

, E.

Bassous

, and A. R.

LeBlanc

, Design of ion-implanted MOSFET’s with very small physical dimensions, IEEE Journal of Solid-State Circuits, vol. 9, no. 5, pp. 256-268, 1974.

Crossref Google Scholar

[2]

Bohr

, A 30 year retrospective on Dennard’s MOSFET scaling paper, IEEE Solid-State Circuits Society Newsletter, vol. 12, no. 1, pp. 11-13, 2007.

Crossref Google Scholar

[3]

Kumar

, K. I.

Farkas

, N. P.

Jouppi

, P.

Ranganathan

, and D. M.

Tullsen

, Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction, in Proc. 36th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO 36), San Diego, CA, USA, 2003, pp. 81-92.

Crossref

[4]

Kumar

, V.

Zyuban

, and D. M.

Tullsen

, Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling, ACM SIGARCH Computer Architecture News, vol. 33, no. 2, pp. 408-419, 2005.

Crossref Google Scholar

[5]

Kumar

, D. M.

Tullsen

, N. P.

Jouppi

, and P.

Ranganathan

, Heterogeneous chip multiprocessors, Computer, vol. 38, no. 11, pp. 32-38, 2005.

Crossref Google Scholar

[6]

Heath

, B.

Diniz

, E. V.

Carrera

, W.

Meira

, and R.

Bianchini

, Energy conservation in heterogeneous server clusters, in Proc. 10th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Chicago, IL, USA, 2005, pp. 186-195.

Crossref

[7]

Y. M.

, K.

Skadron

, D.

Brooks

, and Z. G.

, Performance, energy, and thermal considerations for SMT and CMP architectures, in Proc. 11th Int. Symp. High-Performance Computer Architecture, San Francisco, CA, USA, 2005, pp. 71-82.

[8]

Lukefahr

, S.

Padmanabha

, R.

Das

, F. M.

Sleiman

, R.

Dreslinski

, T. F.

Wenisch

, and S.

Mahlke

, Composite cores: Pushing heterogeneity into a core, in 2012 45th Annu. IEEE/ACM Int. Symp. Microarchitecture, Vancouver, Canada, 2012, pp. 317-328

Crossref

[9]

T. S.

Muthukaruppan

, M.

Pricopi

, V.

Venkataramani

, T.

Mitra

, and S.

Vishin

, Hierarchical power management for asymmetric multi-core in dark silicon era, in 2013 50th ACM/EDAC/IEEE Design Automation Conf. (DAC), Austin, TX, USA, 2013, pp. 1-9.

Crossref

[10]

Meng

, K.

Kawakami

, and A. K.

Coskun

, Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints, in Proc. 49th Annu. Design Automation Conf., San Francisco, CA, USA, 2012, pp. 648-655.

Crossref

[11]

Cao

, S. M.

Blackburn

, T. J.

Gao

, and K. S.

McKinley

, The Yin and Yang of power and performance for asymmetric hardware and managed software, in 2012 39th Annu. Int. Symp. Computer Architecture (ISCA), Portland, OR, USA, 2012, pp. 225-236.

Crossref

[12]

Gholkar

, F.

Mueller

, and B.

Rountree

, Power tuning HPC jobs on power-constrained systems, in Proc. 2016 Int. Conf. Parallel Architectures and Compilation, Haifa, Israel, 2016, pp. 179-191.

Crossref

[13]

Patki

, D. K.

Lowenthal

, A.

Sasidharan

, M.

Maiterth

, B. L.

Rountree

, M.

Schulz

, and B. R.

de Supinski

, Practical resource management in power-constrained, high performance computing, in Proc. 24th Int. Symp. High-Performance Parallel and Distributed Computing, Portland, OR, USA, 2015, pp. 121-132.

Crossref

[14]

Isci

, A.

Buyuktosunoglu

, C. Y.

Cher

, P.

Bose

, and M.

Martonosi

, An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget, in 2006 39th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO’06), Orlando, FL, USA, 2006, pp. 347-358.

Crossref

[15]

Pagani

, J. J.

Chen

, and M. M.

, Energy efficiency on multi-core architectures with multiple voltage islands, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 6, pp. 1608-1621, 2015.

Crossref Google Scholar

[16]

S. W.

Williams

, A.

Waterman

, and D. A.

Patterson

, Roofline: An insightful visual performance model for multicore architectures, Communications of the ACM, vol. 52, no. 4, pp. 65-76, 2009.

Crossref Google Scholar

[17]

Asanovic

, R.

Bodik

, B. C.

Catanzaro

, J. J.

Gebis

, P.

Husbands

, K.

Keutzer

, D. A.

Patterson

, W. L.

Plishker

, J.

Shalf

, S. W.

Williams

, and K. A.

Yelick

, The Landscape of Parallel Computing Research: A View from Berkeley, Electrical Engineering and Computer Sciences, Tech. Rep. UCB/EECS-2006-183, University of California at Berkeley, Berkeley, CA, USA, 2006.

[18]

P. R.

Luszczek

, D. H.

Bailey

, J. J.

Dongarra

, J.

Kepner

, R. F.

Lucas

, R.

Rabenseifner

, and D.

Takahashi

, The HPC Challenge (HPCC) benchmark suite, in Proc. 2006 ACM/IEEE Conf. Supercomputing (SC’06), Tampa, FL, USA, 2006, p. 213.

Crossref

[19]

Jeffrey

, Intel^® VTune^TM Amplifier, https://software.intel.com/en-us/articles/intel-system-studio-intel-vtune-amplifier-platform-profiler-overview, 2018.

[20]

Dimitrov

, Intel^® Power Governor, https://software.intel.com/en-us/articles/intel-power-governor, 2012.

[21]

Viswanathan

, Intel^® Memory Latency Checker v3.8, https://software.intel.com/en-us/articles/intelr-memory-latency-checker, 2013.

[22]

Rountree

, D. K.

Lowenthal

, B. R.

de Supinski

, M.

Schulz

, V. W.

Freeh

, and T.

Bletsch

, Adagio: Making DVS practical for complex HPC applications, in Proc. 23rd Int. Conf. Supercomputing, New York, NY, USA, 2009, pp. 460-469.

Crossref

[23]

Wang

, A.

Porterfield

, J.

Cavazos

, and S.

Bhalachandra

, Using per-loop CPU clock modulation for energy efficiency in OpenMP applications, presented at the 2015 44th Int. Conf. Parallel Processing, Beijing, China, 2015, pp. 629-638.

Crossref

[24]

Bhalachandra

, A.

Porterfield

, S. L.

Olivier

, and J. F.

Prins

, An adaptive core-specific runtime for energy efficiency, peesented at 2017 IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Orlando, FL, USA, 2017, pp. 947-956.

Crossref

[25]

Stamelakos

, S.

Xydis

, G.

Palermo

, and C.

Silvano

, Variation-aware voltage island formation for power efficient near-threshold manycore architectures, presented at the 2014 19th Asia and South Pacific Design Automation Conf. (ASP-DAC), Singapore, 2014, pp. 304-310.

Crossref

[26]

U. R.

Karpuzcu

, A.

Sinkar

, N. S.

Kim

, and J.

Torrellas

, EnergySmart: Toward energy-efficient manycores for near-threshold computing, presented at 2013 IEEE 19th Int. Symp. High Performance Computer Architecture (HPCA), Shenzhen, China, 2013, pp. 542-553.

Crossref

[27]

Begum

, D.

Werner

, M.

Hempstead

, G.

Prasad

, and G.

Challen

, Energy-performance trade-offs on energy-constrained devices with multi-component DVFS, presented at 2015 IEEE Int. Symp. Workload Characterization, Atlanta, GA, USA, 2015, pp. 34-43.

Crossref

[28]

Q. X.

Liu

, M.

Moreto

, J.

Abella

, F. J.

Cazorla

, and M.

Valero

, DReAM: An approach to estimate per-task DRAM energy in multicore systems, ACM Transactions on Design Automation of Electronic Systems, vol. 22, no. 1, p. 16, 2016.

Crossref Google Scholar

[29]

Tiwari

, M.

Schulz

, and L.

Carrington

, Predicting optimal power allocation for CPU and DRAM domains, in 2015 IEEE Int. Parallel and Distributed Processing Symp. Workshop, Hyderabad, India, 2015, pp. 951-959.

Crossref

[30]

H. Z.

Zhang

and H.

Hoffmann

, Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques, in Proc. 21st Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS’16), Atlanta, GA, USA, 2016, pp. 545-559.

Crossref

[31]

P. F.

Zou

, T.

Allen

, C. H.

Davis

, X. Z.

Feng

, and R.

, CLIP: Cluster-level intelligent power coordination for power-bounded systems, presented at the 2017 IEEE Int. Conf. Cluster Computing (CLUSTER), Honolulu, HI, USA, 2017, pp. 541-551.

Crossref

[32]

Patki

, D. K.

Lowenthal

, B.

Rountree

, M.

Schulz

, and B. R.

de Supinski

, Exploring hardware overprovisioning in power-constrained, high performance computing, in Proc. 27th Int. ACM Conf. Int. Conf. Supercomputing (ICS’13 ), Eugene, OR, USA, 2013, pp. 173-182.

Crossref

[33]

and C.

Kozyrakis

, Dynamic management of TurboMode in modern multi-core chips, presented at 2014 IEEE 20th Int. Symp. High Performance Computer Architecture (HPCA), Orlando, FL, USA, 2014, pp. 603-613.

Crossref

[34]

H. B.

Jang

, J.

Lee

, J.

Kong

, T.

Suh

, and S. W.

Chung

, Leveraging process variation for performance and energy: In the perspective of overclocking, IEEE Transactions on Computers, vol. 63, no. 5, pp. 1316-1322, 2014.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 26 Issue 3,
June 2021

Pages 370-383

DOI: 10.26599/TST.2020.9010012

Cite this article:

Chen J, Qi X, Wu F, et al. More Bang for Your Buck: Boosting Performance with Capped Power Consumption. Tsinghua Science and Technology, 2021, 26(3): 370-383. https://doi.org/10.26599/TST.2020.9010012