AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (5.3 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

More Bang for Your Buck: Boosting Performance with Capped Power Consumption

Juan Chen( )Xinxin QiFeihao WuJianbin FangYong DongYuan YuanZheng WangKeqin Li
College of Computer, National University of Defense Technology, Changsha 410073, China.
College of Computer, University of Leeds, London LS2 9JT, UK.
School of Science and Engineering, State University of New York, New York, NY 12561, USA.
Show Author Information

Abstract

Achieving faster performance without increasing power and energy consumption for computing systems is an outstanding challenge. This paper develops a novel resource allocation scheme for memory-bound applications running on High-Performance Computing (HPC) clusters, aiming to improve application performance without breaching peak power constraints and total energy consumption. Our scheme estimates how the number of processor cores and CPU frequency setting affects the application performance. It then uses the estimate to provide additional compute nodes to memory-bound applications if it is profitable to do so. We implement and apply our algorithm to 12 representative benchmarks from the NAS parallel benchmark and HPC Challenge (HPCC) benchmark suites and evaluate it on a representative HPC cluster. Experimental results show that our approach can effectively mitigate memory contention to improve application performance, and it achieves this without significantly increasing the peak power and overall energy consumption. Our approach obtains on average 12.69% performance improvement over the default resource allocation strategy, but uses 7.06% less total power, which translates into 17.77% energy savings.

References

[1]
R. H. Dennard, F. H. Gaensslen, H. N. Yu, V. L. Rideout, E. Bassous, and A. R. LeBlanc, Design of ion-implanted MOSFET’s with very small physical dimensions, IEEE Journal of Solid-State Circuits, vol. 9, no. 5, pp. 256-268, 1974.
[2]
M. Bohr, A 30 year retrospective on Dennard’s MOSFET scaling paper, IEEE Solid-State Circuits Society Newsletter, vol. 12, no. 1, pp. 11-13, 2007.
[3]
R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen, Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction, in Proc. 36th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO 36), San Diego, CA, USA, 2003, pp. 81-92.
[4]
R. Kumar, V. Zyuban, and D. M. Tullsen, Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling, ACM SIGARCH Computer Architecture News, vol. 33, no. 2, pp. 408-419, 2005.
[5]
R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan, Heterogeneous chip multiprocessors, Computer, vol. 38, no. 11, pp. 32-38, 2005.
[6]
T. Heath, B. Diniz, E. V. Carrera, W. Meira, and R. Bianchini, Energy conservation in heterogeneous server clusters, in Proc. 10th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, Chicago, IL, USA, 2005, pp. 186-195.
[7]
Y. M. Li, K. Skadron, D. Brooks, and Z. G. Hu, Performance, energy, and thermal considerations for SMT and CMP architectures, in Proc. 11th Int. Symp. High-Performance Computer Architecture, San Francisco, CA, USA, 2005, pp. 71-82.
[8]
A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F. Wenisch, and S. Mahlke, Composite cores: Pushing heterogeneity into a core, in 2012 45th Annu. IEEE/ACM Int. Symp. Microarchitecture, Vancouver, Canada, 2012, pp. 317-328
[9]
T. S. Muthukaruppan, M. Pricopi, V. Venkataramani, T. Mitra, and S. Vishin, Hierarchical power management for asymmetric multi-core in dark silicon era, in 2013 50th ACM/EDAC/IEEE Design Automation Conf. (DAC), Austin, TX, USA, 2013, pp. 1-9.
[10]
J. Meng, K. Kawakami, and A. K. Coskun, Optimizing energy efficiency of 3-D multicore systems with stacked DRAM under power and thermal constraints, in Proc. 49th Annu. Design Automation Conf., San Francisco, CA, USA, 2012, pp. 648-655.
[11]
T. Cao, S. M. Blackburn, T. J. Gao, and K. S. McKinley, The Yin and Yang of power and performance for asymmetric hardware and managed software, in 2012 39th Annu. Int. Symp. Computer Architecture (ISCA), Portland, OR, USA, 2012, pp. 225-236.
[12]
N. Gholkar, F. Mueller, and B. Rountree, Power tuning HPC jobs on power-constrained systems, in Proc. 2016 Int. Conf. Parallel Architectures and Compilation, Haifa, Israel, 2016, pp. 179-191.
[13]
T. Patki, D. K. Lowenthal, A. Sasidharan, M. Maiterth, B. L. Rountree, M. Schulz, and B. R. de Supinski, Practical resource management in power-constrained, high performance computing, in Proc. 24th Int. Symp. High-Performance Parallel and Distributed Computing, Portland, OR, USA, 2015, pp. 121-132.
[14]
C. Isci, A. Buyuktosunoglu, C. Y. Cher, P. Bose, and M. Martonosi, An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget, in 2006 39th Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO’06), Orlando, FL, USA, 2006, pp. 347-358.
[15]
S. Pagani, J. J. Chen, and M. M. Li, Energy efficiency on multi-core architectures with multiple voltage islands, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 6, pp. 1608-1621, 2015.
[16]
S. W. Williams, A. Waterman, and D. A. Patterson, Roofline: An insightful visual performance model for multicore architectures, Communications of the ACM, vol. 52, no. 4, pp. 65-76, 2009.
[17]
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick, The Landscape of Parallel Computing Research: A View from Berkeley, Electrical Engineering and Computer Sciences, Tech. Rep. UCB/EECS-2006-183, University of California at Berkeley, Berkeley, CA, USA, 2006.
[18]
P. R. Luszczek, D. H. Bailey, J. J. Dongarra, J. Kepner, R. F. Lucas, R. Rabenseifner, and D. Takahashi, The HPC Challenge (HPCC) benchmark suite, in Proc. 2006 ACM/IEEE Conf. Supercomputing (SC’06), Tampa, FL, USA, 2006, p. 213.
[20]
M. Dimitrov, Intel® Power Governor, https://software.intel.com/en-us/articles/intel-power-governor, 2012.
[21]
V. Viswanathan, Intel® Memory Latency Checker v3.8, https://software.intel.com/en-us/articles/intelr-memory-latency-checker, 2013.
[22]
B. Rountree, D. K. Lowenthal, B. R. de Supinski, M. Schulz, V. W. Freeh, and T. Bletsch, Adagio: Making DVS practical for complex HPC applications, in Proc. 23rd Int. Conf. Supercomputing, New York, NY, USA, 2009, pp. 460-469.
[23]
W. Wang, A. Porterfield, J. Cavazos, and S. Bhalachandra, Using per-loop CPU clock modulation for energy efficiency in OpenMP applications, presented at the 2015 44th Int. Conf. Parallel Processing, Beijing, China, 2015, pp. 629-638.
[24]
S. Bhalachandra, A. Porterfield, S. L. Olivier, and J. F. Prins, An adaptive core-specific runtime for energy efficiency, peesented at 2017 IEEE Int. Parallel and Distributed Processing Symp. (IPDPS), Orlando, FL, USA, 2017, pp. 947-956.
[25]
I. Stamelakos, S. Xydis, G. Palermo, and C. Silvano, Variation-aware voltage island formation for power efficient near-threshold manycore architectures, presented at the 2014 19th Asia and South Pacific Design Automation Conf. (ASP-DAC), Singapore, 2014, pp. 304-310.
[26]
U. R. Karpuzcu, A. Sinkar, N. S. Kim, and J. Torrellas, EnergySmart: Toward energy-efficient manycores for near-threshold computing, presented at 2013 IEEE 19th Int. Symp. High Performance Computer Architecture (HPCA), Shenzhen, China, 2013, pp. 542-553.
[27]
R. Begum, D. Werner, M. Hempstead, G. Prasad, and G. Challen, Energy-performance trade-offs on energy-constrained devices with multi-component DVFS, presented at 2015 IEEE Int. Symp. Workload Characterization, Atlanta, GA, USA, 2015, pp. 34-43.
[28]
Q. X. Liu, M. Moreto, J. Abella, F. J. Cazorla, and M. Valero, DReAM: An approach to estimate per-task DRAM energy in multicore systems, ACM Transactions on Design Automation of Electronic Systems, vol. 22, no. 1, p. 16, 2016.
[29]
A. Tiwari, M. Schulz, and L. Carrington, Predicting optimal power allocation for CPU and DRAM domains, in 2015 IEEE Int. Parallel and Distributed Processing Symp. Workshop, Hyderabad, India, 2015, pp. 951-959.
[30]
H. Z. Zhang and H. Hoffmann, Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques, in Proc. 21st Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS’16), Atlanta, GA, USA, 2016, pp. 545-559.
[31]
P. F. Zou, T. Allen, C. H. Davis, X. Z. Feng, and R. Ge, CLIP: Cluster-level intelligent power coordination for power-bounded systems, presented at the 2017 IEEE Int. Conf. Cluster Computing (CLUSTER), Honolulu, HI, USA, 2017, pp. 541-551.
[32]
T. Patki, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski, Exploring hardware overprovisioning in power-constrained, high performance computing, in Proc. 27th Int. ACM Conf. Int. Conf. Supercomputing (ICS’13 ), Eugene, OR, USA, 2013, pp. 173-182.
[33]
D. Lo and C. Kozyrakis, Dynamic management of TurboMode in modern multi-core chips, presented at 2014 IEEE 20th Int. Symp. High Performance Computer Architecture (HPCA), Orlando, FL, USA, 2014, pp. 603-613.
[34]
H. B. Jang, J. Lee, J. Kong, T. Suh, and S. W. Chung, Leveraging process variation for performance and energy: In the perspective of overclocking, IEEE Transactions on Computers, vol. 63, no. 5, pp. 1316-1322, 2014.
Tsinghua Science and Technology
Pages 370-383
Cite this article:
Chen J, Qi X, Wu F, et al. More Bang for Your Buck: Boosting Performance with Capped Power Consumption. Tsinghua Science and Technology, 2021, 26(3): 370-383. https://doi.org/10.26599/TST.2020.9010012

834

Views

18

Downloads

11

Crossref

N/A

Web of Science

10

Scopus

1

CSCD

Altmetrics

Received: 25 March 2020
Accepted: 02 April 2020
Published: 12 October 2020
© The author(s) 2021.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return