A Holistic Energy-Efficient Approach for a Processor-Memory System

Feihao Wu; Juan Chen; Yong Dong; Wenxu Zheng; Xiaodong Pan; Yuan Yuan; Zhixin Ou; Yuyang Sun

doi:10.26599/TST.2018.9020104

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (2.8 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

A Holistic Energy-Efficient Approach for a Processor-Memory System

Feihao Wu, Juan Chen(

), Yong Dong, Wenxu Zheng, Xiaodong Pan, Yuan Yuan, Zhixin Ou, Yuyang Sun

College of Computer, National University of Defense Technology,Changsha 410073, China.

Show Author Information

Abstract

Component overclocking is an effective approach to speed up the components of a system to realize a higher program performance; it includes processor overclocking or memory overclocking. However, overclocking will unavoidably result in increase in power consumption. Our goal is to optimally improve the performance of scientific computing applications without increasing the total power consumption for a processor-memory system. We built a processor-memory energy efficiency model for multicore-based systems, which coordinates the performance and power of processor and memory. Our model exploits performance boost opportunities for a processor-memory system by adopting processor overclocking, processor Dynamic Voltage and Frequency Scaling (DVFS), memory active ratio adjustment, and memory overclocking, according to different scientific applications. This model also provides a total power control method by considering the same four factors mentioned above. We propose a processor and memory Coordination-based holistic Energy-Efficient (CEE) algorithm, which achieves performance improvement without increasing the total power consumption. The experimental results show that an average of 9.3% performance improvement was obtained for all 14 benchmarks. Meanwhile the total power consumption does not increase. The maximal performance improvement was up to 13.1% from dedup benchmark. Our experiments validate the effectiveness of our holistic energy-efficient model and technology.

Keywords

processor overclocking memory overclocking performance boost total power control energy efficiency

References

[1]

H. B.,

Jang

J.,

Lee

J.,

Kong

Suh

, and S. W.

Chung

, Leveraging process variation for performance and energy: In the perspective of overclocking, IEEE Transactions on Computers, vol. 63, no. 5, pp. 1316-1322, 2014.

Crossref Google Scholar

[2]

Subcommittee

, Top ten exascale research challenges, Report, US Department Of Energy, USA, 2014.

[3]

W.,

Wang

A.,

Porterfield

Cavazos

, and S.

Bhalachandra

, Using per-loop CPU clock modulation for energy efficiency in openmp applications, in Proc. 44th Int. International Conference Parallel Processing, Beijing, China, 2015, pp. 629-638.

[4]

L.,

Tan

S. L.,

Song

P.,

Z.,

Chen

, and D. J.

Kerbyson

, Investigating the interplay between energy efficiency and resilience in high performance computing, in Proc. 29th Int. Parallel and Distributed Processing Symposium, Hyderabad, India, 2015, pp. 786-796.

[5]

S.,

Rivoire

M. A.,

Shah

Ranganathan

, and C.

Kozyrakis

, Joulesort: A balanced energy-efficiency benchmark, in Proc. 26th Int. Special Interest Group On Management of Data, Beijing, China, 2007, pp. 365-376.

[6]

A.,

Rasmussen

G.,

Porter

M.,

Conley

H. V.,

Madhyastha

R. N.,

Mysore

Pucher

, and A.

Vahdat

, Tritonsort: A balanced large-scale sorting system, in Proc. 8th Int. Usenix Conference on Networked Systems Design & Implementation, Boston, MA, USA, 2011, pp. 1-28.

[7]

D. G.,

Andersen

J.,

Franklin

M.,

Kaminsky

A.,

Phanishayee

Tan

, and V.

Vasudevan

, Fawn: A fast array of wimpy nodes, in Proc. 22nd Int. Acm Symposium on Operating Systems Principles, Montana, MT, USA, 2009, pp. 1-14.

Crossref

[8]

A.,

Tiwari

Schulz

, and L.

Carrington

, Predicting optimal power allocation for cpu and dram domains, in Proc. 29th Int. Parallel and Distributed Processing Symposium Workshop (IPDPSW), Hyderabad, India, 2015, pp. 951-959.

[9]

Zhang

and H.

Hoffmann

, Maximizing performance under a power cap: A comparison of hardware, software, and hybrid techniques, ACM SIGPLAN Notices, vol. 51, no. 4, pp. 545-559, 2016.

Crossref Google Scholar

[10]

R.,

X.,

Feng

, and P.

Zou

, The case for cross-component power coordination on power bounded systems, in Proc. 45th Int. International Conference on Parallel Processing (ICPP), Philadelphia, PA, USA, 2016, pp. 516-525.

Crossref

[11]

M.,

Chen

Wang

, and X.

, Coordinating processor and main memory for efficientserver power control, in Proc. 25th Int. International Conference on Supercomputing (ICS), Arizona, AZ, USA, pp. 130-140.

[12]

Q.,

Deng

D.,

Meisner

A.,

Bhattacharjee

T. F.

Wenisch

, and R.

Bianchini

, CoScale: Coordinating CPU and memory system DVFS in server systems, in Proc. 45th Int. International Symposium on Microarchitecture (MICRO), Canada, 2012, pp. 143-154.

[13]

J.,

Rubio

K.,

Rajamani

F.,

Rawson

H.,

Hanson

Ghiasi

, and T.

Keller

, Dynamic processor overclocking for improving performance of power-constrained systems, Report, IBM, 2005.

[14]

A. D. M.

Akhshabi1

, Overclocking of CPU and graphics cards cooling refrigerator models offer the xtreme (permanent use) in order to increase efficiency, Bulletin of Applied and Research Science, vol. 3, no. 3, pp. 44-50, 2013.

Crossref Google Scholar

[15]

C.,

Bienia

S.,

Kumar

J. P.

Singh

, and K.

, The parsec benchmark suite: Characterization and architectural implications, in Proc. 17th Int. International Conference on Parallel Architectures and Compilation Techniques, Raleigh, NC, USA, 2008, pp. 72-81.

[16]

P. R.,

Luszczek

D. H.,

Bailey

J. J.,

Dongarra

J.,

Kepner

R. F.,

Lucas

Rabenseifner

, and D.

Takahashi

, The HPC challenge (HPCC) benchmark suite, in Proc. 19th Int. ACM/IEEE Conference on Supercomputing, Tampa, SF, USA, 2006, pp. 213-213.

[17]

Intel 64 and IA-32 Architectures Software Developers Manual, Intel Corporation, 2014.

Google Scholar

[18]

James

, How to overclock: It’s easier than you think, https://www.pcgamesn.com/hardware-guides/overclocking-guide-how-to-overclock, 2017.

[19]

Moment

, DDR4 RAM overclocking 101 guide, http://www.overclockers.com/forums/showthread.php/785102-DDR4-RAM-overclocking-101-guide, 2017.

[20]

and C.

Kozyrakis

, Dynamic management of turbomode in modern multi-core chips, in Proc. 20th Int. High Performance Computer Architecture (HPCA), Florida, FL, USA, 2014, pp. 603-613.

[21]

Intel vtune amplifier, https://software.intel.com/en-us/intel-vtune-amplifier-xe, 2017.

[22]

Dimitrov

, Intel power governor, https://software.intel.com/en-us/articles/intel-power-governor, 2012.

[23]

Viswanathan

, Intel Memory Latency Checker v3.4, https://software.intel.com/en-us/articles/intelr-memory-latency-checker, 2017.

[24]

C.,

Lefurgy

Wang

, and M.

Ware

, Power capping: A prelude to power shifting, Cluster Computing, vol. 11, no. 2, pp. 183-195, 2008.

Crossref Google Scholar

[25]

R.,

Raghavendra

P.,

Ranganathan

V.,

Talwar

Wang

, and X.

Zhu

, No power struggles: Coordinated multi-level power management for the data center, in Proc. 13rd Int. International Conference on Architectural Support for Programming Languages and Operating Systems, Seattle, WA, USA, 2008, pp. 48-59.

Crossref

[26]

X.,

Yang

Y.,

Zhang

X.,

J.,

Xue

I.,

Rogers

G.,

Wang

, and X.

Fang

, Exploiting the reuse supplied by loop-dependent stream references for stream processors, ACM Transactions on Architecture and Code Optimization, vol. 7, no. 2, pp. 1-35, 2010.

Crossref Google Scholar

[27]

X.,

Yang

Z.,

Wang

Xue

, and Y.

Zhou

, The reliability wall for exascale supercomputing, IEEE Transactions on Computers, vol. 61, no. 6, pp. 767-779, 2012.

Crossref Google Scholar

[28]

B.,

Rountree

D. K.,

Lownenthal

B. R. de,

Supinski

M.,

Schulz

V. W.

Freeh

, and T.

Bletsch

, Adagio: Making DVS practical for complex HPC applications, in Proc. 23rd Int. International Conference on Supercomputing, Yorktown Heights, NY, USA, 2009, pp. 460-469.

[29]

S.,

Bhalachandra

A.,

Porterfield

S. L.

Olivier

, and J. F.

Prins

, An adaptive core-specific runtime for energy efficiency, in Proc. 31s Int. IEEE International Parallel and Distributed Processing Symposium, Florida, FL, USA, 2017, pp. 947-956.

[30]

A.,

Marathe

P. E.,

Bailey

D. K.,

Lowenthal

B.,

Rountree

Schulz

, and B. R. de

Supinski

, A run-time system for power-constrained hpc applications, in Proc. 31s Int. High Performance Computing, Bengaluru, Indian, 2015, pp. 394-408.

Crossref

[31]

I.,

Stamelakos

S.,

Xydis

Palermo

, and C.

Silvano

, Variation-aware voltage island formation for power efficient near-threshold manycore architectures, in Proc. 19th Int. Asia and South Pacific Design Automation Conference, Singapore, 2014, pp. 304-310.

[32]

U. R.,

Karpuzcu

A.,

Sinkar

N. S.

Kim

, and J.

Torrellas

, Energysmart: Toward energy-efficient manycores for near-threshold computing, in Proc. 19th Int. High Performance Computer Architecture, Shenzhen, China, 2013, pp. 542-553.

[33]

R.,

Begum

D.,

Werner

M.,

Hempstead

Prasad

, and G.

Challen

, Energy-performance trade-offs on energy-constrained devices with multi-component DVFS, in Proc. 10th Int. International Symposium on Workload Characterization, Georgia, GA, USA, 2015, pp. 34-43.

[34]

Mittal

, A survey of architectural techniques for DRAM power management, International Journal of High Performance Systems Architecture, vol. 4, no. 2, pp. 110-119, 2012.

Crossref Google Scholar

[35]

Q.,

Liu

M.,

Moreto

J.,

Abella

F. J.

Cazorla

, and M.

Valero

, Dream: Per-task DRAM energy metering in multicore systems, in Proc. 20th Int. European Conference on Parallel Processing, Porto, Portugal, 2014, pp. 111-123.

Crossref

[36]

Deng

, Active low-power modes for main memory with memscale, IEEE Micro, vol. 32, no. 3, pp. 62-69, 2012.

Crossref Google Scholar

[37]

P.,

Zou

T.,

Allen

C. H. Davis,

Feng

, and R.

, Clip: Cluster-level intelligent power coordination for power-bounded systems, in Proc. 20th Int. Cluster Computing, Hawaii, HI, USA, 2017, pp. 541-551.

[38]

R.,

Zou

, and X.

Feng

, Application-aware power coordination on power bounded NUMA multicore systems, in Proc. 46th Int. International Conference on Parallel Processing, Briston, UK, 2017, pp. 591-600.

Crossref

[39]

Acun

and L. V.

Kale

, Mitigating processor variation through dynamic load balancings, in Proc. 30th Int. International Parallel and Distributed Processing Symposium Workshops, Chicago, IL, USA, 2016, pp. 1073-1076.

Crossref

[40]

T.,

Patki

D. K.,

Lowenthal

B.,

Rountree

Schulz

, and B. R. de

Supinski

, Exploring hardware overprovisioning in power-constrained, high performance computing, in Proc. 27th Int. International Conference on Supercomputing, Eugene, OR, USA, 2013, pp. 173-182.

Tsinghua Science and Technology

Volume 24 Issue 4,
August 2019

Pages 468-483

DOI: 10.26599/TST.2018.9020104

Cite this article:

Wu F, Chen J, Dong Y, et al. A Holistic Energy-Efficient Approach for a Processor-Memory System. Tsinghua Science and Technology, 2019, 24(4): 468-483. https://doi.org/10.26599/TST.2018.9020104

786

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 17 May 2018

Accepted: 15 June 2018

Published: 07 March 2019