[3]
W., Wang A., Porterfield J. Cavazos, and S. Bhalachandra, Using per-loop CPU clock modulation for energy efficiency in openmp applications, in Proc. 44th Int. International Conference Parallel Processing, Beijing, China, 2015, pp. 629-638.
[4]
L., Tan S. L., Song P., Wu Z., Chen R. Ge, and D. J. Kerbyson, Investigating the interplay between energy efficiency and resilience in high performance computing, in Proc. 29th Int. Parallel and Distributed Processing Symposium, Hyderabad, India, 2015, pp. 786-796.
[5]
S., Rivoire M. A., Shah P. Ranganathan, and C. Kozyrakis, Joulesort: A balanced energy-efficiency benchmark, in Proc. 26th Int. Special Interest Group On Management of Data, Beijing, China, 2007, pp. 365-376.
[6]
A., Rasmussen G., Porter M., Conley H. V., Madhyastha R. N., Mysore A. Pucher, and A. Vahdat, Tritonsort: A balanced large-scale sorting system, in Proc. 8th Int. Usenix Conference on Networked Systems Design & Implementation, Boston, MA, USA, 2011, pp. 1-28.
[7]
D. G., Andersen J., Franklin M., Kaminsky A., Phanishayee L. Tan, and V. Vasudevan, Fawn: A fast array of wimpy nodes, in Proc. 22nd Int. Acm Symposium on Operating Systems Principles, Montana, MT, USA, 2009, pp. 1-14.
[8]
A., Tiwari M. Schulz, and L. Carrington, Predicting optimal power allocation for cpu and dram domains, in Proc. 29th Int. Parallel and Distributed Processing Symposium Workshop (IPDPSW), Hyderabad, India, 2015, pp. 951-959.
[10]
R., Ge X., Feng Y. He, and P. Zou, The case for cross-component power coordination on power bounded systems, in Proc. 45th Int. International Conference on Parallel Processing (ICPP), Philadelphia, PA, USA, 2016, pp. 516-525.
[11]
M., Chen X. Wang, and X. Li, Coordinating processor and main memory for efficientserver power control, in Proc. 25th Int. International Conference on Supercomputing (ICS), Arizona, AZ, USA, pp. 130-140.
[12]
Q., Deng D., Meisner A., Bhattacharjee T. F. Wenisch, and R. Bianchini, CoScale: Coordinating CPU and memory system DVFS in server systems, in Proc. 45th Int. International Symposium on Microarchitecture (MICRO), Canada, 2012, pp. 143-154.
[15]
C., Bienia S., Kumar J. P. Singh, and K. Li, The parsec benchmark suite: Characterization and architectural implications, in Proc. 17th Int. International Conference on Parallel Architectures and Compilation Techniques, Raleigh, NC, USA, 2008, pp. 72-81.
[16]
P. R., Luszczek D. H., Bailey J. J., Dongarra J., Kepner R. F., Lucas R. Rabenseifner, and D. Takahashi, The HPC challenge (HPCC) benchmark suite, in Proc. 19th Int. ACM/IEEE Conference on Supercomputing, Tampa, SF, USA, 2006, pp. 213-213.
[20]
D. Lo and C. Kozyrakis, Dynamic management of turbomode in modern multi-core chips, in Proc. 20th Int. High Performance Computer Architecture (HPCA), Florida, FL, USA, 2014, pp. 603-613.
[25]
R., Raghavendra P., Ranganathan V., Talwar Z. Wang, and X. Zhu, No power struggles: Coordinated multi-level power management for the data center, in Proc. 13rd Int. International Conference on Architectural Support for Programming Languages and Operating Systems, Seattle, WA, USA, 2008, pp. 48-59.
[28]
B., Rountree D. K., Lownenthal B. R. de, Supinski M., Schulz V. W. Freeh, and T. Bletsch, Adagio: Making DVS practical for complex HPC applications, in Proc. 23rd Int. International Conference on Supercomputing, Yorktown Heights, NY, USA, 2009, pp. 460-469.
[29]
S., Bhalachandra A., Porterfield S. L. Olivier, and J. F. Prins, An adaptive core-specific runtime for energy efficiency, in Proc. 31s Int. IEEE International Parallel and Distributed Processing Symposium, Florida, FL, USA, 2017, pp. 947-956.
[30]
A., Marathe P. E., Bailey D. K., Lowenthal B., Rountree M. Schulz, and B. R. de Supinski, A run-time system for power-constrained hpc applications, in Proc. 31s Int. High Performance Computing, Bengaluru, Indian, 2015, pp. 394-408.
[31]
I., Stamelakos S., Xydis G. Palermo, and C. Silvano, Variation-aware voltage island formation for power efficient near-threshold manycore architectures, in Proc. 19th Int. Asia and South Pacific Design Automation Conference, Singapore, 2014, pp. 304-310.
[32]
U. R., Karpuzcu A., Sinkar N. S. Kim, and J. Torrellas, Energysmart: Toward energy-efficient manycores for near-threshold computing, in Proc. 19th Int. High Performance Computer Architecture, Shenzhen, China, 2013, pp. 542-553.
[33]
R., Begum D., Werner M., Hempstead G. Prasad, and G. Challen, Energy-performance trade-offs on energy-constrained devices with multi-component DVFS, in Proc. 10th Int. International Symposium on Workload Characterization, Georgia, GA, USA, 2015, pp. 34-43.
[35]
Q., Liu M., Moreto J., Abella F. J. Cazorla, and M. Valero, Dream: Per-task DRAM energy metering in multicore systems, in Proc. 20th Int. European Conference on Parallel Processing, Porto, Portugal, 2014, pp. 111-123.
[37]
P., Zou T., Allen C. H. Davis, IV X. Feng, and R. Ge, Clip: Cluster-level intelligent power coordination for power-bounded systems, in Proc. 20th Int. Cluster Computing, Hawaii, HI, USA, 2017, pp. 541-551.
[38]
R., Ge P. Zou, and X. Feng, Application-aware power coordination on power bounded NUMA multicore systems, in Proc. 46th Int. International Conference on Parallel Processing, Briston, UK, 2017, pp. 591-600.
[39]
B. Acun and L. V. Kale, Mitigating processor variation through dynamic load balancings, in Proc. 30th Int. International Parallel and Distributed Processing Symposium Workshops, Chicago, IL, USA, 2016, pp. 1073-1076.
[40]
T., Patki D. K., Lowenthal B., Rountree M. Schulz, and B. R. de Supinski, Exploring hardware overprovisioning in power-constrained, high performance computing, in Proc. 27th Int. International Conference on Supercomputing, Eugene, OR, USA, 2013, pp. 173-182.