Exploitation of Locality for Energy Efficiency for Breadth First Search in Fine-Grain Execution Models

Chen Chen; Souad Koliai; Guang Gao

doi:10.1109/TST.2013.6678909

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

PDF (1.3 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Exploitation of Locality for Energy Efficiency for Breadth First Search in Fine-Grain Execution Models

Chen Chen(

), Souad Koliai, Guang Gao

Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19711, USA

Show Author Information

Abstract

In the upcoming exa-scale era, the exploitation of data locality in parallel programs is very important because it benefits both program performance and energy efficiency. However, this is a hard topic for graph algorithms such as the Breadth First Search (BFS) due to the irregular data access patterns. This study analyzes the exploitation of data locality in the BFS and its impact on the energy efficiency with the Codelet fine-grain dataflow-inspired execution model. The Codelet Model more efficiently exploits data locality than the OpenMP-like execution models which traditionally focus on coarse-grain parallelism inside loops. A BFS algorithm is then given to exploit the locality between two loop iterations that belong to two different loops (inter-loop locality). This kind of locality can be exploited by the Codelet Model but not by traditional coarse-grain execution models like OpenMP. Tests were performed on fsim which is a simulation platform developed by Intel for the Ubiquitous High Performance Computing (UHPC) project to design future exa-scale architectures. The results show that this BFS algorithm saves up to 7% of the dynamic energy for memory accesses compared to a BFS implementation based on OpenMP loop scheduling.

Keywords

breadth first search locality fine grain execution model

References

[1]

P. Kogge, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, K. Hill, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, R. S. Williams, and K. Yelick, ExaScale computing study: Technology challenges in achieving exaScale systems, DARPA IPTO, Air Force Research Labs, Tech. Rep., Sep. 2008.

Crossref

[2]

J. Torrellas, Architectures for extreme-scale computing, Computer, vol. 42, no. 11, pp. 28-35, 2009.

Crossref Google Scholar

[3]

IBM Cell Broadband Engine, http://www-01.ibm.com/chips/techlib/techlib.nsf/products/Cell_Broadband_Engine, 2013.

[4]

J. Cuvillo, W. Zhu, Z. Hu, and G. R. Gao, Tiny threads: A thread virtual machine for the cyclops64 cellular architecture, presented at the Fifth Workshop on Massively Parallel Processing, in Conjuction with 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), Denver, CO, USA, 2005.

Crossref

[5]

R. Knauerhase, R. Cledat, and J. Teller, For extreme parallelism, your os is sooooo last-millennium, in Proc. of the 4th USENIX Conference on Hot Topics in Parallelism, Berkeley, CA, USA: USENIX Association, 2012, pp. 3.

[6]

N. Satish, C. Kim, J. Chhugani, and P. Dubey, Large-scale energy-efficient graph traversal: A path to efficient data-intensive supercomputing, in Proc. of the International Conference on High Performance Computing, Networking, Storage and Analysis, Los Alamitos, CA, USA, 2012, pp. 14:1-14:11.

Crossref

[7]

The graph 500 list, http://www.graph500.org/, 2012.

Crossref

[8]

D. P. Scarpazza, O. Villa, and F. Petrini, Efficient breadth-first search on the cell/be processor, IEEE Trans. Parallel Distrib. Syst., vol. 19, no. 10, pp. 1381-1395, 2008.

Crossref Google Scholar

[9]

S. Zuckerman, J. Suetterlein, R. Knauerhase, and G. R. Gao, Using a “codelet” program execution model for exascale machines: Position paper, in Proc. of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era, New York, NY, USA, 2011, pp. 64-69.

Crossref

[10]

DARPA-BAA-10-37, UHPC: Ubiquitous high performance computing, http://www.darpa.mil/Our_Work/MTO/Programs/Ubiquitous_High_Performance_Computing_(UHPC).aspx, 2012.

[11]

A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee, Measurement and analysis of online social networks, in Proc. of the 7th ACM SIGCOMM Conference on Internet Measurement, New York, NY, USA, 2007, pp. 29-42.

Crossref

[12]

A. Sud, E. Andersen, S. Curtis, M. Lin, and D. Manocha, Real-time path planning for virtual agents in dynamic environments, in ACM SIGGRAPH 2008 Classes, New York, NY, USA, 2008, pp. 55:1-55:9.

Crossref

[13]

B. Awerbuch and R. Gallager, A new distributed algorithm to find breadth first search trees, Information Theory, IEEE Transactions on, vol. 33, no. 3, pp. 315-322, May 1987.

Crossref Google Scholar

[14]

G. Y. Ananth, V. Kumar, and P. Pardalos, Parallel processing of discrete optimization problems, in In Encyclopedia of Microcomputers, Marcel Dekker Inc, 1993, pp. 129-147.

[15]

A. Yoo, E. Chow, K. Henderson, W. McLendon, B. Hendrickson, and U. Catalyurek, A scalable distributed parallel breadth-first search algorithm on bluegene, in Proc. of the 2005 ACM/IEEE Conference on Supercomputing, Washington, DC, USA, 2005, pp. 25.

Crossref

[16]

D. A. Bader and K. Madduri, Designing multithreaded algorithms for breadth-first search and st-connectivity on the cray mta-2, in Proc. of the 2006 International Conference on Parallel Processing, Washington, DC, USA, 2006, pp. 523-530.

[17]

T. St. John, J. B. Dennis, and G. R. Gao, Massively parallel breadth first search using a tree-structured memory model, in Proc. of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, New York, NY, USA, 2012, pp. 115-123.

Crossref

[18]

V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader, Scalable graph exploration on multicore processors, in Proc. of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, Washington, DC, USA, 2010, pp. 1-11.

Crossref

[19]

J. Chhugani, N. Satish, C. Kim, J. Sewall, and P. Dubey, Fast and efficient graph traversal algorithm for cpus: Maximizing single-node efficiency, in Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, 2012, pp. 378-389. .

Crossref

[20]

A. Buluç and K. Madduri, Parallel breadth-first search on distributed memory systems, in Proc. of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, New York, NY, USA, 2011, pp. 65:1-65:12.

[21]

K. Ueno and T. Suzumura, Highly scalable graph search for the graph500 benchmark, in Proc. of the 21st International Symposium on High-Performance Parallel and Distributed Computing, New York, NY, USA, 2012, pp. 149-160.

Crossref

Tsinghua Science and Technology

Volume 18 Issue 6,
December 2013

Pages 636-646

DOI: 10.1109/TST.2013.6678909

Cite this article:

Chen C, Koliai S, Gao G. Exploitation of Locality for Energy Efficiency for Breadth First Search in Fine-Grain Execution Models. Tsinghua Science and Technology, 2013, 18(6): 636-646. https://doi.org/10.1109/TST.2013.6678909

485

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 18 January 2013

Revised: 02 April 2013

Accepted: 20 April 2013

Published: 06 December 2013