AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Survey

A Survey on Graph Processing Accelerators: Challenges and Opportunities

National Engineering Research Center for Big Data Technology and System, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Services Computing Technology and System Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Cluster and Grid Computing Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
School of Computing, National University of Singapore, Singapore 117418, Singapore
Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
Show Author Information

Abstract

Graph is a well known data structure to represent the associated relationships in a variety of applications, e.g., data science and machine learning. Despite a wealth of existing efforts on developing graph processing systems for improving the performance and/or energy efficiency on traditional architectures, dedicated hardware solutions, also referred to as graph processing accelerators, are essential and emerging to provide the benefits significantly beyond what those pure software solutions can offer. In this paper, we conduct a systematical survey regarding the design and implementation of graph processing accelerators. Specifically, we review the relevant techniques in three core components toward a graph processing accelerator: preprocessing, parallel graph computation, and runtime scheduling. We also examine the benchmarks and results in existing studies for evaluating a graph processing accelerator. Interestingly, we find that there is not an absolute winner for all three aspects in graph acceleration due to the diverse characteristics of graph processing and the complexity of hardware configurations. We finally present and discuss several challenges in details, and further explore the opportunities for the future research.

Electronic Supplementary Material

Download File(s)
jcst-34-2-339-Highlights.pdf (667.8 KB)

References

[1]
Malewicz G, Austern M H, Bik A J, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. ACM SIGMOD Int. Conf. Management of Data, June 2010, pp.135-146.
[2]

Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein J M. Distributed GraphLab: A framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment, 2012, 5(8): 716-727.

[3]
Shun J, Blelloch G E. Ligra: A lightweight graph processing framework for shared memory. In Proc. the 18th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2013, pp.135-146.
[4]
Kyrola A, Blelloch G E, Guestrin C. GraphChi: Large-scale graph computation on just a PC. In Proc. the 10th USENIX Conf. Operating Systems Design and Implementation, October 2012, pp.31-46.
[5]
Roy A, Mihailovic I, Zwaenepoel W. X-Stream: Edge-centric graph processing using streaming partitions. In Proc. the 24th ACM SIGOPS Symp. Operating Systems Principles, November 2013, pp.472-488.
[6]

Zhong J, He B. Medusa: A parallel graph processing system on graphics processors. ACM SIGMOD Record, 2014, 43(2): 35-40.

[7]
Khorasani F, Vora K, Gupta R, Bhuyan L N. CuSha: Vertex-centric graph processing on GPUs. In Proc. the 23rd Int. Symp. High-Performance Parallel and Distributed Computing, June 2014, pp.239-252.
[8]
Wang Y, Davidson A, Pan Y, Wu Y, Riffel A, Owens J D. Gunrock: A high-performance graph processing library on the GPU. In Proc. the 21st ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, March 2016, Article No. 11.
[9]

Shi X, Luo X, Liang J, Zhao P, Di S, He B, Jin H. Frog: Asynchronous graph processing on GPU with hybrid coloring model. IEEE Trans. Knowledge and Data Engineering, 2018, 30(1): 29-42.

[10]
Fu Z, Personick M, Thompson B. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proc. the 2nd International Workshop on Graph Data Management Experiences and Systems, June 2014, Article No. 2.
[11]
Liu H, Huang H H. Enterprise: Breadth-first graph traversal on GPUs. In Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2015, Article No. 68.
[12]
Beamer S, Asanovic K, Patterson D. Locality exists in graph processing: Workload characterization on an ivy bridge server. In Proc. IEEE Int. Symp. Workload Characterization, November 2015, pp.56-65.
[13]
Malicevic J, Lepers B, Zwaenepoel W. Everything you always wanted to know about multicore graph processing but were afraid to ask. In Proc. the 2017 USENIX Annual Technical Conf., July 2017, pp.631-643.
[14]
Nai L, Hadidi R, Sim J, Kim H, Kumar P, Kim H. Graph-PIM: Enabling instruction-level PIM offloading in graph computing frameworks. In Proc. the 2007 IEEE Int. Symp. High Performance Computer Architecture, February 2017, pp.457-468.
[15]
Yao P, Zheng L, Liao X, Jin H, He B. An efficient graph accelerator with parallel data conflict management. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, November 2018, Article No. 8.
[16]
Ham T J, Wu L, Sundaram N, Satish N, Martonosi M. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016, Article No. 56.
[17]
Nai L, Xia Y, Tanase I G, Kim H, Lin C Y. GraphBIG: Understanding graph computing in the context of industrial solutions. In Proc. Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2015, Article No. 69.
[18]
Satish N, Sundaram N, Patwary M M, Seo J, Park J, Hassaan M A, Sengupta S, Yin Z, Dubey P. Navigating the maze of graph analytics frameworks using massive graph datasets. In Proc. ACM SIGMOD Int. Conf. Management of Data, June 2014, pp.979-990.
[19]
Ben-Nun T, Sutton M, Pai S, Pingali K. Groute: An asynchronous multi-GPU programming model for irregular computations. In Proc. the 22nd ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2017, pp.235-248.
[20]
Hennessy J, Patterson D. Domain specific architectures. In Computer Architecture: A Quantitative Approach (6th edition), Merken S, McFadden N (eds.), Elsevier, 2017, pp.540-606.
[21]
Ceze L, Hill M D, Sankaralingam K, Wenisch T F. Democratizing design for future computing platforms. arXiv: 1706.08597, 2017. http://arxiv.org/abs/1706.08597, Jun. 2017.
[22]

Lee Y, Waterman A, Cook H et al. An agile approach to building RISC-V microprocessors. IEEE Micro, 2016, 36(2): 8-20.

[23]
Caulfield A M, Chung E S, Putnam A et al. A cloud-scale acceleration architecture. In Proc. the 49th Annual IEEE/ACM Int. Symp. Microarchitecture, October 2016, Article No. 7.
[24]
de Lorimier M, Kapre N, Mehta N et al. GraphStep: A system architecture for sparse-graph algorithms. In Proc. the 14th Annual IEEE Symp. Field-Programmable Custom Computing Machines, April 2006, pp.143-151.
[25]
Attia O G, Johnson T, Townsend K, Jones P, Zambreno J. CyGraph: A reconfigurable architecture for parallel breadth-first search. In Proc. the 2004 Int. Parallel and Distributed Processing Symp. Workshops, May 2014, pp.228-235.
[26]
Dai G, Chi Y, Wang Y, Yang H. FPGP: Graph processing framework on FPGA a case study of breadth-first search. In Proc. the 2006 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2016, pp.105-110.
[27]
Zhou S, Chelmis C, Prasanna V K. High-throughput and energy-efficient graph processing on FPGA. In Proc. the 24th IEEE Annual Int. Symp. Field-Programmable Custom Computing Machines, May 2016, pp.103-110.
[28]
Dai G, Huang T, Chi Y, Xu N, Wang Y, Yang H. Fore-Graph: Exploring large-scale graph processing on multi-FPGA architecture. In Proc. the 2017 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2017, pp.217-226.
[29]
Ozdal M M, Yesil S, Kim T, Ayupov A, Greth J, Burns S, Özturk Ö. Energy efficient architecture for graph analytics accelerators. In Proc. the 43rd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2016, pp.166-177.
[30]
Zhou J, Liu S, Guo Q, Zhou X, Zhi T, Liu D, Wang C, Zhou X, Chen Y, Chen T. TuNao: A high-performance and energy-efficient reconfigurable accelerator for graph processing. In Proc. the 17th IEEE/ACM Int. Symp. Cluster, Cloud and Grid Computing, May 2017, pp.731-734.
[31]

Ayupov A, Yesil S, Ozdal M M, Kim T, Burns S, Özturk Ö. A template-based design methodology for graph-parallel hardware accelerators. IEEE Trans. Computer Aided Design of Integrated Circuits and Systems, 2018, 37(2): 420-430.

[32]
Ahn J, Hong S, Yoo S, Mutlu O, Choi K. A scalable processing-in-memory accelerator for parallel graph processing. In Proc. the 42nd ACM/IEEE Annual Int. Symp. Computer Architecture, June 2015, pp.105-117.
[33]
Pawlowski J T. Hybrid memory cube (HMC). In Proc. the 23rd IEEE Hot Chips Symp., August 2011, Article No. 15.
[34]
Kim J, Kim Y. HBM: Memory solution for bandwidth-hungry processors. In Proc. the 26th IEEE Hot Chips Symp., August 2014, Article No. 19.
[35]

Wong H S, Lee H Y, Yu S, Chen Y S, Wu Y, Chen P S, Lee B, Chen F T, Tsai M J. Metal-oxide RRAM. Proceedings of the IEEE, 2012, 100(6): 1951-1970.

[36]
Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: Bringing order to the web. Technical Report, Stanford InfoLab, 1999. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf, Jan. 2019.
[37]

McCune R R, Weninger T, Madey G. Thinking like a vertex: A survey of vertex-centric frameworks for large-scale distributed graph processing. ACM Trans. Computing Surveys, 2015, 48(2): Article No. 25.

[38]

Shi X, Zheng Z, Zhou Y, Jin H, He L, Liu B, Hua Q. Graph processing on GPUs: A survey. ACM Trans. Computing Surveys, 2018, 50(6): Article No. 81.

[39]

Heidari S, Simmhan Y, Calheiros R N, Buyya R. Scalable graph processing frameworks: A taxonomy and open challenges. ACM Trans. Computing Surveys, 2018, 51(3): Article No. 60.

[40]
Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. Power-Graph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symp. Operating Systems Design and Implementation, October 2012, pp.17-30.
[41]
Avery C. Giraph: Large-scale graph processing infrastructure on Hadoop. In Proc. the 2011 Hadoop Summit, June 2011, pp.5-9.
[42]
Gonzalez J E, Xin R S, Dave A, Crankshaw D, Franklin M J, Stoica I. GraphX: Graph processing in a distributed dataflow framework. In Proc. the 11th USENIX Symp. Operating Systems Design and Implementation, October 2014, pp.599-613.
[43]
Teixeira C H, Fonseca A J, Serafini M, Siganos G, Zaki M J, Aboulnaga A. Arabesque: A system for distributed graph mining. In Proc. the 25th Symp. Operating Systems Principles, October 2015, pp.425-440.
[44]
Chen R, Shi J, Chen Y, Chen H. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proc. the 10th European Conf. Computer Systems, April 2015, Article No. 1.
[45]
Zhu X, Chen W, Zheng W, Ma X. Gemini: A computation-centric distributed graph processing system. In Proc. the 12th USENIX Symp. Operating Systems Design and Implementation, November 2016, pp.301-316.
[46]
Khayyat Z, Awara K, Alonazi A, Jamjoom H, Williams D, Kalnis P. Mizan: A system for dynamic load balancing in large-scale graph processing. In Proc. the 8th ACM European Conf. Computer Systems, April 2013, pp.169-182.
[47]
Randles M, Lamb D, Taleb-Bendiab A. A comparative study into distributed load balancing algorithms for cloud computing. In Proc. the 24th IEEE Int. Conf. Advanced Information Networking and Applications Workshops, April 2010, pp.551-556.
[48]
Zhao Y, Yoshigoe K, Xie M, Zhou S, Seker R, Bian J. LightGraph: Lighten communication in distributed graph-parallel processing. In Proc. the 2004 IEEE Int. Congress on Big Data, June 2014, pp.717-724.
[49]
Wang P, Zhang K, Chen R, Chen H, Guan H. Replication-based fault-tolerance for large-scale graph processing. In Proc. the 44th Annual IEEE/IFIP Int. Conf. Dependable Systems and Networks, June 2014, pp.562-573.
[50]
Nguyen D, Lenharth A, Pingali K. A lightweight infrastructure for graph analytics. In Proc. the 24th ACM SIGOPS Symp. Operating Systems Principles, November 2013, pp.456-471.
[51]

Sundaram N, Satish N, Patwary M M, Dulloor S R, Anderson M J, Vadlamudi S G, Das D, Dubey P. GraphMat: High performance graph analytics made productive. Proceedings of the VLDB Endowment, 2015, 8(11): 1214-1225.

[52]
Zhang K, Chen R, Chen H. NUMA-aware graph-structured analytics. In Proc. the 20th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2015, pp.183-193.
[53]
Han W S, Lee S, Park K, Lee J H, Kim M S, Kim J, Yu H. TurboGraph: A fast parallel graph engine handling billion-scale graphs in a single PC. In Proc. the 19th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2013, pp.77-85.
[54]
Yuan P, Zhang W, Xie C, Jin H, Liu L, Lee K. Fast iterative graph computation: A path centric approach. In Proc. the 2004 Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2014, pp.401-412.
[55]
Zhu X, Han W, Chen W. GridGraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In Proc. the 2005 USENIX Annual Technical Conf., July 2015, pp.375-386.
[56]
Chi Y, Dai G, Wang Y, Sun G, Li G, Yang H. NXgraph: An efficient graph processing system on a single machine. In Proc. the 32nd IEEE Int. Conf. Data Engineering, May 2016, pp.409-420.
[57]
Maass S, Min C, Kashyap S, Kang W, Kumar M, Kim T. Mosaic: Processing a trillion-edge graph on a single machine. In Proc. the 12th ACM European Conf. Computer Systems, April 2017, pp.527-543.
[58]
Seo H, Kim J, Kim M S. GStream: A graph streaming processing method for large-scale graphs on GPUs. In Proc. the 20th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2015, pp.253-254.
[59]
Soman J, Kishore K, Narayanan P J. A fast GPU algorithm for graph connectivity. In Proc. the 24th IEEE Int. Symp. Parallel & Distributed Processing, Workshops and PhD Forum, April 2010, Article No. 87.
[60]
McLaughlin A, Bader D A. Scalable and high performance betweenness centrality on the GPU. In Proc. the 2014 Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2014, pp.572-583.
[61]
Sariyüce A E, Kaya K, Saule E, Çatalyürek Ü V. Betweenness centrality on GPUs and heterogeneous architectures. In Proc. the 6th Workshop on General Purpose Processor Using Graphics Processing Units, March 2013, pp.76-85.
[62]
Davidson A A, Baxter S, Garland M, Owens J D. Work-efficient parallel GPU methods for single-source shortest paths. In Proc. the 28th IEEE Int. Parallel and Distributed Processing Symp., May 2014, pp.349-359.
[63]
Hong S, Chafi H, Sedlar E, Olukotun K. Green-Marl: A DSL for easy and efficient graph analysis. In Proc. the 17th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2012, pp.349-362.
[64]
Gharaibeh A, Reza T, Santos-Neto E, Costa L B, Sallinen S, Ripeanu M. Efficient large-scale graph processing on hybrid CPU and GPU systems. arXiv: 1312.3018, 2013. http://arxiv.org/abs/1312.3018, Dec. 2018.
[65]

Zhang T, Zhang J, Shu W, Wu M Y, Liang X. Efficient graph computation on hybrid CPU and GPU systems. The Journal of Supercomputing, 2015, 71(4): 1563-1586.

[66]
Liu H, Huang H H, Hu Y. iBFS: Concurrent breadth-first search on GPUs. In Proc. the 2016 Int. Conf. Management of Data, June 2016, pp.403-416.
[67]
Sengupta D, Song S L, Agarwal K, Schwan K. GraphReduce: Processing large-scale graphs on accelerator-based systems. In Proc. the 2015 Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2015, Article No. 28.
[68]
Kim M S, An K, Park H, Seo H, Kim J. GTS: A fast and scalable graph processing method based on streaming topology to GPUs. In Proc. the 2016 Int. Conf. Management of Data, June 2016, pp.447-461.
[69]
Han L, Shen Z, Shao Z, Huang H H, Li T. A novel ReRAM-based processing-in-memory architecture for graph computing. In Proc. the 6th IEEE Non-Volatile Memory Systems and Applications Symp., August 2017, Article No. 13.
[70]
Song L, Zhuo Y, Qian X, Li H, Chen Y. GraphR: Accelerating graph processing using ReRAM. In Proc. the 2018 IEEE Int. Symp. High Performance Computer Architecture, February 2018, pp.531-543.
[71]
Zhang J, Khoram S, Li J. Boosting the performance of FPGA-based graph processor using hybrid memory cube: A case for breadth first search. In Proc. the 2017 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2017, pp.207-216.
[72]
Oguntebi T, Olukotun K. GraphOps: A dataflow library for graph analytics acceleration. In Proc. the 2016 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2016, pp.111-117.
[73]

Dai G, Huang T, Chi Y, Zhao J, Sun G, Liu Y, Wang Y, Xie Y, Yang H. GraphH: A processing-in-memory architecture for large-scale graph processing. IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems. doi: 10.1109/TCAD.2018.2821565.

[74]
Zhang J, Li J. Degree-aware hybrid graph traversal on FPGA-HMC platform. In Proc. the 2018 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2018, pp.229-238.
[75]
Zhou S, Kannan R, Min Y, Prasanna V K. FASTCF: FPGA-based accelerator for stochastic-gradient-descent-based collaborative filtering. In Proc. the 2018 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2018, pp.259-268.
[76]
Khoram S, Zhang J, Strange M, Li J. Accelerating graph analytics by co-optimizing storage and access on an FPGAHMC platform. In Proc. the 2018 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2018, pp.239-248.
[77]

Han L, Shen Z, Liu D, Shao Z, Huang H H, Li T. A novel ReRAM-based processing-in-memory architecture for graph traversal. ACM Trans. Storage, 2018, 14(1): Article No. 9.

[78]
Wang Q, Jiang W, Xia Y, Prasanna V. A message-passing multi-softcore architecture on FPGA for breadth-first search. In Proc. the 2010 Int. Conf. Field-Programmable Technology, December 2010, pp.70-77.
[79]
Umuroglu Y, Morrison D, Jahre M. Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform. In Proc. the 25th Int. Conf. Field Programmable Logic and Applications, September 2015, Article No. 12.
[80]
Zhou S, Prasanna V K. Accelerating graph analytics on CPU-FPGA heterogeneous platform. In Proc. the 29th Int. Symp. Computer Architecture and High Performance Computing, October 2017, pp.137-144.
[81]
Zhang M, Zhuo Y, Wang C, Gao M, Wu Y, Chen K, Kozyrakis C, Qian X. GraphP: Reducing communication for PIM-based graph processing with efficient data partition. In Proc. the 2018 IEEE Int. Symp. High Performance Computer Architecture, February 2018, pp.544-557.
[82]
Huang T, Dai G, Wang Y, Yang H. HyVE: Hybrid vertexedge memory hierarchy for energy-efficient graph processing. In Proc. the 2018 Design, Automation and Test in Europe Conference and Exhibition, March 2018, pp.973-978.
[83]

Ozdal M M, Yesil S, Kim T, Ayupov A, Greth J, Burns S, Ozturk O. Graph analytics accelerators for cognitive systems. IEEE Micro, 2017, 37(1): 42-51.

[84]
Kapre N. Custom FPGA-based soft-processors for sparse graph acceleration. In Proc. the 26th IEEE Int. Conf. Application-Specific Systems, Architectures and Processors, July 2015, pp.9-16.
[85]
Betkaoui B, Thomas D B, Luk W, Przulj N. A framework for FPGA acceleration of large graph problems: Graphlet counting case study. In Proc. the 2011 Int. Conf. Field-Programmable Technology, December 2011, Article No. 2.
[86]
Betkaoui B, Wang Y, Thomas D B, Luk W. A reconfigurable computing approach for efficient and scalable parallel graph exploration. In Proc. the 23rd IEEE Int. Conf. Application-Specific Systems, Architectures and Processors, July 2012, pp.8-15.
[87]
Betkaoui B, Wang Y, Thomas D B, Luk W. Parallel FPGA-based all pairs shortest paths for sparse networks: A human brain connectome case study. In Proc. the 22nd Int. Conf. Field Programmable Logic and Applications, August 2012, pp.99-104.
[88]
Nurvitadhi E, Weisz G, Wang Y, Hurkat S, Nguyen M, Hoe J C, Martínez J F, Guestrin C. GraphGen: An FPGA framework for vertex-centric graph computation. In Proc. the 22nd IEEE Annual Int. Symp. Field-Programmable Custom Computing Machines, May 2014, pp.25-28.
[89]
Attia O G, Grieve A, Townsend K R, Jones P, Zambreno J. Accelerating all-pairs shortest path using a message-passing reconfigurable architecture. In Proc. the 2015 Int. Conf. Reconfigurable Computing and FPGAs, December 2015, Article No. 5.
[90]
Engelhardt N, So H K. GraVF: A vertex-centric distributed graph processing framework on FPGAs. In Proc. the 26th Int. Conf. Field Programmable Logic and Applications, August 2016, Article No. 62.
[91]
Jin H, Yao P, Liao X, Zheng L, Li X. Towards dataflow-based graph accelerator. In Proc. the 37th IEEE Int. Conf. Distributed Computing Systems, June 2017, pp.1981-1992.
[92]
Zhou S, Chelmis C, Prasanna V K. Accelerating large-scale single-source shortest path on FPGA. In Proc. the 2015 Int. Parallel and Distributed Processing Symposium Workshop, May 2015, pp.129-136.
[93]
Zhou S, Chelmis C, Prasanna V K. Optimizing memory performance for FPGA implementation of PageRank. In Proc. the 2015 Int. Conf. Reconfigurable Computing and FPGAs, December 2015, Article No. 53.
[94]
Jun S W, Wright A, Zhang S, Xu S, Arvind. GraFBoost: Using accelerated flash storage for external graph analytics. In Proc. the 45th ACM/IEEE Int. Symp. Computer Architecture, June 2018, pp.411-424.
[95]

Thomas D, Moorby P. The Verilog® Hardware Description Language (5th edition). Springer, 2002.

[96]

Ashenden P J. The Designer’s Guide to VHDL (3rd edition). Morgan Kaufmann, 2008.

[97]

Lee J, Kim H, Yoo S, Choi K, Hofstee H P, Nam G J, Nutter M R, Jamsek D. ExtraV: Boosting graph processing near storage with a coherent accelerator. Proceedings of the VLDB Endowment, 2017, 10(12): 1706-1717.

[98]
Kim G, Kim J, Ahn J H, Kim J. Memory-centric system interconnect design with hybrid memory cubes. In Proc. the 22nd Int. Conf. Parallel Architectures and Compilation Techniques, September 2013, pp.145-155.
[99]
Xu C, Niu D, Muralimanohar N, Balasubramonian R, Zhang T, Yu S, Xie Y. Overcoming the challenges of crossbar resistive memory architectures. In Proc. the 21st IEEE Int. Symp. High Performance Computer Architecture, February 2015, pp.476-488.
[100]
Do J, Kee Y S, Patel J M, Park C, Park K, DeWitt D J. Query processing on smart SSDs: Opportunities and challenges. In Proc. the 2013 ACM SIGMOD Int. Conf. Management of Data, June 2013, pp.1221-1230.
[101]
Jun S W, Liu M, Lee S, Hicks J, Ankcorn J, King M, Xu S, Arvind. BlueDBM: An appliance for big data analytics. In Proc. the 42nd ACM Annual Int. Symp. Computer Architecture, June 2015, pp.1-13.
[102]
Zhang J, Jung M. FlashAbacus: A self-governing flash-based accelerator for low-power systems. In Proc. the 13th EuroSys Conf., April 2018, Article No. 15.
[103]

Ozdal M M. Emerging accelerator platforms for data centers. IEEE Design & Test, 2018, 35(1): 47-54.

[104]
Weisz G, Melber J, Wang Y, Fleming K, Nurvitadhi E, Hoe J C. A study of pointer-chasing performance on shared-memory processor-FPGA systems. In Proc. the 2016 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2016, pp.264-273.
[105]
Gu B, Yoon A S, Bae D H, Jo I, Lee J, Yoon J, Kang J U, Kwon M, Yoon C, Cho S, Jeong J, Chang D. Biscuit: A framework for near-data processing of big data workloads. In Proc. the 43rd Int. Symp. Computer Architecture, June 2016, pp.153-165.
[106]
Son Y, Choi J, Jeon J, Min C, Kim S, Yeom H Y, Han H. SSD-assisted backup and recovery for database systems. In Proc. the 33rd IEEE Int. Conf. Data Engineering, April 2017, pp.285-296.
[107]
Song W S, Gleyzer V, Lomakin A, Kepner J. Novel graph processor architecture, prototype system, and results. In Proc. the 2016 IEEE High Performance Extreme Computing Conference, September 2016, Article No. 59.
[108]

Jin H, Yao P, Liao X. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60(12): Article No. 126102.

[109]
Windh S, Budhkar P, Najjar W A. CAMs as synchronizing caches for multithreaded irregular applications on FPGAs. In Proc. the 2015 ACM/IEEE Int. Conf. Computer-Aided Design, November 2015, pp.331-336.
[110]

Wang L, Yang X, Dai H. Scratchpad memory allocation for arrays in permutation graphs. Science China Information Sciences, 2013, 56(5): 1-13.

[111]
Gao M, Ayers G, Kozyrakis C. Practical near-data processing for in-memory analytics frameworks. In Proc. the 2015 Int. Conf. Parallel Architecture and Compilation, October 2015, pp.113-124.
[112]

Faloutsos M, Faloutsos P, Faloutsos C. On power-law relationships of the Internet topology. ACM SIGCOMM Computer Communication Review, 1999, 29(4): 251-262.

[113]
Xie C, Chen R, Guan H, Zang B, Chen H. SYNC or ASYNC: Time to fuse for distributed graph-parallel computation. In Proc. the 20th ACM SIGPLAN Symp. Principles and Practice of Parallel Programming, February 2015, pp.194-204.
[114]
Ozdal M M, Yesil S, Kim T, Ayupov A, Burns S, Ozturk O. Architectural requirements for energy efficient execution of graph analytics applications. In Proc. the 2015 IEEE/ACM Int. Conf. Computer-Aided Design, November 2015, pp.676-681.
[115]
Beamer S, Asanović K, Patterson D. Direction-optimizing breadth-first search. In Proc. the 2012 Int. Conf. High Performance Computing, Networking, Storage and Analysis, November 2012, Article No. 12.
[116]
Beamer S, Asanović K, Patterson D. The GAP benchmark suite. arXiv: 1508.03619, 2015. http://arxiv.org/abs/1508.03619, May 2017.
[117]

Scarpazza D P, Villa O, Petrini F. Efficient breadth-first search on the Cell/B.E. processor. IEEE Trans. Parallel and Distributed Systems, 2008, 19(10): 1381-95.

[118]

Milenković T, Lai J, Pržulj N. GraphCrunch: A tool for large network analyses. BMC Bioinformatics, 2008, 9: Article No. 70.

[119]
Hong S, Oguntebi T, Olukotun K. Efficient parallel graph exploration on multi-core CPU and GPU. In Proc. the 2011 Int. Conf. Parallel Architectures and Compilation Techniques, October 2011, pp.78-88.
[120]
Matsumoto K, Nakasato N, Sedukhin S G. Blocked all-pairs shortest paths algorithm for hybrid CPU-GPU system. In Proc. the 13th IEEE Int. Conf. High Performance Computing and Communications, September 2011, pp.145-152.
[121]

Siek J G, Lee L Q, Lumsdaine A. The Boost Graph Library: User Guide and Reference Manual (PAP/CDR edition). Addison-Wesley Professional, 2001.

[122]
Ma X, Zhang D, Chiou D. FPGA-accelerated transactional execution of graph workloads. In Proc. the 2017 ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, February 2017, pp.227-236.
[123]
Zheng D, Mhembere D, Burns R, Vogelstein J, Priebe C E, SzalayA S. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proc. the 13th USENIX Conf. File and Storage Technologies, February 2015, pp.45-58.
[124]

Rodeh O. B-trees, shadowing, and clones. ACM Transactions on Storage, 2008, 3(4): Article No. 2.

[125]

Sha M, Li Y, He B, Tan K L. Accelerating dynamic graph analytics on GPUs. Proceedings of the VLDB Endowment, 2017, 11(1): 107-120.

[126]
Shi X, Cui B, Shao Y, Tong Y. Tornado: A system for real-time iterative analysis over evolving data. In Proc. the 2016 Int. Conf. Management of Data, June 2016, pp.417-430.
[127]

Chen H, Sun Z, Yi F, Su J. BufferBank storage: An economic, scalable and universally usable in-network storage model for streaming data applications. Science China Information Sciences, 2016, 59(1): 1-15.

[128]
Zhang M, Wu Y, Chen K, Qian X, Li X, Zheng W. Exploring the hidden dimension in graph processing. In Proc. the 12th USENIX Conf. Operating Systems Design and Implementation, November 2016, pp.285-300.
[129]
Battaglia P W, Hamrick J B, Bapst V et al. Relational inductive biases, deep learning, and graph networks. arXiv: 1806.01261, 2018. http://arxiv.org/abs/1806.01261, Jun. 2018.
[130]
Narayanan A, Chandramohan M, Venkatesan R, Chen L, Liu Y, Jaiswal S. graph2vec: Learning distributed representations of graphs. arXiv: 1707.05005, 2017. https://arxiv.org/abs/1707.05005, Jun. 2018.
[131]
Ribeiro L F, Saverese P H, Figueiredo D R. Struc2vec: Learning node representations from structural identity. In Proc. the 23rd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2017, pp.385-394.
[132]

Zheng L, Liao X, Jin H. Efficient and scalable graph parallel processing with symbolic execution. ACM Trans. Architecture and Code Optimization, 2018, 15(1): Article No. 3.

[133]
Li Z, Liu L, Deng Y, Yin S, Wang Y, Wei S. Aggressive pipelining of irregular applications on reconfigurable hardware. In Proc. the 44th Annual Int. Symp. Computer Architecture, June 2017, pp.575-586.
[134]
Zheng L, Liao X, Jin H, Zhao J, Wang Q. Scalable concurrency debugging with distributed graph processing. In Proc. the 2018 Int. Symp. Code Generation and Optimization, February 2018, pp.188-199.
[135]
Jouppi N P, Young C, Patil N et al. In-datacenter performance analysis of a tensor processing unit. In Proc. the 44th Annual Int. Symp. Computer Architecture, June 2017, pp.1-12.
[136]
Chen T, Du Z, Sun N, Wang J, Wu C, Chen Y, Temam O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 19th Int. Conf. Architectural Support for Programming Languages and Operating Systems, March 2014, pp.269-284.
Journal of Computer Science and Technology
Pages 339-371
Cite this article:
Gui C-Y, Zheng L, He B, et al. A Survey on Graph Processing Accelerators: Challenges and Opportunities. Journal of Computer Science and Technology, 2019, 34(2): 339-371. https://doi.org/10.1007/s11390-019-1914-z

388

Views

58

Crossref

N/A

Web of Science

67

Scopus

9

CSCD

Altmetrics

Received: 16 July 2018
Revised: 02 February 2019
Published: 22 March 2019
©2019 Springer Science + Business Media, LLC & Science Press, China
Return