AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Cover Article

Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective

Xiao-Fei Liao1,2,3Wen-Ju Zhao1,2,3Hai Jin1,2,3( )Peng-Cheng Yao1,2,3,4Yu Huang1,2,3,4Qing-Gang Wang1,2,3,4Jin Zhao1,2,3,4Long Zheng1,2,3,4Yu Zhang1,2,3,4Zhi-Yuan Shao1,2,3,4
National Engineering Research Center for Big Data Technology and System, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan 430074, China
Services Computing Technology and System Laboratory, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan 430074, China
Cluster and Grid Computing Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Zhejiang Lab, Hangzhou 311121, China
Show Author Information

Abstract

Graph processing has been widely used in many scenarios, from scientific computing to artificial intelligence. Graph processing exhibits irregular computational parallelism and random memory accesses, unlike traditional workloads. Therefore, running graph processing workloads on conventional architectures (e.g., CPUs and GPUs) often shows a significantly low compute-memory ratio with few performance benefits, which can be, in many cases, even slower than a specialized single-thread graph algorithm. While domain-specific hardware designs are essential for graph processing, it is still challenging to transform the hardware capability to performance boost without coupled software codesigns. This article presents a graph processing ecosystem from hardware to software. We start by introducing a series of hardware accelerators as the foundation of this ecosystem. Subsequently, the codesigned parallel graph systems and their distributed techniques are presented to support graph applications. Finally, we introduce our efforts on novel graph applications and hardware architectures. Extensive results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.

Electronic Supplementary Material

Video
4150-Video.mp4
Download File(s)
JCST-2401-14150-Highlights.pdf (812.9 KB)

References

[1]

Wu S W, Sun F, Zhang W T, Xie X, Cui B. Graph neural networks in recommender systems: A survey. ACM Computing Surveys, 2023, 55(5): Article No. 97. DOI: 10.1145/3535101.

[2]

Bullmore E, Sporns O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 2009, 10(3): 186–198. DOI: 10.1038/NRN2575.

[3]

Wang B Y, Dabbaghjamanesh M, Kavousi-Fard A, Mehraeen S. Cybersecurity enhancement of power trading within the networked microgrids based on blockchain and directed acyclic graph approach. IEEE Trans. Industry Applications, 2019, 55(6): 7300–7309. DOI: 10.1109/TIA.2019.2919820.

[4]

Yin J, Tang M J, Cao J L, You M S, Wang H, Alazab M. Knowledge-driven cybersecurity intelligence: Software vulnerability coexploitation behavior discovery. IEEE Trans. Industrial Informatics, 2023, 19(4): 5593–5601. DOI: 10.1109/TII.2022.3192027.

[5]

Luo J W, He M K, Pan W K, Ming Z. BGNN: Behavior-aware graph neural network for heterogeneous session-based recommendation. Frontiers of Computer Science, 2023, 17(5): 175336. DOI: 10.1007/s11704-022-2100-y.

[6]

He D L, Yuan P P, Jin H. Answering reachability queries with ordered label constraints over labeled graphs. Frontiers of Computer Science, 2024, 18(1): 181601. DOI: 10.1007/s11704-022-2368-y.

[7]

Gui C Y, Zheng L, He B S, Liu C, Chen X Y, Liao X F, Jin H. A survey on graph processing accelerators: Challenges and opportunities. Journal of Computer Science and Technology, 2019, 34(2): 339–371. DOI: 10.1007/S11390-019-1914-Z.

[8]
Chen D, Jin H, Zheng L, Huang Y, Yao P C, Gui C Y, Wang Q G, Liu H F, He H H, Liao X F, Zheng R. A general offloading approach for near-DRAM processing-in-memory architectures. In Proc. the 2022 IEEE International Parallel and Distributed Processing Symposium, May 2022, pp.246-257. DOI: 10.1109/IPDPS53621.2022.00032.
[9]
Yao P C, Zheng L, Liao X F, Jin H, He B S. An efficient graph accelerator with parallel data conflict management. In Proc. the 27th International Conference on Parallel Architectures and Compilation Techniques, Nov. 2018, Article No. 8. DOI: 10.1145/3243176.3243201.
[10]

Jin H, Chen D, Zheng L, Huang Y, Yao P C, Zhao J, Liao X F, Jiang W B. Accelerating graph convolutional networks through a PIM-accelerated approach. IEEE Trans. Computers, 2023, 72(9): 2628–2640. DOI: 10.1109/TC.2023.3257514.

[11]

Wang D W, Cui W Q. An efficient graph data compression model based on the germ quotient set structure. Frontiers of Computer Science, 2022, 16(6): 166617. DOI: 10.1007/s11704-022-1489-7.

[12]

Fang P, Wang F, Shi Z, Feng D, Yi Q X, Xu X H, Zhang Y X. An efficient memory data organization strategy for application-characteristic graph processing. Frontiers of Computer Science, 2022, 16(1): Article No. 161607. DOI: 10.1007/s11704-020-0255-y.

[13]
Yao P C, Zheng L, Huang Y, Wang Q G, Gui C Y, Zeng Z, Liao X F, Jin H, Xue J L. ScalaGraph: A scalable accelerator for massively parallel graph processing. In Proc. the 2022 IEEE International Symposium on High-Performance Computer Architecture, Apr. 2022, pp.199–212. DOI: 10.1109/HPCA53966.2022.00023.
[14]
Yao P C, Zheng L, Zeng Z, Huang Y, Gui C Y, Liao X F, Jin H, Xue J L. A locality-aware energy-efficient accelerator for graph mining applications. In Proc. the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2020, pp.895–907. DOI: 10.1109/MICRO50266.2020.00077.
[15]
Rahman S, Abu-Ghazaleh N, Gupta R. GraphPulse: An event-driven hardware accelerator for asynchronous graph processing. In Proc. the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2020, pp.908–921. DOI: 10.1109/MICRO50266.2020.00078.
[16]

Jin H, Yao P C, Liao X F. Towards dataflow based graph processing. Science China Information Sciences, 2017, 60(12): Article No. 126102. DOI: 10.1007/s11432-017-9226-8.

[17]
Li K X, Xu S X, Shao Z Y, Zheng R, Liao X F, Jin H. ScalaBFS2: A high performance BFS accelerator on an HBM-enhanced FPGA chip. ACM Trans. Reconfigurable Technology and Systems. DOI: 10.1145/3650037. (accepted
[18]

Zhang Y, Liao X F, Jin H, Gu L, Tan G, Zhou B B. HotGraph: Efficient asynchronous processing for real-world graphs. IEEE Trans. Computers, 2017, 66(5): 799–809. DOI: 10.1109/TC.2016.2624289.

[19]
Chen D, Gui C Y, Zhang Y, Jin H, Zheng L, Huang Y, Liao X F. GraphFly: Efficient asynchronous streaming graphs processing via dependency-flow. In Proc. the 2022 International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2022. DOI: 10.1109/SC41404.2022.00050.
[20]
Zhang Y, Liao X F, Jin H, Gu L, He L G, He B S, Liu H K. CGraph: A correlations-aware approach for efficient concurrent iterative graph processing. In Proc. the 2018 USENIX Annual Technical Conference, Jul. 2018, pp.441–452. https://www.usenix.org/system/files/conference/atc18/atc18-zhang-yu.pdf, Oct. 2023.
[21]
Zhang Y, Liao X F, Jin H, He B S, Liu H K, Gu L. DiGraph: An efficient path-based iterative directed graph processing system on multiple GPUs. In Proc. the 24th International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2019, pp.601–614. DOI: 10.1145/3297858.3304029.
[22]
Wang Q G, Zheng L, Huang Y, Yao P C, Gui C Y, Liao X F, Jin H, Jiang W B, Mao F B. GraSU: A fast graph update library for FPGA-based dynamic graph processing. In Proc. the 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2021, pp.149–159. DOI: 10.1145/3431920.3439288.
[23]

Zhang Y, Liao X F, Jin H, Gu L, Zhou B B. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Trans. Knowledge and Data Engineering, 2018, 30(5): 895–907. DOI: 10.1109/TKDE. 2017.2781241.

[24]
Liu C Q, Liu H F, Zheng L, Huang Y, Ye X Y, Liao X F, Jin H. FNNG: A high-performance FPGA-based accelerator for K-nearest neighbor graph construction. In Proc. the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Feb. 2023, pp.67–77. DOI: 10.1145/3543622.3573189.
[25]
Wang Q G, Zheng L, Hu A, Huang Y, Yao P C, Gui C Y, Liao X F, Jin H, Xue J L. A data-centric accelerator for high-performance hypergraph processing. In Proc. the 55th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2022, pp.1326–1341. DOI: 10.1109/MICRO56248.2022.00088.
[26]
Chen D, He H H, Jin H, Zheng L, Huang Y, Shen X Y, Liao X F. MetaNMP: Leveraging Cartesian-like product to accelerate HGNNs with near-memory processing. In Proc. the 50th Annual International Symposium on Computer Architecture, Jun. 2023, Article No. 56. DOI: 10.1145/3579371.3589091.
[27]
Zheng L, Zhao J S, Huang Y, Wang Q G, Zeng Z, Xue J L, Liao X F, Jin H. Spara: An energy-efficient ReRAM-based accelerator for sparse graph analytics applications. In Proc. the 2020 IEEE International Parallel and Distributed Processing Symposium, May 2020, pp.696–707. DOI: 10.1109/IPDPS47924.2020.00077.
[28]
Huang Y, Zheng L, Yao P C, Wang Q G, Liao X F, Jin H, Xue J L. Accelerating graph convolutional networks using crossbar-based processing-in-memory architectures. In Proc. the 2022 IEEE International Symposium on High-Performance Computer Architecture, Apr. 2022, pp.1029–1042. DOI: 10.1109/HPCA53966.2022.00079.
[29]
Huang Y, Zheng L, Yao P C, Zhao J S, Liao X F, Jin H, Xue J L. A heterogeneous PIM hardware-software co-design for energy-efficient graph processing. In Proc. the 2020 IEEE International Parallel and Distributed Processing Symposium, May 2020, pp.684–695. DOI: 10.1109/IPDPS47924.2020.00076.
[30]
Ham T J, Wu L S, Sundaram N, Satish N, Martonosi M. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In Proc. the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2016. DOI: 10.1109/MICRO.2016.7783759.
[31]
Dai G H, Huang T H, Chi Y Z, Xu N Y, Wang Y, Yang H Z. ForeGraph: Exploring large-scale graph processing on multi-FPGA architecture. In Proc. the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Feb. 2017, pp.217–226. DOI: 10.1145/3020078.3021739.
[32]
Chen X Y, Chen Y, Cheng F, Tan H S, He B S, Wong W F. ReGraph: Scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines. In Proc. the 55th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2022, pp.1342–1358. DOI: 10.1109/MICRO56248.2022.00092.
[33]
Yan M Y, Hu X, Li S C, Basak A, Li H, Ma X, Akgun I, Feng Y J, Gu P, Deng L, Ye X C, Zhang Z M, Fan D R, Xie Y. Alleviating irregularity in graph analytics acceleration: A hardware/software co-design approach. In Proc. the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2019, pp.615–628. DOI: 10.1145/3352460.3358318.
[34]

Zhang Y F, Gao Q X, Gao L X, Wang C R. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Trans. Parallel and Distributed Systems, 2014, 25(8): 2091–2100. DOI: 10.1109/TPDS.2013.235.

[35]
Gonzalez J E, Low Y, Gu H J, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp.17–30. https://www.usenix.org/system/files/conference/osdi12/osdi12-final-167.pdf, Oct. 2023.
[36]
Vora K, Gupta R, Xu G Q. KickStarter: Fast and accurate computations on streaming graphs via trimmed approximations. In Proc. the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, Apr. 2017, pp.237–251. DOI: 10.1145/3037697.3037748.
[37]
Mariappan M, Vora K. GraphBolt: Dependency-driven synchronous processing of streaming graphs. In Proc. the 14th EuroSys Conference, Mar. 2019, Article No. 25. DOI: 10.1145/3302424.3303974.
[38]
Wang Y Z H, Davidson A, Pan Y C, Wu Y D, Riffel A, Owens J D. Gunrock: A high-performance graph processing library on the GPU. In Proc. the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb. 2016, Article No. 11. DOI: 10.1145/2851141.2851145.
[39]

Ben-Nun T, Sutton M, Pai S, Pingali K. Groute: An asynchronous multi-GPU programming model for irregular computations. ACM SIGPLAN Notices, 2017, 52(8): 235–248. DOI: 10.1145/3155284.3018756.

[40]
Dong W, Moses C, Li K. Efficient k-nearest neighbor graph construction for generic similarity measures. In Proc. the 20th International Conference on World Wide Web, Mar. 2011, pp.577–586. DOI: 10.1145/1963405.1963487.
[41]
Wang Q G, Zheng L, Yuan J R, Huang Y, Yao P C, Gui C Y, Hu A, Liao X F, Jin H. Hardware-accelerated hypergraph processing with chain-driven scheduling. In Proc. the 2022 IEEE International Symposium on High-Performance Computer Architecture, Apr. 2022, pp.184–198. DOI: 10.1109/HPCA53966.2022.00022.
[42]
Hu M, Strachan J P, Li Z Y, Grafals E M, Davila N, Graves C, Lam S, Ge N, Yang J J, Williams R S. Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication. In Proc. the 53rd Annual Design Automation Conference, Jun. 2016, Article No. 19. DOI: 10.1145/2897937.2898010.
[43]
Chi P, Li S C, Xu C, Zhang T, Zhao J S, Liu Y P, Wang Y, Xie Y. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In Proc. the 43rd Annual International Symposium on Computer Architecture, Jun. 2016, pp.27–39. DOI: 10.1109/ISCA.2016.13.
[44]
Song L H, Zhuo Y W, Qian X H, Li H, Chen Y R. GraphR: Accelerating graph processing using ReRAM. In Proc. the 2018 IEEE International Symposium on High Performance Computer Architecture, Feb. 2018, pp.531–543. DOI: 10.1109/HPCA.2018.00052.
[45]
Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks. arXiv: 1609.02907, 2016. https://arxiv.org/abs/1609.02907, Mar. 2024.
[46]

Jin T S, Dai H Q, Cao L J, Zhang B C, Huang F Y, Gao Y, Ji R R. Deepwalk-aware graph convolutional networks. Science China Information Sciences, 2022, 65(5): 152104. DOI: 10.1007/s11432-020-3318-5.

[47]

Bai J Y, Guo J, Wang C C, Chen Z Y, He Z, Yang S, Yu P P, Zhang Y, Guo Y W. Deep graph learning for spatially-varying indoor lighting prediction. Science China Information Sciences, 2023, 66(3): Article No. 132106. DOI: 10.1007/s11432- 022-3576-9.

[48]
Fey M, Lenssen J E. Fast graph representation learning with PyTorch geometric. arXiv: 1903.02428, 2019. https://arxiv.org/abs/1903.02428, Mar. 2024.
Journal of Computer Science and Technology
Pages 245-266
Cite this article:
Liao X-F, Zhao W-J, Jin H, et al. Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective. Journal of Computer Science and Technology, 2024, 39(2): 245-266. https://doi.org/10.1007/s11390-024-4150-0

270

Views

1

Crossref

1

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 26 January 2024
Accepted: 03 March 2024
Published: 30 March 2024
© Institute of Computing Technology, Chinese Academy of Sciences 2024
Return