Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective

Xiao-Fei Liao; Wen-Ju Zhao; Hai Jin; Peng-Cheng Yao; Yu Huang; Qing-Gang Wang; Jin Zhao; Long Zheng; Yu Zhang; Zhi-Yuan Shao

doi:10.1007/s11390-024-4150-0

| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Cover Article

Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective

Xiao-Fei Liao^{¹^,²^,³}, Wen-Ju Zhao^{¹^,²^,³}, Hai Jin^{¹^,²^,³}(), Peng-Cheng Yao^{¹^,²^,³^,⁴}, Yu Huang^{¹^,²^,³^,⁴}, Qing-Gang Wang^{¹^,²^,³^,⁴}, Jin Zhao^{¹^,²^,³^,⁴}, Long Zheng^{¹^,²^,³^,⁴}, Yu Zhang^{¹^,²^,³^,⁴}, Zhi-Yuan Shao^{¹^,²^,³^,⁴}

1National Engineering Research Center for Big Data Technology and System, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan 430074, China

2Services Computing Technology and System Laboratory, School of Computer Science and Technology Huazhong University of Science and Technology, Wuhan 430074, China

3Cluster and Grid Computing Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

4Zhejiang Lab, Hangzhou 311121, China

Show Author Information

Abstract

Graph processing has been widely used in many scenarios, from scientific computing to artificial intelligence. Graph processing exhibits irregular computational parallelism and random memory accesses, unlike traditional workloads. Therefore, running graph processing workloads on conventional architectures (e.g., CPUs and GPUs) often shows a significantly low compute-memory ratio with few performance benefits, which can be, in many cases, even slower than a specialized single-thread graph algorithm. While domain-specific hardware designs are essential for graph processing, it is still challenging to transform the hardware capability to performance boost without coupled software codesigns. This article presents a graph processing ecosystem from hardware to software. We start by introducing a series of hardware accelerators as the foundation of this ecosystem. Subsequently, the codesigned parallel graph systems and their distributed techniques are presented to support graph applications. Finally, we introduce our efforts on novel graph applications and hardware architectures. Extensive results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.

Keywords

graph processing hardware accelerator software system high performance ecosystem

Electronic Supplementary Material

Video

4150-Video.mp4

Download File(s)

JCST-2401-14150-Highlights.pdf (812.9 KB)

References

[1]

Wu S W, Sun F, Zhang W T, Xie X, Cui B. Graph neural networks in recommender systems: A survey. ACM Computing Surveys, 2023, 55(5): Article No. 97. DOI: 10.1145/3535101.

Crossref Google Scholar

[2]

Bullmore E, Sporns O. Complex brain networks: Graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 2009, 10(3): 186–198. DOI: 10.1038/NRN2575.

Crossref Google Scholar

[3]

Wang B Y, Dabbaghjamanesh M, Kavousi-Fard A, Mehraeen S. Cybersecurity enhancement of power trading within the networked microgrids based on blockchain and directed acyclic graph approach. IEEE Trans. Industry Applications, 2019, 55(6): 7300–7309. DOI: 10.1109/TIA.2019.2919820.

Crossref Google Scholar

[4]

Yin J, Tang M J, Cao J L, You M S, Wang H, Alazab M. Knowledge-driven cybersecurity intelligence: Software vulnerability coexploitation behavior discovery. IEEE Trans. Industrial Informatics, 2023, 19(4): 5593–5601. DOI: 10.1109/TII.2022.3192027.

Crossref Google Scholar

[5]

Luo J W, He M K, Pan W K, Ming Z. BGNN: Behavior-aware graph neural network for heterogeneous session-based recommendation. Frontiers of Computer Science, 2023, 17(5): 175336. DOI: 10.1007/s11704-022-2100-y.

Crossref Google Scholar

[6]

He D L, Yuan P P, Jin H. Answering reachability queries with ordered label constraints over labeled graphs. Frontiers of Computer Science, 2024, 18(1): 181601. DOI: 10.1007/s11704-022-2368-y.

Crossref Google Scholar

[7]

Gui C Y, Zheng L, He B S, Liu C, Chen X Y, Liao X F, Jin H. A survey on graph processing accelerators: Challenges and opportunities. Journal of Computer Science and Technology, 2019, 34(2): 339–371. DOI: 10.1007/S11390-019-1914-Z.

Crossref Google Scholar

[8]

Chen D, Jin H, Zheng L, Huang Y, Yao P C, Gui C Y, Wang Q G, Liu H F, He H H, Liao X F, Zheng R. A general offloading approach for near-DRAM processing-in-memory architectures. In Proc. the 2022 IEEE International Parallel and Distributed Processing Symposium, May 2022, pp.246-257. DOI: 10.1109/IPDPS53621.2022.00032.