AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences Beijing 100190, China
University of Chinese Academy of Sciences, Beijing 100049, China
Show Author Information

Abstract

It is challenging to model the performance of distributed graph computation. Explicit formulation cannot easily capture the diversified factors and complex interactions in the system. Statistical learning methods require a large number of training samples to generate an accurate prediction model. However, it is time-consuming to run the required graph computation tests to obtain the training samples. In this paper, we propose TransGPerf, a transfer learning based solution that can exploit prior knowledge from a source scenario and utilize a manageable amount of training data for modeling the performance of a target graph computation scenario. Experimental results show that our proposed method is capable of generating accurate models for a wide range of graph computation tasks on PowerGraph and GraphX. It outperforms transfer learning methods proposed for other applications in the literature.

Electronic Supplementary Material

Download File(s)
jcst-36-4-778-Highlights.pdf (175.7 KB)

References

[1]
Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, June 2010, pp.135-146. DOI: 10.1145/1807167.1807184.
[2]
Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, October 2012, pp.17-30.
[3]
Xin R S, Gonzalez J E, Franklin M J, Stoica I. GraphX: A resilient distributed graph system on Spark. In Proc. the 1st International Workshop on Graph Data Management Experiences and Systems, June 2013, Article No. 2. DOI: 10.1145/2484425.2484427.
[4]
Niu S, Chen S. Optimizing CPU cache performance for Pregel-like graph computation. In Proc. the 31st IEEE International Conference on Data Engineering Workshops, April 2015, pp.149-154. DOI: 10.1109/ICDEW.2015.7129568.
[5]

Han M, Daudjee K, Ammar K, Özsu M T, Wang X, Jin T. An experimental comparison of Pregel-like graph processing systems. Proc. VLDB Endow., 2014, 7(12): 1047-1058. DOI: 10.14778/2732977.2732980.

[6]

Iosup A, Hegeman T, Ngai W L et al. LDBC graphalytics: A benchmark for large-scale graph analysis on parallel and distributed platforms. Proc. VLDB Endow., 2016, 9(13): 1317-1328. DOI: 10.14778/3007263.3007270.

[7]
Ngai W L, Hegeman T, Heldens S, Iosup A. Granula: Toward fine-grained performance analysis of largescale graph processing platforms. In Proc. the 5th International Workshop on Graph Data-management Experiences & Systems, May 2017, Article No. 8. DOI: 10.1145/3078447.3078455.
[8]
Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin F B, Babu S. Starfish: A self-tuning system for big data analytics. In Proc. the 5th Biennial Conference on Innovative Data Systems Research, January 2011, pp.261-272.
[9]
Venkataraman S, Yang Z, Franklin M J, Recht B, Stoica I. Ernest: Efficient performance prediction for large-scale advanced analytics. In Proc. the 13th USENIX Symposium on Networked Systems Design and Implementation, March 2016, pp.363-378.
[10]

Pan S J, Yang Q. A survey on transfer learning. IEEE Trans. Knowledge and Data Engineering, 2010, 22(10): 1345-1359. DOI: 10.1109/TKDE.2009.191.

[11]

Weiss K R, Khoshgoftaar T M, Wang D. A survey of transfer learning. J. Big Data, 2016, 3: Article No. 9. DOI: 10.1186/s40537-016-0043-6.

[12]
Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T. Deep domain confusion: Maximizing for domain invariance. arXiv: 1412.3474, 2014. https://arxiv.org/pdf/141-2.3474.pdf, May 2021.
[13]
Long M, Cao Y, Wang J, Jordan M I. Learning transferable features with deep adaptation networks. In Proc. the 32nd International Conference on Machine Learning, July 2015, pp.97-105.
[14]
Long M, Zhu H, Wang J, Jordan M I. Deep transfer learning with joint adaptation networks. In Proc. the 34th International Conference on Machine Learning, August 2017, pp.2208-2217.
[15]
Sun B, Saenko K. Deep CORAL: Correlation alignment for deep domain adaptation. In Proc. the 2016 European Conference on Computer Vision Workshops, October 2016, pp.443-450. DOI: 10.1007/978-3-319-49409-8_35.
[16]
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778. DOI: 10.1109/CVPR.2016.90.
[17]
Barker K J, Pakin S, Kerbyson D J. A performance model of the Krak hydrodynamics application. In Proc. the 2006 International Conference on Parallel Processing, August 2006, pp.245-254. DOI: 10.1109/ICPP.2006.11.
[18]
Kerbyson D J, Alme H J, Hoisie A, Petrini F, Wasserman H J, Gittings M L. Predictive performance and scalability modeling of a large-scale application. In Proc. the 2001 ACM/IEEE Conference on Supercomputing, November 2001, Article No. 37. DOI: 10.1145/582034.582071.
[19]
Sundaram-Stukel D, Vernon M K. Predictive analysis of a wavefront application using logGP. In Proc. the 1999 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, May 1999, pp.141-150. DOI: 10.1145/301104.301117.
[20]
Bhattacharyya A, Hoefler T. PEMOGEN: Automatic adaptive performance modeling during program runtime. In Proc. the 23rd International Conference on Parallel Architectures and Compilation, August 2014, pp.393-404. DOI: 10.1145/2628071.2628100.
[21]
Bhattacharyya A, Kwasniewski G, Hoefler T. Using compiler techniques to improve automatic performance modeling. In Proc. the 2015 International Conference on Parallel Architectures and Compilation, October 2015, pp.468-479. DOI: 10.1109/PACT.2015.39.
[22]
Calotoiu A, Beckingsale D, Earl C W, Hoefler T, Karlin I, Schulz M, Wolf F. Fast multi-parameter performance modeling. In Proc. the 2016 IEEE International Conference on Cluster Computing, September 2016, pp.172-181. DOI: 10.1109/CLUSTER.2016.57.
[23]

Sun J, Sun G, Zhan S, Zhang J, Chen Y. Automated performance modeling of HPC applications using machine learning. IEEE Trans. Computers, 2020, 69(5): 749-763. DOI: 10.1109/TC.2020.2964767.

[24]

Pan S J, Tsang I W, Kwok J T, Yang Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Networks, 2011, 22(2): 199-210. DOI: 10.1109/TNN.2010.2091281.

[25]
Sun B, Feng J, Saenko K. Return of frustratingly easy domain adaptation. In Proc. the 30th AAAI Conference on Artificial Intelligence, February 2016, pp.2058-2065.
[26]
Tzeng E, Hoffman J, Darrell T, Saenko K. Simultaneous deep transfer across domains and tasks. In Proc. the 2015 IEEE International Conference on Computer Vision, December 2015, pp.4068-4076. DOI: 10.1109/ICCV.2015.463.
[27]
Long M, Zhu H, Wang J, Jordan M I. Unsupervised domain adaptation with residual transfer networks. In Proc. the 30th Annual Conference on Neural Information Processing Systems, December 2016, pp.136-144.
[28]

Leskovec J, Sosič R. SNAP: A general-purpose network analysis and graph-mining library. ACM Trans. Intell. Syst. Technol., 2016, 8(1): Article No. 1. DOI: 10.1145/2898361.

[29]
Chakrabarti D, Zhan Y, Faloutsos C. R-MAT: A recursive model for graph mining. In Proc. the 4th SIAM International Conference on Data Mining, April 2004, pp.442-446. DOI: 10.1137/1.9781611972740.43.
[30]
Faloutsos M, Faloutsos P, Faloutsos C. On power-law relationships of the Internet topology. In Proc. the 1999 ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, August 30–September 3, 1999, pp.251-262. DOI: 10.1145/316188.316229.
[31]
Leskovec J, Chakrabarti D, Kleinberg J M, Faloutsos C. Realistic, mathematically tractable graph generation and evolution, using Kronecker multiplication. In Proc. the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, October 2005, pp.133-145. DOI: 10.1007/11564126_17.
[32]
Park H, Kim M. TrillionG: A trillion-scale synthetic graph generator using a recursive vector model. In Proc. the 2017 ACM International Conference on Management of Data, May 2017, pp.913-928. DOI: 10.1145/3035918.3064014.
[33]
Boldi P, Vigna S. The webgraph framework I: Compression techniques. In Proc. the 13th International Conference on World Wide Web, May 2004, pp.595-602. DOI: 10.1145/988672.988752.
Journal of Computer Science and Technology
Pages 778-791
Cite this article:
Niu S, Chen S. TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance. Journal of Computer Science and Technology, 2021, 36(4): 778-791. https://doi.org/10.1007/s11390-021-1356-2

370

Views

0

Crossref

0

Web of Science

0

Scopus

0

CSCD

Altmetrics

Received: 02 February 2021
Accepted: 10 July 2021
Published: 05 July 2021
©Institute of Computing Technology, Chinese Academy of Sciences 2021
Return