AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.2 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

BPGM: A Big Graph Mining Tool

Yang LiuBin Wu( )Hongxu WangPengjiang Ma
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
Show Author Information

Abstract

The design and implementation of a scalable parallel mining system target for big graph analysis has proven to be challenging. In this study, we propose a parallel data mining system for analyzing big graph data generated on a Bulk Synchronous Parallel (BSP) computing model named BSP-based Parallel Graph Mining (BPGM). This system has four sets of parallel graph mining algorithms programmed in the BSP parallel model and a well-designed workflow engine optimized for cloud computing to invoke these algorithms. Experimental results show that the graph mining algorithm components in BPGM are efficient and have better performance than big cloud-based parallel data miner and BC-BSP.

References

[1]
J. Dean and G. Sanjay, MapReduce: Simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.
[2]
S. Marc, S. W. Otto, D. W. Walker, J. Dongarra, and S. Huss-Lederman, MPI: The Complete Reference. MIT press, 1995.
[3]
L. G. Valiant, A bridging model for parallel computation, Communications of the ACM, vol. 33, no. 8, pp. 103-111, 1990.
[4]
S. Owen, A. Robin, T. Dunning, and E. Friedman, Mahout in Action. Manning, 2011.
[5]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein, Graphlab: A new framework for parallel machine learning, arXiv preprint., arXiv:1006.4990, 2010.
[6]
U. Kang, C. E. Tsourakakis, and C. Faloutsos, Pegasus: A peta-scale graph mining system implementation and observations, in Data Mining, 2009 ICDM’09 Ninth IEEE International Conference on, 2009.
[7]
M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, Dryad: Distributed data-parallel programs from sequential building blocks, ACM SIGOPS Operating Systems Review, vol. 41, no. 3, pp. 59-72, 2007.
[8]
L. Yu, J. Zheng, W. Shen, B. Wu, B. Wang, L. Qian, and B. Zhang, BC-PDM: Data mining, social network analysis and text mining system based on cloud computing, presented at the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012.
[9]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, Pregel: A system for large-scale graph processing, in Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, 2010.
[10]
Y. Bao, Z. Wang, Y. Gu, G. Yu, F. Leng, H. Zhang, B. Chen, C. Deng, and L. Guo, BC-BSP: A BSP-based parallel iterative processing system for big data on cloud architecture, in Proc. Database Systems for Advanced Applications, Springer Berlin Heidelberg, 2013.
[11]
S. Seo, E. J. Yoon, J. Kim, S. Jin, J. Kim, and S. Maeng, Hama: An efficient matrix computation with the mapreduce framework, in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on, 2010.
[12]
L. Page, S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web, Technical report, SIDL-WP-1999-0120, Stanford University, 1998.
[13]
J. Pan, H. J. Yang, C. Faloutsos, and P. Duygulu, Automatic multimedia cross-modal correlation discovery, in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, USA, 2004, pp. 653-658.
[14]
J. M. Kleinberg, Authoritative sources in a hyperlinked environment, Journal of the ACM, vol. 46, no. 5, pp. 604-632, 1999.
[15]
M. Girvan and M. E. J. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences, vol. 99, no. 12, pp. 7821-7826, 2002.
[16]
A. Clauset, M. E. J. Newman, and C. Moore, Finding community structure in very large networks, Physical Review E, vol. 70, no. 6, pp. 1-6, 2004.
[17]
M. E. J. Newman and M. Girvan, Finding and evaluating community structure in networks, Physical Review E, vol. 69, no. 2, pp. 1-15, 2004.
[18]
U. N. Raghavan, R. Albert, and S. Kumara, Near linear time algorithm to detect community structures in large-scale networks, Physical Review E, vol. 76, no. 3, pp. 1-11, 2007.
[19]
Z. Zeng, B. Wu, and H. Wang, A parallel graph partitioning algorithm to speed up the largescale distributed graph mining, in The 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, 2012.
[20]
G. Karypis and V. Kumar, Metis-unstructured graph partitioning and sparse matrix ordering system, version 2.0, http://dm.kaist.ac.kr/kse625/resources/metis.pdf, 1995.
[21]
J. Yang and D. Zhang, Lightweight workflow engine based on Hadoop and OSGI, presented at the 5th IEEE International Conference on Broadband Network & Multimedia Technology, Beijing, China, 2013.
Tsinghua Science and Technology
Pages 33-38
Cite this article:
Liu Y, Wu B, Wang H, et al. BPGM: A Big Graph Mining Tool. Tsinghua Science and Technology, 2014, 19(1): 33-38. https://doi.org/10.1109/TST.2014.6733206

604

Views

13

Downloads

11

Crossref

N/A

Web of Science

12

Scopus

0

CSCD

Altmetrics

Received: 13 December 2013
Revised: 23 December 2013
Accepted: 24 December 2013
Published: 07 February 2014
© The author(s) 2014
Return