| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

Electronic Supplementary Material

References

Show full outline

Hide outline

Regular Paper

Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach

Hui Yu, Xin-Yu Jiang, Jin Zhao, Hao Qi, Yu Zhang(), Xiao-Fei Liao, Hai-Kun Liu, Fu-Bing Mao, Hai Jin

National Engineering Research Center for Big Data Technology and System, Huazhong University of Science and Technology, Wuhan 430074, China

Service Computing Technology and System Laboratory, Huazhong University of Science and Technology Wuhan 430074, China

Cluster and Grid Computing Laboratory, Huazhong University of Science and Technology, Wuhan 430074, China

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

Show Author Information

Abstract

Many systems have been built to employ the delta-based iterative execution model to support iterative algorithms on distributed platforms by exploiting the sparse computational dependencies between data items of these iterative algorithms in a synchronous or asynchronous approach. However, for large-scale iterative algorithms, existing synchronous solutions suffer from slow convergence speed and load imbalance, because of the strict barrier between iterations; while existing asynchronous approaches induce excessive redundant communication and computation cost as a result of being barrier-free. In view of the performance trade-off between these two approaches, this paper designs an efficient execution manager, called Aiter-R, which can be integrated into existing delta-based iterative processing systems to efficiently support the execution of delta-based iterative algorithms, by using our proposed group-based iterative execution approach. It can efficiently and correctly explore the middle ground of the two extremes. A heuristic scheduling algorithm is further proposed to allow an iterative algorithm to adaptively choose its trade-off point so as to achieve the maximum efficiency. Experimental results show that Aiter-R strikes a good balance between the synchronous and asynchronous policies and outperforms state-of-the-art solutions. It reduces the execution time by up to 54.1% and 84.6% in comparison with existing asynchronous and the synchronous models, respectively.

Keywords

iterative algorithm delta-based execution model efficiency

Electronic Supplementary Material

Download File(s)

2101_ESM.pdf (117 KB)

References

[1]

Zhang Y, Gao Q, Gao L, Wang C. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(8): 2091-2100. DOI: 10.1109/TPDS.2013.235.

Crossref Google Scholar

[2]

Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp. 17-30.

[3]

Mihaylov S R, Ives Z G, Guha S. REX: Recursive, delta-based data-centric computation. Proc. the VLDB Endowment, 2012, 5(11): 1280-1291. DOI: 10.14778/2350229.2350246.

Crossref Google Scholar

[4]

Yu W, Lin X, Zhang W. Fast incremental SimRank on link-evolving graphs. In Proc. the 30th IEEE International Conference on Data Engineering, Mar. 31-Apr. 4, 2014, pp. 304-315. DOI: 10.1109/ICDE.2014.6816660.

[5]

Zhang Y, Chen S, Wang Q, Yu G. i²MapReduce: Incremental MapReduce for mining evolving big data. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(7): 1906-1919. DOI: 10.1109/TKDE.2015.2397438.

Crossref Google Scholar

[6]

Zhang Y, Liao X, Jin H, Gu L, Zhou B B. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(5): 895-907. DOI: 10.1109/TKDE.2017.2781241.

Crossref Google Scholar

[7]

Zhang Y, Liao X, Shi X, Jin H, He B. Efficient disk-based directed graph processing: A strongly connected component approach. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(4): 830-842. DOI: 10.1109/TPDS.2017.2776115.

Crossref Google Scholar

[8]

Hou G, Chen X, Wang S, Wei Z. Massively parallel algorithms for personalized PageRank. Proc. the VLDB Endowment, 2021, 14(9): 1668-1680. DOI: 10.14778/3461535.3461554.

Crossref Google Scholar

[9]

Chen H, Jin H, Cui X. Hybrid followee recommendation in microblogging systems. Science China Information Sciences, 2017, 60(1): Article No. 012102. DOI: 10.1007/s11432-016-5551-7.

[10]

Liao X, Chen Y, Zhang Y et al. An efficient incremental strongly connected components algorithm for evolving directed graphs. Scientia Sinica Informationis, 2019, 49(8): 988-1004. DOI: 10.1360/N112018-00125.(in Chinese)

Crossref Google Scholar

[11]

Baluja S, Seth R, Sivakumar D et al. Video suggestion and discovery for YouTube: Taking random walks through the view graph. In Proc. the 17th International Conference on World Wide Web, Apr. 2008, pp. 895-904. DOI: 10.1145/1367497.1367618.

[12]

Liben-Nowell D, Kleinberg J. The link prediction problem for social networks. In Proc. the 12th International Conference on Information and Knowledge Management, Nov. 2003, pp. 556-559. DOI: 10.1145/956863.956972.

[13]

Shroff G M. A parallel algorithm for the eigenvalues and eigenvectors of a general complex matrix. Numerische Mathematik, 1990, 58(1): 779-805. DOI: 10.1007/BF01385654.

Crossref Google Scholar

[14]

Zhang Y, Liao X, Jin H, Min G. Resisting skew-accumulation for time-stepped applications in the cloud via exploiting parallelism. IEEE Transactions on Cloud Computing, 2015, 3(1): 54-65. DOI: 10.1109/TCC.2014.2328594.

Crossref Google Scholar

[15]

Zhang Y, Liao X, Jin H, Tan G, Min G. Inc-part: Incremental partitioning for load balancing in large-scale behavioral simulations. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(7): 1900-1909. DOI: 10.1109/TPDS.2014.2333511.

Crossref Google Scholar

[16]

Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S, Qiu J, Fox G. Twister: A runtime for iterative MapReduce. In Proc. the 19th ACM International Symposium on High Performance Distributed Computing, Jun. 2010, pp. 810-818. DOI: 10.1145/1851476.1851593.

[17]

Bu Y, Howe B, Balazinska M, Ernst M D. HaLoop: Efficient iterative data processing on large clusters. Proc. the VLDB Endowment, 2010, 3(1): 285-296. DOI: 10.14778/1920841.1920881.

Crossref Google Scholar

[18]

Power R, Li J. Piccolo: Building fast, distributed programs with partitioned tables. In Proc. the 9th USENIX Conference on Operating Systems Design and Implementation, Oct. 2010, pp. 293-306.

[19]

Zaharia M, Chowdhury M, FranklinM J, Shenker S, Stoica I. Spark: Cluster computing with working sets. In Proc. the 2nd USENIX Workshop on Hot Topics in Cloud Computing, Jun. 2010.

[20]

Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, Jun. 2010, pp. 135-146. DOI: 10.1145/1807167.1807184.

[21]

Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W. Chaos: Scale-out graph processing from secondary storage. In Proc. the 25th Symposium on Operating Systems Principles, Oct. 2015, pp. 410-424. DOI: 10.1145/2815400.2815408.

[22]

Chen R, Shi J, Chen Y, Chen H. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proc. the 10th European Conference on Computer Systems, Apr. 2015, Article No. 1. DOI: 10.1145/2741948.2741970.

[23]

Chazan D, Miranker W. Chaotic relaxation. Linear Algebra and Its Applications, 1969, 2(2): 199-222. DOI: 10.1016/0024-3795(69)90028-7.

Crossref Google Scholar

[24]

Baudet G M. Asynchronous iterative methods for multiprocessors. Journal of the ACM, 1978, 25(2): 226-244. DOI: 10.1145/322063.322067.

Crossref Google Scholar

[25]

Bertsekas D P. Distributed asynchronous computation of fixed points. Mathematical Programming, 1983, 27(1): 107-120. DOI: 10.1007/BF02591967.

Crossref Google Scholar

[26]

Liu H K, Chen D, Jin H, Liao X F, He B S, Hu K, Zhang Y. A survey of non-volatile main memory technologies: State-of-the-arts, practices, and future directions. Journal of Computer Science and Technology, 2021, 36(1): 4-32. DOI: 10.1007/s11390-020-0780-z.

Crossref Google Scholar

[27]

Lv X Q, Xiao W, Zhang Y, Liao X F, Jin H, Hua S Q. An effective framework for asynchronous incremental graph processing. Frontiers of Computer Science, 2019, 13(3): 539-551. DOI: 10.1007/s11704-018-7443-z.

Crossref Google Scholar

[28]

Murray D G, Schwarzkopf M, Smowton C, Smith S, Madhavapeddy A, Hand S. CIEL: A universal execution engine for distributed data-flow computing. In Proc. the 8th USENIX Conference on Networked Systems Design and Implementation, Mar. 30-Apr. 1, 2011, pp. 113-126.

[29]

Dai D, Chen Y, Kimpe D, Ross R B. Trigger-based incremental data processing with unified sync and async model. IEEE Transactions on Cloud Computing, 2021, 9(1): 372-385. DOI: 10.1109/TCC.2018.2830348.

Crossref Google Scholar

[30]

Zhang Y, Gao Q, Gao L, Wang C. PrIter: A distributed framework for prioritized iterative computations. In Proc. the 2nd ACM Symposium on Cloud Computing, Oct. 2011, Article No. 13. DOI: 10.1145/2038916.2038929.

[31]

Takács G, Pilászy I, Németh B, Tikk D. Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 2009, 10: 623-656.

Journal of Computer Science and Technology

Volume 37 Issue 4,
July 2022

Pages 797-813

DOI: 10.1007/s11390-022-2101-1

Cite this article:

Yu H, Jiang X-Y, Zhao J, et al. Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach. Journal of Computer Science and Technology, 2022, 37(4): 797-813. https://doi.org/10.1007/s11390-022-2101-1

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号