AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Regular Paper

Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach

National Engineering Research Center for Big Data Technology and System, Huazhong University of Science and Technology, Wuhan 430074, China
Service Computing Technology and System Laboratory, Huazhong University of Science and Technology Wuhan 430074, China
Cluster and Grid Computing Laboratory, Huazhong University of Science and Technology, Wuhan 430074, China
School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Show Author Information

Abstract

Many systems have been built to employ the delta-based iterative execution model to support iterative algorithms on distributed platforms by exploiting the sparse computational dependencies between data items of these iterative algorithms in a synchronous or asynchronous approach. However, for large-scale iterative algorithms, existing synchronous solutions suffer from slow convergence speed and load imbalance, because of the strict barrier between iterations; while existing asynchronous approaches induce excessive redundant communication and computation cost as a result of being barrier-free. In view of the performance trade-off between these two approaches, this paper designs an efficient execution manager, called Aiter-R, which can be integrated into existing delta-based iterative processing systems to efficiently support the execution of delta-based iterative algorithms, by using our proposed group-based iterative execution approach. It can efficiently and correctly explore the middle ground of the two extremes. A heuristic scheduling algorithm is further proposed to allow an iterative algorithm to adaptively choose its trade-off point so as to achieve the maximum efficiency. Experimental results show that Aiter-R strikes a good balance between the synchronous and asynchronous policies and outperforms state-of-the-art solutions. It reduces the execution time by up to 54.1% and 84.6% in comparison with existing asynchronous and the synchronous models, respectively.

Electronic Supplementary Material

Download File(s)
2101_ESM.pdf (117 KB)

References

[1]

Zhang Y, Gao Q, Gao L, Wang C. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(8): 2091-2100. DOI: 10.1109/TPDS.2013.235.

[2]
Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp. 17-30.
[3]

Mihaylov S R, Ives Z G, Guha S. REX: Recursive, delta-based data-centric computation. Proc. the VLDB Endowment, 2012, 5(11): 1280-1291. DOI: 10.14778/2350229.2350246.

[4]
Yu W, Lin X, Zhang W. Fast incremental SimRank on link-evolving graphs. In Proc. the 30th IEEE International Conference on Data Engineering, Mar. 31-Apr. 4, 2014, pp. 304-315. DOI: 10.1109/ICDE.2014.6816660.
[5]

Zhang Y, Chen S, Wang Q, Yu G. i2MapReduce: Incremental MapReduce for mining evolving big data. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(7): 1906-1919. DOI: 10.1109/TKDE.2015.2397438.

[6]

Zhang Y, Liao X, Jin H, Gu L, Zhou B B. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(5): 895-907. DOI: 10.1109/TKDE.2017.2781241.

[7]

Zhang Y, Liao X, Shi X, Jin H, He B. Efficient disk-based directed graph processing: A strongly connected component approach. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(4): 830-842. DOI: 10.1109/TPDS.2017.2776115.

[8]

Hou G, Chen X, Wang S, Wei Z. Massively parallel algorithms for personalized PageRank. Proc. the VLDB Endowment, 2021, 14(9): 1668-1680. DOI: 10.14778/3461535.3461554.

[9]
Chen H, Jin H, Cui X. Hybrid followee recommendation in microblogging systems. Science China Information Sciences, 2017, 60(1): Article No. 012102. DOI: 10.1007/s11432-016-5551-7.
[10]

Liao X, Chen Y, Zhang Y et al. An efficient incremental strongly connected components algorithm for evolving directed graphs. Scientia Sinica Informationis, 2019, 49(8): 988-1004. DOI: 10.1360/N112018-00125.(in Chinese)

[11]
Baluja S, Seth R, Sivakumar D et al. Video suggestion and discovery for YouTube: Taking random walks through the view graph. In Proc. the 17th International Conference on World Wide Web, Apr. 2008, pp. 895-904. DOI: 10.1145/1367497.1367618.
[12]
Liben-Nowell D, Kleinberg J. The link prediction problem for social networks. In Proc. the 12th International Conference on Information and Knowledge Management, Nov. 2003, pp. 556-559. DOI: 10.1145/956863.956972.
[13]

Shroff G M. A parallel algorithm for the eigenvalues and eigenvectors of a general complex matrix. Numerische Mathematik, 1990, 58(1): 779-805. DOI: 10.1007/BF01385654.

[14]

Zhang Y, Liao X, Jin H, Min G. Resisting skew-accumulation for time-stepped applications in the cloud via exploiting parallelism. IEEE Transactions on Cloud Computing, 2015, 3(1): 54-65. DOI: 10.1109/TCC.2014.2328594.

[15]

Zhang Y, Liao X, Jin H, Tan G, Min G. Inc-part: Incremental partitioning for load balancing in large-scale behavioral simulations. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(7): 1900-1909. DOI: 10.1109/TPDS.2014.2333511.

[16]
Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S, Qiu J, Fox G. Twister: A runtime for iterative MapReduce. In Proc. the 19th ACM International Symposium on High Performance Distributed Computing, Jun. 2010, pp. 810-818. DOI: 10.1145/1851476.1851593.
[17]

Bu Y, Howe B, Balazinska M, Ernst M D. HaLoop: Efficient iterative data processing on large clusters. Proc. the VLDB Endowment, 2010, 3(1): 285-296. DOI: 10.14778/1920841.1920881.

[18]
Power R, Li J. Piccolo: Building fast, distributed programs with partitioned tables. In Proc. the 9th USENIX Conference on Operating Systems Design and Implementation, Oct. 2010, pp. 293-306.
[19]
Zaharia M, Chowdhury M, FranklinM J, Shenker S, Stoica I. Spark: Cluster computing with working sets. In Proc. the 2nd USENIX Workshop on Hot Topics in Cloud Computing, Jun. 2010.
[20]
Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, Jun. 2010, pp. 135-146. DOI: 10.1145/1807167.1807184.
[21]
Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W. Chaos: Scale-out graph processing from secondary storage. In Proc. the 25th Symposium on Operating Systems Principles, Oct. 2015, pp. 410-424. DOI: 10.1145/2815400.2815408.
[22]
Chen R, Shi J, Chen Y, Chen H. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proc. the 10th European Conference on Computer Systems, Apr. 2015, Article No. 1. DOI: 10.1145/2741948.2741970.
[23]

Chazan D, Miranker W. Chaotic relaxation. Linear Algebra and Its Applications, 1969, 2(2): 199-222. DOI: 10.1016/0024-3795(69)90028-7.

[24]

Baudet G M. Asynchronous iterative methods for multiprocessors. Journal of the ACM, 1978, 25(2): 226-244. DOI: 10.1145/322063.322067.

[25]

Bertsekas D P. Distributed asynchronous computation of fixed points. Mathematical Programming, 1983, 27(1): 107-120. DOI: 10.1007/BF02591967.

[26]

Liu H K, Chen D, Jin H, Liao X F, He B S, Hu K, Zhang Y. A survey of non-volatile main memory technologies: State-of-the-arts, practices, and future directions. Journal of Computer Science and Technology, 2021, 36(1): 4-32. DOI: 10.1007/s11390-020-0780-z.

[27]

Lv X Q, Xiao W, Zhang Y, Liao X F, Jin H, Hua S Q. An effective framework for asynchronous incremental graph processing. Frontiers of Computer Science, 2019, 13(3): 539-551. DOI: 10.1007/s11704-018-7443-z.

[28]
Murray D G, Schwarzkopf M, Smowton C, Smith S, Madhavapeddy A, Hand S. CIEL: A universal execution engine for distributed data-flow computing. In Proc. the 8th USENIX Conference on Networked Systems Design and Implementation, Mar. 30-Apr. 1, 2011, pp. 113-126.
[29]

Dai D, Chen Y, Kimpe D, Ross R B. Trigger-based incremental data processing with unified sync and async model. IEEE Transactions on Cloud Computing, 2021, 9(1): 372-385. DOI: 10.1109/TCC.2018.2830348.

[30]
Zhang Y, Gao Q, Gao L, Wang C. PrIter: A distributed framework for prioritized iterative computations. In Proc. the 2nd ACM Symposium on Cloud Computing, Oct. 2011, Article No. 13. DOI: 10.1145/2038916.2038929.
[31]

Takács G, Pilászy I, Németh B, Tikk D. Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 2009, 10: 623-656.

Journal of Computer Science and Technology
Pages 797-813
Cite this article:
Yu H, Jiang X-Y, Zhao J, et al. Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach. Journal of Computer Science and Technology, 2022, 37(4): 797-813. https://doi.org/10.1007/s11390-022-2101-1

462

Views

0

Crossref

0

Web of Science

1

Scopus

0

CSCD

Altmetrics

Received: 21 December 2021
Revised: 17 June 2022
Accepted: 29 June 2022
Published: 25 July 2022
©Institute of Computing Technology, Chinese Academy of Sciences 2022
Return