AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (10.1 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Balance Resource Allocation for Spark Jobs Based on Prediction of the Optimal Resource

College of Computer, National University of Defense Technology, Changsha 410073, China.
College of System Engineering, National University of Defense Technology, Changsha 410073, China.
Show Author Information

Abstract

Apache Spark provides a well-known MapReduce computing framework, aiming to fast-process big data analytics in data-parallel manners. With this platform, large input data are divided into data partitions. Each data partition is processed by multiple computation tasks concurrently. Outputs of these computation tasks are transferred among multiple computers via the network. However, such a distributed computing framework suffers from system overheads, inevitably caused by communication and disk I/O operations. System overheads take up a large proportion of the Job Completion Time (JCT). We observed that excessive computational resources incurs considerable system overheads, prolonging the JCT. The over-allocation of individual jobs not only prolongs their own JCTs, but also likely makes other jobs suffer from under-allocation. Thus, the average JCT is suboptimal, too. To address this problem, we propose a prediction model to estimate the changing JCT of a single Spark job. With the support of the prediction method, we designed a heuristic algorithm to balance the resource allocation of multiple Spark jobs, aiming to minimize the average JCT in multiple-job cases. We implemented the prediction model and resource allocation method in ReB, a Resource-Balancer based on Apache Spark. Experimental results showed that ReB significantly outperformed the traditional max-min fairness and shortest-job-optimal methods. The average JCT was decreased by around 10%-30% compared to the existing solutions.

References

[1]
C. Mosharaf, Z. Matei, M. Justin, J. Michael, and S. Ion, Managing data transfers in computer clusters with Orchestra, ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 98-109, 2011.
[2]
J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols, M. Satyanarayanan, R. N. Sidebotham, and M. J. West, Scale and performance in a distributed file system, ACM Transaction on Computer System, vol. 6, no. 1, pp. 51-81, 1988.
[3]
D. P. Woodruff and Q. Zhang, When distributed computation is communication expensive, Distributed Computing, vol. 30, no. 5, pp. 309-323, 2017.
[4]
J. Dean and S. Ghemawat, MapReduce: Simplified data processing on large clusters, in Proceedings of USENIX Symposium on Operating System Design and Implementation (OSDI’04), San Francisco, CA, USA, 2004, pp. 137-150.
[5]
Z. Shen, S. Subbiah, X. Gu, and J. Wilkes, Cloudscale: Elastic resource scaling for multi-tenant cloud systems, in Proceedings of ACM Symposium on Cloud Computing (SOCC’11), Cascais, Portugal, 2011, pp. 5-17.
[6]
C. Delimitrou and C. Kozyrakis, Quasar: Resource-efficient and QoS-aware cluster management, in Proceedings of ACM Architectural Support for Programming Languages and Operating Systems (ASPLOS’14), Salt Lake City, UT, USA, 2014, pp. 127-144.
[7]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica, Mesos: A platform for fine-grained resource sharing in the data center, in Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI’11), Boston, MA, USA, 2011, pp. 429-483.
[8]
V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, and S. Seth, Apache Hadoop YARN: Yet another resource negotiator, in Proceedings of ACM Symposium on Cloud Computing (SOCC’13), Santa Clara, CA, USA, 2013, pp. 1-16.
[9]
H. Mao, M. Alizadeh, I. Menache, and S. Kandula, Resource management with deep reinforcement learning, in Proceedings of ACM HotNet Workshop on Hot Topics in Networks (HotNet’16), Atlanta, GA, USA, 2016, pp. 50-56.
[10]
T. Bonald, L. Massouli, A. Prouti, and J. T. Virtamo, A queueing analysis of max-min fairness, proportional fairness and balanced fairness, Queueing Systems, vol. 53, nos. 1&2, pp. 65-84, 2006.
[11]
A. Ghodsi, M. Zaharia, B. Hindman, A. Konwinski, S. Shenker, and I. Stoica, Dominant resource fairness: Fair allocation of multiple resource types, in Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI’13), Boston, MA, USA, 2013, pp. 323-336.
[12]
M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling, in Proceedings of European Conference on Computer Systems (EuroSys’10), Paris, France, 2010, pp. 265-278.
[13]
R. Grandl, G. Ananthanarayanan, S. Kandula, S. Rao, and A. Akella, Multi-resource packing for cluster schedulers, in Proceedings of ACM Special Interest Group on Data Communication (SIGCOMM’14), Chicago, IL, USA, 2014, pp. 455-466.
[14]
S. Venkataraman, Z. Yang, M. J. Franklin, B. Recht, and I. Stoica, Ernest: Efficient performance prediction for large-scale advanced analytics, in Proceedings of USENIX Symposium on Networked Systems Design and Implementatio (NSDI’16), Santa Clara, CA, USA, 2016, pp. 363-378.
[15]
Z. Bei, Z. Yu, H. Zhang, W. Xiong, C. Xu, L. Eeckhout, and S. Feng, RFHOC: A random-forest approach to auto-tuning Hadoop’s configuration, IEEE Transaction on Parallel and Distributed Systems, vol. 27, no. 5, pp. 1470-1483, 2016.
[16]
Z. Yu, Z. Bei, and X. Qian, Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing, in Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’18), Williamsburg, VA, USA, 2018, pp. 564-577.
[17]
C. Re, D. Agrawal, M. Balazinska, M. J. Cafarella, M. I. Jordan, T. Kraska, and R. Ramakrishnan, Machine learning and databases: The sound of things to come or a cacophony of hype? in Proceedings of ACM International Conference on Management of Data (SIGMOD’15), Melbourne, Australia, 2015, pp. 283-284.
[18]
L. Sun, S. Sun, T. Wang, J. Li, and J. Lin, Parallel ADR detection based on spark and BCPNN, Tsinghua Science and Technology, vol. 24, no. 2, pp. 195-206, 2019.
[19]
X. Ye, X. Chen, D. Liu, W. Wang, L. Yang, G. Liang, and G. Shao, Notice of retraction: Efficient feature extraction using Apache Spark for network behavior anomaly detection, Tsinghua Science and Technology, vol. 23, no. 5, pp. 561-573, 2018.
[20]
M. Wang, Y. Cui, S. Xiao, X. Wang, D. Yang, K. Chen, and J. Zhu, Neural network meets DCN: Traffic-driven topology adaptation with deep learning, in Proceedings of ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’18), Irvine, CA, USA, 2018, pp. 97-99.
[21]
N. Yamashita and M. Fukushima, On the rate of convergence of the Levenberg-Marquardt method, Springer Computing, vol. 15, pp. 239-249, 2001.
[22]
R. Grandl, M. Chowdhury, A. Akella, and G. Ananthanarayanan, Altruistic scheduling in multi-resource clusters, in Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), Savannah, GA, USA, 2016, pp. 65-80.
[23]
D. Fooladivanda, A. A. Daoud, and C. Rosenberg, Joint channel allocation and user association for heterogeneous wireless cellular networks, IEEE Transaction on Wireless Communications, vol. 12, no. 1, pp. 248-257, 2011.
Tsinghua Science and Technology
Pages 487-497
Cite this article:
Hu Z, Li D, Guo D. Balance Resource Allocation for Spark Jobs Based on Prediction of the Optimal Resource. Tsinghua Science and Technology, 2020, 25(4): 487-497. https://doi.org/10.26599/TST.2019.9010054

905

Views

65

Downloads

18

Crossref

N/A

Web of Science

21

Scopus

4

CSCD

Altmetrics

Received: 30 March 2019
Revised: 24 July 2019
Accepted: 09 September 2019
Published: 13 January 2020
© The author(s) 2020

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return