Balance Resource Allocation for Spark Jobs Based on Prediction of the Optimal Resource

Zhiyao Hu; Dongsheng Li; Deke Guo

doi:10.26599/TST.2019.9010054

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Search articles, authors, keywords, DOl and etc.

Published Date

Reset Search

{{expandStatus?'Exit ':''}}Advanced Search

Journals A - Z

About Us

Publish with Us

Support

PDF (10.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

AI Chat Paper

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Open Access

Balance Resource Allocation for Spark Jobs Based on Prediction of the Optimal Resource

Zhiyao Hu, Dongsheng Li(

), Deke Guo

College of Computer, National University of Defense Technology, Changsha 410073, China.

College of System Engineering, National University of Defense Technology, Changsha 410073, China.

Show Author Information

Abstract

Apache Spark provides a well-known MapReduce computing framework, aiming to fast-process big data analytics in data-parallel manners. With this platform, large input data are divided into data partitions. Each data partition is processed by multiple computation tasks concurrently. Outputs of these computation tasks are transferred among multiple computers via the network. However, such a distributed computing framework suffers from system overheads, inevitably caused by communication and disk I/O operations. System overheads take up a large proportion of the Job Completion Time (JCT). We observed that excessive computational resources incurs considerable system overheads, prolonging the JCT. The over-allocation of individual jobs not only prolongs their own JCTs, but also likely makes other jobs suffer from under-allocation. Thus, the average JCT is suboptimal, too. To address this problem, we propose a prediction model to estimate the changing JCT of a single Spark job. With the support of the prediction method, we designed a heuristic algorithm to balance the resource allocation of multiple Spark jobs, aiming to minimize the average JCT in multiple-job cases. We implemented the prediction model and resource allocation method in ReB, a Resource-Balancer based on Apache Spark. Experimental results showed that ReB significantly outperformed the traditional max-min fairness and shortest-job-optimal methods. The average JCT was decreased by around 10%-30% compared to the existing solutions.

Keywords

Spark jobs resource over-allocation performance prediction

References

[1]

Mosharaf

, Z.

Matei

, M.

Justin

, J.

Michael

, and S.

Ion

, Managing data transfers in computer clusters with Orchestra, ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 98-109, 2011.

Crossref Google Scholar

[2]

J. H.

Howard

, M. L.

Kazar

, S. G.

Menees

, D. A.

Nichols

, M.

Satyanarayanan

, R. N.

Sidebotham

, and M. J.

West

, Scale and performance in a distributed file system, ACM Transaction on Computer System, vol. 6, no. 1, pp. 51-81, 1988.

Crossref Google Scholar

[3]

D. P.

Woodruff

and Q.

Zhang

, When distributed computation is communication expensive, Distributed Computing, vol. 30, no. 5, pp. 309-323, 2017.

Crossref Google Scholar

[4]

Dean

and S.

Ghemawat

, MapReduce: Simplified data processing on large clusters, in Proceedings of USENIX Symposium on Operating System Design and Implementation (OSDI’04), San Francisco, CA, USA, 2004, pp. 137-150.

[5]

Shen

, S.

Subbiah

, X.

, and J.

Wilkes

, Cloudscale: Elastic resource scaling for multi-tenant cloud systems, in Proceedings of ACM Symposium on Cloud Computing (SOCC’11), Cascais, Portugal, 2011, pp. 5-17.

Crossref

[6]

Delimitrou

and C.

Kozyrakis

, Quasar: Resource-efficient and QoS-aware cluster management, in Proceedings of ACM Architectural Support for Programming Languages and Operating Systems (ASPLOS’14), Salt Lake City, UT, USA, 2014, pp. 127-144.

Crossref

[7]

Hindman

, A.

Konwinski

, M.

Zaharia

, A.

Ghodsi

, A. D.

Joseph

, R.

Katz

, S.

Shenker

, and I.

Stoica

, Mesos: A platform for fine-grained resource sharing in the data center, in Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI’11), Boston, MA, USA, 2011, pp. 429-483.

[8]

V. K.

Vavilapalli

, A. C.

Murthy

, C.

Douglas

, S.

Agarwal

, M.

Konar

, R.

Evans

, T.

Graves

, J.

Lowe

, H.

Shah

, and S.

Seth

, Apache Hadoop YARN: Yet another resource negotiator, in Proceedings of ACM Symposium on Cloud Computing (SOCC’13), Santa Clara, CA, USA, 2013, pp. 1-16.

Crossref

[9]

Mao

, M.

Alizadeh

, I.

Menache

, and S.

Kandula

, Resource management with deep reinforcement learning, in Proceedings of ACM HotNet Workshop on Hot Topics in Networks (HotNet’16), Atlanta, GA, USA, 2016, pp. 50-56.

Crossref

[10]

Bonald

, L.

Massouli

, A.

Prouti

, and J. T.

Virtamo

, A queueing analysis of max-min fairness, proportional fairness and balanced fairness, Queueing Systems, vol. 53, nos. 1&2, pp. 65-84, 2006.

Crossref Google Scholar

[11]

Ghodsi

, M.

Zaharia

, B.

Hindman

, A.

Konwinski

, S.

Shenker

, and I.

Stoica

, Dominant resource fairness: Fair allocation of multiple resource types, in Proceedings of USENIX Symposium on Networked Systems Design and Implementation (NSDI’13), Boston, MA, USA, 2013, pp. 323-336.

[12]

Zaharia

, D.

Borthakur

, J. S.

Sarma

, K.

Elmeleegy

, S.

Shenker

, and I.

Stoica

, Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling, in Proceedings of European Conference on Computer Systems (EuroSys’10), Paris, France, 2010, pp. 265-278.

Crossref

[13]

Grandl

, G.

Ananthanarayanan

, S.

Kandula

, S.

Rao

, and A.

Akella

, Multi-resource packing for cluster schedulers, in Proceedings of ACM Special Interest Group on Data Communication (SIGCOMM’14), Chicago, IL, USA, 2014, pp. 455-466.

Crossref

[14]

Venkataraman

, Z.

Yang

, M. J.

Franklin

, B.

Recht

, and I.

Stoica

, Ernest: Efficient performance prediction for large-scale advanced analytics, in Proceedings of USENIX Symposium on Networked Systems Design and Implementatio (NSDI’16), Santa Clara, CA, USA, 2016, pp. 363-378.

[15]

Bei

, Z.

, H.

Zhang

, W.

Xiong

, C.

, L.

Eeckhout

, and S.

Feng

, RFHOC: A random-forest approach to auto-tuning Hadoop’s configuration, IEEE Transaction on Parallel and Distributed Systems, vol. 27, no. 5, pp. 1470-1483, 2016.

Crossref Google Scholar

[16]

, Z.

Bei

, and X.

Qian

, Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing, in Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’18), Williamsburg, VA, USA, 2018, pp. 564-577.

Crossref

[17]

, D.

Agrawal

, M.

Balazinska

, M. J.

Cafarella

, M. I.

Jordan

, T.

Kraska

, and R.

Ramakrishnan

, Machine learning and databases: The sound of things to come or a cacophony of hype? in Proceedings of ACM International Conference on Management of Data (SIGMOD’15), Melbourne, Australia, 2015, pp. 283-284.

Crossref

[18]

Sun

, S.

Sun

, T.

Wang

, J.

, and J.

Lin

, Parallel ADR detection based on spark and BCPNN, Tsinghua Science and Technology, vol. 24, no. 2, pp. 195-206, 2019.

Crossref Google Scholar

[19]

, X.

Chen

, D.

Liu

, W.

Wang

, L.

Yang

, G.

Liang

, and G.

Shao

, Notice of retraction: Efficient feature extraction using Apache Spark for network behavior anomaly detection, Tsinghua Science and Technology, vol. 23, no. 5, pp. 561-573, 2018.

Crossref Google Scholar

[20]

Wang

, Y.

Cui

, S.

Xiao

, X.

Wang

, D.

Yang

, K.

Chen

, and J.

Zhu

, Neural network meets DCN: Traffic-driven topology adaptation with deep learning, in Proceedings of ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’18), Irvine, CA, USA, 2018, pp. 97-99.

Crossref

[21]

Yamashita

and M.

Fukushima

, On the rate of convergence of the Levenberg-Marquardt method, Springer Computing, vol. 15, pp. 239-249, 2001.

Crossref Google Scholar

[22]

Grandl

, M.

Chowdhury

, A.

Akella

, and G.

Ananthanarayanan

, Altruistic scheduling in multi-resource clusters, in Proceedings of USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), Savannah, GA, USA, 2016, pp. 65-80.

[23]

Fooladivanda

, A. A.

Daoud

, and C.

Rosenberg

, Joint channel allocation and user association for heterogeneous wireless cellular networks, IEEE Transaction on Wireless Communications, vol. 12, no. 1, pp. 248-257, 2011.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 25 Issue 4,
August 2020

Pages 487-497

DOI: 10.26599/TST.2019.9010054

Cite this article:

Hu Z, Li D, Guo D. Balance Resource Allocation for Spark Jobs Based on Prediction of the Optimal Resource. Tsinghua Science and Technology, 2020, 25(4): 487-497. https://doi.org/10.26599/TST.2019.9010054

905

Views

Downloads

Crossref

N/A

Web of Science

Scopus

CSCD

Google Scholar
Citation

Altmetrics

Received: 30 March 2019

Revised: 24 July 2019

Accepted: 09 September 2019

Published: 13 January 2020

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).