<i>VirtCO</i>: Joint Coflow Scheduling and Virtual Machine Placement in Cloud Data Centers

Dian Shen; Junzhou Luo; Fang Dong; Junxue Zhang

doi:10.26599/TST.2018.9010098

| Sign up

PDF (1.1 MB)

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Open Access

VirtCO: Joint Coflow Scheduling and Virtual Machine Placement in Cloud Data Centers

Dian Shen, Junzhou Luo, Fang Dong(), Junxue Zhang

School of Computer Science and Engineering, Southeast University, Nanjing 211189, China.

SING Group, Hong Kong University of Science and Technology, Hong Kong 999077, China.

Show Author Information

Abstract

Cloud data centers, such as Amazon EC2, host myriad big data applications using Virtual Machines (VMs). As these applications are communication-intensive, optimizing network transfer between VMs is critical to the performance of these applications and network utilization of data centers. Previous studies have addressed this issue by scheduling network flows with coflow semantics or optimizing VM placement with traffic considerations. However, coflow scheduling and VM placement have been conducted orthogonally. In fact, these two mechanisms are mutually dependent, and optimizing these two complementary degrees of freedom independently turns out to be suboptimal. In this paper, we present VirtCO, a practical framework that jointly schedules coflows and places VMs ahead of VM launch to optimize the overall performance of data center applications. We model the joint coflow scheduling and VM placement optimization problem, and propose effective heuristics for solving it. We further implement VirtCO with OpenStack and deploy it in a testbed environment. Extensive evaluation of real-world traces shows that compared with state-of-the-art solutions, VirtCO greatly reduces the average coflow completion time by up to 36.5%. This new framework is also compatible with and readily deployable within existing data center architectures.

Keywords

cloud computing data center coflow scheduling Virtual Machine (VM) placement

References

[1]

Amazon Elastic Compute Cloud, http://aws.amazon.com/ec2/, 2018.

Crossref

[2]

J. C.

Mogul

and L.

Popa

, What we talk about when we talk about cloud network performance, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’12), Helsinki, Finland, 2012, pp. 44-48.

Crossref

[3]

Xie

, N.

Ding

, Y. C.

, and R.

Kompella

, The only constant is change: Incorporating time-varying network reservations in data centers, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’12), Helsinki, Finland, 2012, pp. 199-210.

Crossref

[4]

Chowdhury

, M.

Zaharia

, J.

, M. I.

Jordan

, and I.

Stoica

, Managing data transfers in computer clusters with orchestra, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’12), Toronto, Canada, 2011, pp. 98-109.

Crossref

[5]

Jiang

, S.

, B.

, and B.

, Symbiosis: Network-aware task scheduling in data-parallel frameworks, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’16), San Francisco, CA, USA, 2016, pp. 1-9.

Crossref

[6]

Chowdhury

, Y.

Zhong

, and I.

Stoica

, Efficient coflow scheduling with varys, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’14), Chicago, IL, USA, 2014, pp. 443-454.

Crossref

[7]

Dean

and S.

Ghemawat

, Mapreduce: Simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp. 107-113, 2008.

Crossref Google Scholar

[8]

Zaharia

, M.

Chowdhury

, T.

Das

, A.

Dave

, J.

, M.

McCauley

, MJ.

Franklin

, S.

Shenker

, and I.

Stoica

, Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing, in Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI’12), San Jose, CA, USA, 2012, pp. 2-2.

Crossref

[9]

Malewicz

, M. H.

Austern

, A. J. C.

Bik

, J. C.

Dehnert

, I.

Horn

, N.

Leiser

, and G.

Czajkowski

, Pregel: A system for large-scale graph processing, in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’10), Indianapolis, IN, USA, 2010, pp. 135-146.

Crossref

[10]

Chowdhury

and I.

Stoica

, Efficient coflow scheduling without prior knowledge, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’15), London, UK, 2015, pp. 393-406.

Crossref

[11]

Qiu

, C.

Stein

, and Y.

Zhong

, Minimizing the total weighted completion time of coflows in datacenter networks, in Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’15), Portland, OR, USA, 2015, pp. 294-303.

Crossref

[12]

Lee

, Y.

Turner

, M.

Lee

, L.

Popa

, S.

Banerjee

, J.

Kang

, and P.

Sharma

, Application-driven bandwidth guarantees in datacenters, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’14), Chicago, IL, USA, 2014, pp. 467-478.

Crossref

[13]

Meng

, V.

Pappas

, and L.

Zhang

, Improving the scalability of data center networks with traffic-aware virtual machine placement, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’10), San Diego, CA, USA, 2010, pp. 1-9.

Crossref

[14]

, J.

, S.

Tang

, and S.

, Let’s stay together: Towards traffic aware virtual machine placement in data centers, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’14), Toronto, Canada, 2014, pp. 1842-1850.

Crossref

[15]

Zhao

, K.

Chen

, W.

Bai

, M.

, C.

Tian

, Y.

Geng

, Y.

Yang

, D.

, and S.

Wang

, Rapier: Integrating routing and scheduling for coflow-aware data center networks, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’15), Hong Kong, China, 2015, pp. 424-432.

Crossref

[16]

Jalaparti

, P.

Bodik

, I.

Menache

, S.

Rao

, K.

Makarychev

, and M.

Caesar

, Network-aware scheduling for data-parallel jobs: Plan when you can, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’15), London, UK, 2015, pp. 407-420.

Crossref

[17]

Zhang

, L.

Chen

, B.

, K.

Chen

, M

Chowdhury

, and Y.

Geng

, Coda: Toward automatically identifying and scheduling coflows in the dark, in Proceedings of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’16), Florianopolis, Brazil, 2016, pp. 160-173.

Crossref

[18]

LaCurts

, J. C.

Mogul

, H.

Balakrishnan

, and Y.

Turner

, Cicada: Introducing predictive guarantees for cloud networks, in Proceedings of USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’14), Philadelphia, PA, USA, 2014, pp. 14-19.

[19]

Perry

, H.

Balakrishnan

, and D.

Shah

, Flowtune: Flowlet control for datacenter networks. in Proceedings of USENIX Conference on Networked Systems Design and Implementation (NSDI’17), Boston, MA, USA, 2017, pp. 421-435.

[20]

OpenStack Open Source Cloud Computing Software, https://www.openstack.org/, 2018.

[21]

Shen

, J.

Luo

, F.

Dong

, and J.

Zhang

, Appbag: Application-aware bandwidth allocation for virtual machines in cloud environment, in 45th International Conference on Parallel Processing (ICPP), Philadelphia, PA, USA, 2016, pp. 21-30.

Crossref

[22]

Chen

, W.

Cui

, B.

, and B.

, Optimizing coflow completion times with utility max-min fairness, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’16), San Francisco, CA, USA, 2016, pp. 1755-1763.

Crossref

[23]

, Sed: An SDN-based explicit-deadline-aware TCP for cloud data center networks, Tsinghua Science and Technology, vol. 21, no. 5, pp. 491-499, 2016.

Crossref Google Scholar

[24]

Ahmad

, S. T.

Chakradhar

, A.

Raghunathan

, and T. N.

Vijaykumar

, Shufflewatcher: Shuffle-aware scheduling in multi-tenant MapReduce clusters, in Proceedings of USENIX Annual Technical Conference (ATC’14), Philadelphia, PA, USA, 2014, pp. 1-12.

[25]

Munir

, T.

, R.

Raghavendra

, F.

, and A. X.

Liu

, Network scheduling aware task placement in datacenters, in Proceedings of the International Conference on Emerging Networking Experiments and Technologies (CoNEXT’16), Irvine, CA, USA, 2016, pp. 221-235.

Crossref

[26]

Zhao

, Y.

Huang

, K.

Chen

, M.

, S.

Wang

, and D. S.

, Joint VM placement and topology optimization for traffic scalability in dynamic datacenter networks, Computer Networks, vol. 80, pp. 109-123, 2015.

Crossref Google Scholar

[27]

Wang

, Y.

Zhang

, and D.

Jin

, Virtual machine migration planning in software-defined networks, in Proceedings of IEEE Conference on Computer Communications (INFOCOM’15), Hong Kong, China, 2015, pp. 487-495.

Crossref

[28]

, D.

, Y.

, and X.

, Efficient multi-tenant virtual machine allocation in cloud data centers, Tsinghua Science and Technology, vol. 20, no. 1, pp. 81-89, 2015.

Crossref Google Scholar

[29]

Ousterhout

, R.

Rasti

, S.

Ratnasamy

, S.

Shenker

, and B. G.

Chun

, Making sense of performance in data analytics frameworks, in Proceedings of USENIX Conference on Networked Systems Design and Implementation (NSDI’15), Oakland, CA, USA, 2015, pp. 293-307.

[30]

Trivedi

, P.

Stuedi

, J.

Pfefferle

, R.

Stoica

, B.

Metzler

, I.

Koltsidas

, and N.

Ioannou

, On the [ir] relevance of network performance for data processing, in Proceedings of USENIX Workshop on Hot Topics in Cloud Computing (HotCloud’16), Denver, CO, USA, 2016, pp. 126-131.

[31]

Zhang

, J.

Chen

, J.

Luo

, and A.

Song

, Efficient location-aware data placement for data-intensive applications in geo-distributed scientific data centers, Tsinghua Science and Technology, vol. 21, no. 5, pp. 471-481, 2016.

Crossref Google Scholar

Tsinghua Science and Technology

Volume 24 Issue 5,
October 2019

Pages 630-644

DOI: 10.26599/TST.2018.9010098

Cite this article:

Shen D, Luo J, Dong F, et al. VirtCO: Joint Coflow Scheduling and Virtual Machine Placement in Cloud Data Centers. Tsinghua Science and Technology, 2019, 24(5): 630-644. https://doi.org/10.26599/TST.2018.9010098