School of Computer Science and Engineering, Southeast University, Nanjing211189, China.
Show Author Information
Hide Author Information
Abstract
With the continuous enrichment of cloud services, an increasing number of applications are being deployed in data centers. These emerging applications are often communication-intensive and data-parallel, and their performance is closely related to the underlying network. With their distributed nature, the applications consist of tasks that involve a collection of parallel flows. Traditional techniques to optimize flow-level metrics are agnostic to task-level requirements, leading to poor application-level performance. In this paper, we address the heterogeneous task-level requirements of applications and propose task-aware flow scheduling. First, we model tasks’ sensitivity to their completion time by utilities. Second, on the basis of Nash bargaining theory, we establish a flow scheduling model with heterogeneous utility characteristics, and analyze it using Lagrange multiplier method and KKT condition. Third, we propose two utility-aware bandwidth allocation algorithms with different practical constraints. Finally, we present Tasch, a system that enables tasks to maintain high utilities and guarantees the fairness of utilities. To demonstrate the feasibility of our system, we conduct comprehensive evaluations with real-world traffic trace. Communication stages complete up to 1.4 faster on average, task utilities increase up to 2.26, and the fairness of tasks improves up to 8.66 using Tasch in comparison to per-flow mechanisms.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
C. Y.,HongM.Caesar, and P. B.Godfrey, Finishing flows quickly with preemptive scheduling, ACM SIGCOMM Computer Communication Review, vol. 42, no. 4, pp. 127-138, 2012.
C.,WilsonH.,BallaniT.Karagiannis, and A.Rowtron, Better never than late: Meeting deadlines in datacenter networks, ACM SIGCOMM Computer Communication Review, vol. 41, no. 4, pp. 50-61, 2011.
M.,ChowdhuryY.Zhong, and I.Stoica, Efficient coflow scheduling with varys, ACM SIGCOMM Computer Communication Review, vol. 44, no. 4, pp. 443-454, 2014.
L.,ChenW.,CuiB.Li, and B.Li, Optimizing coflow completion times with utility max-min fairness, in IEEE INFOCOM 2016 - the IEEE International Conference on Computer Communications, 2016, pp. 1-9.
W.,BaiL.,ChenK.,ChenD.,HanC.Tian, and H.Wang, Information-agnostic flow scheduling for commodity data centers, in Usenix Conference on Networked Systems Design and Implementation, 2015, pp. 455-468.
[9]
T.,BensonA.,AnandA.Akella, and M.Zhang, Microte: Fine grained traffic engineering for data centers, in Proceedings of the Seventh Conference on Emerging Networking Experiments and Technologies, 2011.
[10]
M.,Al-FaresS.,RadhakrishnanB.,RaghavanN.Huang, and A.Vahdat, Hedera: Dynamic flow scheduling for data center networks, in Usenix Symposium on Networked Systems Design and Implementation, 2010, pp. 281-296.
[11]
A.,MunirG.,BaigS. M.,IrtezaI. A.,QaziA. X.Liu, and F. R.Dogar, Friends, not foes: Synthesizing existing transport strategies for data center networks, .
H.,WuZ.,FengC.Guo, and Y.Zhang, Ictcp: Incast congestion control for tcp in data-center networks, IEEE/ACM Transactions on Networking, vol. 21, no. 2, pp. 345-358, 2013.
D.,ZatsT.,DasP.,MohanD.Borthakur, and R.Katz, Detail: Reducing the flow completion time tail in datacenter networks, ACM SIGCOMM Computer Communication Review, vol. 42, no. 4, pp. 139-150, 2012.
M.,AlizadehA.,GreenbergD. A.,MaltzJ.,PadhyeP.,PatelB.,PrabhakarS.Sengupta, and M.Sridharan, Data center tcp (dctcp), ACM SIGCOMM Computer Communication Review, vol. 40, no. 4, pp. 63-74, 2010.
M.,AlizadehA.,KabbaniT.,EdsallB.,PrabhakarA.Vahdat, and M.Yasuda, Less is more: Trading a little bandwidth for ultra-low latency in the data center, in Usenix Conference on Networked Systems Design and Implementation, 2012, p. 19.
[16]
S.,FloydD. K. K.Ramakrishnan, and D. L.Black, The addition of Explicit Congestion Notification (ECN) to IP, RFC 3168, https://rfc-editor.org/rfc/rfc3168.txt, Sep. 2001.
F.,LuJ.,LiS.,JiangY.Song, and F.Wang, Geographic information and node selfish-based routing algorithm for delay tolerant networks, Tsinghua Science and Technology, vol. 22, no. 3, pp. 243-253, 2017.
D.,TaoZ.Lin, and B.Wang, Load feedback-based resource scheduling and dynamic migration-based data locality for virtual hadoop clusters in openstack-based clouds, Tsinghua Science and Technology, vol. 22, no. 2, pp. 149-159, 2017.
L.,LiuJ.Li, and J.Wu, Taps: Task-aware preemptive flow scheduling, in IEEE International Workshop on Local & Metropolitan Area Networks, 2015, pp. 1-2.
J.,JiangS.,MaB.Li, and B.Li, Tailor: Trimming coflow completion times in datacenter networks, in International Conference on Computer Communication and Networks, 2016.
[21]
F. R.,DogarT.,KaragiannisH.Ballani, and A.Rowstron, Decentralized task-aware scheduling for data center networks, ACM SIGCOMM Computer Communication Review, vol. 44, no. 4, pp. 431-442, 2013.
S.,LuoH.,YuY.,ZhaoS.,WangS.Yu, and L.Li, Towards practical and near-optimal coflow scheduling for data center networks, IEEE Transactions on Parallel & Distributed Systems, vol. 27, no. 11, pp. 3366-3380, 2016.
H.,ZhangL.,ChenB.,YiK.,ChenM.Chowdhury, and Y.Gengee, Coda: Toward automatically identifying and scheduling coflows in the dark, in Proceedings of the 2016 ACM SIGCOMM Conference, 2016, pp. 160-173.
[24]
Z.,LiY.,ZhangD.,LiK.Chen, and Y.Peng, Optas: Decentralized flow monitoring and scheduling for tiny tasks, in IEEE International Conference on Computer Communications, 2016, pp. 1-9.
[25]
H.,SusantoH.Jin, and K.Chen, Stream: Decentralized opportunistic inter-coflow scheduling for datacenter networks, in IEEE International Conference on Network Protocols, 2016, pp. 1-10.
C.,RaiciuS.,BarreC.,PluntkeA.,GreenhalghD.Wischik, and M.Handley, Improving datacenter performance and robustness with multipath tcp, in ACM SIGCOMM 2011, 2011, pp. 266-277.
A.,GreenbergJ. R.,HamiltonN.,JainS.,KandulaC.,KimP.,LahiriD. A.,MaltzP.Patel, and S.Sengupta, Vl2: A scalable and flexible data center network, Communications of the ACM, vol. 54, no. 3, pp. 95-104, 2011.
G.Dhandapani and A.Sundaresan, Netlink sockets, overview, Tech. Report, Information and Telecommunications Technology Center, Department of Electrical Engineering & Computer Science, The University of Kansas, Lawrence, KS, USA, 1999.
Dong F, Guo X, Zhou P, et al. Task-Aware Flow Scheduling with Heterogeneous Utility Characteristics for Data Center Networks. Tsinghua Science and Technology, 2019, 24(4): 400-411. https://doi.org/10.26599/TST.2018.9010122
10.26599/TST.2018.9010122.F1
Data center virtualization network model.
10.26599/TST.2018.9010122.F2
Sigmoid utility function curve.
3.4 Model analysis and solutions
We can solve
Formula (14)
by using the Lagrange multiplier method and KKT condition. The Lagrange function that corresponds to the original problem can be expressed as
The calculation formulas for and are as follows:
According to KKT condition, to solve the original problem, we only need to solve the following equations:
The solution problem of the above equations is a convex optimization problem with constraints. The complexity of the solution increases significantly with the increase in the cluster size and the number of tasks. Therefore, this paper will use a heuristic algorithm to solve the original problem.
The objective of this paper is to optimize system bandwidth resource utility as much as possible while guaranteeing fairness among tasks. For , suppose its maximum utility value is (which can be calculated according to utility function), and the actual utility finally achieved is . Then, the utility completion rate for defining is
This paper uses the well-known Gini coefficient and Lorenz curve[
28
] in the field of economics to measure the fairness of utilities among tasks. In economics, the Lorenz curve is used to describe the unfairness of social wealth distribution. The abscissa indicates the accumulation of the proportion of population, the ordinate indicates the accumulation of the proportion of wealth, the 45-degree line indicates the absolute fair distribution of wealth, and the curve formed by the horizontal axis and the right straight line represents the absolute unfair distribution of wealth, while the arc in the middle is for the Lorenz curve. Close proximity of the arc to the 45-degree line corresponds to fair distribution of wealth. Let the area between the Lorentz curve and the 45-degree line be and the area between the absolute unfair line and the absolute unfair line be Then, the ratio of the Gini coefficient to and is defined to measure the unfairness of the distribution of wealth. In this paper, we use the cumulative proportion of tasks as the horizontal axis and the cumulative ratio of utility completion rates as the vertical axis, so that the Lorentz curve and the Gini coefficient are applied to the measurement of the fairness of utilities among tasks.
To optimize system bandwidth resource utilities while guaranteeing fairness among tasks, this paper proposes two utility-aware task bandwidth allocation algorithms to solve the original problem. In accordance with the support function and network conditions of systems, we provide two bandwidth allocation algorithms: non-blocking utility-aware task bandwidth allocation algorithm and blocking utility-aware task bandwidth allocation algorithm.
Non-blocking mode. The detailed process of the algorithm is shown in Algorithm 1. The algorithm is mainly divided into two steps: (1) With the utility proportional attenuation factor being set, we attempt to ensure that all tasks to reach . If not, then we use , to attenuate the utility of each task. The bandwidth allocated for each task is calculated based on the attenuated task utility, and whether the currently allocated bandwidth meets the access link bandwidth capacity constraint is determined; (2) In the case of satisfying the access link bandwidth capacity constraint, the remaining bandwidth is allocated to the task with the largest unit bandwidth utility to maximize the system bandwidth resource utility.
Blocking mode. In the non-blocking utility-aware task bandwidth allocation algorithm, the arriving tasks do not need to wait, and all tasks in the system allocate bandwidth proportionally according to the task scheduling algorithm. When the number of concurrent tasks in the system exceeds the tolerance of the access link, the bandwidth allocated to each task in the system is too small, the task completion time is too long, and the task cannot even be completed within an acceptable time range, thereby resulting in performance degradation or even system crash. We adopt a strategy of controlling the number of concurrent tasks, setting the task concurrent limit, and queuing the tasks in the system that exceed the concurrent limit for service. The detailed process of the algorithm is shown in Algorithm 2. The algorithm is mainly divided into four steps: (1) First, a task concurrent limit maxTask is set. When a task arrives, maxTask performs a self-decreasing operation. If maxTask is reduced to 0, then the subsequent tasks are queued for service. (2) Second, by setting utility proportional attenuation factor , we attempt to ensure that all tasks reach . If not, then we use , to attenuate the utility of each task. The bandwidth allocated for each task is calculated based on the attenuated task utility, and whether the currently allocated bandwidth meets the access link bandwidth capacity constraint is determined. (3) In the case of satisfying the access link bandwidth capacity constraint, the remaining bandwidth is allocated to the task with the largest unit bandwidth utility to maximize the system bandwidth resource utility. (4) When a task is completed, it performs an auto-increment operation on maxTask, so that subsequent tasks waiting for service can be scheduled. Therefore, in the case of an extremely congested system network, Algorithm 2 implements the control of concurrent tasks number in the system and improves the performance of the system in handling network congestion.
3.4 Model analysis and solutions
We can solve
Formula (14)
by using the Lagrange multiplier method and KKT condition. The Lagrange function that corresponds to the original problem can be expressed as
The calculation formulas for and are as follows:
According to KKT condition, to solve the original problem, we only need to solve the following equations:
The solution problem of the above equations is a convex optimization problem with constraints. The complexity of the solution increases significantly with the increase in the cluster size and the number of tasks. Therefore, this paper will use a heuristic algorithm to solve the original problem.
The objective of this paper is to optimize system bandwidth resource utility as much as possible while guaranteeing fairness among tasks. For , suppose its maximum utility value is (which can be calculated according to utility function), and the actual utility finally achieved is . Then, the utility completion rate for defining is
This paper uses the well-known Gini coefficient and Lorenz curve[
28
] in the field of economics to measure the fairness of utilities among tasks. In economics, the Lorenz curve is used to describe the unfairness of social wealth distribution. The abscissa indicates the accumulation of the proportion of population, the ordinate indicates the accumulation of the proportion of wealth, the 45-degree line indicates the absolute fair distribution of wealth, and the curve formed by the horizontal axis and the right straight line represents the absolute unfair distribution of wealth, while the arc in the middle is for the Lorenz curve. Close proximity of the arc to the 45-degree line corresponds to fair distribution of wealth. Let the area between the Lorentz curve and the 45-degree line be and the area between the absolute unfair line and the absolute unfair line be Then, the ratio of the Gini coefficient to and is defined to measure the unfairness of the distribution of wealth. In this paper, we use the cumulative proportion of tasks as the horizontal axis and the cumulative ratio of utility completion rates as the vertical axis, so that the Lorentz curve and the Gini coefficient are applied to the measurement of the fairness of utilities among tasks.
To optimize system bandwidth resource utilities while guaranteeing fairness among tasks, this paper proposes two utility-aware task bandwidth allocation algorithms to solve the original problem. In accordance with the support function and network conditions of systems, we provide two bandwidth allocation algorithms: non-blocking utility-aware task bandwidth allocation algorithm and blocking utility-aware task bandwidth allocation algorithm.
Non-blocking mode. The detailed process of the algorithm is shown in Algorithm 1. The algorithm is mainly divided into two steps: (1) With the utility proportional attenuation factor being set, we attempt to ensure that all tasks to reach . If not, then we use , to attenuate the utility of each task. The bandwidth allocated for each task is calculated based on the attenuated task utility, and whether the currently allocated bandwidth meets the access link bandwidth capacity constraint is determined; (2) In the case of satisfying the access link bandwidth capacity constraint, the remaining bandwidth is allocated to the task with the largest unit bandwidth utility to maximize the system bandwidth resource utility.
Blocking mode. In the non-blocking utility-aware task bandwidth allocation algorithm, the arriving tasks do not need to wait, and all tasks in the system allocate bandwidth proportionally according to the task scheduling algorithm. When the number of concurrent tasks in the system exceeds the tolerance of the access link, the bandwidth allocated to each task in the system is too small, the task completion time is too long, and the task cannot even be completed within an acceptable time range, thereby resulting in performance degradation or even system crash. We adopt a strategy of controlling the number of concurrent tasks, setting the task concurrent limit, and queuing the tasks in the system that exceed the concurrent limit for service. The detailed process of the algorithm is shown in Algorithm 2. The algorithm is mainly divided into four steps: (1) First, a task concurrent limit maxTask is set. When a task arrives, maxTask performs a self-decreasing operation. If maxTask is reduced to 0, then the subsequent tasks are queued for service. (2) Second, by setting utility proportional attenuation factor , we attempt to ensure that all tasks reach . If not, then we use , to attenuate the utility of each task. The bandwidth allocated for each task is calculated based on the attenuated task utility, and whether the currently allocated bandwidth meets the access link bandwidth capacity constraint is determined. (3) In the case of satisfying the access link bandwidth capacity constraint, the remaining bandwidth is allocated to the task with the largest unit bandwidth utility to maximize the system bandwidth resource utility. (4) When a task is completed, it performs an auto-increment operation on maxTask, so that subsequent tasks waiting for service can be scheduled. Therefore, in the case of an extremely congested system network, Algorithm 2 implements the control of concurrent tasks number in the system and improves the performance of the system in handling network congestion.
10.26599/TST.2018.9010122.F3
Task-level flow scheduling system architecture.
10.26599/TST.2018.9010122.F4
Lorenz curve of task utility fairness.
10.26599/TST.2018.9010122.F5
Task utility for different schemes.
10.26599/TST.2018.9010122.F6
CDF curve of the task completion time.
10.26599/TST.2018.9010122.F7
Impact of different T-granularity on task utilities.
10.26599/TST.2018.9010122.F8
Impact of different T-granularity on average task completion time.
10.26599/TST.2018.9010122.F9
Impact of different numbers of concurrent tasks on task utility.
10.26599/TST.2018.9010122.F10
Impact of the number of concurrent tasks on the average task completion time.