School of Software Engineering, Tongji University, Shanghai201804, China.
Show Author Information
Hide Author Information
Abstract
Virtualization is the most important technology in the unified resource layer of cloud computing systems. Static placement and dynamic management are two types of Virtual Machine (VM) management methods. VM dynamic management is based on the structure of the initial VM placement, and this initial structure will affect the efficiency of VM dynamic management. When a VM fails, cloud applications deployed on the faulty VM will crash if fault tolerance is not considered. In this study, a model of initial VM fault-tolerant placement for star topological data centers of cloud systems is built on the basis of multiple factors, including the service-level agreement violation rate, resource remaining rate, power consumption rate, failure rate, and fault tolerance cost. Then, a heuristic ant colony algorithm is proposed to solve the model. The service-providing VMs are placed by the ant colony algorithms, and the redundant VMs are placed by the conventional heuristic algorithms. The experimental results obtained from the simulation, real cluster, and fault injection experiments show that the proposed method can achieve better VM fault-tolerant placement solution than that of the traditional first fit or best fit descending method.
No abstract is available for this article. Click the button above to view the PDF directly.
References
[1]
I.Foster, Y.Zhao, I.Raicu, and S. Y.Lu, Cloud computing and grid computing 360-degree compared, in Proc. 2008 Grid Computing Environments Workshop, Austin, TX, USA, 2008, pp. 1-10.
M.Nelson, B. H.Lim, and G.Hutchins, Fast transparent migration for virtual machines, in Proc. 2005 USENIX Annu. Technical Conf., Anaheim, CA, USA, 2005, pp. 391-394.
[4]
M.Lee, A.Krishnakumar, P.Krishnan, N.Singh, and S.Yajnik, Hypervisor-assisted application checkpointing in virtualized environments, in Proc. IEEE/IFIP 41st Int. Conf. Dependable Systems & Networks, Hong Kong, China, 2011, pp. 371-382.
X.Zhang, Z. G.Huo, J.Ma, and D.Meng, Fast and live whole-system migration of virtual machines, (in Chinese), Journal of Computer Research and Development, vol. 49, no. 3, pp. 661-668, 2012.
F.Xu, F. M.Liu, L. H.Liu, H.Jin, B.Li, and B. C.Li, iAware: Making live migration of virtual machines interference-aware in the cloud, IEEE Transactions on Computers, vol. 63, no.12, pp. 3012-3025, 2014.
K. J.Ye, Z. H.Wu, C.Wang, B. B.Zhou, W. S.Si, X. H.Jiang, and A. Y.Zomaya, Profiling-based workload consolidation and migration in virtualized data centers, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 3, pp. 878-890, 2015.
H. K.Liu and B. S.He, VMbuddies: Coordinating live migration of multi-tier applications in cloud environments, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 4, pp. 1192-1205, 2015.
J.Zhu, W.Dong, Z. F.Jiang, X. G.Shi, Z.Xiao, and X. M.Li, Improving the performance of hypervisor-based fault tolerance, in Proc. IEEE Int. Symp. Parallel & Distributed Processing, Atlanta, GA, USA, 2010, pp. 1-10.
J.Zhu, Z. F.Jiang, Z.Xiao, and X. M.Li, Optimizing the performance of virtual machine synchronization for fault tolerance, IEEE Transactions on Computers, vol. 60, no. 12, pp. 1718-1729, 2011.
D.Shen, J. Z.Luo, F.Dong, and J. X.Zhang, VirtCo: Joint coflow scheduling and virtual machine placement in cloud data centers, Tsinghua Science and Technology, vol. 24, no. 5, pp. 630-644, 2019.
Z.Liu, S. J.Sun, J.Xing, Z.Fu, X. H.Hu, J. W.Pi, X. F.Yang, Y. S.Lu, and J.Li, MN-SLA: A modular networking SLA framework for cloud management system, Tsinghua Science and Technology, vol. 23, no. 6, pp. 635-644, 2018.
Q.Li, Q. F.Hao, L. M.Xiao, and Z. J.Li, Adaptive management and multi-objective optimization for virtual machine placement in cloud computing, (in Chinese), Chinese Journal of Computers, vol. 34, no. 12, pp. 2253-2264, 2011.
K.Tsakalozos, M.Roussopoulos, and A.Delis, Hint-based execution of workloads in clouds with Nefeli, IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 7, pp. 1331-1340, 2013.
F.Farahnakian, A.Ashraf, T.Pahikkala, P.Liljeberg, J.Plosila, I.Porres, and H.Tenhunen, Using ant colony system to consolidate VMs for green cloud computing, IEEE Transactions on Services Computing, vol. 8, no. 2, pp. 187-198, 2015.
E. G.Coffman, M. R.Garey, and D. S.Johnson, Approximation algorithms for bin packing: A survey, in Approximation Algorithms for NP-Hard Problems. Boston, MA, USA: PWS Publishing, 1997, pp. 46-93.
[17]
N.Bobroff, A.Kochut, and K.Beaty, Dynamic placement of virtual machines for managing SLA violations, in Proc. 10th IFIP/IEEE Int. Symp. Integrated Management, Munich, Germany, 2007, pp. 119-128.
S.Chaisiri, B. S.Lee, and D.Niyato, Optimization of resource provisioning cost in cloud computing, IEEE Transactions on Services Computing, vol. 5, no. 2, pp. 164-177, 2012.
C. H.Lien, Y. W.Bai, and M. B.Lin, Estimation by software for the power consumption of streaming-media servers, IEEE Transactions on Instrumentation and Measurement, vol. 56, no. 5, pp. 1859-1870, 2007.
A.Verma, G.Dasgupta, T. K.Nayak, P.De, and R.Kothari, Server workload analysis for power minimization using consolidation, in Proc. 2009 Conf. USENIX Annu. Technical Conf., San Diego, CA, USA, 2009, p. 28.
[21]
J. K.Dong, H. B.Wang, and S. D.Cheng, Energy-performance tradeoffs in IaaS cloud with virtual machine scheduling, China Communications, vol. 12, no. 2, pp. 155-166, 2015.
X.Jing and J. A. B.Fortes, Multi-objective virtual machine placement in virtualized data center environments, in Proc. 2010 IEEE/ACM Int’l Conf. Green Computing and Communications & Int’l Conf. Cyber, Physical and Social Computing, Hangzhou, China, 2010, pp. 179-188.
[23]
F.Ma, F.Liu, and Z.Liu, Multi-objective optimization for initial virtual machine placement in cloud data center, Journal of Information and Computational Science, vol. 9, no. 16, pp. 5029-5038, 2012.
S. N.Wang, H. X.Gu, and G.Wu, A new approach to multi-objective virtual machine placement in virtualized data center, in Proc. IEEE 8th Int. Conf. Networking, Architecture and Storage, Xi’an, China, 2013, pp. 331-335.
F.Machida, M.Kawato, and Y.Maeno, Redundant virtual machine placement for fault-tolerant consolidated server clusters, in Proc. 2010 IEEE Network Operations and Management Symposium, Osaka, Japan, 2010, pp. 32-39.
G. Y.Jung, K. R.Joshi, M. A.Hiltunen, R. D.Schlichting, and C.Pu, Performance and availability aware regeneration for cloud based multitier applications, in Proc. IEEE/IFIP Int. Conf. Dependable Systems and Networks, Chicago, IL, USA, 2010, pp. 497-506.
E.Bin, O.Biran, O.Boni, E.Hadad, E. K.Kolodner, Y.Moatti, and D. H.Lorenz, Guaranteeing high availability goals for virtual machine placement, in Proc. 31st Int. Conf. Distributed Computing Systems, Minneapolis, MN, USA, 2011, pp. 700-709.
D.Epping and F.Denneman, VMware vSphere 4.1 HA and DRS Technical Deepdive. North Charleston, SC, USA: CreateSpace, 2010, pp. 15-22.
[29]
Z. B.Zhen, T. C.Zhou, M. R.Lyu, and I.King, FTCloud: A component ranking framework for fault-tolerant cloud applications, in Proc. IEEE 21st Int. Symp. Software Reliability Engineering, San Jose, CA, USA, 2010, pp. 398-407.
Z. B.Zheng, T. C.Zhou, M. R.Lyu, and I.King, Component ranking for fault-tolerant cloud applications, IEEE Transactions on Services Computing, vol. 5, no. 4, pp. 540-550, 2012.
F.Hermenier, J.Lawall, and G.Muller, BtrPlace: A flexible consolidation manager for highly available applications, IEEE Transactions on Dependable and Secure Computing, vol. 10, no. 5, pp. 273-286, 2013.
X. X.Cui, Multi-Objective Evolutionary Algorithms and Their Applications, (in Chinese). Beijing, China: National Defense Industry Press, 2006, pp. 10-20.
[33]
T.Stützle and H. H.Hoos, MAX-MIN ant system, Future Generation Computer Systems, vol. 16, no. 8, pp. 889-914, 2000.
R. N.Calheiros, R.Ranjan, A.Beloglazov, C. A. F.De Rose, and R.Buyya, CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms, Software: Practice and Experience, vol. 41, no.1, pp. 23-50, 2011.
E.Cecchet, A.Chanda, S.Elnikety, J.Marguerite, and W.Zwaenepoel, Performance comparison of middleware architectures for generating dynamic web content, in Proc. 2003 ACM/IFIP/USENIX Int. Conf. Middleware, Rio de Janeiro, Brazil, 2003, pp. 242-261.
A.Jin and J. H.Jiang, Fault injection scheme for embedded systems at machine code level and verification, in Proc. 15th IEEE Pacific Rim Int. Symp. Dependable Computing, Shanghai, China, 2009, pp. 55-62.
J. W.Hu and J. H.Jiang, Design and implementation of a fault injection mechanism for software reliability evaluation, (in Chinese), Journal of Computer-Aided Design & Computer Graphics, vol. 24, no. 6, pp. 741-751, 2012.
D. Q.Zhang, J. H.Jiang, and L. B.Chen, A method for validating the effectiveness of fault clustering and failure clustering of programs, Scientia Sinica Informationis, vol. 44, no. 10, pp. 1323-1344, 2014.
Zhang W, Chen X, Jiang J. A Multi-Objective Optimization Method of Initial Virtual Machine Fault-Tolerant Placement for Star Topological Data Centers of Cloud Systems. Tsinghua Science and Technology, 2021, 26(1): 95-111. https://doi.org/10.26599/TST.2019.9010044
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
10.26599/TST.2019.9010044.F1
Structure of the unified resource layer and fault-tolerant placement management system.
10.26599/TST.2019.9010044.F2
Diagram of resource usage when three VMs are placed on a physical node.
3.2 Heuristic ant colony algorithm
The following part describes the traditional heuristic algorithm, which can solve the VM fault-tolerant placement problem, and proposes a heuristic ACO algorithm.
(1) FFD-based algorithm of VM fault-tolerant placement
The FFD-based algorithm of VM fault-tolerant placement is described in Algorithm 1. Its input contains , , , , , , , , and , and its output contains A and . A is the placement matrix that describes how the VMs are placed on physical nodes. This algorithm can be divided into three steps, as follows:
Step 1. Sort the resource request of VMs in descending order, expressed as , , …, .
Step 2. Place on the physical node . If does not meet the requirements, then select the next physical node , until the resource request of is met.
Step 3. To meet the resource request of the VM, if physical nodes , , …, are already in use at this time, then select one or some of them to place in numerical order of subscript, ensuring that the resource request is met and its corresponding redundant and service-providing VMs are not on the same physical node. If one or some physical nodes that meet the requirements do exist, then select the one that has the minimum subscript to place . By contrast, if such a physical node does not exist, then select a new physical node to place .
(2) BFD-based algorithm of VM fault-tolerant placement
The BFD-based algorithm of VM fault-tolerant placement is described in Algorithm 2. Its input and output are the same as those of the FFD-based algorithm. This algorithm can be divided into three steps, as follows:
Step 1. Sort the resource requests of VMs in descending order, expressed as , , …, .
Step 2. Place on the physical node . If does not meet the requirements, then select the next physical node , until the resource request of is met.
Step 3. To meet the resource request of the VM, if physical nodes , , …, are already in use at this time, then select one or some of them to place in numerical order of the subscript, ensuring that the resource request is met and its corresponding redundant and service-providing VMs are not on the same physical node. If one or some physical nodes that meet the requirements do exist, then select the one which can minimize the resource remaining rate to place . By contrast, if such a physical node does not exist, then select a new physical node to place .
The traditional ant colony algorithm optimization depends on the positive feedback mechanism, but it often leads to local optimization or premature convergence. To avoid this situation, this study uses the Max-Min Ant System (MMAS) proposed by Stützle and Hoos[
33
], which can search for high-quality solutions and avoid premature convergence or falling into the local optimum.
We assume that the number of ants is in iteration , the number of ants on the -th VM is , the pheromone of the -th VM on the physical node is , the probability that ant places the -th VM on physical node is , the heuristic function is , the information heuristic factor is , the expected heuristic factor is , the tabu list of ant is (the set of VMs that has been placed), the volatile coefficient , the pheromone increment of the -th VM on physical node is , and the set of VMs that has not yet been placed and can be placed on physical node by ant is . If , then the transition probability function can be expressed as Eq. (
17
); otherwise, its value is zero.
where is a parameter used to control the influence of the information accumulated by ants during its movement process and is a parameter used to control the influence of the heuristic information in path selection. In iteration , the pheromone function of the -th VM on physical node is expressed as Eq. (
18
):
In the first iteration, , is a constant value and is the pheromone evaporation coefficient. In MMAS, we need to set the upper and lower bounds for pheromone. We assume that , where the initial value of is , , and is the lower bound factor (i.e., ). The pheromone increment of the -th VM on the physical node is expressed as Eq. (
19
):
where represents the optimal solution set of this model. According to Eq. (
19
), the value of can be calculated. In MMAS, the pheromone increment is equal to the value of the fitness function in the optimal solution. However, in this algorithm, the pheromone increment is equal to the value of the fitness function in the optimal solution divided by the pheromone, which can avoid falling into the local optimum prematurely and obtain the global optimal approximate solution set eventually. In Eq. (
17
), is the heuristic function when the -th VM is placed on physical node and represents the desirability of the -th VM placed on physical node . Its function can be expressed as Eq. (
20
):
where , , and represent the ratios of the request of the -th VM for CPU, memory, and network bandwidth resources to the remaining resources of physical node , respectively. As shown in Eq. (
20
), the ant preferentially places VMs with large requests for CPU, memory, and network bandwidth resources.
3.2 Heuristic ant colony algorithm
The following part describes the traditional heuristic algorithm, which can solve the VM fault-tolerant placement problem, and proposes a heuristic ACO algorithm.
(1) FFD-based algorithm of VM fault-tolerant placement
The FFD-based algorithm of VM fault-tolerant placement is described in Algorithm 1. Its input contains , , , , , , , , and , and its output contains A and . A is the placement matrix that describes how the VMs are placed on physical nodes. This algorithm can be divided into three steps, as follows:
Step 1. Sort the resource request of VMs in descending order, expressed as , , …, .
Step 2. Place on the physical node . If does not meet the requirements, then select the next physical node , until the resource request of is met.
Step 3. To meet the resource request of the VM, if physical nodes , , …, are already in use at this time, then select one or some of them to place in numerical order of subscript, ensuring that the resource request is met and its corresponding redundant and service-providing VMs are not on the same physical node. If one or some physical nodes that meet the requirements do exist, then select the one that has the minimum subscript to place . By contrast, if such a physical node does not exist, then select a new physical node to place .
(2) BFD-based algorithm of VM fault-tolerant placement
The BFD-based algorithm of VM fault-tolerant placement is described in Algorithm 2. Its input and output are the same as those of the FFD-based algorithm. This algorithm can be divided into three steps, as follows:
Step 1. Sort the resource requests of VMs in descending order, expressed as , , …, .
Step 2. Place on the physical node . If does not meet the requirements, then select the next physical node , until the resource request of is met.
Step 3. To meet the resource request of the VM, if physical nodes , , …, are already in use at this time, then select one or some of them to place in numerical order of the subscript, ensuring that the resource request is met and its corresponding redundant and service-providing VMs are not on the same physical node. If one or some physical nodes that meet the requirements do exist, then select the one which can minimize the resource remaining rate to place . By contrast, if such a physical node does not exist, then select a new physical node to place .
The traditional ant colony algorithm optimization depends on the positive feedback mechanism, but it often leads to local optimization or premature convergence. To avoid this situation, this study uses the Max-Min Ant System (MMAS) proposed by Stützle and Hoos[
33
], which can search for high-quality solutions and avoid premature convergence or falling into the local optimum.
We assume that the number of ants is in iteration , the number of ants on the -th VM is , the pheromone of the -th VM on the physical node is , the probability that ant places the -th VM on physical node is , the heuristic function is , the information heuristic factor is , the expected heuristic factor is , the tabu list of ant is (the set of VMs that has been placed), the volatile coefficient , the pheromone increment of the -th VM on physical node is , and the set of VMs that has not yet been placed and can be placed on physical node by ant is . If , then the transition probability function can be expressed as Eq. (
17
); otherwise, its value is zero.
where is a parameter used to control the influence of the information accumulated by ants during its movement process and is a parameter used to control the influence of the heuristic information in path selection. In iteration , the pheromone function of the -th VM on physical node is expressed as Eq. (
18
):
In the first iteration, , is a constant value and is the pheromone evaporation coefficient. In MMAS, we need to set the upper and lower bounds for pheromone. We assume that , where the initial value of is , , and is the lower bound factor (i.e., ). The pheromone increment of the -th VM on the physical node is expressed as Eq. (
19
):
where represents the optimal solution set of this model. According to Eq. (
19
), the value of can be calculated. In MMAS, the pheromone increment is equal to the value of the fitness function in the optimal solution. However, in this algorithm, the pheromone increment is equal to the value of the fitness function in the optimal solution divided by the pheromone, which can avoid falling into the local optimum prematurely and obtain the global optimal approximate solution set eventually. In Eq. (
17
), is the heuristic function when the -th VM is placed on physical node and represents the desirability of the -th VM placed on physical node . Its function can be expressed as Eq. (
20
):
where , , and represent the ratios of the request of the -th VM for CPU, memory, and network bandwidth resources to the remaining resources of physical node , respectively. As shown in Eq. (
20
), the ant preferentially places VMs with large requests for CPU, memory, and network bandwidth resources.
3.3 Multi-objective optimization algorithm of initial VM fault-tolerant placement
The heuristic ant colony algorithm includes the FFDACO and BFDACO algorithms. The difference is that, when an ant has placed all of the service-providing VMs, the redundant VMs can be placed by the FFD or BFD algorithm.
The FFDACO-based algorithm of VM fault-tolerant placement is described in Algorithm 3. Its input and output are similar to those of Algorithms 1 and 2. The difference is that the FFDACO-based algorithm has several additional parameters to MMAS that we have discussed in Section 3.2. This algorithm can be divided into the following steps:
Step 1. Initialize the parameters, i.e., the number of ants is and the number of iterations is .
Step 2. Select the physical node randomly and place ant on it. Then, the ant starts to search VMs.
Step 3. If set C (set of service-providing VMs) is not empty, then proceed to Step 4; otherwise, proceed to Step 6.
Step 4. The ant searches the VMs from , which contains the VMs that have not been placed and can be placed on physical node . If set is empty, then select the next physical node randomly and proceed to Step 3.
Step 5. The ant updates (it represents the desirability of the -th VM placed on physical node ) according to Eq. (
20
), selects the -th VM from set , places it on the physical node using the roulette wheel selection algorithm (which selects randomly by probability) according to Eq. (
17
), updates the placement matrix A, deletes the -th VM from set C, and updates (useable resources on the physical node ). Then, proceed to Step 3.
Step 6. Place the redundant VMs. Sort the VMs in set on the basis of the resource requests in descending order. Then, new VM sequence can be expressed as VM, VM, …, VM.
Step 7. If set is not empty, then proceed to Step 8; otherwise, proceed to Step 9.
Step 8. To meet the resource request of VM in set , if physical nodes , , …, are already in use at this time, then select one or some of them to place the VM in numerical order of subscript, ensuring that the resource request is met and its corresponding redundant and service-providing VMs are not on the same physical node. Then, if one or some physical nodes that meet the requirements do exist, then select the one that has the minimum subscript to place the VM. If such physical node does not exist, then select a new physical node to place the VM. Finally, delete the VM from , update the placement matrix A, and proceed to Step 7.
Step 9. If , then = A; otherwise, compare with , if (, then = A. Reset , , C, and . Then set , if , proceed to Step 2.
Step 10. Update according to Eqs. (
18
) and (
19
), if , set ; if , set . Then set , if , reset , and proceed to Step 2.
Step 11. Calculate U according to and by Eq. (
16
). Then, output and .
The difference between BFDACO and FFDACO algorithms is determined in Steps 6 - 8. In the BFDACO algorithm, these steps adopt the BFD method described in Algorithm 2.
According to Ref. [
33
], the time complexity of the proposed heuristic ant colony algorithm is , which increases with the increment of the number of VMs , the number of ants , and the number of iterations , respectively. Its space complexity is =; therefore, it increases with the increment of the number of VMs and the number of ants.
The four algorithms (i.e., FFD, BFD, FFDACO, and BFDACO) are all implemented using the Java programming language. Section 4 describes the three experiments, i.e., simulation, real cluster, and fault injection experiments, conducted in this study and explains all of the related algorithmic languages and environments.
10.26599/TST.2019.9010044.F3
Variations of the fitness function values with different weights of placement factors.
10.26599/TST.2019.9010044.F4
Comparison of the values of the placement factors among different algorithms.
10.26599/TST.2019.9010044.F5
Variation of the fitness function value based on the FFDACO algorithm.
10.26599/TST.2019.9010044.F6
Variation of the fitness function value based on the BFDACO algorithm.
10.26599/TST.2019.9010044.F7
Three-layer structure based on RUBiS.
10.26599/TST.2019.9010044.F8
Implementation of the prototype system.
10.26599/TST.2019.9010044.F9
Fitness function values in the simulation experiment.
10.26599/TST.2019.9010044.F10
Fitness function values in the real cluster experiment.