With the breakthroughs in generative artificial intelligence (GAI) models, the vast computational demands are placing an unprecedented burden on the data center (DC) energy supply. The cooling system is the second major energy consumer in the DC, maintaining the safe and efficient operation of computing equipment. However, time-varying temperature gradients and power distribution pose a considerable challenge for efficient cooling management in DCs. For this problem, this work proposes a multi-objective cooling control optimization (MCCO) method to minimize cooling energy consumption while maximizing the rack cooling index (RCI) to ensure energy efficiency and security of hybrid-cooled DCs. The proposed method relies on high-fidelity models that characterize the dynamic thermal evolution and cooling power. Therefore, a novel network model (TCN-BiGRU-Attention) combining temporal convolutional network (TCN), bidirectional gated recurrent unit (BiGRU), and attention mechanism is designed to capture the features of multivariate time-series to predict temperature changes in thermal environments and cooling loops. Moreover, considering the complex heat transfer and operational characteristics of hybrid cooling systems, a machine learning (ML)-based power model is constructed to evaluate the holistic cooling power. Subsequently, the NSGA-Ⅱ algorithm formulates the optimal cooling decision based on the predicted thermal distribution and cooling power, realizing the trade-off between energy consumption and cooling effectiveness. The results of numerical experiments using Marconi 100 data traces suggest that the proposed MCCO significantly reduces cooling energy consumption in summer and winter while maintaining the RCI above 95%.
- Article type
- Year
- Co-author
As the power demand in data centers is increasing, the power capacity of the power supply system has become an essential resource to be optimized. Although many data centers use power oversubscription to make full use of the power capacity, there are unavoidable power supply risks associated with it. Therefore, how to improve the data center power capacity utilization while ensuring power supply security has become an important issue. To solve this problem, we first define it and propose a risk evaluation metric called Weighted Power Supply Risk (WPSRisk). Then, a method, named Hybrid Genetic Algorithm with Ant Colony System (HGAACS) , is proposed to improve power capacity utilization and reduce power supply risks by optimizing the server placement in the power supply system. HGAACS uses historical power data of each server to find a better placement solution by population iteration. HGAACS possesses not only the remarkable local search ability of Ant Colony System (ACS), but also enhances the global search capability by incorporating genetic operators from Genetic Algorithm (GA). To verify the performance of HGAACS, we experimentally compare it with five other placement algorithms. The experimental results show that HGAACS can perform better than other algorithms in both improving power utilization and reducing the risk of power supply system.