Enhanced reinforcement learning-model predictive control for distributed energy systems: Overcoming local and global optimization limitations

Hua Meng; Huaijiang Bin; Fanyue Qian; Tingting Xu; Chaoliang Wang; Wei Liu; Yuting Yao; Yingjun Ruan

doi:10.1007/s12273-024-1227-9

| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

References

Show full outline

Hide outline

Research Article

Enhanced reinforcement learning-model predictive control for distributed energy systems: Overcoming local and global optimization limitations

Hua Meng^¹, Huaijiang Bin^¹, Fanyue Qian^¹, Tingting Xu^¹, Chaoliang Wang^², Wei Liu^², Yuting Yao^¹(), Yingjun Ruan^¹()

College of Mechanical and Energy Engineering, Tongji University, Shanghai 200092, China

State Grid Zhejiang Marketing Service Centre, Hangzhou 310014, China

Show Author Information

Abstract

The complex structures of distributed energy systems (DES) and uncertainties arising from renewable energy sources and user load variations pose significant operational challenges. Model predictive control (MPC) and reinforcement learning (RL) are widely used to optimize DES by predicting future outcomes based on the current state. However, MPC’s real-time application is constrained by its computational demands, making it less suitable for complex systems with extended predictive horizons. Meanwhile, RL’s model-free approach leads to suboptimal data utilization, limiting its overall performance. To address these issues, this study proposes an improved reinforcement learning-model predictive control (RL-MPC) algorithm that combines the high-precision local optimization of MPC with the global optimization capability of RL. In this study, we enhance the existing RL-MPC algorithm by increasing the number of optimization steps performed by the MPC component. We evaluated RL, MPC, and the enhanced RL-MPC on a DES comprising a photovoltaic (PV) and battery energy storage system (BESS). The results indicate the following: (1) The twin delayed deep deterministic policy gradient (TD3) algorithm outperforms other RL algorithms in energy cost optimization, but is outperformed in all cases by RL-MPC. (2) For both MPC and RL-MPC, when the mean absolute percentage error (MAPE) of the first-step prediction is 5%, the total cost increases by ~1.2% compared to that when the MAPE is 0%. However, if the accuracy of the initial prediction data remains constant while only the error gradient of the data sequence increases, the total cost remains nearly unchanged, with an increase of only ~0.1%. (3) Within a 12 h predictive horizon, RL-MPC outperforms MPC, suggesting it as a suitable alternative to MPC when high-accuracy prediction data are limited.

Keywords

operation optimization distributed energy systems model predictive control reinforcement learning reinforcement learning-model predictive control

References

Afram A, Janabi-Sharifi F (2014). Theory and applications of HVAC control systems—A review of model predictive control (MPC). Building and Environment, 72: 343–355.