Discover the SciOpen Platform and Achieve Your Research Goals with Ease.
Search articles, authors, keywords, DOl and etc.
Reinforcement Learning (RL) has emerged as a promising data-driven solution for wargaming decision-making. However, two domain challenges still exist: (1) dealing with discrete-continuous hybrid wargaming control and (2) accelerating RL deployment with rich offline data. Existing RL methods fail to handle these two issues simultaneously, thereby we propose a novel offline RL method targeting hybrid action space. A new constrained action representation technique is developed to build a bidirectional mapping between the original hybrid action space and a latent space in a semantically consistent way. This allows learning a continuous latent policy with offline RL with better exploration feasibility and scalability and reconstructing it back to a needed hybrid policy. Critically, a novel offline RL optimization objective with adaptively adjusted constraints is designed to balance the alleviation and generalization of out-of-distribution actions. Our method demonstrates superior performance and generality across different tasks, particularly in typical realistic wargaming scenarios.
Ö. F. Arar and K. Ayan, A flexible rule-based framework for pilot performance analysis in air combat simulation systems, Turk. J. Elec. Eng. Comp. Sci., vol. 21, no. 8, pp. 2397–2415, 2013.
C. Huang, H. Zhang, L. Wang, X. Luo, and Y. Song, Mixed deep reinforcement learning considering discrete-continuous hybrid action space for smart home energy management, J. Mod. Power Syst. Clean Energy, vol. 10, no. 3, pp. 743–754, 2022.
K. Gao, Y. Huang, A. Sadollah, and L. Wang, A review of energy-efficient scheduling in intelligent production systems, Complex Intell. Syst., vol. 6, no. 2, pp. 237–249, 2020.
Y. He, L. Xing, Y. Chen, W. Pedrycz, L. Wang, and G. Wu, A generic Markov decision process model and reinforcement learning method for scheduling agile earth observation satellites, IEEE Trans. Syst. Man Cybern. Syst., vol. 52, no. 3, pp. 1463–1474, 2022.
K. Zhu and T. Zhang, Deep reinforcement learning based mobile robot navigation: A review, Tsinghua Science and Technology, vol. 26, no. 5, pp. 674–691, 2021.
K. Zhao and L. Ning, Hybrid navigation method for multiple robots facing dynamic obstacles, Tsinghua Science and Technology, vol. 27, no. 6, pp. 894–901, 2022.
X. Hao, C. Xu, L. Xie, and H. Li, Optimizing the perceptual quality of time-domain speech enhancement with reinforcement learning, Tsinghua Science and Technology, vol. 27, no. 6, pp. 939–947, 2022.
L. Wang, Z. Pan, and J. Wang, A review of reinforcement learning based intelligent optimization for manufacturing scheduling, Complex System Modeling and Simulation, vol. 1, no. 4, pp. 257–270, 2021.
M. Tan, Z. Zhang, Y. Ren, I. Richard, and Y. Zhang, Multi-agent system for electric vehicle charging scheduling in parking lots, Complex System Modeling and Simulation, vol. 3, no. 2, pp. 129–142, 2023.
Z. Liao and S. Li, Solving nonlinear equations systems with an enhanced reinforcement learning based differential evolution, Complex System Modeling and Simulation, vol. 2, no. 1, pp. 78–95, 2022.
W. Shi, Y. H. Feng, G. Q. Cheng, H. L. Huang, J. C. Huang, Z. Liu, and W. He, Research on multi-aircraft cooperative air combat method based on deep reinforcement learning, (in Chinese), Acta Autom. Sin., vol. 47, no. 7, pp. 1610–1623, 2021.
Y. Sun, B. Yuan, Q. Xiang, J. Zhou, J. Yu, D. Dai, and X. Zhou, Intelligent decision-making and human language communication based on deep reinforcement learning in a wargame environment, IEEE Trans. Hum. Mach. Syst., vol. 53, pp. 201–214, 2023.
X. Lou, Q. Yin, J. Zhang, C. Yu, Z. He, N. Cheng, and K. Huang, Offline reinforcement learning with representations for actions, Inf. Sci., vol. 610, pp. 746–758, 2022.
P. Swazinna, S. Udluft, D. Hein, and T. Runkler, Comparing model-free and model-based algorithms for offline reinforcement learning, IFAC-Papers On Line, vol. 55, no. 15, pp. 19–26, 2022.
C. Huang, K. Dong, H. Huang, S. Tang, and Z. Zhang, Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization, J. Syst. Eng. Electron., vol. 29, no. 1, pp. 86–97, 2018.
860
Views
346
Downloads
0
Crossref
0
Web of Science
0
Scopus
0
CSCD
Altmetrics
The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).