AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Full Length Article | Open Access

MADRL-based UAV swarm non-cooperative game under incomplete information

Ershen WANGaFan LIUaChen HONGb( )Jing GUOaLin ZHAOcJian XUEcNing HEd
School of Electronic and Information Engineering, Shenyang Aerospace University, Shenyang 110136, China
College of Robotics, Beijing Union University, Beijing 100101, China
School of Engineering Science, University of Chinese Academy of Sciences, Beijing 100049, China
College of Smart City, Beijing Union University, Beijing 100101, China
Show Author Information

Abstract

Unmanned Aerial Vehicles (UAVs) play increasing important role in modern battlefield. In this paper, considering the incomplete observation information of individual UAV in complex combat environment, we put forward an UAV swarm non-cooperative game model based on Multi-Agent Deep Reinforcement Learning (MADRL), where the state space and action space are constructed to adapt the real features of UAV swarm air-to-air combat. The multi-agent particle environment is employed to generate an UAV combat scene with continuous observation space. Some recently popular MADRL methods are compared extensively in the UAV swarm non-cooperative game model, the results indicate that the performance of Multi-Agent Soft Actor-Critic (MASAC) is better than that of other MADRL methods by a large margin. UAV swarm employing MASAC can learn more effective policies, and obtain much higher hit rate and win rate. Simulations under different swarm sizes and UAV physical parameters are also performed, which implies that MASAC owns a well generalization effect. Furthermore, the practicability and convergence of MASAC are addressed by investigating the loss value of Q-value networks with respect to individual UAV, the results demonstrate that MASAC is of good practicability and the Nash equilibrium of the UAV swarm non-cooperative game under incomplete information can be reached.

References

1

Lan YS, Zhang TT, Song AG. Adaptive structure modeling and prediction for swarm unmanned system. Sci Sin-Inf 2020;50(3):347–62.

2

Sun CY, Mu CX. Important scientific problems of multi-agent deep reinforcement learning. Automatica Sinica 2020;46(7):1301–12 [Chinese].

3

Wang ES, Guo J, Hong C, et al. Cooperative confrontation model of UAV swarm with random spatial networks. J Beijing Univ Aeronaut Astronaut 2023;49(1):10–6 [Chinese].

4

Wang ES, Guo J, Hong C, et al. UAV swarm air-ground engagement model with improved payoff. J Nanjing Univ Aeronaut Astronaut 2021;53(6):888–97 [Chinese].

5
Fan DD, Theodorou EA, Reeder J. Model-based stochastic search for large scale optimization of multi-agent UAV swarms. 2018 IEEE symposium series on computational intelligence (SSCI). Piscataway: IEEE Press; 2018.p.2216–22.
6

Zhou K, Wei RX, Zhang QR, et al. Learning system for air combat decision inspired by cognitive mechanisms of the brain. IEEE Access 1809;8:8129–44.

7
Song XY, Yang RP, Yin CS, et al. A cooperative aerial interception model based on multi-agent system for UAVs. 2021 IEEE 5th advanced information technology, electronic and automation control conference (IAEAC). Piscataway: IEEE Press; 2021. p. 873–82.
8

Zheng XM, Ma CY. An intelligent target detection method of UAV swarms based on improved KM algorithm. Chin J Aeronaut 2021;34(2):539–53.

9

Wang C, Wu LZ, Yan C, et al. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork. Chin J Aeronaut 2020;33(11):2930–45.

10
Burch N, Johanson M, Bowling M. Solving imperfect information games using decomposition. Proceedings of the 28th AAAI conference on artificial intelligence. New York: ACM; 2014. p. 602–608.
11
Rowland M, Omidshafiei S, Tuyls K, et al. Multiagent evaluation under incomplete information. arXiv preprint: 1909.09849, 2019.
12

Duan HB, Li P, Yu YX. A predator-prey particle swarm optimization approach to multiple UCAV air combat modeled by dynamic game theory. IEEE/CAA J Autom Sin 2015;2(1):11–8.

13

Huang CQ, Dong KS, Huang HQ, et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron 2018;29(1):86–97.

14

Li SY, Chen M, Wang YH, et al. Human-computer gaming decision-making method in air combat under an incomplete strategy set. Sci Sin-Inf 2022;52(12):2239.

15

Degrave J, Felici F, Buchli J, et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 2022;602(7897):414–9.

16

Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature 2015;518(7540):529–33.

17
Wang ZY, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning. Proceedings of the 33rd international conference on international conference on machine learning - Volume 48. New York: ACM; 2016. p. 1995–2003.
18
van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-Learning. Proceedings of the 13th AAAI conference on artificial intelligence. New York: ACM; 2016. p. 2094–2100.
19
Huang Y, Wei GL, Wang YX. V-D D3QN: the variant of double deep Q-learning network with dueling architecture. 2018 37th Chinese control conference (CCC). Piscataway: IEEE; 2018. p. 9130–5.
20
Kapturowski S, Ostrovski G, Quan J, et al. Recurrent experience replay in distributed reinforcement learning. Proceedings of the iternational conference on learn-ing representations. New York: ACM; 2018.
21
Badia AP, Sprechmann P, Vitvitskyi A, et al. Never give up: Learning directed exploration strategies. arXiv preprint: 2003.06038, 2020.
22
Badia AP, Piot B, Kapturowski S, et al. Agent57: Outperforming the Atari human benchmark. In: Proceedings of the 37th international conference on machine learning. New York: ACM; 2020. p. 507–517.
23
Co-Reyes JD, Miao YJ, Peng DY, et al. Evolving reinforcement learning algorithms. arXivpreprint: 2101.03958, 2021.
24

Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 1992;8(3):229–56.

25
Sutton RS, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation. Proceedings of the 12th international conference on neural information processing systems. New York: ACM;1999. p. 1057–63.
26
Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. Proceedings of the 33rd international conference on mchine learning - Volume 48. 2016, New York: ACM; 2016. p. 1928–37.
27
Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. Proceedings of the 31st international conference on international conference on machine learning - Volume 32. New York: ACM; 2014: I–387–I–395.
28
Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv preprint: 1509.02971, 2015.
29
Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint: 1801.01290, 2018.
30
Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. Proceedings of the 31st international conference on neural information processing systems. New York: ACM; 2017. p. 6382–93.
31
Yu C, Velu A, Vinitsky E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint: 2103.01955, 2021.
32
Foerster J, Farquhar G, Afouras T, et al. Counterfactual multiagent policy gradients. arXiv preprint: 1705.08926, 2018.
33

Guo T, Jiang N, Li BY, et al. UAV navigation in high dynamic environments: A deep reinforcement learning approach. Chin J Aeronaut 2021;34(2):479–89.

34
Liu P, Ma YF. A deep reinforcement learning based intelligent decision method for UCAV air combat. Asian simulation conference. Singapore: Springer; 2017. p. 274–286.
35
Toghiani-Rizi B, Kamrani F, Luotsinen LJ, et al. Evaluating deep reinforcement learning for computer generated forces in ground combat simulation. 2017 IEEE international conference on systems, man, and cybernetics (SMC). Piscataway: IEEE; 2017. p. 3433–8.
36
Yang QM, Zhu Y, Zhang JD, et al. UAV air combat autonomous maneuver decision based on DDPG algorithm. 2019 IEEE 15th international conference on control and automation (ICCA). Piscataway: IEEE; 2019. p. 37–42.
37

Kong WR, Zhou DY, Yang Z, et al. UAV autonomous aerial combat maneuver strategy generation with observation error based on state-adversarial deep deterministic policy gradient and inverse reinforcement learning. Electronics 2020;9(7):1121.

38
Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. arXiv preprint: 1802.09477, 2018.
39
Qiu XY, Yao ZY, Tan FW, et al. One-to-one air-combat maneuver strategy based on improved TD3 algorithm. 2020 Chinese automation congress (CAC). Piscataway: IEEE; 2020. p. 5719–25.
40
Cheng Y, Song Y. Autonomous dcision-mking generation of UAV based on soft actor-critic algorithm. 2020 39th Chinese control conference (CCC). Piscataway: IEEE; 2020. p. 7350–5.
41

Wei XL, Yang LF, Cao G, et al. Recurrent MADDPG for object detection and assignment in combat tasks. IEEE Access 2020;8:163334–43.

42

Kong WR, Zhou DY, Yang Z, et al. Maneuver strategy generation of UCAV for within visual range air combat based on multi-agent reinforcement learning and target position prediction. Appl Sci 2020;10(15):5198.

43
Kong WR, Zhou DY, Zhang K, et al. Air combat autonomous maneuver decision for one-on-one within visual range engagement base on robust multi-agent reinforcement learning. 2020 IEEE 16th international conference on control & automation (ICCA). Piscataway: IEEE; 2020. p. 506–12.
44
Kong WR, Zhou DY, Yang Z. Air combat strategies generation of CGF based on MADDPG and reward shaping. 2020 international conference on computer vision, image and deep learning (CVIDL). Piscataway: IEEE; 2020. p. 651–5.
45
Xiang L, Xie T. Research on UAV swarm confrontation task based on MADDPG algorithm. 2020 5th international conference on mechanical, control and computer engineering (ICMCCE). Piscataway: IEEE; 2020. p. 1513–8.
46

Wang ES, Liu F, Hong C, et al. MASAC-based confrontation game method of UAV clusters. Sci Sin-Inf 2022;52(12):2254.

47

Bai X, Lu CX, Bao QH, et al. An improved PPO for multiple unmanned aerial vehicles. J Phys: Conf Ser 2021;1757(1):012156.

48

Shi W, Feng YH, Cheng GQ, et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning. Acta Autom Sin 2021;47(7):1610–23 [Chinese].

49
Zhang GY, Li Y, Xu XH, et al. Multiagent reinforcement learning for swarm confrontation environments. International conference on intelligent robotics and applications. Cham: Springer; 2019. p. 533–43.
50

Huang LW, Fu MS, Qu H, et al. A deep reinforcement learning-based method applied for solving multi-agent defense and attack problems. Expert Syst Appl 2021;176:114896.

51
Källström J, Heintz F. Agent coordination in air combat simulation using multi-agent deep reinforcement learning. 2020 IEEE international conference on systems, man, and cybernetics (SMC). New York: ACM; 2020. p. 2157–64.
52

Gong ZH, Xu Y, Luo DL. UAV cooperative air combat maneuvering confrontation based on multi-agent reinforcement learning. Unmanned Syst 2023;11(3):273–86.

53

Cao Y, Kou YX, Li ZW, et al. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. Int J Aerosp Eng 2023;2023:3657814.

54

Zhang JD, Wang DH, Yang QM, et al. Multi-dimensional air combat decision-making of UAV based on HRL. ACTA ARMAMENTARII 2022;44(6):1547–63 [Chinese].

55

Qiu Y, Zhao BQ, Zou J, et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm. Electron Opt Control 2023;30(1):8–14 [Chinese].

56

McGrew JS, How JP, Williams B, et al. Air-combat strategy using approximate dynamic programming. J Guid Contr Dyn 2010;33(5):1641–54.

57

Ma W, Li H, Wang Z, et al. Close air combat maneuver decision based on deep stochastic game. Systems Eng Electron 2021;43(2):443–51 [Chinese].

58

Shapley LS. Stochastic games. Proc Natl Acad Sci USA 1953;39(10):1095–100.

59
Littman ML. Markov games as a framework for multi-agent reinforcement learning. Machine learning proceedings 1994. Amsterdam: Elsevier; 1994. p. 157–63.
60
Lanctot M, Waugh K, Zinkevich M, et al. Monte Carlo sampling for regret minimization in extensive games. Proceedings of the 22nd international conference on neural information processing systems. New York: ACM; 2009. p. 1078–1086.
61
Heinrich J, Silver D. Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint: 1603.01121, 2016.
62

Nash J. Non-cooperative games. Ann Math 1951;54(2):286.

63
Yang YD, Wang J. An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv preprint: 2011.00583, 2020.
64
Bowling M, Veloso M. Rational and convergent learning in stochastic games. Proceedings of the 17th international joint conference on atificial intelligence - Volume 2. New York: ACM; 2001. p.1021–1026.
65

Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artif Intell 2002;136(2):215–50.

66

Shoham Y, Powers R, Grenager T. If multi-agent learning is the answer, what is the question? Artif Intell 2007;171(7):365–77.

67

Hao JY, Shao K, Li K, et al. Research and applications of game intelligence. Sci Sin-Inf 2023;53(10):1892.

68

Du Y, Li FX, Zandi HL, et al. Approximating Nash equilibrium in day-ahead electricity market bidding with multi-agent deep reinforcement learning. J Mod Power Syst Clean Energy 2021;9(3):534–44.

69

Albrecht SV, Christianos F, et al. Multi-agent reinforcement learning: Foundations and modern approaches. Pasadena:MIT Press; 2023.

70

Ghavamzadeh M, Mannor S, Pineau J, et al. Bayesian reinforcement learning: A survey. Found Trends® Mach Learn 2015;8(5–6):359–483.

Chinese Journal of Aeronautics
Pages 293-306
Cite this article:
WANG E, LIU F, HONG C, et al. MADRL-based UAV swarm non-cooperative game under incomplete information. Chinese Journal of Aeronautics, 2024, 37(6): 293-306. https://doi.org/10.1016/j.cja.2024.03.030

74

Views

0

Crossref

0

Web of Science

0

Scopus

Altmetrics

Received: 19 June 2023
Revised: 03 August 2023
Accepted: 08 September 2023
Published: 25 March 2024
© 2024 Chinese Society of Aeronautics and Astronautics.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Return