MADRL-based UAV swarm non-cooperative game under incomplete information

Ershen WANG; Fan LIU; Chen HONG; Jing GUO; Lin ZHAO; Jian XUE; Ning HE

doi:10.1016/j.cja.2024.03.030

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Full Length Article | Open Access

MADRL-based UAV swarm non-cooperative game under incomplete information

Ershen WANG^{^a}, Fan LIU^{^a}, Chen HONG^{^b}(

), Jing GUO^{^a}, Lin ZHAO^{^c}, Jian XUE^{^c}, Ning HE^{^d}

School of Electronic and Information Engineering, Shenyang Aerospace University, Shenyang 110136, China

College of Robotics, Beijing Union University, Beijing 100101, China

School of Engineering Science, University of Chinese Academy of Sciences, Beijing 100049, China

College of Smart City, Beijing Union University, Beijing 100101, China

Show Author Information

Abstract

Unmanned Aerial Vehicles (UAVs) play increasing important role in modern battlefield. In this paper, considering the incomplete observation information of individual UAV in complex combat environment, we put forward an UAV swarm non-cooperative game model based on Multi-Agent Deep Reinforcement Learning (MADRL), where the state space and action space are constructed to adapt the real features of UAV swarm air-to-air combat. The multi-agent particle environment is employed to generate an UAV combat scene with continuous observation space. Some recently popular MADRL methods are compared extensively in the UAV swarm non-cooperative game model, the results indicate that the performance of Multi-Agent Soft Actor-Critic (MASAC) is better than that of other MADRL methods by a large margin. UAV swarm employing MASAC can learn more effective policies, and obtain much higher hit rate and win rate. Simulations under different swarm sizes and UAV physical parameters are also performed, which implies that MASAC owns a well generalization effect. Furthermore, the practicability and convergence of MASAC are addressed by investigating the loss value of Q-value networks with respect to individual UAV, the results demonstrate that MASAC is of good practicability and the Nash equilibrium of the UAV swarm non-cooperative game under incomplete information can be reached.

Keywords

Deep learning Reinforcement learning Nash equilibrium Multi-agent Non-cooperative game UAV swarm

References

Lan YS, Zhang TT, Song AG. Adaptive structure modeling and prediction for swarm unmanned system. Sci Sin-Inf 2020;50(3):347–62.

Crossref Google Scholar

Sun CY, Mu CX. Important scientific problems of multi-agent deep reinforcement learning. Automatica Sinica 2020;46(7):1301–12 [Chinese].

Google Scholar

Wang ES, Guo J, Hong C, et al. Cooperative confrontation model of UAV swarm with random spatial networks. J Beijing Univ Aeronaut Astronaut 2023;49(1):10–6 [Chinese].

Google Scholar

Wang ES, Guo J, Hong C, et al. UAV swarm air-ground engagement model with improved payoff. J Nanjing Univ Aeronaut Astronaut 2021;53(6):888–97 [Chinese].

Google Scholar

Fan DD, Theodorou EA, Reeder J. Model-based stochastic search for large scale optimization of multi-agent UAV swarms. 2018 IEEE symposium series on computational intelligence (SSCI). Piscataway: IEEE Press; 2018.p.2216–22.

Crossref

Zhou K, Wei RX, Zhang QR, et al. Learning system for air combat decision inspired by cognitive mechanisms of the brain. IEEE Access 1809;8:8129–44.

Crossref Google Scholar

Song XY, Yang RP, Yin CS, et al. A cooperative aerial interception model based on multi-agent system for UAVs. 2021 IEEE 5th advanced information technology, electronic and automation control conference (IAEAC). Piscataway: IEEE Press; 2021. p. 873–82.

Crossref

Zheng XM, Ma CY. An intelligent target detection method of UAV swarms based on improved KM algorithm. Chin J Aeronaut 2021;34(2):539–53.

Crossref Google Scholar

Wang C, Wu LZ, Yan C, et al. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork. Chin J Aeronaut 2020;33(11):2930–45.

Crossref Google Scholar

Burch N, Johanson M, Bowling M. Solving imperfect information games using decomposition. Proceedings of the 28th AAAI conference on artificial intelligence. New York: ACM; 2014. p. 602–608.

Crossref

Rowland M, Omidshafiei S, Tuyls K, et al. Multiagent evaluation under incomplete information. arXiv preprint: 1909.09849, 2019.

Duan HB, Li P, Yu YX. A predator-prey particle swarm optimization approach to multiple UCAV air combat modeled by dynamic game theory. IEEE/CAA J Autom Sin 2015;2(1):11–8.

Crossref Google Scholar

Huang CQ, Dong KS, Huang HQ, et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J Syst Eng Electron 2018;29(1):86–97.

Crossref Google Scholar

Li SY, Chen M, Wang YH, et al. Human-computer gaming decision-making method in air combat under an incomplete strategy set. Sci Sin-Inf 2022;52(12):2239.

Crossref Google Scholar

Degrave J, Felici F, Buchli J, et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 2022;602(7897):414–9.

Crossref Google Scholar

Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature 2015;518(7540):529–33.

Crossref Google Scholar

Wang ZY, Schaul T, Hessel M, et al. Dueling network architectures for deep reinforcement learning. Proceedings of the 33rd international conference on international conference on machine learning - Volume 48. New York: ACM; 2016. p. 1995–2003.

van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double Q-Learning. Proceedings of the 13th AAAI conference on artificial intelligence. New York: ACM; 2016. p. 2094–2100.

Crossref

Huang Y, Wei GL, Wang YX. V-D D3QN: the variant of double deep Q-learning network with dueling architecture. 2018 37th Chinese control conference (CCC). Piscataway: IEEE; 2018. p. 9130–5.

Crossref

Kapturowski S, Ostrovski G, Quan J, et al. Recurrent experience replay in distributed reinforcement learning. Proceedings of the iternational conference on learn-ing representations. New York: ACM; 2018.

Badia AP, Sprechmann P, Vitvitskyi A, et al. Never give up: Learning directed exploration strategies. arXiv preprint: 2003.06038, 2020.

Badia AP, Piot B, Kapturowski S, et al. Agent57: Outperforming the Atari human benchmark. In: Proceedings of the 37th international conference on machine learning. New York: ACM; 2020. p. 507–517.

Co-Reyes JD, Miao YJ, Peng DY, et al. Evolving reinforcement learning algorithms. arXivpreprint: 2101.03958, 2021.

Williams RJ. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn 1992;8(3):229–56.

Crossref Google Scholar

Sutton RS, McAllester D, Singh S, et al. Policy gradient methods for reinforcement learning with function approximation. Proceedings of the 12th international conference on neural information processing systems. New York: ACM;1999. p. 1057–63.

Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning. Proceedings of the 33rd international conference on mchine learning - Volume 48. 2016, New York: ACM; 2016. p. 1928–37.

Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. Proceedings of the 31st international conference on international conference on machine learning - Volume 32. New York: ACM; 2014: I–387–I–395.

Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning. arXiv preprint: 1509.02971, 2015.

Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint: 1801.01290, 2018.

Lowe R, Wu Y, Tamar A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. Proceedings of the 31st international conference on neural information processing systems. New York: ACM; 2017. p. 6382–93.

Yu C, Velu A, Vinitsky E, et al. The surprising effectiveness of PPO in cooperative, multi-agent games. arXiv preprint: 2103.01955, 2021.

Foerster J, Farquhar G, Afouras T, et al. Counterfactual multiagent policy gradients. arXiv preprint: 1705.08926, 2018.

Crossref

Guo T, Jiang N, Li BY, et al. UAV navigation in high dynamic environments: A deep reinforcement learning approach. Chin J Aeronaut 2021;34(2):479–89.

Crossref Google Scholar

Liu P, Ma YF. A deep reinforcement learning based intelligent decision method for UCAV air combat. Asian simulation conference. Singapore: Springer; 2017. p. 274–286.

Crossref

Toghiani-Rizi B, Kamrani F, Luotsinen LJ, et al. Evaluating deep reinforcement learning for computer generated forces in ground combat simulation. 2017 IEEE international conference on systems, man, and cybernetics (SMC). Piscataway: IEEE; 2017. p. 3433–8.

Crossref

Yang QM, Zhu Y, Zhang JD, et al. UAV air combat autonomous maneuver decision based on DDPG algorithm. 2019 IEEE 15th international conference on control and automation (ICCA). Piscataway: IEEE; 2019. p. 37–42.

Crossref

Kong WR, Zhou DY, Yang Z, et al. UAV autonomous aerial combat maneuver strategy generation with observation error based on state-adversarial deep deterministic policy gradient and inverse reinforcement learning. Electronics 2020;9(7):1121.

Crossref Google Scholar

Fujimoto S, van Hoof H, Meger D. Addressing function approximation error in actor-critic methods. arXiv preprint: 1802.09477, 2018.

Qiu XY, Yao ZY, Tan FW, et al. One-to-one air-combat maneuver strategy based on improved TD3 algorithm. 2020 Chinese automation congress (CAC). Piscataway: IEEE; 2020. p. 5719–25.

Crossref

Cheng Y, Song Y. Autonomous dcision-mking generation of UAV based on soft actor-critic algorithm. 2020 39th Chinese control conference (CCC). Piscataway: IEEE; 2020. p. 7350–5.

Crossref

Wei XL, Yang LF, Cao G, et al. Recurrent MADDPG for object detection and assignment in combat tasks. IEEE Access 2020;8:163334–43.

Crossref Google Scholar

Kong WR, Zhou DY, Yang Z, et al. Maneuver strategy generation of UCAV for within visual range air combat based on multi-agent reinforcement learning and target position prediction. Appl Sci 2020;10(15):5198.

Crossref Google Scholar

Kong WR, Zhou DY, Zhang K, et al. Air combat autonomous maneuver decision for one-on-one within visual range engagement base on robust multi-agent reinforcement learning. 2020 IEEE 16th international conference on control & automation (ICCA). Piscataway: IEEE; 2020. p. 506–12.

Crossref

Kong WR, Zhou DY, Yang Z. Air combat strategies generation of CGF based on MADDPG and reward shaping. 2020 international conference on computer vision, image and deep learning (CVIDL). Piscataway: IEEE; 2020. p. 651–5.

Crossref

Xiang L, Xie T. Research on UAV swarm confrontation task based on MADDPG algorithm. 2020 5th international conference on mechanical, control and computer engineering (ICMCCE). Piscataway: IEEE; 2020. p. 1513–8.

Crossref

Wang ES, Liu F, Hong C, et al. MASAC-based confrontation game method of UAV clusters. Sci Sin-Inf 2022;52(12):2254.

Crossref Google Scholar

Bai X, Lu CX, Bao QH, et al. An improved PPO for multiple unmanned aerial vehicles. J Phys: Conf Ser 2021;1757(1):012156.

Crossref Google Scholar

Shi W, Feng YH, Cheng GQ, et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning. Acta Autom Sin 2021;47(7):1610–23 [Chinese].

Google Scholar

Zhang GY, Li Y, Xu XH, et al. Multiagent reinforcement learning for swarm confrontation environments. International conference on intelligent robotics and applications. Cham: Springer; 2019. p. 533–43.

Crossref

Huang LW, Fu MS, Qu H, et al. A deep reinforcement learning-based method applied for solving multi-agent defense and attack problems. Expert Syst Appl 2021;176:114896.

Crossref Google Scholar

Källström J, Heintz F. Agent coordination in air combat simulation using multi-agent deep reinforcement learning. 2020 IEEE international conference on systems, man, and cybernetics (SMC). New York: ACM; 2020. p. 2157–64.

Crossref

Gong ZH, Xu Y, Luo DL. UAV cooperative air combat maneuvering confrontation based on multi-agent reinforcement learning. Unmanned Syst 2023;11(3):273–86.

Crossref Google Scholar

Cao Y, Kou YX, Li ZW, et al. Autonomous maneuver decision of UCAV air combat based on double deep Q network algorithm and stochastic game theory. Int J Aerosp Eng 2023;2023:3657814.

Crossref Google Scholar

Zhang JD, Wang DH, Yang QM, et al. Multi-dimensional air combat decision-making of UAV based on HRL. ACTA ARMAMENTARII 2022;44(6):1547–63 [Chinese].

Crossref Google Scholar

Qiu Y, Zhao BQ, Zou J, et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm. Electron Opt Control 2023;30(1):8–14 [Chinese].

Google Scholar

McGrew JS, How JP, Williams B, et al. Air-combat strategy using approximate dynamic programming. J Guid Contr Dyn 2010;33(5):1641–54.

Crossref Google Scholar

Ma W, Li H, Wang Z, et al. Close air combat maneuver decision based on deep stochastic game. Systems Eng Electron 2021;43(2):443–51 [Chinese].

Google Scholar

Shapley LS. Stochastic games. Proc Natl Acad Sci USA 1953;39(10):1095–100.

Crossref Google Scholar

Littman ML. Markov games as a framework for multi-agent reinforcement learning. Machine learning proceedings 1994. Amsterdam: Elsevier; 1994. p. 157–63.

Crossref

Lanctot M, Waugh K, Zinkevich M, et al. Monte Carlo sampling for regret minimization in extensive games. Proceedings of the 22nd international conference on neural information processing systems. New York: ACM; 2009. p. 1078–1086.

Heinrich J, Silver D. Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint: 1603.01121, 2016.

Nash J. Non-cooperative games. Ann Math 1951;54(2):286.

Crossref Google Scholar

Yang YD, Wang J. An overview of multi-agent reinforcement learning from game theoretical perspective. arXiv preprint: 2011.00583, 2020.

Bowling M, Veloso M. Rational and convergent learning in stochastic games. Proceedings of the 17th international joint conference on atificial intelligence - Volume 2. New York: ACM; 2001. p.1021–1026.

Bowling M, Veloso M. Multiagent learning using a variable learning rate. Artif Intell 2002;136(2):215–50.

Crossref Google Scholar

Shoham Y, Powers R, Grenager T. If multi-agent learning is the answer, what is the question? Artif Intell 2007;171(7):365–77.

Crossref Google Scholar

Hao JY, Shao K, Li K, et al. Research and applications of game intelligence. Sci Sin-Inf 2023;53(10):1892.

Crossref Google Scholar

Du Y, Li FX, Zandi HL, et al. Approximating Nash equilibrium in day-ahead electricity market bidding with multi-agent deep reinforcement learning. J Mod Power Syst Clean Energy 2021;9(3):534–44.

Crossref Google Scholar

Albrecht SV, Christianos F, et al. Multi-agent reinforcement learning: Foundations and modern approaches. Pasadena:MIT Press; 2023.

Ghavamzadeh M, Mannor S, Pineau J, et al. Bayesian reinforcement learning: A survey. Found Trends® Mach Learn 2015;8(5–6):359–483.

Crossref Google Scholar

Chinese Journal of Aeronautics

Volume 37 Issue 6,
June 2024

Pages 293-306

DOI: 10.1016/j.cja.2024.03.030

Cite this article:

WANG E, LIU F, HONG C, et al. MADRL-based UAV swarm non-cooperative game under incomplete information. Chinese Journal of Aeronautics, 2024, 37(6): 293-306. https://doi.org/10.1016/j.cja.2024.03.030

Views

Crossref

Web of Science

Scopus

Google Scholar
Citation

Altmetrics

Received: 19 June 2023

Revised: 03 August 2023

Accepted: 08 September 2023

Published: 25 March 2024

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).