Reinforcement learning based UAV formation control in GPS-denied environment

Bodi MA; Zhenbao LIU; Feihong JIANG; Wen ZHAO; Qingqing DANG; Xiao WANG; Junhong ZHANG; Lina WANG

doi:10.1016/j.cja.2023.07.006

AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Full Length Article | Open Access

Reinforcement learning based UAV formation control in GPS-denied environment

Bodi MA^{^a}, Zhenbao LIU^{^a^,^b}(

), Feihong JIANG^{^a^,^c}, Wen ZHAO^{^a}, Qingqing DANG^{^a}, Xiao WANG^{^a}, Junhong ZHANG^{^a^,^c}, Lina WANG^{^a}

School of Civil Aviation, Northwestern Polytechnical University, Xi’an 710000, China

Research & Development Institute in Shenzhen, Northwestern Polytechnical University, Shenzhen 518000,China

Flight Control Division, AVIC The First Aircraft Institute, Xi’an 710089, China

Show Author Information

Abstract

Highly accurate positioning is a crucial prerequisite of multi Unmanned Aerial Vehicle close-formation flight for target tracking, formation keeping, and collision avoidance. Although the position of a UAV can be obtained through the Global Positioning System (GPS), it is difficult for a UAV to obtain highly accurate positioning data in a GPS-denied environment (e.g., a GPS jamming area, suburb, urban canyon, or mountain area); this may cause it to miss a tracking target or collide with another UAV. In particular, UAV close-formation control in GPS-denied environments faces difficulties owing to the low-accuracy position, close distance between vehicles, and nonholonomic dynamics of a UAV. In this paper, on the one hand, we develop an innovative UAV formation localization method to address the formation localization issues in GPS-denied environments; on the other hand, we design a novel reinforcement learning based algorithm to achieve the high-efficiency and robust performance of the controller. First, a novel Lidar-based localization algorithm is developed to measure the localization of each aircraft in the formation flight. In our solution, each UAV is equipped with Lidar as the position measurement sensor instead of the GPS module. The k-means algorithm is implemented to calculate the center point position of UAV. A novel formation position vector matching method is proposed to match center points with UAVs in the formation and estimate their position information. Second, a reinforcement learning based UAV formation control algorithm is developed by selecting the optimal policy to control UAV swarm to start and keep flying in a close formation of a specific geometry. Third, the innovative collision risk evaluation module is proposed to address the collision-free issues in the formation group. Finally, a novel experience replay method is also provided in this paper to enhance the learning efficiency. Experimental results validate the accuracy, effectiveness, and robustness of the proposed scheme.

Keywords

Reinforcement learning Unmanned aerial vehicles (UAVs)Close formation control GPS-denied environment Intelligent flight control

References

An BH, Wang B, Fan HJ, et al. Fully distributed prescribed performance formation control for UAVs with unknown maneuver of leader. Aerosp Sci Technol 2022;130:107886.

Crossref Google Scholar

Yan C, Wang C, Xiang XJ, et al. Deep reinforcement learning of collision-free flocking policies for multiple fixed-wing UAVs using local situation maps. IEEE Trans Ind Inform 2022;18(2):1260–70.

Crossref Google Scholar

He A, Gao HG, Zhang SS, et al. Full mode flight dynamics modelling and control of stopped-rotor UAV. Chin J Aeronaut 2022;35(10):95–105.

Crossref Google Scholar

Chen LL, Liu ZB, Dang QQ, et al. Robust trajectory tracking control for a quadrotor using recursive sliding mode control and nonlinear extended state observer. Aerosp Sci Technol 2022;128:107749.

Crossref Google Scholar

Ma BD, Liu ZB, Jiang FH, et al. Vehicle detection in aerial images using rotation-invariant cascaded forest. IEEE Access 2019;7:59613–23.

Crossref Google Scholar

Muslimov TZ, Munasypov RA. Consensus-based cooperative control of parallel fixed-wing UAV formations via adaptive backstepping. Aerosp Sci Technol 2021;109:106416.

Crossref Google Scholar

Li H, Wang J, Han CW, et al. Leader-follower formation control of mutilple UAVs based on ADRC: Experiment research. 2021 4th IEEE international conference on industrial cyber-physical systems (ICPS). Piscataway: IEEE Press; 2021. p. 558–65.

Crossref

Huang Y, Meng ZY. Bearing-based distributed formation control of multiple vertical take-off and landing UAVs. IEEE Trans Contr Netw Syst 2021;8(3):1281–92.

Crossref Google Scholar

Liu YC, Bucknall R. Path planning algorithm for unmanned surface vehicle formations in a practical maritime environment. Ocean Eng 2015;97:126–44.

Crossref Google Scholar

Singh Y, Sharma S, Sutton R, et al. A constrained A* approach towards optimal path planning for an unmanned surface vehicle in a maritime environment containing dynamic obstacles and ocean currents. Ocean Eng 2018;169:187–201.

Crossref Google Scholar

Zhang JL, Zhang P, Yan JG. Distributed adaptive finite-time compensation control for UAV swarm with uncertain disturbances. IEEE Trans Circuits Syst I 2021;68(2):829–41.

Crossref Google Scholar

Zhang QR, Liu HHT. UDE-based robust command filtered backstepping control for close formation flight. IEEE Trans Ind Electron 2018;65(11):8818–27.

Crossref Google Scholar

Peng ZH, Wang D, Li TS, et al. Output-feedback cooperative formation maneuvering of autonomous surface vehicles with connectivity preservation and collision avoidance. IEEE Trans Cybern 2020;50(6):2527–35.

Crossref Google Scholar

Liu C, Jiang B, Zhang K. Adaptive fault-tolerant H-infinity output feedback control for lead–wing close formation flight. IEEE Trans Syst Man Cybern 2020;50(8):2804–14.

Google Scholar

Yang ZC, Zheng SQ, Liu F, et al. Adaptive output feedback control for fractional-order multi-agent systems. ISA Trans 2020;96:195–209.

Crossref Google Scholar

Ali ZA, Israr A, Alkhammash EH, et al. A leader-follower formation control of multi-UAVs via an adaptive hybrid controller. Complexity 2021;2021:1–16.

Crossref Google Scholar

Ali ZA, Han ZG. Multi-unmanned aerial vehicle swarm formation control using hybrid strategy. Trans Inst Meas Contr 2021;43(12):2689–701.

Crossref Google Scholar

Yu YJ, Guo J, Ahn CK, et al. Neural adaptive distributed formation control of nonlinear multi-UAVs with unmodeled dynamics. IEEE Trans Neural Netw Learn Syst 2022;99:1–7.

Google Scholar

Wang CP, Wang JQ, Wu P, et al. Consensus problem and formation control for heterogeneous multi-agent systems with switching topologies. Electronics 2022;11(16):2598.

Crossref Google Scholar

Liu BJ, Li AJ, Guo Y, et al. Adaptive distributed finite-time formation control for multi-UAVs under input saturation without collisions. Aerosp Sci Technol 2022;120:107252.

Crossref Google Scholar

Chen LL, Liu ZB, Gao HG, et al. Robust adaptive recursive sliding mode attitude control for a quadrotor with unknown disturbances. ISA Trans 2022;122:114–25.

Crossref Google Scholar

Liu WL, Li ZX, Sun SS, et al. Design a novel target to improve positioning accuracy of autonomous vehicular navigation system in GPS denied environments. IEEE Trans Ind Inform 2021;17(11):7575–88.

Crossref Google Scholar

Tang YZ, Hu YC, Cui JQ, et al. Vision-aided multi-UAV autonomous flocking in GPS-denied environment. IEEE Trans Ind Electron 2019;66(1):616–26.

Crossref Google Scholar

Lezki H, Yetik İŞ. Localization using single camera and lidar in GPS-denied environments. 2020 28th signal processing and communications applications conference (SIU). Piscataway: IEEE Press; 2021. p. 1–4.

Crossref

Wilson AN, Kumar A, Jha A, et al. Embedded sensors, communication technologies, computing platforms and machine learning for UAVs: A review. IEEE Sens J 2022;22(3):1807–26.

Crossref Google Scholar

Wan X, Shao YB, Zhang SY, et al. Terrain aided planetary UAV localization based on geo-referencing. IEEE Trans Geosci Remote Sens 2022;60:1–18.

Crossref Google Scholar

Guo KX, Li XX, Xie LH. Ultra-wideband and odometry-based cooperative relative localization with application to multi-UAV formation control. IEEE Trans Cybern 2020;50(6):2590–603.

Crossref Google Scholar

Hemann G, Singh S, Kaess M. Long-range GPS-denied aerial inertial navigation with LIDAR localization. 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS). Piscataway: IEEE Press; 2016. p. 1659–66.

Crossref

Shen HM, Zong Q, Tian BL, et al. Voxel-based localization and mapping for multirobot system in GPS-denied environments. IEEE Trans Ind Electron 2022;69(10):10333–42.

Crossref Google Scholar

Shen HM, Zong Q, Lu HC, et al. A distributed approach for lidar-based relative state estimation of multi-UAV in GPS-denied environments. Chin J Aeronaut 2022;35(1):59–69.

Crossref Google Scholar

Xu H, Wang LQ, Zhang YC, et al. Decentralized visual-inertial-UWB fusion for relative state estimation of aerial swarm. 2020 IEEE international conference on robotics and automation (ICRA). Piscataway: IEEE Press; 2020. p. 8776–82.

Crossref

Walter V, Saska M, Franchi A. Fast mutual relative localization of UAVs using ultraviolet LED markers. 2018 international conference on unmanned aircraft systems (ICUAS). Piscataway: IEEE Press; 2018. p. 1217–26.

Crossref

Zhu F, Ren Y, Kong F, et al. Decentralized lidar-inertial swarm odometry. arXiv preprint:2209.06628; 2022.

Crossref

Zhang WZ, Zhang W. An efficient UAV localization technique based on particle swarm optimization. IEEE Trans Veh Technol 2022;71(9):9544–57.

Crossref Google Scholar

Gageik N, Benz P, Montenegro S. Obstacle detection and collision avoidance for a UAV with complementary low-cost sensors. IEEE Access 2015;3:599–609.

Crossref Google Scholar

Yu Z, Zhang Y, Jiang B, et al. A review on fault-tolerant cooperative control of multiple unmanned aerial vehicles. Chin J Aeronaut 2022;35(1):1–18.

Crossref Google Scholar

Wang YD, Sun J, He HB, et al. Deterministic policy gradient with integral compensator for robust quadrotor control. IEEE Trans Syst Man Cybern 2020;50(10):3713–25.

Crossref Google Scholar

Hou YN, Liu LF, Wei Q, et al. A novel DDPG method with prioritized experience replay. 2017 IEEE international conference on systems, man, and cybernetics (SMC). Piscataway: IEEE Press; 2017. p. 316–21.

Crossref

Rastogi D. Deep reinforcement learning for bipedal robots [dissertation]. Delft: Delft University of Technology; 2017.

Hu JW, Wang LH, Hu TM, et al. Autonomous maneuver decision making of dual-UAV cooperative air combat based on deep reinforcement learning. Electronics 2022;11(3):467.

Crossref Google Scholar

Zhang JD, Yang QM, Shi GQ, et al. UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning. J Syst Eng Electron 2021;32(6):1421–38.

Crossref Google Scholar

Lin Y, Wang MY, Zhou XL, et al. Dynamic spectrum interaction of UAV flight formation communication with priority: A deep reinforcement learning approach. IEEE Trans Cogn Commun Netw 2020;6(3):892–903.

Crossref Google Scholar

Zhang YZ, Wu ZR, Ma YH, et al. Research on autonomous formation of Multi-UAV based on MADDPG algorithm. 2022 IEEE 17th international conference on control & automation (ICCA). Piscataway: IEEE Press; 2022. p. 249–54.

Crossref

Abadi M, Barham P, Chen J, et al. Tensorflow: a system for large-scale machine learning. 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). Berkeley: USENIX Association; 2016. p. 265–83.

Pham H, La H, Feil-Seifer D, et al. Autonomous UAV navigation using reinforcement learning. arXiv preprint: 1801.05086, 2018.

Pham H, La H, Feil-Seifer D, et al. Cooperative and distributed reinforcement learning of drones for field coverage. arXiv preprint:1803.07250, 2018.

Guo YH, Chen G, Zhao T. Learning-based collision-free coordination for a team of uncertain quadrotor UAVs. Aerosp Sci Technol 2021;119:107127.

Crossref Google Scholar

Hu JY, Niu HL, Carrasco J, et al. Fault-tolerant cooperative navigation of networked UAV swarms for forest fire monitoring. Aerosp Sci Technol 2022;123:107494.

Crossref Google Scholar

Hu ZJ, Gao XG, Wan KF, et al. Relevant experience learning: a deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments. Chin J Aeronaut 2021;34(12):187–204.

Crossref Google Scholar

Zhou WH, Li J, Liu ZH, et al. Improving multi-target cooperative tracking guidance for UAV swarms using multi-agent reinforcement learning. Chin J Aeronaut 2022;35(7):100–12.

Crossref Google Scholar

Chinese Journal of Aeronautics

Volume 36 Issue 11,
November 2023

Pages 281-296

DOI: 10.1016/j.cja.2023.07.006

Cite this article:

MA B, LIU Z, JIANG F, et al. Reinforcement learning based UAV formation control in GPS-denied environment. Chinese Journal of Aeronautics, 2023, 36(11): 281-296. https://doi.org/10.1016/j.cja.2023.07.006

Views

Crossref

Web of Science

Scopus

Google Scholar
Citation

Altmetrics

Received: 16 October 2022

Revised: 06 November 2022

Accepted: 03 January 2023

Published: 13 July 2023

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).