Department of Automation, Tsinghua University, Beijing100084, China
Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing100084, China
Show Author Information
Hide Author Information
Abstract
Navigation is a fundamental problem of mobile robots, for which Deep Reinforcement Learning (DRL) has received significant attention because of its strong representation and experience learning abilities. There is a growing trend of applying DRL to mobile robot navigation. In this paper, we review DRL methods and DRL-based navigation frameworks. Then we systematically compare and analyze the relationship and differences between four typical application scenarios: local obstacle avoidance, indoor navigation, multi-robot navigation, and social navigation. Next, we describe the development of DRL-based navigation. Last, we discuss the challenges and some possible solutions regarding DRL-based navigation.
J.Engel, T.Schöps, and D.Cremers, LSD-SLAM: Large-scale direct monocular SLAM, inComputer Vision – ECCV 2014, D.Fleet, T.Pajdla, B.Schiele, and T.Tuytelaars, eds. Zurich, Switzerland: Springer International Publishing, 2014, pp. 834–849.
[3]
R.Mur-Artal, J. M. M.Montiel, and J. D.Tardós, ORB-SLAM: A versatile and accurate monocular SLAM system, IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147–1163, 2015.
G.Grisetti, C.Stachniss, and W.Burgard, Improved techniques for grid mapping with rao-blackwellized particle filters, IEEE Transactions on Robotics, vol. 23, no. 1, pp. 34–46, 2007.
S.Kohlbrecher, O.von Stryk, J.Meyer, and U.Klingauf, A flexible and scalable SLAM system with full 3D motion estimation, presented at2011 IEEE Int. Symp. Safety, Security, and Rescue Robotics, Kyoto, Japan, 2011, pp. 155–160.
[6]
M.Elbanhawi and M.Simic, Sampling-based robot motion planning: A review, IEEE Access, vol. 2, pp. 56–77, 2014.
V.Mnih, K.Kavukcuoglu, D.Silver, A.Graves, I.Antonoglou, D.Wierstra, and M.Riedmiller, Playing atari with deep reinforcement learning, arXiv preprint arXiv:1312.5602, 2013.
A.Pandey, S.Pandey, and D. R.Parhi, Mobile robot navigation and obstacle avoidance techniques: A review, International Robotics & Automation Journal, vol. 2, no. 3, pp. 96–105, 2017.
F.Kamil, S. H.Tang, W.Khaksar, NZulkifli, and S. A.Ahmad, A review on motion planning and obstacle avoidance approaches in dynamic environments, Advances in Robotics & Automation, vol. 4, no. 2, p. 1000134, 2015.
T. T.Nguyen, N. D.Nguyen, and S.Nahavandi, Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications, IEEE Transactions on Cybernetics, vol. 50, no. 9, pp. 3826–3839, 2020.
F. Y.Zeng, C.Wang, and S. S.Ge, A survey on visual navigation for artificial agents with deep reinforcement learning, IEEE Access, vol. 8, pp. 135 426–135 442, 2020.
C. J. C. H.Watkins, Learning from delayed rewards, PhD dissertation, University of Cambridge, Cambridge, England, 1989.
[15]
V.Mnih, K.Kavukcuoglu, D.Silver, A. A.Rusu, J.Veness, M. G.Bellemare, A.Graves, M.Riedmiller, A. K.Fidjeland, G.Ostrovski, et al., Human-level control through deep reinforcement learning, Nature, vol. 518, no. 7540, pp. 529–533, 2015.
H.Van Hasselt, A.Guez, and D.Silver, Deep reinforcement learning with double Q-learning, inProc. 30th AAAI Conf. Artificial Intelligence, Phoenix, AZ, USA, 2016, pp. 2094–2100.
[17]
T. P.Lillicrap, J. J.Hunt, A.Pritzel, N.Heess, T.Erez, Y.Tassa, D.Silver, and D.Wierstra, Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971, 2015.
V.Mnih, A. P.Badia, M.Mirza, A.Graves, T. P.Lillicrap, T.Harley, D.Silver, and K.Kavukcuoglu, Asynchronous methods for deep reinforcement learning, arXiv preprint arXiv:1602.01783, 2016.
Q.Shi, S.Zhao, X. W.Cui, M. Q.Lu, and M. D.Jia, Anchor self-localization algorithm based on UWB ranging and inertial measurements, Tsinghua Science and Technology, vol. 24, no. 6, pp. 728–737, 2019.
A.Faust, K.Oslund, O.Ramirez, A.Francis, L.Tapia, M.Fiser, and J.Davidson, PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning, inProc. 2018 IEEE Int. Conf. Robotics and Automation, Brisbane, Australia, 2018, pp. 5113–5120.
[22]
M.Duguleana and G.Mogan, Neural networks based reinforcement learning for mobile robots obstacle avoidance, Expert Systems with Applications, vol. 62, pp. 104–115, 2016.
S. M.Feng, H. L.Ren, X. R.Wang, and P.Ben-Tzvi, andAsme, Mobile robot obstacle avoidance based on deep reinforcement learning, inProc. ASME 2019 Int. Design Engineering Technical Conferences and Computers and Information in Engineering Conf., Anaheim, CA, USA, 2019.
[24]
Y.Kato, K.Kamiyama, and K.Morioka, Autonomous robot navigation system with learning based on deep Q-network and topological maps, inProc. 2017 IEEE/SICE Int. Symp. System Integration, Taipei, China, 2017, pp. 1040–1046.
[25]
Y.Kato and K.Morioka, Autonomous robot navigation system without grid maps based on double deep Q-network and RTK-GNSS localization in outdoor environments, inProc. 2019 IEEE/SICE Int. Symp. System Integration, Paris, France, 2019, pp. 346–351.
[26]
C.Wang, J.Wang, X. D.Zhang, and X.Zhang, Autonomous navigation of UAV in large-scale unknown complex environment with deep reinforcement learning, inProc. 2017 IEEE Global Conf. Signal and Information Processing, Montreal, Canada, 2017, pp. 858–862.
[27]
C.Wang, J.Wang, Y.Shen, and X. D.Zhang, Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach, IEEE Trans. Veh. Technol., vol. 68, no. 3, pp. 2124–2136, 2019.
Z. W.Ma, C.Wang, Y. F.Niu, X. K.Wang, and L. C.Shen, A saliency-based reinforcement learning approach for a UAV to avoid flying obstacles, Robotics and Autonomous Systems, vol. 100, pp. 108–118, 2018.
X.Wu, H. L.Chen, C. G.Chen, M. Y.Zhong, S. R.Xie, Y. K.Guo, and H.Fujita, The autonomous navigation and obstacle avoidance for USVs with ANOA deep reinforcement learning method, Knowl.-Based Syst., vol. 196, p. 105201, 2020.
X. Y.Zhang, C. B.Wang, Y. C.Liu, and X.Chen, Decision-making for the autonomous navigation of maritime autonomous surface ships based on scene division and deep reinforcement learning, Sensors, vol. 19, no. 18, p. 4055, 2019.
H. B.Shi, L.Shi, M.Xu, and K. S.Hwang, End-to-end navigation strategy with deep reinforcement learning for mobile robots, IEEE Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2393–2402, 2020.
L.Tai, G.Paolo, and M.Liu, Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation, inProc. 2017 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vancouver, Canada, 2017, pp. 31–36.
[35]
K.Yokoyama and K.Morioka, Autonomous mobile robot with simple navigation system based on deep reinforcement learning and a monocular camera, inProc. 2020 IEEE/SICE Int. Symp. System Integration, Honolulu, HI, USA, 2020, pp. 525–530.
[36]
J. W.Zhang, J. T.Springenberg, J.Boedecker, and W.Burgard, Deep reinforcement learning with successor features for navigation across similar environments, inProc. 2017 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vancouver, Canada, 2017, pp. 2371–2378.
[37]
X. Y.Lei, Z.Zhang, and P. F.Dong, Dynamic path planning of unknown environment based on deep reinforcement learning, Journal of Robotics, vol. 2018, p. 5781591, 2018.
C.Sampedro, H.Bavle, A.Rodriguez-Ramos, P.de la Puente, and P.Campoy, Laser-based reactive navigation for multirotor aerial robots using deep reinforcement learning, inProc. 2018 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Madrid, Spain, 2018, pp. 1024–1031.
[40]
C.Wang, J.Wang, J. J.Wang, and X. D.Zhang, Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards, IEEE Internet of Things Journal, vol. 7, no. 7, pp. 6180–6190, 2020.
F.Aznar, M.Pujol, and R.Rizo, Obtaining fault tolerance avoidance behavior using deep reinforcement learning, Neurocomputing, vol. 345, pp. 77–91, 2019.
J.Choi, K.Park, M.Kim, and S.Seok, Deep reinforcement learning of navigation in a complex and crowded environment with a limited field of view, inProc. 2019 Int. Conf. Robotics and Automation, Montreal, Canada, 2019, pp. 5993–6000.
[43]
F.Leiva and J.Ruiz-del-Solar, Robust RL-based map-less local planning: Using 2D point clouds as observations, IEEE Robotics and Automation Letters, vol. 5, no. 4, pp. 5787–5794, 2020.
Y.Zhu, R.Mottaghi, E.Kolve, J. J.Lim, A.Gupta, L.Fei-Fei, and A.Farhadi, Target-driven visual navigation in indoor scenes using deep reinforcement learning, inProc. 2017 IEEE Int. Conf. Robotics and Automation, Singapore, 2017, pp. 3357–3364.
[46]
J.Oh, V.Chockalingam, S.Singh, and H.Lee, Control of memory, active perception, and action in minecraft, arXiv preprint arXiv:1605.09128, 2016.
G.Brunner, O.Richter, Y. Y.Wang, and R.Wattenhofer, Teaching a machine to read maps with deep reinforcement learning, arXiv preprint arXiv:1711.07479, 2017.
Y.Wu, Y. X.Wu, G.Gkioxari, and Y. D.Tian, Building generalizable agents with a realistic and rich 3D environment, arXiv preprint arXiv:1801.02209, 2018.
S. R.Song, F.Yu, A.Zeng, A. X.Chang, M.Savva, and T.Funkhouser, Semantic scene completion from a single depth image, arXiv preprint arXiv:1611.08974, 2016.
F. Y.Zeng and C.Wang, Visual navigation with asynchronous proximal policy optimization in artificial agents, Journal of Robotics, vol. 2020, p. 8702962, 2020.
A.Devo, G.Mezzetti, G.Costante, M. L.Fravolini, and P.Valigi, Towards generalization in target-driven visual navigation by using deep reinforcement learning, IEEE Transactions on Robotics, vol. 36, no. 5, pp. 1546–1561, 2020.
A.Devo, G.Costante, and P.Valigi, Deep reinforcement learning for instruction following visual navigation in 3D maze-like environments, IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1175–1182, 2020.
S. H.Hsu, S. H.Chan, P. T.Wu, K.Xiao, and L. C.Fu, Distributed deep reinforcement learning based indoor visual navigation, inProc. 2018 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Madrid, Spain, 2018, pp. 2532–2537.
[55]
A.Staroverov, D. A.Yudin, I.Belkin, V.Adeshkin, Y. K.Solomentsev, and A. I.Panov, Real-time object navigation with deep neural networks and hierarchical reinforcement learning, IEEE Access, vol. 8, pp. 195 608–195 621, 2020.
Y.Lu, Y. R.Chen, D. B.Zhao, and D.Li, MGRL: Graph neural network based inference in a Markov network with reinforcement learning for visual navigation, Neurocomputing, vol. 421, pp. 140–150, 2021.
Z.Fan, G. S.Pereira, and V.Kumar, Cooperative localization and tracking in distributed robot-sensor networks, Tsinghua Science and Technology, vol. 10, no. 1, pp. 91–101, 2005.
Y. F.Chen, M.Liu, M.Everett, and J. P.How, Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning, arXiv preprint arXiv:1609.07845, 2016.
W. H.Ding, S. J.Li, H. H.Qian, and Y. Q.Chen, Hierarchical reinforcement learning framework towards multi-agent navigation, inProc. 2018 IEEE Int. Conf. Robotics and Biomimetics, Kuala Lumpur, Malaysia, 2018, pp. 237–242.
[60]
P. X.Long, T. X.Fan, X. Y.Liao, W. X.Liu, H.Zhang, and J.Pan, Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning, arXiv preprint arXiv:1709.10082, 2018.
T. X.Fan, P. X.Long, W. X.Liu, and J.Pan, Fully distributed multi-robot collision avoidance via deep reinforcement learning for safe and efficient navigation in complex scenarios, arXiv preprint arXiv:1808.03841, 2018.
T. X.Fan, P. X.Long, W. X.Liu, and J.Pan, Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios, Int. J. Robot. Res., vol. 39, no. 7, pp. 856–892, 2020.
W. Z.Chen, S. Z.Zhou, Z. S.Pan, H. X.Zheng, and Y.Liu, Mapless collaborative navigation for a multi-robot system based on the deep reinforcement learning, Applied Sciences, vol. 9, no. 20, p. 4198, 2019.
J. T.Lin, X. Y.Yang, P. W.Zheng, and H.Cheng, End-to-end decentralized multi-robot navigation in unknown complex environments via deep reinforcement learning, inProc. 2019 IEEE Int. Conf. Mechatronics and Automation, Tianjin, China, 2019, pp. 2493–2500.
[65]
G.Sartoretti, J.Kerr, Y. F.Shi, G.Wagner, T. K. S.Kumar, S.Koenig, and H.Choset, PRIMAL: Pathfinding via reinforcement and imitation multi-agent learning, IEEE Robotics and Automation Letters, vol. 4, no. 3, pp. 2378–2385, 2019.
J. C.Ma, H. M.Lu, J. H.Xiao, Z. W.Zeng, and Z. Q.Zheng, Multi-robot target encirclement control with collision avoidance via deep reinforcement learning, Journal of Intelligent & Robotic Systems, vol. 99, no. 2, pp. 371–386, 2020.
Y. F.Chen, M.Everett, M.Liu, and J. P.How, Socially aware motion planning with deep reinforcement learning, inProc. 2017 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vancouver, Canada, 2017, pp. 1343–1350.
[68]
L.Chen, N.Ma, P.Wang, J. H.Li, P. F.Wang, G. L.Pang, and X. J.Shi, Survey of pedestrian action recognition techniques for autonomous driving, Tsinghua Science and Technology, vol. 25, no. 4, pp. 458–470, 2020.
M.Everett, Y. F.Chen, and J. P.How, Motion planning among dynamic, decision-making agents with deep reinforcement learning, inProc. 2018 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Madrid, Spain, 2018, pp. 3052–3059.
[70]
P. H.Ciou, Y. T.Hsiao, Z. Z.Wu, S. H.Tseng, and L. C.Fu, Composite reinforcement learning for social robot navigation, inProc. 2018 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Madrid, Spain, 2018, pp. 2553–2558.
[71]
L. B.Sun, J. F.Zhai, and W. H.Qin, Crowd navigation in an unknown and dynamic environment based on deep reinforcement learning, IEEE Access, vol. 7, pp. 109 544–109 554, 2019.
Y.Sasaki, S.Matsuo, A.Kanezaki, and H.Takemura, A3C based motion learning for an autonomous mobile robot in crowds, inProc. 2019 IEEE Int. Conf. Systems, Man and Cybernetics, Bari, Italy, 2019, pp. 1036–1042.
[73]
A. J.Sathyamoorthy, U.Patel, T.Guan, and D.Manocha, Frozone: Freezing-free, pedestrian-friendly navigation in human crowds, IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4352–4359, 2020.
Y. Y.Chen, C. C.Liu, B. E.Shi, and M.Liu, Robot navigation in crowds by graph convolutional networks with attention learned from human gaze, IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2754–2761, 2020.
H. T. L.Chiang, A.Faust, M.Fiser, and A.Francis, Learning navigation behaviors end-to-end with AutoRL, IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 2007–2014, 2019.
G. D.Chen, S. Y.Yao, J.Ma, L. F.Pan, Y. A.Chen, P.Xu, J. M.Ji, and X. P.Chen, Distributed non-communicating multi-robot collision avoidance via map-based deep reinforcement learning, Sensors, vol. 20, no. 17, p. 4836, 2020.
V. J.Hodge, R.Hawkins, and R.Alexander, Deep reinforcement learning for drone navigation using sensor data, Neural Computing and Applications, vol. 33, no. 6, pp. 2015–2033, 2021.
Y. D.Wang, H. B.He, and C. Y.Sun, Learning to navigate through complex dynamic environment with modular deep reinforcement learning, IEEE Transactions on Games, vol. 10, no. 4, pp. 400–412, 2018.
J. J.Zeng, R. S.Ju, L.Qin, Y.Hu, Q. J.Yin, and C.Hu, Navigation in unknown dynamic environments based on deep reinforcement learning, Sensors, vol. 19, no. 18, p. 3837, 2019.
K.Lobos-Tsunekawa, F.Leiva, and J.Ruiz-del-Solar, Visual navigation for biped humanoid robots using deep reinforcement learning, IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3247–3254, 2018.
Q. C.Zhang, M. Q.Zhu, L.Zou, M.Li, and Y.Zhang, Learning reward function with matching network for mapless navigation, Sensors, vol. 20, no. 13, p. 3664, 2020.
A.Hussein, E.Elyan, M. M.Gaber, and C.Jayne, Deep imitation learning for 3D navigation tasks, Neural Computing and Applications, vol. 29, no. 7, pp. 389–404, 2018.
M.Jaderberg, V.Mnih, W. M.Czarnecki, T.Schaul, J. Z.Leibo, D.Silver, and K.Kavukcuoglu, Reinforcement learning with unsupervised auxiliary tasks, arXiv preprint arXiv:1611.05397, 2016.
P.Mirowski, M. K.Grimes, M.Malinowski, K. M.Hermann, K.Anderson, D.Teplyashin, K.Simonyan, K.Kavukcuoglu, A.Zisserman, and R.Hadsell, Learning to navigate in cities without a map, arXiv preprint arXiv:1804.00168, 2019.
D. W.Wang, T. X.Fan, T.Han, and J.Pan, A two-stage reinforcement learning approach for multi-UAV collision avoidance under imperfect sensing, IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3098–3105, 2020.
Zhu K, Zhang T. Deep Reinforcement Learning Based Mobile Robot Navigation: A Review. Tsinghua Science and Technology, 2021, 26(5): 674-691. https://doi.org/10.26599/TST.2021.9010012
10.26599/TST.2021.9010012.T001Simple comparison of different DRL-based navigation scenarios.
Navigation scenario
Static obstacle
Dynamic obstacle
Structure continuous obstacle
Obstacle scale
Obstacle velocity
Cooperation
Randomness
Local obstacle avoidance
Y
Y
N
Low
Low
Indoor navigation
Y
N
Y
Low
Multi-robot navigation
Y
Y
N
High
High
Y
Low
Social navigation
Y
Y
N
High
High
N
High
10.26599/TST.2021.9010012.T002Simple comparison of relevant references.
Application scenario
Reference
Algorithm
Perception type
Action space
Reward setting
Simulation
Real system
Year
Local obstacle avoidance
[32]
Uncertainty-aware RL
C
Con, 2
Y
Y
2017
[24]
DDQN
A
Dis, 3
Dense
Y
Y
2017
[34]
ADDPG (Asynchronous DDPG)
A
Con, 2
Dense
Y
Y
2017
[26]
Fast-RDPG (Fast Recurrent DPG)
B
Con, 1
Dense
Y
N
2017
[36]
SF-RL (Successor Feature RL)
D
Dis, 4
Dense
Y
Y
2017
[37]
DDQN
A
Dis, 8
Dense
Y
Y
2018
[28]
Salient region detection + AC
C
Con, 2
–
Y
N
2018
[38]
IL (Imitation Learning) + CPO
A
Con, 2
Dense
Y
Y
2018
[39]
DDPG
A
Con, 2
Dense
Y
Y
2018
[42]
SAC (Soft AC)
D
Con, 2
Dense
Y
Y
2019
[25]
DDQN
A
Dis, 3
Dense
Y
Y
2019
[27]
Fast-RDPG (Fast Recurrent DPG)
B
Con, 1
Dense
Y
N
2019
[43]
DDPG
A
Con, 2
Dense
Y
N
2020
[33]
ICM A3C (Intrinstic Curiosity)
A/C
Dis,
Dense
Y
Y
2020
[40]
Improved A3C
A
Con, 2
Sparse
Y
N
2020
[35]
DDQN
C
Dis, 3
Dense
Y
Y
2020
Indoor navigation
[44]
Nav A3C (A3C + LSTM)
C
Dis, 8
Sparse
Y
N
2017
[45]
AI2-THOR (Deep siamese AC network)
C
Dis, 4
Dense
Y
Y
2017
[54]
LSTM + DRL
C
Dis, 3
Dense
Y
Y
2018
[51]
A2CAT-VN
C
Dis, 8
Dense
Y
N
2019
[52]
IMPALA
C
Dis, 3
Sparse
Y
Y
2020
[55]
HISNav framework
D
–
–
Y
Y
2020
[50]
AppoNav (LSTM+PPO)
C
Dis, 8
Sparse
Y
N
2020
Multi-robot navigation
[58]
CADRL (Coll. Avoidance with Deep RL)
E
Con, 2
Dense
Y
N
2016
[60]
Parallel PPO
A
Con, 2
Dense
Y
N
2018
[61]
Hybrid-RL (Parallel PPO)
A
Con, 2
Dense
Y
Y
2018
[63]
PDDPG (Parallel DDPG )
A
Con, 2
Dense
Y
N
2019
[64]
PPO
A
Con, 2
Dense
Y
Y
2019
[65]
PRIMAL (IL + A3C + LSTM)
E
Dis, 5
Dense
Y
Y
2019
Social navigation
[67]
SA-CADRL
E
Con, 2
Dense
Y
Y
2017
[69]
GA3C-CADRL
E
Dis, 11
Dense
Y
Y
2018
[72]
A3C
–
Dis, –
Dense
Y
Y
2019
[71]
PPO + LSTM + collision prediction
E
Con, 2
Dense
Y
N
2019
[73]
Frozone + DRL
E
Con, 2
Dense
Y
Y
2020
Note: Perception type: “A” denotes the laser range finder, “B” denotes ultrasonic sonar or other range finders, “C” denotes the monocular camera, “D” denotes the depth camera, and “E” denotes agent-level data provided by the system. Action space: “Con” denotes continuous action, “Dis” denotes discrete action, and the number denotes dimension.