| Sign up

PDF (2.9 MB)

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Research Article | Open Access

Development of deep-learning-based autonomous agents for low-speed maneuvering in Unity

Riccardo Berta, Luca Lazzaroni(), Alessio Capello, Marianna Cossu, Luca Forneris, Alessandro Pighetti, Francesco Bellotti

Electrical, Electronics and Telecommunication Engineering and Naval Architecture Department (DITEN), University of Genoa, Genoa 16145, Italy

Show Author Information

Abstract

This study provides a systematic analysis of the resource-consuming training of deep reinforcement-learning (DRL) agents for simulated low-speed automated driving (AD). In Unity, this study established two case studies: garage parking and navigating an obstacle-dense area. Our analysis involves training a path-planning agent with real-time-only sensor information. This study addresses research questions insufficiently covered in the literature, exploring curriculum learning (CL), agent generalization (knowledge transfer), computation distribution (CPU vs. GPU), and mapless navigation. CL proved necessary for the garage scenario and beneficial for obstacle avoidance. It involved adjustments at different stages, including terminal conditions, environment complexity, and reward function hyperparameters, guided by their evolution in multiple training attempts. Fine-tuning the simulation tick and decision period parameters was crucial for effective training. The abstraction of high-level concepts (e.g., obstacle avoidance) necessitates training the agent in sufficiently complex environments in terms of the number of obstacles. While blogs and forums discuss training machine learning models in Unity, a lack of scientific articles on DRL agents for AD persists. However, since agent development requires considerable training time and difficult procedures, there is a growing need to support such research through scientific means. In addition to our findings, we contribute to the R&D community by providing our environment with open sources.

Keywords

automated driving autonomous agents deep reinforcement learning curriculum learning modeling and simulation

References

Almón-Manzano, L., Pastor-Vargas, R., Troncoso, J. M. C., 2022. Deep reinforcement learning in Agents’ training: Unity ML-agents. In: International Work-Conference on the Interplay Between Natural and Artificial Computation, 391–400.

Lopez, P. A., Wiessner, E., Behrisch, M., Bieker-Walz, L., Erdmann, J., Flotterod, Y. P., et al., 2018. Microscopic Traffic Simulation using SUMO. In: 2018 21st International Conference on Intelligent Transportation Systems (ITSC), 2575–2582.

Anzalone, L., Barra, P., Barra, S., Castiglione, A., Nappi, M., 2022. An end-to-end curriculum learning approach for autonomous driving scenarios. IEEE Trans Intell Transport Syst, 23, 19817−19826.

Crossref Google Scholar

Bae, J., Kim, T., Lee, W., Shim, I., 2021. Curriculum learning for vehicle lateral stability estimations. IEEE Access, 9, 89249−89262.

Crossref Google Scholar

Bellotti, F., Lazzaroni, L., Capello, A., Cossu, M., De Gloria, A., Berta, R., 2023. Explaining a deep reinforcement learning (DRL)-based automated driving agent in highway simulations. IEEE Access, 11, 28522−28550.

Crossref Google Scholar

Bengio, Y., 2014. Evolving culture versus local minima. In: Growing Adaptive Machines: Combining Development and Learning in Artificial Neural Networks, 109–138.

Bengio, Y., Louradour, J., Collobert, R., Weston, J., 2009. Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, 41–48.

Boroujeni, Z., Goehring, D., Ulbrich, F., Neumann, D., Rojas, R., 2017. Flexible unit A-star trajectory planning for autonomous vehicles on structured road maps. In: 2017 IEEE International Conference on Vehicular Electronics and Safety (ICVES), 7–12.

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., et al., 2016. OpenAI Gym. https://arxiv.org/abs/1606.01540

Buckley, D., 2023. Unity ML-Agents Tutorials. https://gamedevacademy.org/unity-machine-learning-agents-tutorials

CARLA Map Editor, 2023. https://github.com/carla-simulator/carla-map-editor

CARLA Scenario Runner, 2023. https://github.com/carla-simulator/scenario_runner

Chiang, H. T L., Hsu, J., Fiser, M., Tapia, L., Faust, A., 2019. RL-RRT: Kinodynamic motion planning via learning reachability estimators from RL policies. IEEE Robot Autom Lett, 4, 4298−4305.

Crossref Google Scholar

Chu, Z., Wang, F., Lei, T., Luo, C., 2023. Path planning based on deep reinforcement learning for autonomous underwater vehicles under ocean current disturbance. IEEE Trans Intell Veh, 8, 108−120.

Crossref Google Scholar

Dai, C., Zong, C., Zhang, D., Li, G., Chuyo, K., Zheng, H., et al., 2023. Human-like lane-changing trajectory planning algorithm for human-machine conflict mitigation. J Intell Connect Veh, 6, 46−63.

Crossref Google Scholar

Dev, Y., 2023. Chevrolet Corvette 1980 Different colours. https://sketchfab.com/3d-models/chevrolet-corvette-1980-different-colours-7e428bdb3ab54b4e9ac610e545fd9d03

Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., et al., 2017. OpenAI Baselines. https://github.com/openai/baselines

Ding, H., Li, W., Xu, N., Zhang, J., 2022. An enhanced eco-driving strategy based on reinforcement learning for connected electric vehicles: Cooperative velocity and lane-changing control. J Intell Connect Veh, 5, 316−332.

Crossref Google Scholar

Dolgov, D., Thrun, S., Montemerlo, M., Diebel, J., 2008. Practical search techniques in path planning for autonomous driving. AAAI Work Tech Rep, WS–08–10, 32–37.

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V., 2017. CARLA: An open urban driving simulator. In: Conference on robot learning, 1–16.

Elios Lab., 2023. Pathfollowing. https://github.com/Elios-Lab/pathfollowing.git

Faust, A., Oslund, K., Ramirez, O., Francis, A., Tapia, L., Fiser, M., et al., 2018. PRM-RL: Long-range robotic navigation tasks by combining reinforcement learning and sampling-based planning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), 5113–5120.

Fei, Y., Shi, P., Li, Y., Liu, Y., Qu, X., 2024. Formation control of multi-agent systems with actuator saturation via neural-based sliding mode estimators. Knowl Based Syst, 284, 111292.

Crossref Google Scholar

Florensa, C., Held, D., Wulfmeier, M., Zhang, M., Abbeel, P., 2017. Reverse curriculum generation for reinforcement learning. In: Proceedings of the 1st Annual Conference on Robot Learning, 482–495.

Forneris, L., Pighetti, A., Lazzaroni, L., Bellotti, F., Capello, A., Cossu, M., et al., 2023. Implementing deep reinforcement learning (DRL)-based driving styles for non-player vehicles. Int J Serious Games, 10, 153−170.

Crossref Google Scholar

Gan, N., Zhang, M., Zhou, B., Chai, T., Wu, X., Bian, Y., 2022. Spatio-temporal heuristic method: A trajectory planning for automatic parking considering obstacle behavior. J Intell Connect Veh, 5, 177−187.

Crossref Google Scholar

Gao, J., Ye, W., Guo, J., Li, Z., 2020. Deep reinforcement learning for indoor mobile robot path planning. Sensors, 20, 5493.

Crossref Google Scholar

González Bautista, D., Pérez, J., Milanés, V., Nashashibi, F., 2015. A review of motion planning techniques for automated vehicles. IEEE Trans Intell Transp Syst, 17, 1135−1145.

Crossref Google Scholar

Goodfellow, I. J., Mirza, M., Xiao, D., Courville, A. Bengio, Y., 2015. An empirical investigation of catastrophic forgetting in gradient-based neural networks. https://doi.org/10.48550/arXiv.1312.6211

Grabusts, P., Musatovs, J., Golenkov, V., 2019. The application of simulated annealing method for optimal route detection between objects. Procedia Comput Sci, 149, 95−101.

Crossref Google Scholar

Gu, Y., Cheng, Y., Philip Chen, C. L., Wang, X., 2022. Proximal policy optimization with policy feedback. IEEE Trans Syst Man Cybern Syst, 52, 4600−4610.

Crossref Google Scholar

Hao, K., Zhao, J., Yu, K., Li, C., Wang, C., 2020. Path planning of mobile robots based on a multi-population migration genetic algorithm. Sensors, 20, 5873.

Crossref Google Scholar

He, Y., Liu, Y., Yang, L., Qu, X., 2023. Deep adaptive control: Deep reinforcement learning-based adaptive vehicle trajectory control algorithms for different risk levels. IEEE Trans Intell Veh, 9, 1654−1666.

Crossref Google Scholar

Juliani, A., Berges, V.P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., et al., 2020. Unity: A general platform for intelligent agents. https://doi.org/10.48550/arXiv.1809.02627

Kavraki, L. E., Svestka, P., Latombe, J. C., Overmars, M. H., 1996. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans Robot Automat, 12, 566−580.

Crossref Google Scholar

Kesting, A., Treiber, M., Helbing, D., 2007. General lane-changing model MOBIL for car-following models. Transp Res Rec, 1999, 86−94.

Crossref Google Scholar

Khatib, O., 1990. Real-time obstacle avoidance for manipulators and mobile robots. In: Cox IJ, Wilfong GT, Autonomous Robot Vehicles, 396–404.

Kobayashi, M., Motoi, N., 2022. Local path planning: Dynamic window approach with virtual manipulators considering dynamic obstacles. IEEE Access, 10, 17018−17029.

Crossref Google Scholar

Kwon, T., Di Palo, N. and Johns, E., 2023. Language models as zero-shot trajectory generators. https://doi.org/10.48550/arXiv.2310.11604

LaValle, S. M., Kuffner, Jr., J. J., 2001. Randomized kinodynamic planning. Int J Robot Res, 20, 378−400.

Crossref Google Scholar

Lazzaroni, L., Bellotti, F., Capello, A., Cossu, M., De Gloria, A., Berta, R., 2022. Deep reinforcement learning for automated car parking. In: International Conference on Applications in Electronics Pervading Industry, 125−130.

Lazzaroni, L., Pighetti, A., Bellotti, F., Capello, A., Cossu, M., Berta, R., 2024. Automated parking in CARLA: A deep reinforcement learning-based approach. In: International Conference on Applications in Electronics Pervading Industry, 352–357.

Lei, X., Zhang, Z., Dong, P., 2018. Dynamic path planning of unknown environment based on deep reinforcement learning. J Robot, 2018, 5781591.

Crossref Google Scholar

Leurent, E., 2018. An environment for autonomous driving decision-making. https://github.com/Farama-Foundation/HighwayEnv

Ling, Y., Yang, N., Yu, H., Zhu, Y., 2021. Novel Bayesian network incremental learning method based on particle swarm optimization algorithm. In: International Conference on Intelligent and Interactive Systems and Applications, 941–947.

Liu, L. S., Lin, J. F., Yao, J. X., He, D. W., Zheng, J. S., Huang, J., et al., 2021. Path planning for smart car based on dijkstra algorithm and dynamic window approach. Wirel Commun Mob Comput, 2021, 1−12.

Crossref Google Scholar

Liu, Y., Wu, F., Liu, Z., Wang, K., Wang, F., Qu, X., 2023. Can language models be used for real-world urban-delivery route optimization? Innovation, 4, 100520.

Luo, M., Hou, X., Yang, J., 2020. Surface optimal path planning using an extended dijkstra algorithm. IEEE Access, 8, 147827−147838.

Crossref Google Scholar

Majumder, A., 2021. Deep Reinforcement Learning in Unity: With Unity ML Toolkit. Berkeley, CA: Apress.

Martin, E. Zhai, Y., 2019. Top 5 ways real-time 3D is revolutionizing the automotive product lifecycle. https://unity.com/resources/whitepaper-top-5-use-cases-for-rt3d

Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T.P., Harley, T., Silver, D., et al., 2016. Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, 1928–1937.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al., 2015. Human-level control through deep reinforcement learning. Nature, 518, 529−533.

Crossref Google Scholar

Nobahari, H., Nasrollahi, S., 2019. A terminal guidance algorithm based on ant colony optimization. Comput Electr Eng, 77, 128−146.

Crossref Google Scholar

Nordeus, E., 2022. Self driving vehicle. https://github.om/Habrador/Self-driving-vehicle

NVIDIA-Omniverse., 2023. NVIDIA PhysX. https://github.com/NVIDIA-Omniverse/PhysX

Peng, T., Liu, X., Fang, R., Zhang, R., Pang, Y., Wang, T., et al., 2020. Lane-change path planning and control method for self-driving articulated trucks. J Intell Connect Veh, 3, 49−66.

Crossref Google Scholar

Peng, Y., Liu, Y., Zhang, H., 2021. Deep reinforcement learning based path planning for UAV-assisted edge computing networks. In: 2021 IEEE Wireless Communications and Networking Conference (WCNC), 1–6.

Petereit, J., Emter, T., Frey, C.W., Kopfstedt, T., Beutel, A., 2012. Application of hybrid A* to an autonomous mobile robot for path planning in unstructured outdoor environments. In: ROBOTIK 2012; 7th German Conference on Robotics, 1–6.

Pohan, M. A. R., Trilaksono, B. R., Santosa, S. P., Rohman, A. S., 2021. Path planning algorithm using the hybridization of the rapidly-exploring random tree and ant colony systems. IEEE Access, 9, 153599−153615.

Crossref Google Scholar

Polack, P., Altche, F., d’Andrea-Novel, B., de La Fortelle, A., 2017. The kinematic bicycle model: A consistent model for planning feasible trajectories for autonomous vehicles? In: 2017 IEEE Intelligent Vehicles Symposium (IV), 812–818.

Pothan, S., Nandagopal, J. L., Selvaraj, G., 2017. Path planning using state lattice for autonomous vehicle. In: 2017 International Conference on Technological Advancements in Power and Energy (TAP Energy), 1–5.

Qu, X., Lin, H., Liu, Y., 2023. Envisioning the future of transportation: Inspiration of ChatGPT and large models. Commun Transp Res, 3, 100103.

Crossref Google Scholar

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N., 2021. Stable-baselines3: Reliable reinforcement learning implementations. J Mach Learn Res, 22, 1−8.

Raju K., 2020. Autonomous car parking using ML-agents. https://medium.com/xrpractices/autonomous-car-parking-using-ml-agents-d780a366fe46

Ryou, G., Tal, E., Karaman, S., 2022. Cooperative multi-agent trajectory generation with modular Bayesian optimization. https://doi.org/10.48550/arXiv.2206.00726

Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P., 2017a. Trust region policy optimization. In: International conference on machine learning, 1889–1897.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O., 2017b. Proximal policy optimization algorithms. https://doi.org/10.48550/arXiv.1707.06347

Sedighi, S., Nguyen, D. V., Kuhnert, K. D., 2019. Guided hybrid A-star path planning algorithm for valet parking applications. In: 2019 5th International Conference on Control, Automation and Robotics (ICCAR), 570–575.

Silva, F. L., Filgueira da Silva, S., Mazzariol Santiciolli, F., Eckert, J. J., Silva, L. C. A., Dedini, F. G., 2021. Multi-objective optimization of the steering system and fuzzy logic control applied to a car-like robot. In: International Symposium on Multibody Systems and Mechatronics, 195–202.

Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M., 2014. Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on International Conference on Machine Learning-Volume 32, 387–395.

Simonini, T., Sanseviero, O., 2023. The hugging face deep reinforcement learning class. https://github.com/huggingface/deep-rl-class.git

Sun, H., Zhang, W., Yu, R., Zhang, Y., 2021. Motion planning for mobile robots—Focusing on deep reinforcement learning: A systematic review. IEEE Access, 9, 69061−69081.

Crossref Google Scholar

Sutton, R. S., Barto, A. G., 1998. Reinforcement learning: An introduction. IEEE Trans Neural Netw, 9, 1054.

Crossref Google Scholar

Szczepanski, R., Tarczewski, T., Erwinski, K., 2022. Energy efficient local path planning algorithm based on predictive artificial potential field. IEEE Access, 10, 39729−39742.

Crossref Google Scholar

Treiber, M., Hennecke, A., Helbing, D., 2000. Congested traffic states in empirical observations and microscopic simulations. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics, 62, 1805−1824.

Crossref Google Scholar

Unity, 2023. Unity. https://unity.com

Unreal Engine, 2023. Unreal Engine. https://www.unrealengine.com/en-US

Urmanov, M., Alimanova, M., Nurkey, A., 2019. Training unity machine learning agents using reinforcement learning method. In: 2019 15th International Conference on Electronics, Computer and Computation (ICECCO), 1–4.

Van Hasselt, H., Guez, A., Silver, D., 2016. Deep reinforcement learning with double Q-learning. Proc AAAI Conf Artif Intell, 30, 1.

Crossref Google Scholar

Wang, D., Ha, M., Qiao, J., 2020. Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Trans Automat Contr, 65, 1272−1279.

Crossref Google Scholar

Wang, D., He, H., Liu, D., 2017. Adaptive critic nonlinear robust control: A survey. IEEE Trans Cybern, 47, 3429−3451.

Crossref Google Scholar

Wang, P., Yang, J., Zhang, Y., Wang, Q., Sun, B., Guo, D., 2022a. Obstacle-avoidance path-planning algorithm for autonomous vehicles based on B-spline algorithm. World Electr Veh J, 13, 233.

Crossref Google Scholar

Wang, X., Chen, Y. Zhu, W., 2022b. A survey on curriculum learning. IEEE Trans Pattern Anal Mach Intell, 44, 4555−4576.

Wei, Y., Zhang, H., Wang, Y., Huang, C., 2023. Autonomous maneuver decision-making through curriculum learning and reinforcement learning with sparse rewards. IEEE Access, 11, 73543−73555.

Crossref Google Scholar

Zhang, K., Niroui, F., Ficocelli, M., Nejat, G., 2018. Robot Navigation of Environments with Unknown Rough Terrain Using deep Reinforcement Learning. In: 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), 1–7.

Zhao, J., Ma, X., Yang, B., Chen, Y., Zhou, Z., Xiao, P., 2022. Global path planning of unmanned vehicle based on fusion of A^* algorithm and Voronoi field. J Intell Connect Veh, 5, 250−259.

Crossref Google Scholar

Zheng, L., Zeng, P., Yang, W., Li, Y., Zhan, Z., 2020. Bézier curve-based trajectory planning for autonomous vehicles with collision avoidance. IET Intell Transp Syst, 14, 1882−1891.

Crossref Google Scholar

Journal of Intelligent and Connected Vehicles

Volume 7 Issue 3,
September 2024

Pages 229-244

DOI: 10.26599/JICV.2023.9210039

Cite this article:

Berta R, Lazzaroni L, Capello A, et al. Development of deep-learning-based autonomous agents for low-speed maneuvering in Unity. Journal of Intelligent and Connected Vehicles, 2024, 7(3): 229-244. https://doi.org/10.26599/JICV.2023.9210039

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号