AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

Human as AI mentor: Enhanced human-in-the-loop reinforcement learning for safe and efficient autonomous driving

Zilin HuangZihao ShengChengyuan MaSikai Chen( )
Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, WI, 53706, USA
Show Author Information

Abstract

Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents’ policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor’s cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios.

References

 
Achiam, J., Held, D., Tamar, A., Abbeel, P., 2017. Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning, pp. 22–31.
 

Andreotti, E.Selpi, Boyraz, P., 2023. Potential impact of autonomous vehicles in mixed traffic from simulation using real traffic flow. J. Int. Con. Veh. 6, 1–15.

 

Ansariyar, A., Tahmasebi, M., 2022. Investigating the effects of gradual deployment of market penetration rates (MPR) of connected vehicles on delay time and fuel consumption. J. Intell. Connect. Veh. 5, 188–198.

 

Aradi, S., 2022. Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Trans. Intell. Transport. Syst. 23, 740–759.

 
Bain, M., Sammut, C., 1995. A framework for behavioural cloning. In: Machine Intelligence. Oxford University Press, Oxford, pp. 103–129.
 

Bando, M., Hasebe, K., Nakayama, A., Shibata, A., Sugiyama, Y., 1995. Dynamical model of traffic congestion and numerical simulation. Phys. Rev. E. 51, 1035–1042.

 

Booth, S., Knox, W.B., Shah, J., Niekum, S., Stone, P., Allievi, A., 2023. The perils of trial-and-error reward design: Misdesign through overfitting and invalid task specifications. Proc. AAAI Conf. Artif. Intell. 37, 5920–5929.

 

Chen, C., Wang, J., Xu, Q., Wang, J., Li, K., 2021a. Mixed platoon control of automated and human-driven vehicles at a signalized intersection: Dynamical analysis and optimal control. Transp. Res. Part C Emerg. Technol. 127, 103138.

 

Chen, D., Srivastava, A., Ahn, S., Li, T., 2020. Traffic dynamics under speed disturbance in mixed traffic with automated and non-automated vehicles. Transp. Res. Part C Emerg. Technol. 113, 293–313.

 

Chen, S., Dong, J., Ha, P., Li, Y., Labi, S., 2021b. Graph neural network and reinforcement learning for multi-agent cooperative control of connected autonomous vehicles. Comput. Aided. Civ. Infrastruct. Eng. 36, 838–857.

 

Chen, S., Zong, S., Chen, T., Huang, Z., Chen, Y., Labi, S., 2023. A taxonomy for autonomous vehicles considering ambient road infrastructure. Sustainability 15, 11258.

 
Christiano, P.F., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D., 2017. Deep reinforcement learning from human preferences. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4302–4310.
 
Codevilla, F., Santana, E., Lopez, A., Gaidon, A., 2019. Exploring the limitations of behavior cloning for autonomous driving. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9329–9338.
 
Chen, R., 2023. DI-drive: OpenDILab decision intelligence platform for autonomous driving simulation n.d. https://github.com/opendilab/DI-drive.
 

Di, X., Shi, R., 2021. A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to AI-guided driving policy learning. Transp. Res. Part C Emerg. Technol. 125, 103008.

 

Ding, H., Li, W., Xu, N., Zhang, J., 2022. An enhanced eco-driving strategy based on reinforcement learning for connected electric vehicles: Cooperative velocity and lane-changing control. J. Intell. Connect. Veh. 5, 316–332.

 

Dong, J., Chen, S., Li, Y., Du, R., Steinfeld, A., Labi, S., 2021. Space-weighted information fusion using deep reinforcement learning: The context of tactical control of lane-changing autonomous vehicles and connectivity range assessment. Transp. Res. Part C Emerg. Technol. 128, 103192.

 

Dong, J., Chen, S., Miralinaghi, M., Chen, T., Labi, S., 2022. Development and testing of an image transformer for explainable autonomous driving systems. J. Intell. Connect. Veh. 5, 235–249.

 

Dong, J., Chen, S., Miralinaghi, M., Chen, T., Li, P., Labi, S., 2023. Why did the AI make that decision? Towards an explainable artificial intelligence (XAI) for autonomous driving systems. Transp. Res. Part C Emerg. Technol. 156, 104358.

 
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V., 2017. CARLA: An open urban driving simulator. In: Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16.
 

Du, R., Chen, S., Dong, J., Chen, T., Fu, X., Labi, S., 2023. Dynamic urban traffic rerouting with fog-cloud reinforcement learning. Computer Aided Civil. Eng. 1–21.

 

Feng, S., Sun, H., Yan, X., Zhu, H., Zou, Z., Shen, S., et al., 2023. Dense reinforcement learning for safety validation of autonomous vehicles. Nature 615, 620–627.

 

Ha, P., Chen, S., Dong, J., Labi, S., 2023. Leveraging vehicle connectivity and autonomy for highway bottleneck congestion mitigation using reinforcement learning. Transp. A Transp. Sci. 1–26.

 
Ha, S., Xu, Peng, Tan, Zhenyu, Levine, S., Tan, Jie, 2020. Learning to walk in the real world with minimal human effort. In: Proceedings of the 2020 Conference on Robot Learning, pp. 1110–1120.
 
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S., 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870.
 

Han, Y., Wang, M., Leclercq, L., 2023. Leveraging reinforcement learning for dynamic traffic control: A survey and challenges for field implementation. Commun. Transport. Res. 3, 100104.

 

Han, Y., Wang, M., Li, L., Roncoli, C., Gao, J., Liu, P., 2022. A physics-informed reinforcement learning-based strategy for local and coordinated ramp metering. Transp. Res. Part C Emerg. Technol. 137, 103584.

 
Ho, J., Ermon, S., 2016. Generative adversarial imitation learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 4572–4580.
 

Huang, L., Guo, H., Zhang, R., Wang, H., Wu, J., 2018. Capturing drivers’ lane changing behaviors on operational level by data driven methods. IEEE Access 6, 57497–57506.

 
Huang, Z., Chen, S., Pian, Y., Sheng, Z., Ahn, S., Noyce, D., 2023a. CV2X-LOCA: Roadside unit-enabled cooperative localization framework for autonomous vehicles. https://arxiv.org/abs/2304.00676.
 

Huang, Z., Liu, H., Wu, J., Lv, C., 2023b. Conditional predictive behavior planning with inverse reinforcement learning for human-like autonomous driving. IEEE Trans. Intell. Transp. Syst. 24, 7244–7258.

 
Ibarz, B., Leike, J., Pohlen, T., Irving, G., Legg, S., Amodei, D., 2018. Reward learning from human preferences and demonstrations in Atari. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8022–8034.
 

Jiang, L., Xie, Y., Evans, N.G., Wen, X., Li, T., Chen, D., 2022. Reinforcement Learning based cooperative longitudinal control for reducing traffic oscillations and improving platoon stability. Transp. Res. Part C Emerg. Technol. 141, 103744.

 
Kakade, S., Langford, J., 2002. Approximately optimal approximate reinforcement learning. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 267–274.
 
Kelly, M., Sidrane, C., Driggs-Campbell, K., Kochenderfer, M.J., 2019. HG-DAgger: Interactive imitation learning with human experts. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8077–8083.
 

Kesting, A., Treiber, M., Helbing, D., 2007. General lane-changing model MOBIL for car-following models. Transp. Res. Rec. 1999, 86–94.

 

Kiran, B.R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A.A., Yogamani, S., et al., 2022. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transport. Syst. 23, 4909–4926.

 

Knox, W.B., Allievi, A., Banzhaf, H., Schmitt, F., Stone, P., 2023. Reward (Mis)design for autonomous driving. Artif. Intell. 316, 103829.

 

Krishna, R., Lee, D., Li, F.F., Bernstein, M.S., 2022. Socially situated artificial intelligence enables learning from human interaction. Proc. Natl. Acad. Sci. USA 119, 39.

 
Kumar, A., Zhou, A., Tucker, G., Levine, S., 2020. Conservative Q-learning for offline reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 1179–1191.
 

Le Mero, L., Yi, D., Dianati, M., Mouzakitis, A., 2022. A survey on imitation learning techniques for end-to-end autonomous vehicles. IEEE Trans. Intell. Transport. Syst. 23, 14128–14147.

 

Li, J., Wu, P., Li, R., Pian, Y., Huang, Z., Xu, L., et al., 2022a. ST-CRMF: Compensated residual matrix factorization with spatial-temporal regularization for graph-based time series forecasting. Sensors 22, 5877.

 

Li, Q., Peng, Z., Feng, L., Zhang, Q., Xue, Z., Zhou, B., 2023. MetaDrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 1–14.

 
Li, Q., Peng, Z., Zhou, B., 2022b. Efficient learning of safe driving policy via human-AI copilot optimization. In: International Conference on Learning Representations, pp. 1–19.
 
Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., et al., 2018. RLlib: Abstractions for distributed reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3053–3062.
 
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., et al., 2016. Continuous control with deep reinforcement learning. In: Proceedings of the International Conference on Learning Representations, pp. 1–14.
 

Lin, Y., Huang, Z., Wu, P., Xu, L., 2021. RSSI positioning method of vehicles in tunnels based on semi-supervised extreme learning machine. J. Traffic Transp. Eng. 243–255.

 

Liu, C., Sheng, Z., Chen, S., Shi, H., Ran, B., 2023. Longitudinal control of connected and automated vehicles among signalized intersections in mixed traffic flow with deep reinforcement learning approach. Phys. A Stat. Mech. Appl. 629, 129189.

 

Ma, H., An, B., Li, L., Zhou, Z., Qu, X., Ran, B., 2023. Anisotropy safety potential field model under intelligent and connected vehicle environment and its application in car-following modeling. J. Int. Con. Veh. 6, 79–90.

 

Mahdinia, I., Mohammadnazar, A., Arvin, R., Khattak, A.J., 2021. Integration of automated vehicles in mixed traffic: Evaluating changes in performance of following human-driven vehicles. Accid. Anal. Prev. 152, 106006.

 
Mandel, T., Liu, Y.E., Brunskill, E., Popović, Z., 2017. Where to Add actions in human-inthe-loop reinforcement learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2322–2328.
 
Mandlekar, A., Xu, D., Martín-Martín, R., Zhu, Y., Li, F.F., Savarese, S., 2020. Human-inthe-Loop Imitation Learning using Remote Teleoperation. https://arxiv.org/abs/2012.06733.
 

Mohammadian, S., Zheng, Z., Haque, M.M., Bhaskar, A., 2023. Continuum modeling of freeway traffic flows: State-of-the-art, challenges and future directions in the era of connected and automated vehicles. Commun. Transport. Res. 3, 100107.

 

Muhammad, K., Ullah, A., Lloret, J., Del Ser, J., de Albuquerque, V.H.C., 2021. Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Trans. Intell. Transp. Syst. 22, 4316–4336.

 
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P., 2018. Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299.
 

Olovsson, T., Svensson, T., Wu, J., 2022. Future connected vehicles: Communications demands, privacy and cyber-security. Commun. Transp. Res. 2, 100056.

 

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., et al., 2022. Training language models to follow instructions with human feedback. Adv. Neurl inf. Process. Syst. 35, 27730–27744.

 
Peng, Z, Li, Q, Liu, C, Zhou, B, 2022. Safe driving via expert guided policy optimization. In: Proceedings of the 5th Conference on Robot Learning, pp. 1554–1563.
 
Peng, Z., Mo, W., Duan, C., Li, Q., Zhou, B., 2023. Learning from active human involvement through proxy value propagation. In: 3-th Conference on Neural Information Processing Systems, pp. 1–23.
 

Qu, X., Lin, H., Liu, Y., 2023. Envisioning the future of transportation: Inspiration of ChatGPT and large models. Commun. Transport. Res. 3, 100103.

 
Ross, S., Gordon, G.J., Bagnell, J.A., 2011. A reduction of imitation learning and structured prediction to No-RegretOnline learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635.
 
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P., 2016. High-dimensional continuous control using generalized advantage estimation. https://arxiv.org/abs/1506.02438.
 
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O., 2017. Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347.
 

Sharma, A., Zheng, Z., Kim, J., Bhaskar, A., Haque, M.M., 2021. Assessing traffic disturbance, efficiency, and safety of the mixed traffic flow of connected vehicles and traditional vehicles by considering human factors. Transp. Res. Part C Emerg. Technol. 124, 102934.

 
Sheng, Z., Huang, Z., Chen, S., 2023. EPG-MGCN: Ego-planning guided multi-graph convolutional network for heterogeneous agent trajectory prediction. https://arxiv.org/abs/2303.17027.
 

Shi, H., Chen, D., Zheng, N., Wang, X., Zhou, Y., Ran, B., 2023. A deep reinforcement learning based distributed control strategy for connected automated vehicles in mixed traffic platoon. Transp. Res. Part C Emerg. Technol. 148, 104019.

 

Shi, X., Wang, Z., Li, X., Pei, M., 2021. The effect of ride experience on changing opinions toward autonomous vehicle safety. Commun. Transport. Res. 1, 100003.

 

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., et al., 2018. A general reinforcement learning algorithm that Masters chess, shogi, and Go through self-play. Science 362, 1140–1144.

 

Stern, R.E., Chen, Y., Churchill, M., Wu, F., Delle Monache, M.L., Piccoli, B., et al., 2019. Quantifying air quality benefits resulting from few autonomous vehicles stabilizing traffic. Transp. Res. Part D Transp. Environ. 67, 351–365.

 
Stooke, A., Achiam, J., Abbeel, P., 2020. Responsive safety in reinforcement learning by PID Lagrangian methods. In: Proceedings of the 37th International Conference on Machine Learning, pp. 9133–9143.
 

Treiber, M., Hennecke, A., Helbing, D., 2000. Congested traffic states in empirical observations and microscopic simulations. Phys. Rev. E. 62, 1805–1824.

 

Wang, W., Zhang, Y., Gao, J., Jiang, Y., Yang, Y., Zheng, Z., et al., 2023. GOPS: A general optimal control problem solver for autonomous driving and industrial control applications. Commun. Transport. Res. 3, 100096.

 
Wu, J., Huang, W., de Boer, N., Mo, Y., He, X., Lv, C., 2022. Safe decision-making for lanechange of autonomous vehicles via human demonstration-aided reinforcement learning. In: 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), pp. 1228–1233.
 

Wu, J., Huang, Z., Hu, Z., Lv, C., 2023. Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving. Engineering 21, 75–91.

 

Wu, J., Qu, X., 2022. Intersection control with connected and automated vehicles: A review. J. Intell. Connect. Veh. 5, 260–269.

 

Wu, P., Huang, Z., Pian, Y., Xu, L., Li, J., Chen, K., 2020. A combined deep learning method with attention-based LSTM model for short-term traffic speed forecasting. J. Adv. Transp. 2020, 1–15.

 

Wu, Y., Chen, H., Zhu, F., 2019. DCL-AIM: Decentralized coordination learning of autonomous intersection management for connected and automated vehicles. Transp. Res. Part C Emerg. Technol. 103, 246–260.

 

Xu, M., Di, Y., Ding, H., Zhu, Z., Chen, X., Yang, H., 2023. AGNP: Network-wide short-term probabilistic traffic speed prediction and imputation. Commun. Transport. Res. 3, 100099.

 

Yue, L., Abdel-Aty, M., Wang, Z., 2022. Effects of connected and autonomous vehicle merging behavior on mainline human-driven vehicle. J. Intell. Connect. Veh. 5, 36–45.

 

Zhou, H., Zhou, A., Li, T., Chen, D., Peeta, S., Laval, J., 2022. Congestion-mitigating MPC design for adaptive cruise control based on Newell’s car following model: History outperforms prediction. Transp. Res. Part C Emerg. Technol. 142, 103801.

 

Zhu, J., Easa, S., Gao, K., 2022. Merging control strategies of connected and autonomous vehicles at freeway on-ramps: A comprehensive review. J. Intell. Connect. Veh. 5, 99–111.

 

Zhu, M., Wang, Y., Pu, Z., Hu, J., Wang, X., Ke, R., 2020. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving. Transp. Res. Part C Emerg. Technol. 117, 102662.

 

Zhu, Z., Zhao, H., 2022. A survey of deep RL and IL for autonomous driving policy learning. IEEE Trans. Intell. Transport. Syst. 23, 14043–14065.

 

Zhuo, J., Zhu, F., 2023. Evaluation of platooning configurations for connected and automated vehicles at an isolated round about in a mixed traffic environment. J. Int. Con. Veh. 6, 136–148.

Communications in Transportation Research
Article number: 100127
Cite this article:
Huang Z, Sheng Z, Ma C, et al. Human as AI mentor: Enhanced human-in-the-loop reinforcement learning for safe and efficient autonomous driving. Communications in Transportation Research, 2024, 4(2): 100127. https://doi.org/10.1016/j.commtr.2024.100127

17

Views

6

Crossref

5

Web of Science

6

Scopus

Altmetrics

Received: 20 November 2023
Revised: 06 January 2024
Accepted: 07 January 2024
Published: 08 May 2024
© 2024 The Authors.

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Return