| Sign up

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Collect

Submit Manuscript

Show Outline

Outline

Abstract

Keywords

References

Show full outline

Hide outline

Research Article | Open Access

Human as AI mentor: Enhanced human-in-the-loop reinforcement learning for safe and efficient autonomous driving

Zilin Huang, Zihao Sheng, Chengyuan Ma, Sikai Chen()

Department of Civil and Environmental Engineering, University of Wisconsin-Madison, Madison, WI, 53706, USA

Show Author Information

Abstract

Despite significant progress in autonomous vehicles (AVs), the development of driving policies that ensure both the safety of AVs and traffic flow efficiency has not yet been fully explored. In this paper, we propose an enhanced human-in-the-loop reinforcement learning method, termed the Human as AI mentor-based deep reinforcement learning (HAIM-DRL) framework, which facilitates safe and efficient autonomous driving in mixed traffic platoon. Drawing inspiration from the human learning process, we first introduce an innovative learning paradigm that effectively injects human intelligence into AI, termed Human as AI mentor (HAIM). In this paradigm, the human expert serves as a mentor to the AI agent. While allowing the agent to sufficiently explore uncertain environments, the human expert can take control in dangerous situations and demonstrate correct actions to avoid potential accidents. On the other hand, the agent could be guided to minimize traffic flow disturbance, thereby optimizing traffic flow efficiency. In detail, HAIM-DRL leverages data collected from free exploration and partial human demonstrations as its two training sources. Remarkably, we circumvent the intricate process of manually designing reward functions; instead, we directly derive proxy state-action values from partial human demonstrations to guide the agents’ policy learning. Additionally, we employ a minimal intervention technique to reduce the human mentor’s cognitive load. Comparative results show that HAIM-DRL outperforms traditional methods in driving safety, sampling efficiency, mitigation of traffic flow disturbance, and generalizability to unseen traffic scenarios.

Keywords

Human as AI mentor paradigm Autonomous driving Deep reinforcement learning Human-in-the-loop learning Driving policy Mixed traffic platoon

References

Achiam, J., Held, D., Tamar, A., Abbeel, P., 2017. Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning, pp. 22–31.

Andreotti, E.Selpi, Boyraz, P., 2023. Potential impact of autonomous vehicles in mixed traffic from simulation using real traffic flow. J. Int. Con. Veh. 6, 1–15.

Crossref Google Scholar

Ansariyar, A., Tahmasebi, M., 2022. Investigating the effects of gradual deployment of market penetration rates (MPR) of connected vehicles on delay time and fuel consumption. J. Intell. Connect. Veh. 5, 188–198.

Crossref Google Scholar

Aradi, S., 2022. Survey of deep reinforcement learning for motion planning of autonomous vehicles. IEEE Trans. Intell. Transport. Syst. 23, 740–759.

Crossref Google Scholar

Bain, M., Sammut, C., 1995. A framework for behavioural cloning. In: Machine Intelligence. Oxford University Press, Oxford, pp. 103–129.

Bando, M., Hasebe, K., Nakayama, A., Shibata, A., Sugiyama, Y., 1995. Dynamical model of traffic congestion and numerical simulation. Phys. Rev. E. 51, 1035–1042.

Crossref Google Scholar

Booth, S., Knox, W.B., Shah, J., Niekum, S., Stone, P., Allievi, A., 2023. The perils of trial-and-error reward design: Misdesign through overfitting and invalid task specifications. Proc. AAAI Conf. Artif. Intell. 37, 5920–5929.

Crossref Google Scholar

Chen, C., Wang, J., Xu, Q., Wang, J., Li, K., 2021a. Mixed platoon control of automated and human-driven vehicles at a signalized intersection: Dynamical analysis and optimal control. Transp. Res. Part C Emerg. Technol. 127, 103138.

Crossref Google Scholar

Chen, D., Srivastava, A., Ahn, S., Li, T., 2020. Traffic dynamics under speed disturbance in mixed traffic with automated and non-automated vehicles. Transp. Res. Part C Emerg. Technol. 113, 293–313.

Crossref Google Scholar

Chen, S., Dong, J., Ha, P., Li, Y., Labi, S., 2021b. Graph neural network and reinforcement learning for multi-agent cooperative control of connected autonomous vehicles. Comput. Aided. Civ. Infrastruct. Eng. 36, 838–857.

Crossref Google Scholar

Chen, S., Zong, S., Chen, T., Huang, Z., Chen, Y., Labi, S., 2023. A taxonomy for autonomous vehicles considering ambient road infrastructure. Sustainability 15, 11258.

Crossref Google Scholar

Christiano, P.F., Leike, J., Brown, T.B., Martic, M., Legg, S., Amodei, D., 2017. Deep reinforcement learning from human preferences. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4302–4310.

Codevilla, F., Santana, E., Lopez, A., Gaidon, A., 2019. Exploring the limitations of behavior cloning for autonomous driving. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9329–9338.

Chen, R., 2023. DI-drive: OpenDILab decision intelligence platform for autonomous driving simulation n.d. https://github.com/opendilab/DI-drive.

Di, X., Shi, R., 2021. A survey on autonomous vehicle control in the era of mixed-autonomy: From physics-based to AI-guided driving policy learning. Transp. Res. Part C Emerg. Technol. 125, 103008.

Crossref Google Scholar

Ding, H., Li, W., Xu, N., Zhang, J., 2022. An enhanced eco-driving strategy based on reinforcement learning for connected electric vehicles: Cooperative velocity and lane-changing control. J. Intell. Connect. Veh. 5, 316–332.

Crossref Google Scholar

Dong, J., Chen, S., Li, Y., Du, R., Steinfeld, A., Labi, S., 2021. Space-weighted information fusion using deep reinforcement learning: The context of tactical control of lane-changing autonomous vehicles and connectivity range assessment. Transp. Res. Part C Emerg. Technol. 128, 103192.

Crossref Google Scholar

Dong, J., Chen, S., Miralinaghi, M., Chen, T., Labi, S., 2022. Development and testing of an image transformer for explainable autonomous driving systems. J. Intell. Connect. Veh. 5, 235–249.

Crossref Google Scholar

Dong, J., Chen, S., Miralinaghi, M., Chen, T., Li, P., Labi, S., 2023. Why did the AI make that decision? Towards an explainable artificial intelligence (XAI) for autonomous driving systems. Transp. Res. Part C Emerg. Technol. 156, 104358.

Crossref Google Scholar

Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V., 2017. CARLA: An open urban driving simulator. In: Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16.

Du, R., Chen, S., Dong, J., Chen, T., Fu, X., Labi, S., 2023. Dynamic urban traffic rerouting with fog-cloud reinforcement learning. Computer Aided Civil. Eng. 1–21.

Crossref Google Scholar

Feng, S., Sun, H., Yan, X., Zhu, H., Zou, Z., Shen, S., et al., 2023. Dense reinforcement learning for safety validation of autonomous vehicles. Nature 615, 620–627.

Crossref Google Scholar

Ha, P., Chen, S., Dong, J., Labi, S., 2023. Leveraging vehicle connectivity and autonomy for highway bottleneck congestion mitigation using reinforcement learning. Transp. A Transp. Sci. 1–26.

Crossref Google Scholar

Ha, S., Xu, Peng, Tan, Zhenyu, Levine, S., Tan, Jie, 2020. Learning to walk in the real world with minimal human effort. In: Proceedings of the 2020 Conference on Robot Learning, pp. 1110–1120.

Haarnoja, T., Zhou, A., Abbeel, P., Levine, S., 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870.

Han, Y., Wang, M., Leclercq, L., 2023. Leveraging reinforcement learning for dynamic traffic control: A survey and challenges for field implementation. Commun. Transport. Res. 3, 100104.

Crossref Google Scholar

Han, Y., Wang, M., Li, L., Roncoli, C., Gao, J., Liu, P., 2022. A physics-informed reinforcement learning-based strategy for local and coordinated ramp metering. Transp. Res. Part C Emerg. Technol. 137, 103584.

Crossref Google Scholar

Ho, J., Ermon, S., 2016. Generative adversarial imitation learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, pp. 4572–4580.

Huang, L., Guo, H., Zhang, R., Wang, H., Wu, J., 2018. Capturing drivers’ lane changing behaviors on operational level by data driven methods. IEEE Access 6, 57497–57506.

Crossref Google Scholar

Huang, Z., Chen, S., Pian, Y., Sheng, Z., Ahn, S., Noyce, D., 2023a. CV2X-LOCA: Roadside unit-enabled cooperative localization framework for autonomous vehicles. https://arxiv.org/abs/2304.00676.

Huang, Z., Liu, H., Wu, J., Lv, C., 2023b. Conditional predictive behavior planning with inverse reinforcement learning for human-like autonomous driving. IEEE Trans. Intell. Transp. Syst. 24, 7244–7258.

Crossref Google Scholar

Ibarz, B., Leike, J., Pohlen, T., Irving, G., Legg, S., Amodei, D., 2018. Reward learning from human preferences and demonstrations in Atari. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8022–8034.

Jiang, L., Xie, Y., Evans, N.G., Wen, X., Li, T., Chen, D., 2022. Reinforcement Learning based cooperative longitudinal control for reducing traffic oscillations and improving platoon stability. Transp. Res. Part C Emerg. Technol. 141, 103744.

Crossref Google Scholar

Kakade, S., Langford, J., 2002. Approximately optimal approximate reinforcement learning. In: Proceedings of the Nineteenth International Conference on Machine Learning, pp. 267–274.

Kelly, M., Sidrane, C., Driggs-Campbell, K., Kochenderfer, M.J., 2019. HG-DAgger: Interactive imitation learning with human experts. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 8077–8083.

Kesting, A., Treiber, M., Helbing, D., 2007. General lane-changing model MOBIL for car-following models. Transp. Res. Rec. 1999, 86–94.

Crossref Google Scholar

Kiran, B.R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A.A., Yogamani, S., et al., 2022. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transport. Syst. 23, 4909–4926.

Crossref Google Scholar

Knox, W.B., Allievi, A., Banzhaf, H., Schmitt, F., Stone, P., 2023. Reward (Mis)design for autonomous driving. Artif. Intell. 316, 103829.

Crossref Google Scholar

Krishna, R., Lee, D., Li, F.F., Bernstein, M.S., 2022. Socially situated artificial intelligence enables learning from human interaction. Proc. Natl. Acad. Sci. USA 119, 39.

Crossref Google Scholar

Kumar, A., Zhou, A., Tucker, G., Levine, S., 2020. Conservative Q-learning for offline reinforcement learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, pp. 1179–1191.

Le Mero, L., Yi, D., Dianati, M., Mouzakitis, A., 2022. A survey on imitation learning techniques for end-to-end autonomous vehicles. IEEE Trans. Intell. Transport. Syst. 23, 14128–14147.

Crossref Google Scholar

Li, J., Wu, P., Li, R., Pian, Y., Huang, Z., Xu, L., et al., 2022a. ST-CRMF: Compensated residual matrix factorization with spatial-temporal regularization for graph-based time series forecasting. Sensors 22, 5877.

Crossref Google Scholar

Li, Q., Peng, Z., Feng, L., Zhang, Q., Xue, Z., Zhou, B., 2023. MetaDrive: Composing diverse driving scenarios for generalizable reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 1–14.

Crossref Google Scholar

Li, Q., Peng, Z., Zhou, B., 2022b. Efficient learning of safe driving policy via human-AI copilot optimization. In: International Conference on Learning Representations, pp. 1–19.

Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., et al., 2018. RLlib: Abstractions for distributed reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning, pp. 3053–3062.

Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., et al., 2016. Continuous control with deep reinforcement learning. In: Proceedings of the International Conference on Learning Representations, pp. 1–14.

Lin, Y., Huang, Z., Wu, P., Xu, L., 2021. RSSI positioning method of vehicles in tunnels based on semi-supervised extreme learning machine. J. Traffic Transp. Eng. 243–255.

Liu, C., Sheng, Z., Chen, S., Shi, H., Ran, B., 2023. Longitudinal control of connected and automated vehicles among signalized intersections in mixed traffic flow with deep reinforcement learning approach. Phys. A Stat. Mech. Appl. 629, 129189.

Crossref Google Scholar

Ma, H., An, B., Li, L., Zhou, Z., Qu, X., Ran, B., 2023. Anisotropy safety potential field model under intelligent and connected vehicle environment and its application in car-following modeling. J. Int. Con. Veh. 6, 79–90.

Crossref Google Scholar

Mahdinia, I., Mohammadnazar, A., Arvin, R., Khattak, A.J., 2021. Integration of automated vehicles in mixed traffic: Evaluating changes in performance of following human-driven vehicles. Accid. Anal. Prev. 152, 106006.

Crossref Google Scholar

Mandel, T., Liu, Y.E., Brunskill, E., Popović, Z., 2017. Where to Add actions in human-inthe-loop reinforcement learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 2322–2328.

Mandlekar, A., Xu, D., Martín-Martín, R., Zhu, Y., Li, F.F., Savarese, S., 2020. Human-inthe-Loop Imitation Learning using Remote Teleoperation. https://arxiv.org/abs/2012.06733.

Mohammadian, S., Zheng, Z., Haque, M.M., Bhaskar, A., 2023. Continuum modeling of freeway traffic flows: State-of-the-art, challenges and future directions in the era of connected and automated vehicles. Commun. Transport. Res. 3, 100107.

Crossref Google Scholar

Muhammad, K., Ullah, A., Lloret, J., Del Ser, J., de Albuquerque, V.H.C., 2021. Deep learning for safe autonomous driving: Current challenges and future directions. IEEE Trans. Intell. Transp. Syst. 22, 4316–4336.

Crossref Google Scholar

Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P., 2018. Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299.

Olovsson, T., Svensson, T., Wu, J., 2022. Future connected vehicles: Communications demands, privacy and cyber-security. Commun. Transp. Res. 2, 100056.

Crossref Google Scholar

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., et al., 2022. Training language models to follow instructions with human feedback. Adv. Neurl inf. Process. Syst. 35, 27730–27744.

Peng, Z, Li, Q, Liu, C, Zhou, B, 2022. Safe driving via expert guided policy optimization. In: Proceedings of the 5th Conference on Robot Learning, pp. 1554–1563.

Peng, Z., Mo, W., Duan, C., Li, Q., Zhou, B., 2023. Learning from active human involvement through proxy value propagation. In: 3-th Conference on Neural Information Processing Systems, pp. 1–23.

Qu, X., Lin, H., Liu, Y., 2023. Envisioning the future of transportation: Inspiration of ChatGPT and large models. Commun. Transport. Res. 3, 100103.

Crossref Google Scholar

Ross, S., Gordon, G.J., Bagnell, J.A., 2011. A reduction of imitation learning and structured prediction to No-RegretOnline learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 627–635.

Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P., 2016. High-dimensional continuous control using generalized advantage estimation. https://arxiv.org/abs/1506.02438.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O., 2017. Proximal policy optimization algorithms. https://arxiv.org/abs/1707.06347.

Sharma, A., Zheng, Z., Kim, J., Bhaskar, A., Haque, M.M., 2021. Assessing traffic disturbance, efficiency, and safety of the mixed traffic flow of connected vehicles and traditional vehicles by considering human factors. Transp. Res. Part C Emerg. Technol. 124, 102934.

Crossref Google Scholar

Sheng, Z., Huang, Z., Chen, S., 2023. EPG-MGCN: Ego-planning guided multi-graph convolutional network for heterogeneous agent trajectory prediction. https://arxiv.org/abs/2303.17027.

Shi, H., Chen, D., Zheng, N., Wang, X., Zhou, Y., Ran, B., 2023. A deep reinforcement learning based distributed control strategy for connected automated vehicles in mixed traffic platoon. Transp. Res. Part C Emerg. Technol. 148, 104019.

Crossref Google Scholar

Shi, X., Wang, Z., Li, X., Pei, M., 2021. The effect of ride experience on changing opinions toward autonomous vehicle safety. Commun. Transport. Res. 1, 100003.

Crossref Google Scholar

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., et al., 2018. A general reinforcement learning algorithm that Masters chess, shogi, and Go through self-play. Science 362, 1140–1144.

Crossref Google Scholar

Stern, R.E., Chen, Y., Churchill, M., Wu, F., Delle Monache, M.L., Piccoli, B., et al., 2019. Quantifying air quality benefits resulting from few autonomous vehicles stabilizing traffic. Transp. Res. Part D Transp. Environ. 67, 351–365.

Crossref Google Scholar

Stooke, A., Achiam, J., Abbeel, P., 2020. Responsive safety in reinforcement learning by PID Lagrangian methods. In: Proceedings of the 37th International Conference on Machine Learning, pp. 9133–9143.

Treiber, M., Hennecke, A., Helbing, D., 2000. Congested traffic states in empirical observations and microscopic simulations. Phys. Rev. E. 62, 1805–1824.

Crossref Google Scholar

Wang, W., Zhang, Y., Gao, J., Jiang, Y., Yang, Y., Zheng, Z., et al., 2023. GOPS: A general optimal control problem solver for autonomous driving and industrial control applications. Commun. Transport. Res. 3, 100096.

Crossref Google Scholar

Wu, J., Huang, W., de Boer, N., Mo, Y., He, X., Lv, C., 2022. Safe decision-making for lanechange of autonomous vehicles via human demonstration-aided reinforcement learning. In: 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), pp. 1228–1233.

Wu, J., Huang, Z., Hu, Z., Lv, C., 2023. Toward human-in-the-loop AI: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving. Engineering 21, 75–91.

Crossref Google Scholar

Wu, J., Qu, X., 2022. Intersection control with connected and automated vehicles: A review. J. Intell. Connect. Veh. 5, 260–269.

Crossref Google Scholar

Wu, P., Huang, Z., Pian, Y., Xu, L., Li, J., Chen, K., 2020. A combined deep learning method with attention-based LSTM model for short-term traffic speed forecasting. J. Adv. Transp. 2020, 1–15.

Crossref Google Scholar

Wu, Y., Chen, H., Zhu, F., 2019. DCL-AIM: Decentralized coordination learning of autonomous intersection management for connected and automated vehicles. Transp. Res. Part C Emerg. Technol. 103, 246–260.

Crossref Google Scholar

Xu, M., Di, Y., Ding, H., Zhu, Z., Chen, X., Yang, H., 2023. AGNP: Network-wide short-term probabilistic traffic speed prediction and imputation. Commun. Transport. Res. 3, 100099.

Crossref Google Scholar

Yue, L., Abdel-Aty, M., Wang, Z., 2022. Effects of connected and autonomous vehicle merging behavior on mainline human-driven vehicle. J. Intell. Connect. Veh. 5, 36–45.

Crossref Google Scholar

Zhou, H., Zhou, A., Li, T., Chen, D., Peeta, S., Laval, J., 2022. Congestion-mitigating MPC design for adaptive cruise control based on Newell’s car following model: History outperforms prediction. Transp. Res. Part C Emerg. Technol. 142, 103801.

Crossref Google Scholar

Zhu, J., Easa, S., Gao, K., 2022. Merging control strategies of connected and autonomous vehicles at freeway on-ramps: A comprehensive review. J. Intell. Connect. Veh. 5, 99–111.

Crossref Google Scholar

Zhu, M., Wang, Y., Pu, Z., Hu, J., Wang, X., Ke, R., 2020. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving. Transp. Res. Part C Emerg. Technol. 117, 102662.

Crossref Google Scholar

Zhu, Z., Zhao, H., 2022. A survey of deep RL and IL for autonomous driving policy learning. IEEE Trans. Intell. Transport. Syst. 23, 14043–14065.

Crossref Google Scholar

Zhuo, J., Zhu, F., 2023. Evaluation of platooning configurations for connected and automated vehicles at an isolated round about in a mixed traffic environment. J. Int. Con. Veh. 6, 136–148.

Crossref Google Scholar

Communications in Transportation Research

Volume 4 Issue 2,
June 2024

Article number: 100127

DOI: 10.1016/j.commtr.2024.100127

Cite this article:

Huang Z, Sheng Z, Ma C, et al. Human as AI mentor: Enhanced human-in-the-loop reinforcement learning for safe and efficient autonomous driving. Communications in Transportation Research, 2024, 4(2): 100127. https://doi.org/10.1016/j.commtr.2024.100127

About Us

Learn about Open Access

Tsinghua University Press

Publish with Us

Peer Review Policy

Copyright and Licensing

Article Processing Charge

Contact Us

Journal Collaboration: Yao Meng (Ms.)✉️ +86-10-83470574

Technical Support: Kuo Zhao (Mr.)✉️ +86-10-83470507

Media Contact: Hao Jin (Mr.)✉️ +86-10-83470559

Address: Floor 6, Tower B, Xueyan Building, Shuangqing Road, Haidian District, Beijing 100084, China.

SciOpen——中国科技期刊卓越行动计划支持项目

Copyright © 2025 Tsinghua University Press Ltd.

京ICP备 10035462号-42 京公网安备11010802044758号