AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
Article Link
Collect
Submit Manuscript
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Research Article | Open Access

GOPS: A general optimal control problem solver for autonomous driving and industrial control applications

Wenxuan WangYuhang ZhangJiaxin GaoYuxuan JiangYujie YangZhilong ZhengWenjun ZouJie LiCongsheng ZhangWenhan CaoGenjin XieJingliang DuanShengbo Eben Li( )
School of Vehicle and Mobility, Tsinghua University, Beijing, 100084, China
Show Author Information

Abstract

Solving optimal control problems serves as the basic demand of industrial control tasks. Existing methods like model predictive control often suffer from heavy online computational burdens. Reinforcement learning has shown promise in computer and board games but has yet to be widely adopted in industrial applications due to a lack of accessible, high-accuracy solvers. Current Reinforcement learning (RL) solvers are often developed for academic research and require a significant amount of theoretical knowledge and programming skills. Besides, many of them only support Python-based environments and limit to model-free algorithms. To address this gap, this paper develops General Optimal control Problems Solver (GOPS), an easy-to-use RL solver package that aims to build real-time and high-performance controllers in industrial fields. GOPS is built with a highly modular structure that retains a flexible framework for secondary development. Considering the diversity of industrial control tasks, GOPS also includes a conversion tool that allows for the use of Matlab/Simulink to support environment construction, controller design, and performance validation. To handle large-scale problems, GOPS can automatically create various serial and parallel trainers by flexibly combining embedded buffers and samplers. It offers a variety of common approximate functions for policy and value functions, including polynomial, multilayer perceptron, convolutional neural network, etc. Additionally, constrained and robust algorithms for special industrial control systems with state constraints and model uncertainties are also integrated into GOPS. Several examples, including linear quadratic control, inverted double pendulum, vehicle tracking, humanoid robot, obstacle avoidance, and active suspension control, are tested to verify the performances of GOPS.

References

 

Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M., 2013. The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253-279.

 
Bertsekas, D.P., 2012. Dynamic Programming and Optimal Control, fourth ed., vol. Ⅰ. Athena scientific, Nashua.
 
Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W., 2016. OpenAI Gym arXiv:1606.01540. https://arxiv.org/abs/1606.01540.
 

Chen, J., Li, S.E., Tomizuka, M., 2022. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Trans. Intell. Transport. Syst. 23 (6), 5068-5078.

 

Chu, T., Wang, J., Codecà, L., Li, Z., 2020. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transport. Syst. 21 (3), 1086-1095.

 

Degrave, J., Felici, F., Buchli, J., Neunert, M., Tracey, B., Carpanese, F., Ewalds, T., Hafner, R., Abdolmaleki, A., de las Casas, D., Donner, C., Fritz, L., Galperti, C., Huber, A., Keeling, J., Tsimpoukelli, M., Kay, J., Merle, A., Moret, J.-M., Noury, S., Pesamosca, F., Pfau, D., Sauter, O., Sommariva, C., Coda, S., Duval, B., Fasoli, A., Kohli, P., Kavukcuoglu, K., Hassabis, D., Riedmiller, M., 2022. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602 (7897), 414-419.

 
Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P., 2017. OpenAI Baselines. https://github.com/openai/baselines.
 

Duan, J., Guan, Y., Li, S.E., Ren, Y., Sun, Q., Cheng, B., 2022. Distributional soft actor-critic: off-policy reinforcement learning for addressing value estimation errors. IEEE Transact. Neural Networks Learn. Syst. 33 (11), 6584-6598.

 

Duan, J., Li, S.E., Guan, Y., Sun, Q., Cheng, B., 2020. Hierarchical reinforcement learning for self-driving decision-making without reliance on labeled driving data. IET Intell. Transp. Syst. 14 (5), 297-305.

 
Fujimoto, S., Hoof, H., Meger, D., 2018. Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 1587–1596.
 
Gu, S., Holly, E., Lillicrap, T., Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), 3389-3396.
 
Guan, Y., Duan, J., Li, S.E., Li, J., Chen, J., Cheng, B., 2021a. Mixed Policy Gradient arXiv preprint arXiv:2102.11513. https://arxiv.org/abs/2102.11513.
 

Guan, Y., Li, S.E., Duan, J., Li, J., Ren, Y., Sun, Q., Cheng, B., 2021b. Direct and indirect reinforcement learning. Int. J. Intell. Syst. 36 (8), 4439-4467.

 

Guan, Y., Ren, Y., Li, S.E., Sun, Q., Luo, L., Li, K., 2020. Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization. IEEE Trans. Veh. Technol. 69 (11), 12597-12608.

 
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S., 2018. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 1861–1870.
 

Haydari, A., Yılmaz, Y., 2022. Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans. Intell. Transport. Syst. 23 (1), 11-32.

 

Levine, S., Finn, C., Darrell, T., Abbeel, P., 2016. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17 (1), 1334-1373.

 
Li, J., Li, S.E., Guan, Y., Duan, J., Li, W., Yin, Y., 2020. Ternary Policy Iteration Algorithm for Nonlinear Robust Control arXiv:2007.06810. https://arxiv.org/abs/2007.06810.
 
Li, S.E., 2023. Reinforcement Learning for Sequential Decision and Optimal Control. Springer Singapore, Singapore.
 
Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., Gonzalez, J., Jordan, M., Stoica, I., 2018. RLlib: abstractions for distributed reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 3053–3062.
 
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.J., 2016. Continuous control with deep reinforcement learning. In: Proceedings of the 2016 International Conference on Learning Representations (ICLR).
 

Liu, Y., Jia, R., Ye, J., Qu, X., 2022. How machine learning informs ride-hailing services: a survey. Commun. Transp. Res. 2, 100075.

 

Liu, Y., Lyu, C., Zhang, Y., Liu, Z., Yu, W., Qu, X., 2021. DeepTSP: deep traffic state prediction model based on large-scale empirical data. Commun. Transp. Res. 2, 100012.

 
Ma, H., Chen, J., Li, S.E., Lin, Z., Guan, Y., Ren, Y., Zheng, S., 2021a. Model-based constrained reinforcement learning using generalized control barrier function. In: Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4552–4559.
 
Ma, H., Zhang, X., Li, S.E., Lin, Z., Lyu, Y., Zheng, S., 2021b. Feasibility enhancement of constrained receding horizon control using generalized control barrier function. In: Proceedings of the 2021 4th IEEE International Conference on Industrial CyberPhysical Systems (ICPS), pp. 551–557.
 

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D., 2015. Human-level control through deep reinforcement learning. Nature 518 (7540), 529-533.

 
Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M., 2018. Ray: a distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 561–577.
 
Mu, Y., Peng, B., Gu, Z., Li, S.E., Liu, C., Nie, B., Zheng, J., Zhang, B., 2020. Mixed reinforcement learning for efficient policy optimization in stochastic environments. In: Proceedings of the 20th IEEE International Conference on Control, Automation and Systems (IEEE ICCAS), pp. 1212–1219.
 
OpenAI, Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Jozefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Pinto, H.P.d.O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., Zhang, S., 2019. Dota 2 with Large Scale Deep Reinforcement Learning arXiv:1912.06680. https://arxiv.org/abs/1912.06680.
 

Peng, B., Duan, J., Chen, J., Li, S.E., Xie, G., Zhang, C., Guan, Y., Mu, Y., Sun, E., 2022. Model-based chance-constrained reinforcement learning via separated proportional-integral Lagrangian. IEEE Transact. Neural Networks Learn. Syst. https://doi.org/10.1109/TNNLS.2022.3175595.

 

Peng, B., Keskin, M.F., Kulcsar, B., Wymeersch, H., 2021. Connected autonomous vehicles for improving mixed traffic efficiency in unsignalized intersections with deep reinforcement learning. Commun. Transp. Res. 1, 100017.

 
Peng, B., Mu, Y., Duan, J., Guan, Y., Li, S.E., Chen, J., 2021b. Separated proportional-integral Lagrangian for chance constrained reinforcement learning. In: Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IEEE Ⅳ), pp. 193–199.
 

Qin, G., Luo, Q., Yin, Y., Sun, J., Ye, J., 2021. Optimizing matching time intervals for ride-hailing services using reinforcement learning. Transport. Res. C Emerg. Technol. 129, 103239.

 

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N., 2021. Stable-Baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22 (268), 1-8.

 
Ren, Y., Duan, J., Li, S.E., Guan, Y., Sun, Q., 2020. Improving generalization of reinforcement learning with minimax distributional soft actor-critic. In: Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems. IEEE ITSC), pp. 1–6.
 
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P., 2015. Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning (ICML), pp. 1889–1897.
 
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O., 2017. Proximal Policy Optimization Algorithms arXiv: 1707.06347. https://arxiv.org/abs/1707.06347.
 

Shi, X., Zhao, D., Yao, H., Li, X., Hale, D.K., Ghiasi, A., 2021. Video-based trajectory extraction with deep learning for High-Granularity Highway Simulation (HIGH-SIM). Commun. Transp. Res. 1, 100014.

 

Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T.P., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D., 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484-489.

 

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T.P., Simonyan, K., Hassabis, D., 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140-1144.

 
Todorov, E., Erez, T., Tassa, Y., 2012. Mujoco: a physics engine for model-based control. In: Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5026–5033.
 
Weng, J., Chen, H., Yan, D., You, K., Duburcq, A., Zhang, M., Su, Y., Su, H., Zhu, J., 2021. Tianshou: a Highly Modularized Deep Reinforcement Learning Library arXiv: 2107.14171. https://arxiv.org/abs/2107.14171.
 
Yu, D., Ma, H., Li, S.E., Chen, J., 2022. Reachability constrained reinforcement learning. In: Proceedings of the 39th International Conference on Machine Learning (ICML), pp. 25636–25655.
 
Zha, D., Xie, J., Ma, W., Zhang, S., Lian, X., Hu, X., Liu, J., 2021. Douzero: mastering doudizhu with self-play deep reinforcement learning. In: Proceedings of the 38th International Conference on Machine Learning (ICML), pp. 12333–12344.
Communications in Transportation Research
Article number: 100096
Cite this article:
Wang W, Zhang Y, Gao J, et al. GOPS: A general optimal control problem solver for autonomous driving and industrial control applications. Communications in Transportation Research, 2023, 3: 100096. https://doi.org/10.1016/j.commtr.2023.100096

381

Views

17

Crossref

14

Web of Science

20

Scopus

Altmetrics

Received: 29 January 2023
Revised: 01 March 2023
Accepted: 01 March 2023
Published: 17 April 2023
© 2023 The Authors.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Return