AI Chat Paper

Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.

Chat more with AI

| Sign up

Browse by Subject

Search for peer-reviewed journals with full access.

Journals A - Z

About Us

Discover the SciOpen Platform and Achieve Your Research Goals with Ease.

About Us

Publish with Us

Support

Journals A - Z

About Us

Publish with Us

Support

Article Link

Cite

EndNote(RIS) BibTeX

Collect

Submit Manuscript

Show Outline

Outline

Show full outline

Hide outline

Outline

Show full outline

Hide outline

Research Article | Open Access

GOPS: A general optimal control problem solver for autonomous driving and industrial control applications

Wenxuan Wang, Yuhang Zhang, Jiaxin Gao, Yuxuan Jiang, Yujie Yang, Zhilong Zheng, Wenjun Zou, Jie Li, Congsheng Zhang, Wenhan Cao, Genjin Xie, Jingliang Duan, Shengbo Eben Li(

)

School of Vehicle and Mobility, Tsinghua University, Beijing, 100084, China

Show Author Information

Abstract

Solving optimal control problems serves as the basic demand of industrial control tasks. Existing methods like model predictive control often suffer from heavy online computational burdens. Reinforcement learning has shown promise in computer and board games but has yet to be widely adopted in industrial applications due to a lack of accessible, high-accuracy solvers. Current Reinforcement learning (RL) solvers are often developed for academic research and require a significant amount of theoretical knowledge and programming skills. Besides, many of them only support Python-based environments and limit to model-free algorithms. To address this gap, this paper develops General Optimal control Problems Solver (GOPS), an easy-to-use RL solver package that aims to build real-time and high-performance controllers in industrial fields. GOPS is built with a highly modular structure that retains a flexible framework for secondary development. Considering the diversity of industrial control tasks, GOPS also includes a conversion tool that allows for the use of Matlab/Simulink to support environment construction, controller design, and performance validation. To handle large-scale problems, GOPS can automatically create various serial and parallel trainers by flexibly combining embedded buffers and samplers. It offers a variety of common approximate functions for policy and value functions, including polynomial, multilayer perceptron, convolutional neural network, etc. Additionally, constrained and robust algorithms for special industrial control systems with state constraints and model uncertainties are also integrated into GOPS. Several examples, including linear quadratic control, inverted double pendulum, vehicle tracking, humanoid robot, obstacle avoidance, and active suspension control, are tested to verify the performances of GOPS.

Keywords

Reinforcement learning Benchmark Neural network Optimal control Approximate dynamic programming Industrial control

References

Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M., 2013. The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253-279.

Crossref Google Scholar

Bertsekas, D.P., 2012. Dynamic Programming and Optimal Control, fourth ed., vol. Ⅰ. Athena scientific, Nashua.

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W., 2016. OpenAI Gym arXiv:1606.01540. https://arxiv.org/abs/1606.01540.

Chen, J., Li, S.E., Tomizuka, M., 2022. Interpretable end-to-end urban autonomous driving with latent deep reinforcement learning. IEEE Trans. Intell. Transport. Syst. 23 (6), 5068-5078.

Crossref Google Scholar

Chu, T., Wang, J., Codecà, L., Li, Z., 2020. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans. Intell. Transport. Syst. 21 (3), 1086-1095.

Crossref Google Scholar

Degrave, J., Felici, F., Buchli, J., Neunert, M., Tracey, B., Carpanese, F., Ewalds, T., Hafner, R., Abdolmaleki, A., de las Casas, D., Donner, C., Fritz, L., Galperti, C., Huber, A., Keeling, J., Tsimpoukelli, M., Kay, J., Merle, A., Moret, J.-M., Noury, S., Pesamosca, F., Pfau, D., Sauter, O., Sommariva, C., Coda, S., Duval, B., Fasoli, A., Kohli, P., Kavukcuoglu, K., Hassabis, D., Riedmiller, M., 2022. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602 (7897), 414-419.

Crossref Google Scholar

Dhariwal, P., Hesse, C., Klimov, O., Nichol, A., Plappert, M., Radford, A., Schulman, J., Sidor, S., Wu, Y., Zhokhov, P., 2017. OpenAI Baselines. https://github.com/openai/baselines.

Duan, J., Guan, Y., Li, S.E., Ren, Y., Sun, Q., Cheng, B., 2022. Distributional soft actor-critic: off-policy reinforcement learning for addressing value estimation errors. IEEE Transact. Neural Networks Learn. Syst. 33 (11), 6584-6598.

Crossref Google Scholar

Duan, J., Li, S.E., Guan, Y., Sun, Q., Cheng, B., 2020. Hierarchical reinforcement learning for self-driving decision-making without reliance on labeled driving data. IET Intell. Transp. Syst. 14 (5), 297-305.

Crossref Google Scholar

Fujimoto, S., Hoof, H., Meger, D., 2018. Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 1587–1596.

Gu, S., Holly, E., Lillicrap, T., Levine, S. (2017). Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), 3389-3396.

Crossref

Guan, Y., Duan, J., Li, S.E., Li, J., Chen, J., Cheng, B., 2021a. Mixed Policy Gradient arXiv preprint arXiv:2102.11513. https://arxiv.org/abs/2102.11513.

Guan, Y., Li, S.E., Duan, J., Li, J., Ren, Y., Sun, Q., Cheng, B., 2021b. Direct and indirect reinforcement learning. Int. J. Intell. Syst. 36 (8), 4439-4467.

Crossref Google Scholar

Guan, Y., Ren, Y., Li, S.E., Sun, Q., Luo, L., Li, K., 2020. Centralized cooperation for connected and automated vehicles at intersections by proximal policy optimization. IEEE Trans. Veh. Technol. 69 (11), 12597-12608.

Crossref Google Scholar

Haarnoja, T., Zhou, A., Abbeel, P., Levine, S., 2018. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 1861–1870.

Haydari, A., Yılmaz, Y., 2022. Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans. Intell. Transport. Syst. 23 (1), 11-32.

Crossref Google Scholar

Levine, S., Finn, C., Darrell, T., Abbeel, P., 2016. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17 (1), 1334-1373.

Google Scholar

Li, J., Li, S.E., Guan, Y., Duan, J., Li, W., Yin, Y., 2020. Ternary Policy Iteration Algorithm for Nonlinear Robust Control arXiv:2007.06810. https://arxiv.org/abs/2007.06810.

Li, S.E., 2023. Reinforcement Learning for Sequential Decision and Optimal Control. Springer Singapore, Singapore.

Crossref

Liang, E., Liaw, R., Nishihara, R., Moritz, P., Fox, R., Goldberg, K., Gonzalez, J., Jordan, M., Stoica, I., 2018. RLlib: abstractions for distributed reinforcement learning. In: Proceedings of the 35th International Conference on Machine Learning (ICML), pp. 3053–3062.

Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.J., 2016. Continuous control with deep reinforcement learning. In: Proceedings of the 2016 International Conference on Learning Representations (ICLR).

Liu, Y., Jia, R., Ye, J., Qu, X., 2022. How machine learning informs ride-hailing services: a survey. Commun. Transp. Res. 2, 100075.

Crossref Google Scholar

Liu, Y., Lyu, C., Zhang, Y., Liu, Z., Yu, W., Qu, X., 2021. DeepTSP: deep traffic state prediction model based on large-scale empirical data. Commun. Transp. Res. 2, 100012.

Crossref Google Scholar

Ma, H., Chen, J., Li, S.E., Lin, Z., Guan, Y., Ren, Y., Zheng, S., 2021a. Model-based constrained reinforcement learning using generalized control barrier function. In: Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4552–4559.

Crossref

Ma, H., Zhang, X., Li, S.E., Lin, Z., Lyu, Y., Zheng, S., 2021b. Feasibility enhancement of constrained receding horizon control using generalized control barrier function. In: Proceedings of the 2021 4th IEEE International Conference on Industrial CyberPhysical Systems (ICPS), pp. 551–557.

Crossref

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D., 2015. Human-level control through deep reinforcement learning. Nature 518 (7540), 529-533.

Crossref Google Scholar

Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M., 2018. Ray: a distributed framework for emerging AI applications. In: Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pp. 561–577.

Mu, Y., Peng, B., Gu, Z., Li, S.E., Liu, C., Nie, B., Zheng, J., Zhang, B., 2020. Mixed reinforcement learning for efficient policy optimization in stochastic environments. In: Proceedings of the 20th IEEE International Conference on Control, Automation and Systems (IEEE ICCAS), pp. 1212–1219.

Crossref

OpenAI, Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., Farhi, D., Fischer, Q., Hashme, S., Hesse, C., Jozefowicz, R., Gray, S., Olsson, C., Pachocki, J., Petrov, M., Pinto, H.P.d.O., Raiman, J., Salimans, T., Schlatter, J., Schneider, J., Sidor, S., Sutskever, I., Tang, J., Wolski, F., Zhang, S., 2019. Dota 2 with Large Scale Deep Reinforcement Learning arXiv:1912.06680. https://arxiv.org/abs/1912.06680.

Peng, B., Duan, J., Chen, J., Li, S.E., Xie, G., Zhang, C., Guan, Y., Mu, Y., Sun, E., 2022. Model-based chance-constrained reinforcement learning via separated proportional-integral Lagrangian. IEEE Transact. Neural Networks Learn. Syst. https://doi.org/10.1109/TNNLS.2022.3175595.

Crossref Google Scholar

Peng, B., Keskin, M.F., Kulcsar, B., Wymeersch, H., 2021. Connected autonomous vehicles for improving mixed traffic efficiency in unsignalized intersections with deep reinforcement learning. Commun. Transp. Res. 1, 100017.

Crossref Google Scholar

Peng, B., Mu, Y., Duan, J., Guan, Y., Li, S.E., Chen, J., 2021b. Separated proportional-integral Lagrangian for chance constrained reinforcement learning. In: Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IEEE Ⅳ), pp. 193–199.

Crossref

Qin, G., Luo, Q., Yin, Y., Sun, J., Ye, J., 2021. Optimizing matching time intervals for ride-hailing services using reinforcement learning. Transport. Res. C Emerg. Technol. 129, 103239.

Crossref Google Scholar

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N., 2021. Stable-Baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22 (268), 1-8.

Crossref Google Scholar

Ren, Y., Duan, J., Li, S.E., Guan, Y., Sun, Q., 2020. Improving generalization of reinforcement learning with minimax distributional soft actor-critic. In: Proceedings of the 2020 IEEE 23rd International Conference on Intelligent Transportation Systems. IEEE ITSC), pp. 1–6.

Crossref

Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P., 2015. Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning (ICML), pp. 1889–1897.

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O., 2017. Proximal Policy Optimization Algorithms arXiv: 1707.06347. https://arxiv.org/abs/1707.06347.

Shi, X., Zhao, D., Yao, H., Li, X., Hale, D.K., Ghiasi, A., 2021. Video-based trajectory extraction with deep learning for High-Granularity Highway Simulation (HIGH-SIM). Commun. Transp. Res. 1, 100014.

Crossref Google Scholar

Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T.P., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D., 2016. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484-489.

Crossref Google Scholar

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T.P., Simonyan, K., Hassabis, D., 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140-1144.

Crossref Google Scholar

Todorov, E., Erez, T., Tassa, Y., 2012. Mujoco: a physics engine for model-based control. In: Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5026–5033.

Crossref

Weng, J., Chen, H., Yan, D., You, K., Duburcq, A., Zhang, M., Su, Y., Su, H., Zhu, J., 2021. Tianshou: a Highly Modularized Deep Reinforcement Learning Library arXiv: 2107.14171. https://arxiv.org/abs/2107.14171.

Yu, D., Ma, H., Li, S.E., Chen, J., 2022. Reachability constrained reinforcement learning. In: Proceedings of the 39th International Conference on Machine Learning (ICML), pp. 25636–25655.

Zha, D., Xie, J., Ma, W., Zhang, S., Lian, X., Hu, X., Liu, J., 2021. Douzero: mastering doudizhu with self-play deep reinforcement learning. In: Proceedings of the 38th International Conference on Machine Learning (ICML), pp. 12333–12344.

Communications in Transportation Research

Volume 3,
December 2023

Article number: 100096

DOI: 10.1016/j.commtr.2023.100096

Cite this article:

Wang W, Zhang Y, Gao J, et al. GOPS: A general optimal control problem solver for autonomous driving and industrial control applications. Communications in Transportation Research, 2023, 3: 100096. https://doi.org/10.1016/j.commtr.2023.100096

381

Views

Crossref

Web of Science

Scopus

Google Scholar
Citation

Altmetrics

Received: 29 January 2023

Revised: 01 March 2023

Accepted: 01 March 2023

Published: 17 April 2023

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).