AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (1.1 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Article | Open Access

Bridging Reinforcement Learning and Planning to Solve Combinatorial Optimization Problems with Nested Sub-Tasks

Xiaohan Shan1Pengjiu Wang1,2Mingda Wan1Dong Yan1Jialian Li1Jun Zhu1,3( )
Qiyuan Lab, Beijing 100095, China
China Ship Development and Design Center, Wuhan 430064, China
School of Life Sciences, Tsinghua University, Beijing 100084, China
Show Author Information

Abstract

Combinatorial Optimization (CO) problems have been intensively studied for decades with a wide range of applications. For some classic CO problems, e.g., the Traveling Salesman Problem (TSP), both traditional planning algorithms and the emerging reinforcement learning have made solid progress in recent years. However, for CO problems with nested sub-tasks, neither end-to-end reinforcement learning algorithms nor traditional evolutionary methods can obtain satisfactory strategies within a limited time and computational resources. In this paper, we propose an algorithmic framework for solving CO problems with nested sub-tasks, in which learning and planning algorithms can be combined in a modular way. We validate our framework in the Job-Shop Scheduling Problem (JSSP), and the experimental results show that our algorithm has good performance in both solution qualities and model generalizations.

References

[1]

K. L. Hoffman, Combinatorial optimization: Current successes and directions for the future, J. Comput. Appl. Math., vol. 124, nos. 1&2, pp. 341–360, 2000.

[2]

Y. Bengio, A. Lodi, and A. Prouvost, Machine learning for combinatorial optimization: A methodological tour d’horizon, Eur. J. Oper. Res., vol. 290, no. 2, pp. 405–421, 2021.

[3]
R. Zhang, A. Prokhorchuk, and J. Dauwels, Deep reinforcement learning for traveling salesman problem with time windows and rejections, in Proc. 2020 Int. Joint Conf. Neural Networks (IJCNN), Glasgow, UK, 2020, pp. 1–8.
[4]

D. Yan, J. Weng, S. Huang, C. Li, Y. Zhou, H. Su, and J. Zhu, Deep reinforcement learning with credit assignment for combinatorial optimization, Pattern Recognit., vol. 124, pp. 108466, 2022.

[5]

G. B. Dantzig and J. H. Ramser, The truck dispatching problem, Manag. Sci., vol. 6, no. 1, pp. 80–91, 1959.

[6]

J. F. Muth and G. L. Thompson, Industrial scheduling, Louvain Econ. Rev., vol. 32, no. 2, pp. 121–122, 1966.

[7]
C. Yu, X. Zheng, H. H. Zhuo, H. Wan, and W. Luo, Reinforcement learning with knowledge representation and reasoning: A brief survey, arXiv preprint arXiv: 2304.12090, 2023.
[8]

A. S. Jain and S. Meeran, Deterministic job-shop scheduling: Past, present and future, Eur. J. Oper. Res., vol. 113, no. 2, pp. 390–434, 1999.

[9]

E. Demirkol, S. Mehta, and R. Uzsoy, Benchmarks for shop scheduling problems, Eur. J. Oper. Res., vol. 109, no. 1, pp. 137–141, 1998.

[10]
T. M. Moerland, J. Broekens, A. Plaat, and C. M. Jonker, A unifying framework for reinforcement learning and planning, arXiv preprint arXiv: 2006.15009, 2020.
[11]
H. Hu, X. Zhang, X. Yan, L. Wang, and Y. Xu, Solving a new 3D bin packing problem with deep reinforcement learning method, arXiv preprint arXiv: 1708.05930, 2017.
[12]
K. Lin, R. Zhao, Z. Xu, and J. Zhou, Efficient large-scale fleet management via multi-agent deep reinforcement learning, in Proc. 24th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, London, UK, 2018, pp. 1774–1783.
[13]
A. Mirhoseini, A. Goldie, M. Yazgan, J. Jiang, E. Songhori, S. Wang, Y. J. Lee, E. Johnson, O. Pathak, S. Bae, et al., Chip placement with deep reinforcement learning, arXiv preprint arXiv: 2004.10746, 2020.
[14]
J. Pazis and R. Parr, Generalized value functions for large action sets, in Proc. 28th Int. Conf. Machine Learning, Bellevue, WA, USA, 2011, pp. 1185–1192.
[15]
G. Dulac-Arnold, L. Denoyer, P. Preux, and P. Gallinari, Fast reinforcement learning with large action sets using error-correcting output codes for MDP factorization, in Proc. European Conf. Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2012), Bristol, UK, 2012, pp. 180–194.
[16]

T. G. Dietterich and G. Bakiri, Solving multiclass learning problems via error-correcting output codes, J. Artif. Intell. Res., vol. 2, pp. 263–286, 1995.

[17]

M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castañeda, C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, et al., Human-level performance in 3D multiplayer games with population-based reinforcement learning, Science, vol. 364, no. 6443, pp. 859–865, 2019.

[18]
C. Berner, G. Brockman, B. Chan, V. Cheung, P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, et al. , Dota 2 with large scale deep reinforcement learning, arXiv preprint arXiv: 1912.06680, 2019.
[19]
T. Pierrot, V. Macé, J. B. Sevestre, L. Monier, A. Laterre, N. Perrin, K. Beguir, and O. Sigaud, Factored action spaces in deep reinforcement learning, https://openreview.net/forum?id=naSAkn2Xo46, 2021.
[20]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv: 1707.06347, 2017.
[21]
T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al. , Soft actor-critic algorithms and applications, arXiv preprint arXiv: 1812.05905, 2018.
[22]

M. R. Garey, D. S. Johnson, and R. Sethi, The complexity of flowshop and jobshop scheduling, Math. Oper. Res., vol. 1, no. 2, pp. 117–129, 1976.

[23]
P. P. A. Tassel, M. Gebser, and K. Schekotihin, A reinforcement learning environment for job-shop scheduling, in Proc. 2021 PRL Workshop – Bridging the Gap Between AI Planning and Reinforcement Learning, virtual, 2021.
[24]
C. Zhang, W. Song, Z. Cao, J. Zhang, P. S. Tan, and C. Xu, Learning to dispatch for job shop scheduling via deep reinforcement learning, arXiv preprint arXiv: 2010.12367, 2020.
[25]

J. Błażewicz, E. Pesch, and M. Sterna, The disjunctive graph machine representation of the job shop scheduling problem, Eur. J. Oper. Res., vol. 127, no. 2, pp. 317–331, 2000.

[26]
K. Xu, Weihua Hu, J. Leskovec, and S. Jegelka, How powerful are graph neural networks? in Proc. 7th Int. Conf. Learning Representations, New Orleans, LA, USA, 2019.
[27]

E. Taillard, Benchmarks for basic scheduling problems, Eur. J. Oper. Res., vol. 64, no. 2, pp. 278–285, 1993.

[28]
M. Gen, Y. Tsujimura, and E. Kubota, Solving job-shop scheduling problems by genetic algorithm, in Proc. IEEE Int. Conf. Systems, Man and Cybernetics, San Antonio, TX, USA, 1994, pp. 1577–1582.
[29]
L. Perron and V. Furnon, OR-Tools, https://developers.google.com/optimization/, 2019.
CAAI Artificial Intelligence Research
Article number: 9150025
Cite this article:
Shan X, Wang P, Wan M, et al. Bridging Reinforcement Learning and Planning to Solve Combinatorial Optimization Problems with Nested Sub-Tasks. CAAI Artificial Intelligence Research, 2023, 2: 9150025. https://doi.org/10.26599/AIR.2023.9150025
Part of a topical collection:

690

Views

84

Downloads

0

Crossref

Altmetrics

Received: 12 July 2023
Revised: 01 September 2023
Accepted: 03 November 2023
Published: 31 December 2023
© The author(s) 2023.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return