AI Chat Paper
Note: Please note that the following content is generated by AMiner AI. SciOpen does not take any responsibility related to this content.
{{lang === 'zh_CN' ? '文章概述' : 'Summary'}}
{{lang === 'en_US' ? '中' : 'Eng'}}
Chat more with AI
PDF (6 MB)
Collect
Submit Manuscript AI Chat Paper
Show Outline
Outline
Show full outline
Hide outline
Outline
Show full outline
Hide outline
Open Access

Reinforcement Learning-Based Sequential Control Policy for Multiple Peg-in-Hole Assembly

Xinyu Liu1Chao Zeng2( )Chenguang Yang2Jianwei Zhang3
Department of Electrical and Photonics Engineering, Technical University of Denmark, Kongens Lyngby 2800, Denmark
Department of Computer Science, University of Liverpool, Liverpool L69 3BX, UK
TAMS Group, Department of Informatics, University of Hamburg, Hamburg 22527, Germany
Show Author Information

Abstract

Robotic assembly is widely utilized in large-scale manufacturing due to its high production efficiency, and the peg-in-hole assembly is a typical operation. While single peg-in-hole tasks have achieved great performance through reinforcement learning (RL) methods, multiple peg-in-hole assembly remains challenging due to complex geometry and physical constraints. To address this, we introduce a control policy workflow for multiple peg-in-hole assembly, dividing the task into three primitive sub-tasks: picking, alignment, and insertion to modularize the long-term task and improve sample efficiency. Sequential control policy (SeqPolicy), containing three control policies, is used to implement all the sub-tasks step-by-step. This approach introduces human knowledge to manage intermediate states, such as lifting height and aligning direction, thereby enabling flexible deployment across various scenarios. SeqPolicy demonstrated higher training efficiency with faster convergence and a higher success rate compared to the single control policy. Its adaptability is confirmed through generalization experiments involving objects with varying geometries. Recognizing the importance of object pose for control policies, a low-cost and adaptable method using visual representation containing objects’ pose information from RGB images is proposed to estimate objects’ pose in robot base frame directly in working scenarios. The representation is extracted by a Siamese-CNN network trained with self-supervised contrastive learning. Utilizing it, the alignment sub-task is successfully executed. These experiments validate the solution’s reusability and adaptability in multiple peg-in-hole scenarios.

References

[1]

K. P. Valavanis and K. M. Stellakis, A general organizer model for robotic assemblies and intelligent robotic systems, IEEE Trans. Syst., Man, Cybern., vol. 21, no. 2, pp. 302–317, 1991.

[2]

Y. Jiang, Z. Huang, B. Yang, and W. Yang, A review of robotic assembly strategies for the full operation procedure: Planning, execution and evaluation, Robot. Comput. Integr. Manuf., vol. 78, p. 102366, 2022.

[3]

C. C. Beltran-Hernandez, D. Petit, I. G. Ramirez-Alpizar, and K. Harada, Variable compliance control for robotic peg-in-hole assembly: A deep-reinforcement-learning approach, Appl. Sci., vol. 10, no. 19, p. 6923, 2020.

[4]
Y. Chen, C. Wang, F. F. Li, and K. Liu, Sequential dexterity: Chaining dexterous policies for long-horizon manipulation, in Proc. 7th Conf. Robot Learning, Atlanda, GA, USA, 2023, pp. 3809–3829.
[5]

W. Chen, C. Zeng, H. Liang, F. Sun, and J. Zhang, Multimodality driven impedance-based Sim2Real transfer learning for robotic multiple peg-in-hole assembly, IEEE Trans. Cybern., vol. 54, no. 5, pp. 2784–2797, 2024.

[6]

H. Park, J. Park, D. H. Lee, J. H. Park, M. H. Baeg, and J. H. Bae, Compliance-based robotic peg-in-hole assembly strategy without force feedback, IEEE Trans. Ind. Electron., vol. 64, no. 8, pp. 6299–6309, 2017.

[7]

M. A. Lee, Y. Zhu, P. Zachares, M. Tan, K. Srinivasan, S. Savarese, F.-F. Li, A. Garg, and J. Bohg, Making sense of vision and touch: Learning multimodal representations for contact-rich tasks, IEEE Trans. Robot., vol. 36, no. 3, pp. 582–596, 2020.

[8]
Z. Hou, H. Dong, K. Zhang, Q. Gao, K. Chen, and J. Xu, Knowledge-driven deep deterministic policy gradient for robotic multiple peg-in-hole assembly tasks, in Proc. IEEE Int. Conf. Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 2018, pp. 256–261.
[9]

J. Xu, Z. Hou, W. Wang, B. Xu, K. Zhang, and K. Chen, Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks, IEEE Trans. Ind. Inf., vol. 15, no. 3, pp. 1658–1667, 2019.

[10]
T. Inoue, G. De Magistris, A. Munawar, T. Yokoya, and R. Tachibana, Deep reinforcement learning for high precision assembly tasks, in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Vancouver, Canada, 2017, pp. 819–825.
[11]

H. Chen, G. Zhang, H. Zhang, and T. A. Fuhlbrigge, Integrated robotic system for high precision assembly in a semi-structured environment, Assem. Autom., vol. 27, no. 3, pp. 247–252, 2007.

[12]

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, Int. J. Robot. Res., vol. 32, no. 11, pp. 1238–1274, 2013.

[13]
A. Stooke, K. Lee, P. Abbeel, and M. Laskin, Decoupling representation learning from reinforcement learning, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 9870–9879.
[14]
H. Nguyen and H. La, Review of deep reinforcement learning for robot manipulation, in Proc. 3rd IEEE Int. Conf. Robotic Computing (IRC), Naples, Italy, 2019, pp. 590–595.
[15]

C. H. Wu and M. G. Kim, Modeling of part-mating strategies for automating assembly operations for robots, IEEE Trans. Syst., Man. Cybern., vol. 24, no. 7, pp. 1065–1074, 1994.

[16]
J. Xu, Z. Hou, Z. Liu, and H. Qiao, Compare contact model-based control and contact model-free learning: A survey of robotic peg-in-hole assembly strategies, arXiv preprint arXiv: 1904.05240, 2019.
[17]

P. Falco, A. Attawia, M. Saveriano, and D. Lee, On policy learning robust to irreversible events: An application to robotic in-hand manipulation, IEEE Robot. Autom. Lett., vol. 3, no. 3, pp. 1482–1489, 2018.

[18]

C. Zeng, S. Li, B. Fang, Z. Chen, and J. Zhang, Generalization of robot force-relevant skills through adapting compliant profiles, IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 1055–1062, 2022.

[19]

X. Liu, Z. Liu, G. Wang, Z. Liu, and P. Huang, Efficient reinforcement learning method for multi-phase robot manipulation skill acquisition via human knowledge, IEEE Trans. Automat. Sci. Eng., pp. 1–10, 2024.

[20]
W. Tang, Y. Jiang, C. Zeng, H. Zhang, and H. Zhong, A reinforcement learning based control framework for robot gear assembly with demonstration learning and force feedback, in Proc. 2024 IEEE Int. Conf. Industrial Technology (ICIT), Bristol, UK, 2024, pp. 1–6.
[21]

X. Liu, G. Wang, Z. Liu, Y. Liu, Z. Liu, and P. Huang, Hierarchical reinforcement learning integrating with human knowledge for practical robot skill learning in complex multi-stage manipulation, IEEE Trans. Automat. Sci. Eng., vol. 21, no. 3, pp. 3852–3862, 2024.

[22]

A. A. Apolinarska, M. Pacher, H. Li, N. Cote, R. Pastrana, F. Gramazio, and M. Kohler, Robotic assembly of timber joints using reinforcement learning, Autom. Constr., vol. 125, p. 103569, 2021.

[23]
T. Chen, J. Xu, and P. Agrawal, A system for general in-hand object re-orientation, in Proc. 5th Conf. Robot Learning, London, UK, 2021, pp. 297–307.
[24]
S. P. Arunachalam, S. Silwal, B. Evans, and L. Pinto, Dexterous imitation made easy: A learning-based framework for efficient dexterous manipulation, in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), London, UK, 2023, pp. 5954–5961.
[25]

R. S. Sutton, D. Precup, and S. Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artif. Intell., vol. 112, no. 1–2, pp. 181–211, 1999.

[26]
L. P. Kaelbling and T. Lozano-Perez, Hierarchical task and motion planning in the now, in Proc. IEEE Int. Conf. Robotics and Automation, Shanghai, China, 2011, pp. 1470–1477.
[27]
C. Agia, T. Migimatsu, J. Wu, and J. Bohg, STAP: Sequencing task-agnostic policies, in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), London, UK, 2023, pp. 7951–7958.
[28]
T. Kipf, Y. Li, H. Dai, V. Zambaldi, A. Sanchez-Gonzalez, E. Grefenstette, P. Kohli, and P. Battaglia, CompILE: compositional imitation learning and execution, in Proc. 36th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 3418–3428.
[29]

G. Konidaris, S. Kuindersma, R. Grupen, and A. Barto, Robot learning from demonstration by constructing skill trees, Int. J. Robot. Res., vol. 31, no. 3, pp. 360–375, 2012.

[30]
C. Wang, L. Fan, J. Sun, R. Zhang, F. F. Li, D. Xu, Y. Zhu, and A. Anandkumar, MimicPlay: long-horizon imitation learning by watching human play, arXiv preprint arXiv: 2302.12422, 2023.
[31]
O. Nachum, S. S. Gu, H. Lee, and S. Levine, Data-efficient hierarchical reinforcement learning, in Proc. 32nd Conf. Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada, 2018, pp. 3307–3317.
[32]
J. Oh, S. Singh, H. Lee, and P. Kohli, Zero-shot task generalization with multi-task deep reinforcement learning, in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 2661–2670.
[33]
T. D. Kulkarni, K. R. Narasimhan, A. Saeedi, and J. B. Tenenbaum, Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation, in Proc. 30th Int. Conf. Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 2016, pp. 3682–3690.
[34]

D. Han, B. Mulyana, V. Stankovic, and S. Cheng, A survey on deep reinforcement learning algorithms for robotic manipulation, Sensors, vol. 23, no. 7, p. 3762, 2023.

[35]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv: 1707.06347, 2017.
[36]
G. Konidaris and A. Barto, Skill discovery in continuous reinforcement learning domains using skill chaining, in Proc. 23rd Int. Conf. Neural Information Processing Systems (NIPS 2013), Vancouver, Canada, 2013, pp. 1015–1023.
[37]
X. B. Peng, M. Chang, G. Zhang, P. Abbeel, and S. Levine, MCP: Learning composable hierarchical control with multiplicative compositional policies, in Proc. 33rd Conf. Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019, pp. 3686–3697.
[38]
Y. Lee, J. J. Lim, A. Anandkumar, and Y. Zhu, Adversarial skill chaining for long-horizon robot manipulation via terminal state regularization, arXiv preprint arXiv: 2111.07999, 2021.
[39]

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, 2013.

[40]
A. Spurr, A. Dahiya, X. Wang, X. Zhang, and O. Hilliges, Self-supervised 3D hand pose estimation from monocular RGB via contrastive learning, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 11230–11239.
[41]
Y. Xiao, Y. Du, and R. Marlet, PoseContrast: Class-agnostic object viewpoint estimation in the wild with pose-aware contrastive learning, in Proc. Int. Conf. 3D Vision (3DV), London, UK, 2021, pp. 74–84.
[42]
A. Khandelwal, L. Weihs, R. Mottaghi, and A. Kembhavi, Simple but effective: CLIP embeddings for embodied AI, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 14829–14838.
[43]
F. Liu, F. Yan, L. Zheng, C. Feng, Y. Huang, and L. Ma, RoboUniView: visual-language model with unified view representation for robotic manipulation, arXiv preprint arXiv: 2406.18977, 2024.
[44]
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., Learning transferable visual models from natural language supervision, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 8748–8763.
[45]
X. Liu, Z. Rozsypálek, and T. Krajník, Self-supervised learning for fusion of IR and RGB images in visual teach and repeat navigation, in Proc. European Conf. Mobile Robots (ECMR), Coimbra, Portugal, 2023, pp. 1–7.
[46]
V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al., Isaac Gym: High performance GPU-based physics simulation for robot learning, in Proc. 35th Conf. Neural Information Processing Systems Track on Datasets and Benchmarks, virtual, 2021.
[47]
K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, Momentum contrast for unsupervised visual representation learning, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 9726–9735.
[48]
T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, A simple framework for contrastive learning of visual representations, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 1597–1607.
[49]
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.
CAAI Artificial Intelligence Research
Article number: 9150043
Cite this article:
Liu X, Zeng C, Yang C, et al. Reinforcement Learning-Based Sequential Control Policy for Multiple Peg-in-Hole Assembly. CAAI Artificial Intelligence Research, 2024, 3: 9150043. https://doi.org/10.26599/AIR.2024.9150043

264

Views

74

Downloads

0

Crossref

Altmetrics

Received: 14 August 2024
Revised: 21 September 2024
Accepted: 20 October 2024
Published: 22 November 2024
© The author(s) 2024.

The articles published in this open access journal are distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

Return