| Sign up

PDF (6 MB)

Cite

Collect

Submit Manuscript

Open Access

Reinforcement Learning-Based Sequential Control Policy for Multiple Peg-in-Hole Assembly

Xinyu Liu^¹, Chao Zeng^²(), Chenguang Yang^², Jianwei Zhang^³

1Department of Electrical and Photonics Engineering, Technical University of Denmark, Kongens Lyngby 2800, Denmark

2Department of Computer Science, University of Liverpool, Liverpool L69 3BX, UK

3TAMS Group, Department of Informatics, University of Hamburg, Hamburg 22527, Germany

Show Author Information

Abstract

Robotic assembly is widely utilized in large-scale manufacturing due to its high production efficiency, and the peg-in-hole assembly is a typical operation. While single peg-in-hole tasks have achieved great performance through reinforcement learning (RL) methods, multiple peg-in-hole assembly remains challenging due to complex geometry and physical constraints. To address this, we introduce a control policy workflow for multiple peg-in-hole assembly, dividing the task into three primitive sub-tasks: picking, alignment, and insertion to modularize the long-term task and improve sample efficiency. Sequential control policy (SeqPolicy), containing three control policies, is used to implement all the sub-tasks step-by-step. This approach introduces human knowledge to manage intermediate states, such as lifting height and aligning direction, thereby enabling flexible deployment across various scenarios. SeqPolicy demonstrated higher training efficiency with faster convergence and a higher success rate compared to the single control policy. Its adaptability is confirmed through generalization experiments involving objects with varying geometries. Recognizing the importance of object pose for control policies, a low-cost and adaptable method using visual representation containing objects’ pose information from RGB images is proposed to estimate objects’ pose in robot base frame directly in working scenarios. The representation is extracted by a Siamese-CNN network trained with self-supervised contrastive learning. Utilizing it, the alignment sub-task is successfully executed. These experiments validate the solution’s reusability and adaptability in multiple peg-in-hole scenarios.

Keywords

multiple peg-in-hole assembly deep reinforcement learning self-supervised contrastive learning

References

[1]

K. P. Valavanis and K. M. Stellakis, A general organizer model for robotic assemblies and intelligent robotic systems, IEEE Trans. Syst., Man, Cybern., vol. 21, no. 2, pp. 302–317, 1991.

Crossref Google Scholar

[2]

Y. Jiang, Z. Huang, B. Yang, and W. Yang, A review of robotic assembly strategies for the full operation procedure: Planning, execution and evaluation, Robot. Comput. Integr. Manuf., vol. 78, p. 102366, 2022.

Crossref Google Scholar

[3]

C. C. Beltran-Hernandez, D. Petit, I. G. Ramirez-Alpizar, and K. Harada, Variable compliance control for robotic peg-in-hole assembly: A deep-reinforcement-learning approach, Appl. Sci., vol. 10, no. 19, p. 6923, 2020.

Crossref Google Scholar

[4]

Y. Chen, C. Wang, F. F. Li, and K. Liu, Sequential dexterity: Chaining dexterous policies for long-horizon manipulation, in Proc. 7th Conf. Robot Learning, Atlanda, GA, USA, 2023, pp. 3809–3829.

[5]

W. Chen, C. Zeng, H. Liang, F. Sun, and J. Zhang, Multimodality driven impedance-based Sim2Real transfer learning for robotic multiple peg-in-hole assembly, IEEE Trans. Cybern., vol. 54, no. 5, pp. 2784–2797, 2024.

Crossref Google Scholar

[6]

H. Park, J. Park, D. H. Lee, J. H. Park, M. H. Baeg, and J. H. Bae, Compliance-based robotic peg-in-hole assembly strategy without force feedback, IEEE Trans. Ind. Electron., vol. 64, no. 8, pp. 6299–6309, 2017.

Crossref Google Scholar

[7]

M. A. Lee, Y. Zhu, P. Zachares, M. Tan, K. Srinivasan, S. Savarese, F.-F. Li, A. Garg, and J. Bohg, Making sense of vision and touch: Learning multimodal representations for contact-rich tasks, IEEE Trans. Robot., vol. 36, no. 3, pp. 582–596, 2020.

Crossref Google Scholar

[8]

Z. Hou, H. Dong, K. Zhang, Q. Gao, K. Chen, and J. Xu, Knowledge-driven deep deterministic policy gradient for robotic multiple peg-in-hole assembly tasks, in Proc. IEEE Int. Conf. Robotics and Biomimetics (ROBIO), Kuala Lumpur, Malaysia, 2018, pp. 256–261.

Crossref

[9]

J. Xu, Z. Hou, W. Wang, B. Xu, K. Zhang, and K. Chen, Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks, IEEE Trans. Ind. Inf., vol. 15, no. 3, pp. 1658–1667, 2019.

Crossref Google Scholar

[10]

T. Inoue, G. De Magistris, A. Munawar, T. Yokoya, and R. Tachibana, Deep reinforcement learning for high precision assembly tasks, in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Vancouver, Canada, 2017, pp. 819–825.

Crossref

[11]

H. Chen, G. Zhang, H. Zhang, and T. A. Fuhlbrigge, Integrated robotic system for high precision assembly in a semi-structured environment, Assem. Autom., vol. 27, no. 3, pp. 247–252, 2007.

Crossref Google Scholar

[12]

J. Kober, J. A. Bagnell, and J. Peters, Reinforcement learning in robotics: A survey, Int. J. Robot. Res., vol. 32, no. 11, pp. 1238–1274, 2013.

Crossref Google Scholar

[13]

A. Stooke, K. Lee, P. Abbeel, and M. Laskin, Decoupling representation learning from reinforcement learning, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 9870–9879.

[14]

H. Nguyen and H. La, Review of deep reinforcement learning for robot manipulation, in Proc. 3rd IEEE Int. Conf. Robotic Computing (IRC), Naples, Italy, 2019, pp. 590–595.

Crossref

[15]

C. H. Wu and M. G. Kim, Modeling of part-mating strategies for automating assembly operations for robots, IEEE Trans. Syst., Man. Cybern., vol. 24, no. 7, pp. 1065–1074, 1994.

Crossref Google Scholar

[16]

J. Xu, Z. Hou, Z. Liu, and H. Qiao, Compare contact model-based control and contact model-free learning: A survey of robotic peg-in-hole assembly strategies, arXiv preprint arXiv: 1904.05240, 2019.

[17]

P. Falco, A. Attawia, M. Saveriano, and D. Lee, On policy learning robust to irreversible events: An application to robotic in-hand manipulation, IEEE Robot. Autom. Lett., vol. 3, no. 3, pp. 1482–1489, 2018.

Crossref Google Scholar

[18]

C. Zeng, S. Li, B. Fang, Z. Chen, and J. Zhang, Generalization of robot force-relevant skills through adapting compliant profiles, IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 1055–1062, 2022.

Crossref Google Scholar

[19]

X. Liu, Z. Liu, G. Wang, Z. Liu, and P. Huang, Efficient reinforcement learning method for multi-phase robot manipulation skill acquisition via human knowledge, IEEE Trans. Automat. Sci. Eng., pp. 1–10, 2024.

Crossref Google Scholar

[20]

W. Tang, Y. Jiang, C. Zeng, H. Zhang, and H. Zhong, A reinforcement learning based control framework for robot gear assembly with demonstration learning and force feedback, in Proc. 2024 IEEE Int. Conf. Industrial Technology (ICIT), Bristol, UK, 2024, pp. 1–6.

Crossref

[21]

X. Liu, G. Wang, Z. Liu, Y. Liu, Z. Liu, and P. Huang, Hierarchical reinforcement learning integrating with human knowledge for practical robot skill learning in complex multi-stage manipulation, IEEE Trans. Automat. Sci. Eng., vol. 21, no. 3, pp. 3852–3862, 2024.

Crossref Google Scholar

[22]

A. A. Apolinarska, M. Pacher, H. Li, N. Cote, R. Pastrana, F. Gramazio, and M. Kohler, Robotic assembly of timber joints using reinforcement learning, Autom. Constr., vol. 125, p. 103569, 2021.

Crossref Google Scholar

[23]

T. Chen, J. Xu, and P. Agrawal, A system for general in-hand object re-orientation, in Proc. 5th Conf. Robot Learning, London, UK, 2021, pp. 297–307.

[24]

S. P. Arunachalam, S. Silwal, B. Evans, and L. Pinto, Dexterous imitation made easy: A learning-based framework for efficient dexterous manipulation, in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), London, UK, 2023, pp. 5954–5961.

Crossref

[25]

R. S. Sutton, D. Precup, and S. Singh, Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artif. Intell., vol. 112, no. 1–2, pp. 181–211, 1999.

Crossref Google Scholar

[26]

L. P. Kaelbling and T. Lozano-Perez, Hierarchical task and motion planning in the now, in Proc. IEEE Int. Conf. Robotics and Automation, Shanghai, China, 2011, pp. 1470–1477.

Crossref

[27]

C. Agia, T. Migimatsu, J. Wu, and J. Bohg, STAP: Sequencing task-agnostic policies, in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), London, UK, 2023, pp. 7951–7958.

Crossref

[28]

T. Kipf, Y. Li, H. Dai, V. Zambaldi, A. Sanchez-Gonzalez, E. Grefenstette, P. Kohli, and P. Battaglia, CompILE: compositional imitation learning and execution, in Proc. 36th Int. Conf. Machine Learning, Long Beach, CA, USA, 2019, pp. 3418–3428.

[29]

G. Konidaris, S. Kuindersma, R. Grupen, and A. Barto, Robot learning from demonstration by constructing skill trees, Int. J. Robot. Res., vol. 31, no. 3, pp. 360–375, 2012.

Crossref Google Scholar

[30]

C. Wang, L. Fan, J. Sun, R. Zhang, F. F. Li, D. Xu, Y. Zhu, and A. Anandkumar, MimicPlay: long-horizon imitation learning by watching human play, arXiv preprint arXiv: 2302.12422, 2023.

[31]

O. Nachum, S. S. Gu, H. Lee, and S. Levine, Data-efficient hierarchical reinforcement learning, in Proc. 32nd Conf. Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada, 2018, pp. 3307–3317.

[32]

J. Oh, S. Singh, H. Lee, and P. Kohli, Zero-shot task generalization with multi-task deep reinforcement learning, in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 2661–2670.

[33]

T. D. Kulkarni, K. R. Narasimhan, A. Saeedi, and J. B. Tenenbaum, Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation, in Proc. 30th Int. Conf. Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 2016, pp. 3682–3690.

[34]

D. Han, B. Mulyana, V. Stankovic, and S. Cheng, A survey on deep reinforcement learning algorithms for robotic manipulation, Sensors, vol. 23, no. 7, p. 3762, 2023.

Crossref Google Scholar

[35]

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, arXiv preprint arXiv: 1707.06347, 2017.

[36]

G. Konidaris and A. Barto, Skill discovery in continuous reinforcement learning domains using skill chaining, in Proc. 23rd Int. Conf. Neural Information Processing Systems (NIPS 2013), Vancouver, Canada, 2013, pp. 1015–1023.

[37]

X. B. Peng, M. Chang, G. Zhang, P. Abbeel, and S. Levine, MCP: Learning composable hierarchical control with multiplicative compositional policies, in Proc. 33rd Conf. Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 2019, pp. 3686–3697.

[38]

Y. Lee, J. J. Lim, A. Anandkumar, and Y. Zhu, Adversarial skill chaining for long-horizon robot manipulation via terminal state regularization, arXiv preprint arXiv: 2111.07999, 2021.

[39]

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, 2013.

Crossref Google Scholar

[40]

A. Spurr, A. Dahiya, X. Wang, X. Zhang, and O. Hilliges, Self-supervised 3D hand pose estimation from monocular RGB via contrastive learning, in Proc. 2021 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 11230–11239.

Crossref

[41]

Y. Xiao, Y. Du, and R. Marlet, PoseContrast: Class-agnostic object viewpoint estimation in the wild with pose-aware contrastive learning, in Proc. Int. Conf. 3D Vision (3DV), London, UK, 2021, pp. 74–84.

Crossref

[42]

A. Khandelwal, L. Weihs, R. Mottaghi, and A. Kembhavi, Simple but effective: CLIP embeddings for embodied AI, in Proc. 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, pp. 14829–14838.

Crossref

[43]

F. Liu, F. Yan, L. Zheng, C. Feng, Y. Huang, and L. Ma, RoboUniView: visual-language model with unified view representation for robotic manipulation, arXiv preprint arXiv: 2406.18977, 2024.

[44]

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., Learning transferable visual models from natural language supervision, in Proc. 38th Int. Conf. Machine Learning, virtual, 2021, pp. 8748–8763.

[45]

X. Liu, Z. Rozsypálek, and T. Krajník, Self-supervised learning for fusion of IR and RGB images in visual teach and repeat navigation, in Proc. European Conf. Mobile Robots (ECMR), Coimbra, Portugal, 2023, pp. 1–7.

Crossref

[46]

V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa, et al., Isaac Gym: High performance GPU-based physics simulation for robot learning, in Proc. 35th Conf. Neural Information Processing Systems Track on Datasets and Benchmarks, virtual, 2021.

[47]

K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, Momentum contrast for unsupervised visual representation learning, in Proc. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2020, pp. 9726–9735.

Crossref

[48]

T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, A simple framework for contrastive learning of visual representations, in Proc. 37th Int. Conf. Machine Learning, virtual, 2020, pp. 1597–1607.

[49]

K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.

CAAI Artificial Intelligence Research

Volume 3,
2024

Article number: 9150043

DOI: 10.26599/AIR.2024.9150043

Cite this article:

Liu X, Zeng C, Yang C, et al. Reinforcement Learning-Based Sequential Control Policy for Multiple Peg-in-Hole Assembly. CAAI Artificial Intelligence Research, 2024, 3: 9150043. https://doi.org/10.26599/AIR.2024.9150043