PDF (10.9 MB)

Cite

Collect

Article | Open Access

Growing from Exploration: A Self-Exploring Framework for Robots Based on Foundation Models

Shoujie Li^¹, Ran Yu^¹, Tong Wu^¹, Junwen Zhong^², Xiao-Ping Zhang^¹, Wenbo Ding^{¹^,³}()

1Shenzhen Ubiquitous Data Enabling Key Lab, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China

2Department of Physics and Chemistry, Faculty of Science and Technology, University of Macau, Macau 999078, China

3RISC-V International Open Source Laboratory, Tsinghua-Berkeley Shenzhen Institute, Shenzhen 518055, China

Shoujie Li, Ran Yu, and Tong Wu contributed equally to this work.

Show Author Information

Abstract

Intelligent robot is the ultimate goal in the robotics field. Existing works leverage learning-based or optimization-based methods to accomplish human-defined tasks. However, the challenge of enabling robots to explore various environments autonomously remains unresolved. In this work, we propose a framework named GExp, which endows robots with the capability of exploring and learning autonomously without human intervention. To achieve this goal, we devise modules including self-exploration, knowledge-base-building, and close-loop feedback based on foundation models. Inspired by the way that infants interact with the world, GExp encourages robots to understand and explore the environment with a series of self-generated tasks. During the process of exploration, the robot will acquire skills from experiences that are useful in the future. GExp provides robots with the ability to solve complex tasks through self-exploration. GExp work is independent of prior interactive knowledge and human intervention, allowing it to adapt directly to different scenarios, unlike previous studies that provided in-context examples as few-shot learning. In addition, we propose a workflow of deploying the real-world robot system with self-learned skills as an embodied assistant. Project website: GExp.com.

Keywords

intelligent robot foundation models self-exploring framework

References

[1]

G. Hacques, J. Komar, M. Dicks, and L. Seifert, Exploring to learn and learning to explore, Psychol. Res., vol. 85, no. 4, pp. 1367–1379, 2021.

Crossref Google Scholar

[2]

S. Ivaldi, S. M. Nguyen, N. Lyubova, A. Droniou, V. Padois, D. Filliat, P. -Y. Oudeyer, and O. Sigaud, Object learning through active exploration, IEEE Trans. Auton. Mental Dev., vol. 6, no. 1, pp. 56–72, 2014.

Crossref

[3]

M. Balsells, M. Torne, Z. Wang, S. Desai, P. Agrawal, and A. Gupta, Autonomous robotic reinforcement learning with asynchronous human feedback, in Proc. 7th Conf. Robot Learning (CoRL 2023), Atlanta, GA, USA, 2023, pp. 774–799.

[4]

M. Wahde, Introduction to Autonomous Robots, https://www.me.chalmers.se/~mwahde/courses/aa/2016/FFR125_LectureNotes.pdf, 2016.

[5]

A. Doumanoglou, J. Stria, G. Peleka, I. Mariolis, V. Petrik, A. Kargakos, L. Wagner, V. Hlavac, T. -K. Kim, and S. Malassiotis, Folding clothes autonomously: A complete pipeline, IEEE Trans. Robot., vol. 32, no. 6, pp. 1461–1478, 2016.

Crossref

[6]

C. Bersch, B. Pitzer, and S. Kammel, Bimanual robotic cloth manipulation for laundry folding, in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, San Francisco, CA, USA, 2011, pp. 1413–1419.

Crossref

[7]

K. Mo, Y. Deng, C. Xia, and X. Wang, Learning language-conditioned deformable object manipulation with graph dynamics, arXiv preprint arXiv: 2303.01310, 2023.

[8]

H. Shi, H. Xu, S. Clarke, Y. Li, and J. Wu, Robocook: Long-horizon elasto-plastic object manipulation with diverse tools, arXiv preprint arXiv: 2306.14447, 2023.

[9]

Z. Fu, T. Z. Zhao, and C. Finn, Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation, arXiv preprint arXiv: 2401.02117, 2024.

[10]

Z. Yan, N. Crombez, J. Buisson, Y. Ruichck, T. Krajnik, and L. Sun, A quantifiable stratification strategy for tidy-up in service robotics, in Proc. IEEE Int. Conf. Advanced Robotics and Its Social Impacts (ARSO), Tokoname, Japan, 2021, pp. 182–187.

Crossref

[11]

I. Kapelyukh and E. Johns, My house, my rules: Learning tidying preferences with graph neural networks, in Proc. 5th Conf. Robot Learning (CoRL 2021), London, UK, 2023, pp. 740–749.

[12]

S. Li, H. Yu, W. Ding, H. Liu, L. Ye, C. Xia, X. Wang, and X. -P. Zhang, Visual–tactile fusion for transparent object grasping in complex backgrounds, IEEE Trans. Robot., vol. 39, no. 5, pp. 3838–3856, 2023.

Crossref

[13]

Y. Deng, X. Guo, Y. Wei, K. Lu, B. Fang, D. Guo, H. Liu, and F. Sun, Deep reinforcement learning for robotic pushing and picking in cluttered environment, in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Macau, China, 2019, pp. 619–626.

Crossref

[14]

Ahmed Hussein, Mohamed Medhat Gaber, Eyad Elyan, and Chrisina Jayne. Imitation learning: A survey of learning methods, ACM Comput. Surv., vol. 50, no. 2, pp. 1–35, 2017.

Crossref Google Scholar

[15]

OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, et al., GPT-4 technical report, arXiv preprint arXiv: 2303.08774, 2023.

[16]

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al., Llama 2: Open foundation and fine-tuned chat models, arXiv preprint arXiv: 2307.09288, 2023.

[17]

W. Huang, C. Wang, R. Zhang, Y. Li, J. Wu, and F. F. Li, Voxposer: Composable 3D value maps for robotic manipulation with language models, arXiv preprint arXiv: 2307.05973, 2023.

[18]

J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, Code as policies: Language model programs for embodied control, in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), London, UK, 2023, pp. 9493–9500.

Crossref

[19]

Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, et al. Do as i can, not as i say: Grounding language in robotic affordances, in Proc. 6th Conf. Robot Learning (CoRL 2022), Auckland, New Zealand, 2022, pp. 287–318.

[20]

W. K. Vong, W. Wang, A. E. Orhan, and B. M. Lake, Grounded language acquisition through the eyes and ears of a single child, Science, vol. 383, no. 6682, pp. 504–511, 2024.

Crossref Google Scholar

[21]

X. Zhu, Y. Chen, H. Tian, C. Tao, W. Su, C. Yang, G. Huang, B. Li, L. Lu, X. Wang, et al., Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory, arXiv preprint arXiv: 2305.17144, 2023.

[22]

G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar, Voyager: An open-ended embodied agent with large language models, arXiv preprint arXiv: 2305.16291, 2023.

[23]

Y. Hu, F. Lin, T. Zhang, L. Yi, and Y. Gao, Look before you leap: Unveiling the power of GPT-4v in robotic vision-language planning, arXiv preprint arXiv: 2311.17842, 2023.

[24]

Z. Yang, L. Li, K. Lin, J. Wang, C. C. Lin, Z. Liu, and L. Wang, The dawn of LMMs: Preliminary explorations GPT-4v (ision), arXiv preprint arXiv: 2309.17421, 2023.

[25]

S. H. Vemprala, R. Bonatti, A. Bucker, and A. Kapoor, ChatGPT for robotics: Design principles and model abilities, IEEE Access, vol. 12, pp. 55682–55696, 2024.

Crossref Google Scholar

[26]

W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar, et al., Inner monologue: Embodied reasoning through planning with language models, arXiv preprint arXiv: 2207.05608, 2022.

[27]

T. Kwon, N. Di Palo, and E. Johns, Language models as zero-shot trajectory generators, arXiv preprint arXiv: 2310.11604, 2023.

[28]

M. Xu, P. Huang, W. Yu, S. Liu, X. Zhang, Y. Niu, T. Zhang, F. Xia, J. Tan, and D. Zhao, Creative robot tool use with large language models, arXiv preprint arXiv: 2310.13065, 2023.

[29]

Y. J. Ma, W. Liang, G. Wang, D. A. Huang, O. Bastani, D. Jayaraman, Y. Zhu, L. Fan, and A. Anandkumar, Eureka: Human-level reward design via coding large language models, arXiv preprint arXiv: 2310.12931, 2023.

[30]

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C Berg, W. Y. Lo, et al., Segment anything, arXiv preprint arXiv: 2304.02643, 2023.

Crossref

[31]

M. Minderer, A. Gritsenko, A. Stone, M. Neumann, D. Weissenborn, A. Dosovitskiy, A. Mahendran, A. Arnab, M. Dehghani, Z. Shen, et al., Simple open-vocabulary object detection with vision transformers, arXiv preprint arXiv: 2205.06230, 2022.

Crossref

[32]

C. R. Garrett, R. Chitnis, R. Holladay, B. Kim, T. Silver, L. P. Kaelbling, and T. Lozano-Pérez, Integrated task and motion planning, Annu. Rev. Control Robot. Auton. Syst., vol. 4, pp. 265–293, 2021.

Crossref Google Scholar

[33]

D. Nau, Y. Cao, A. Lotem, and H. Munoz-Avila, SHOP: Simple hierarchical ordered planner, in Proc. 16th Int. Joint Conf. Artificial Intelligence (IJCAI’99), Stockholm, Sweden, 1999, pp. 968–973.

[34]

Y. Xie, C. Yu, T. Zhu, J. Bai, Z. Gong, and H. Soh, Translating natural language to planning goals with large language models, arXiv preprint arXiv: 2302.05128, 2023.

[35]

J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, D. Zhou, Chain-of-thought prompting elicits reasoning in large language models, in Proc. 36th Conf. Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 2022, pp. 24824–24837.

[36]

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, ReAct: Synergizing reasoning and acting in language models, arXiv preprint arXiv: 2210.03629, 2022.

[37]

A. Zeng, P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani, et al., Transporter networks: Rearranging the visual world for robotic manipulation, in Proc. 4th Conf. Robot Learning (CoRL 2020), virtual, 2020.

[38]

S. James, Z. Ma, D. R. Arrojo, and A. J. Davison, RLBench: The robot learning benchmark & learning environment, IEEE Robot. Autom. Lett., vol. 5, no. 2, pp. 3019–3026, 2020.

Crossref Google Scholar

[39]

X. Gu, T. Y. Lin, W. Kuo, and Y. Cui, Open vocabulary object detection via vision and language knowledge distillation, arXiv preprint arXiv: 2104.13921, 2021.

CAAI Artificial Intelligence Research

Volume 3,
2024

Article number: 9150037

DOI: 10.26599/AIR.2024.9150037

Cite this article:

Li S, Yu R, Wu T, et al. Growing from Exploration: A Self-Exploring Framework for Robots Based on Foundation Models. CAAI Artificial Intelligence Research, 2024, 3: 9150037. https://doi.org/10.26599/AIR.2024.9150037

Part of a topical collection:

Embodied Intelligence