[2]
S. Ivaldi, S. M. Nguyen, N. Lyubova, A. Droniou, V. Padois, D. Filliat, P. -Y. Oudeyer, and O. Sigaud, Object learning through active exploration, IEEE Trans. Auton. Mental Dev., vol. 6, no. 1, pp. 56–72, 2014.
[3]
M. Balsells, M. Torne, Z. Wang, S. Desai, P. Agrawal, and A. Gupta, Autonomous robotic reinforcement learning with asynchronous human feedback, in Proc. 7th Conf. Robot Learning (CoRL 2023), Atlanta, GA, USA, 2023, pp. 774–799.
[5]
A. Doumanoglou, J. Stria, G. Peleka, I. Mariolis, V. Petrik, A. Kargakos, L. Wagner, V. Hlavac, T. -K. Kim, and S. Malassiotis, Folding clothes autonomously: A complete pipeline, IEEE Trans. Robot., vol. 32, no. 6, pp. 1461–1478, 2016.
[6]
C. Bersch, B. Pitzer, and S. Kammel, Bimanual robotic cloth manipulation for laundry folding, in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, San Francisco, CA, USA, 2011, pp. 1413–1419.
[7]
K. Mo, Y. Deng, C. Xia, and X. Wang, Learning language-conditioned deformable object manipulation with graph dynamics, arXiv preprint arXiv: 2303.01310, 2023.
[8]
H. Shi, H. Xu, S. Clarke, Y. Li, and J. Wu, Robocook: Long-horizon elasto-plastic object manipulation with diverse tools, arXiv preprint arXiv: 2306.14447, 2023.
[9]
Z. Fu, T. Z. Zhao, and C. Finn, Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation, arXiv preprint arXiv: 2401.02117, 2024.
[10]
Z. Yan, N. Crombez, J. Buisson, Y. Ruichck, T. Krajnik, and L. Sun, A quantifiable stratification strategy for tidy-up in service robotics, in Proc. IEEE Int. Conf. Advanced Robotics and Its Social Impacts (ARSO), Tokoname, Japan, 2021, pp. 182–187.
[11]
I. Kapelyukh and E. Johns, My house, my rules: Learning tidying preferences with graph neural networks, in Proc. 5th Conf. Robot Learning (CoRL 2021), London, UK, 2023, pp. 740–749.
[12]
S. Li, H. Yu, W. Ding, H. Liu, L. Ye, C. Xia, X. Wang, and X. -P. Zhang, Visual–tactile fusion for transparent object grasping in complex backgrounds, IEEE Trans. Robot., vol. 39, no. 5, pp. 3838–3856, 2023.
[13]
Y. Deng, X. Guo, Y. Wei, K. Lu, B. Fang, D. Guo, H. Liu, and F. Sun, Deep reinforcement learning for robotic pushing and picking in cluttered environment, in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), Macau, China, 2019, pp. 619–626.
[15]
OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, et al., GPT-4 technical report, arXiv preprint arXiv: 2303.08774, 2023.
[16]
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al., Llama 2: Open foundation and fine-tuned chat models, arXiv preprint arXiv: 2307.09288, 2023.
[17]
W. Huang, C. Wang, R. Zhang, Y. Li, J. Wu, and F. F. Li, Voxposer: Composable 3D value maps for robotic manipulation with language models, arXiv preprint arXiv: 2307.05973, 2023.
[18]
J. Liang, W. Huang, F. Xia, P. Xu, K. Hausman, B. Ichter, P. Florence, and A. Zeng, Code as policies: Language model programs for embodied control, in Proc. IEEE Int. Conf. Robotics and Automation (ICRA), London, UK, 2023, pp. 9493–9500.
[19]
Anthony Brohan, Yevgen Chebotar, Chelsea Finn, Karol Hausman, Alexander Herzog, Daniel Ho, Julian Ibarz, Alex Irpan, Eric Jang, Ryan Julian, et al. Do as i can, not as i say: Grounding language in robotic affordances, in Proc. 6th Conf. Robot Learning (CoRL 2022), Auckland, New Zealand, 2022, pp. 287–318.
[21]
X. Zhu, Y. Chen, H. Tian, C. Tao, W. Su, C. Yang, G. Huang, B. Li, L. Lu, X. Wang, et al., Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory, arXiv preprint arXiv: 2305.17144, 2023.
[22]
G. Wang, Y. Xie, Y. Jiang, A. Mandlekar, C. Xiao, Y. Zhu, L. Fan, and A. Anandkumar, Voyager: An open-ended embodied agent with large language models, arXiv preprint arXiv: 2305.16291, 2023.
[23]
Y. Hu, F. Lin, T. Zhang, L. Yi, and Y. Gao, Look before you leap: Unveiling the power of GPT-4v in robotic vision-language planning, arXiv preprint arXiv: 2311.17842, 2023.
[24]
Z. Yang, L. Li, K. Lin, J. Wang, C. C. Lin, Z. Liu, and L. Wang, The dawn of LMMs: Preliminary explorations GPT-4v (ision), arXiv preprint arXiv: 2309.17421, 2023.
[26]
W. Huang, F. Xia, T. Xiao, H. Chan, J. Liang, P. Florence, A. Zeng, J. Tompson, I. Mordatch, Y. Chebotar, et al., Inner monologue: Embodied reasoning through planning with language models, arXiv preprint arXiv: 2207.05608, 2022.
[27]
T. Kwon, N. Di Palo, and E. Johns, Language models as zero-shot trajectory generators, arXiv preprint arXiv: 2310.11604, 2023.
[28]
M. Xu, P. Huang, W. Yu, S. Liu, X. Zhang, Y. Niu, T. Zhang, F. Xia, J. Tan, and D. Zhao, Creative robot tool use with large language models, arXiv preprint arXiv: 2310.13065, 2023.
[29]
Y. J. Ma, W. Liang, G. Wang, D. A. Huang, O. Bastani, D. Jayaraman, Y. Zhu, L. Fan, and A. Anandkumar, Eureka: Human-level reward design via coding large language models, arXiv preprint arXiv: 2310.12931, 2023.
[30]
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C Berg, W. Y. Lo, et al., Segment anything, arXiv preprint arXiv: 2304.02643, 2023.
[31]
M. Minderer, A. Gritsenko, A. Stone, M. Neumann, D. Weissenborn, A. Dosovitskiy, A. Mahendran, A. Arnab, M. Dehghani, Z. Shen, et al., Simple open-vocabulary object detection with vision transformers, arXiv preprint arXiv: 2205.06230, 2022.
[33]
D. Nau, Y. Cao, A. Lotem, and H. Munoz-Avila, SHOP: Simple hierarchical ordered planner, in Proc. 16th Int. Joint Conf. Artificial Intelligence (IJCAI’99), Stockholm, Sweden, 1999, pp. 968–973.
[34]
Y. Xie, C. Yu, T. Zhu, J. Bai, Z. Gong, and H. Soh, Translating natural language to planning goals with large language models, arXiv preprint arXiv: 2302.05128, 2023.
[35]
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, D. Zhou, Chain-of-thought prompting elicits reasoning in large language models, in Proc. 36th Conf. Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 2022, pp. 24824–24837.
[36]
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, ReAct: Synergizing reasoning and acting in language models, arXiv preprint arXiv: 2210.03629, 2022.
[37]
A. Zeng, P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani, et al., Transporter networks: Rearranging the visual world for robotic manipulation, in Proc. 4th Conf. Robot Learning (CoRL 2020), virtual, 2020.
[39]
X. Gu, T. Y. Lin, W. Kuo, and Y. Cui, Open vocabulary object detection via vision and language knowledge distillation, arXiv preprint arXiv: 2104.13921, 2021.