PDF (11.8 MB)
Collect
Submit Manuscript
Show Outline
Outline
Abstract
References
Show full outline
Hide outline
Research Article | Open Access

Exploring into the Unseen: Enhancing Language-Conditioned Policy Generalization with Behavioral Information

Longhui Cao1,2Chao Wang1,2()Juntong Qi1,2Yan Peng1,2
School of Future Technology, Shanghai University, Shanghai, China
Institute of Artificial Intelligence, Shanghai University, Shanghai, China
Show Author Information

Abstract

Generalizing policies learned by agents in known environments to unseen domains is an essential challenge in advancing the development of reinforcement learning. Lately, language-conditioned policies have underscored the pivotal role of linguistic information in the context of cross-environments. Integrating both environmental and textual information into the observation space enables agents to accomplish similar tasks across different scenarios. However, for entities with varying forms of motion but the same name present in observations (e.g., immovable mage and fleeing mage), existing methods are unable to learn the motion information the entities possess well. They face the problem of ambiguity caused by motion. In order to tackle this challenge, we propose the entity mapper with multi-modal attention based on behavior prediction (EMMA-BBP) framework, comprising modules for predicting motion behavior and text matching. The behavioral prediction module is used to determine the motion information of the entities present in the environment to eliminate the semantic ambiguity of the motion information. The role of the text-matching module is to match the text given in the environment with the information about the entity’s behavior under observation, thus eliminating false textual information. EMMA-BBP has been tested in the demanding environment of MESSENGER, doubling the generalization ability of EMMA.

References

1

Feng S, Sun H, Yan X, Zhu H, Zou Z, Shen S, Liu HX. Dense reinforcement learning for safety validation of autonomous vehicles. Nature. 2023(615):620–627.

2

Sun Y, Zhang K, Sun C. Model-based transfer reinforcement learning based on graphical model representations. IEEE Trans Neural Netw Learn Syst. 2023;32(2):1035–1048.

3

Damani M, Luo Z, Wenzel E, Sartoretti G. Primal _2: Pathfinding via reinforcement and imitation multi-agent learning-lifelong. IEEE Robot Autom Lett. 2021;6(2):2666–2673.

4

Hao J, Yang T, Tang H, Bai C, Liu J, Meng Z, Liu P, Wang Z. Exploration in deep reinforcement learning: From single-agent to multiagent domain. IEEE Trans Neural Netw Learn Syst. 2023(19):Article 3236361.

5
Shridhar M, Yuan X, Côté M, Bisk Y, Trischler A, Hausknecht MJ. ALFWorld: Aligning text and embodied environments for interactive learning. Paper presented at: 9th International Conference on Learning Representations, ICLR 2021; 2021 May 3–7; Austria.
6
Szot A, Clegg A, Undersander E, Wijmans E, Zhao Y, Turner J, Maestre N, Mukadam M, Chaplot DS, Maksymets O, et al. Habitat 2.0: Training home assistants to rearrange their habitat. arXiv. 2021. https://arxiv.org/abs/2106.14405
7
Nagabandi A, Clavera I., Liu S, Fearing RS, Abbeel P, Levine S, Finn C, Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. Paper presented at:7th International Conference on Learning Representations, ICLR; 2019 May 6–9; New Orleans, LA, USA.
8
Zhong V, Rocktäschel T, Grefenstette E. RTFM: Generalising to new environment dynamics via reading. Paper presented at: International Conference on Learning Representations, ICLR; 2020 April 26–30; Addis Ababa, Ethiopia.
9
Hanjie AW, Zhong V, Narasimhan K. Grounding language to entities and dynamics for generalization in reinforcement learning. Paper presented at: International Conference on Machine Learning PMLR; 2021 July 18–24; Virtual, Online.
10
Chen V, Gupta A, Marino K. Ask your humans: Using human instructions to improve generalization in reinforcement learning. Paper presented at: 9th International Conference on Learning Representations, ICLR; 2021 May 3; Austria.
11
Chaudhury S, Swaminathan S, Kimura D, Sen P, Murugesan K, Uceda-Sosa R, Tatsubori M, Fokoue A, Kapanipathi P, Munawar A, et al. Learning symbolic rules over abstract meaning representations for textual reinforcement learning. Paper presented at: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023; 2023 July 9–14; Toronto, Canada.
12
Pang J, Yang X, Yang S, Yu Y. Natural language-conditioned reinforcement learning with inside-out task language development and translation. arXiv. 2023. https://arxiv.org/abs/2302.09368
13
Wu Y, Fan Y, Liang PP, Azaria A, Li Y, Mitchell TM. Read and reap the rewards: Learning to play atari with the help of instruction manuals. arXiv. 2023. https://arxiv.org/abs/2302.04449
14
Zhong V, Hanjie AW, Wang S, Narasimhan K, Zettlemoyer L. Silg: The multi-domain symbolic interactive language grounding benchmark. arXiv. 2021. https://arxiv.org/abs/2110.10661
15
Ding Z, Zhang W, Yue J, Wang X, Huang T, Lu Z. Entity divider with language grounding in multi-agent reinforcement learning. Paper presented at: International Conference on Machine Learning PMLR; 2023 July 23–29; Honolulu, Hawaii, USA.
16
Cao Z, Ramachandra RA, Yu K, Temporal video-language alignment network for reward shaping in reinforcement learning. arXiv. 2023. https://arxiv.org/abs/2302.03954
17
Mu J, Zhong V, Raileanu R, Jiang M, Goodman N, Rocktäschel T, Grefenstette E. Improving intrinsic exploration with language abstractions. arXiv. 2022. https://arxiv.org/abs/2202.08938
18
Li Y, Ren J, Xu T, Zhang T, Gao H, Chen F. Learning invariable semantical representation from language for extensible policy generalization. arXiv. 2022. https://arxiv.org/abs/2202.00466
19
Y. Gong, H. Luo, J. Zhang. Natural language inference over interaction space. Paper presented at: 6th International Conference on Learning Representations, ICLR; 2018 April 30–May 3; Vancouver, BC, Canada.
20
Chen Q, Zhu X, Ling Z, Wei S, Jiang H, Inkpen D. Enhanced LSTM for natural language inference. arXiv. 2017. https://arxiv.org/abs/1609.06038
21
Lan W, Xu W. Neural network models for paraphrase identification, semantic textual similarity, natural language inference, and question answering. arXiv. 2018. https://arxiv.org/abs/1806.04330
22
Huang P-S, He X, Gao J, Deng L, Acero A, Heck L. Learning deep structured semantic models for web search using clickthrough data. Paper presented at: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management; 2013 October 27–November 1; San Francisco, CA, USA. p. 2333–2338.
23
Shen Y, He X, Gao J, Deng L, Mesnil G. A latent semantic model with convolutionalpooling structure for information retrieval. Paper presented at: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management; 2014. p. 101–110.
24
Palangi H, Deng L, Shen Y, Gao J, He X, Chen J, Song X, Ward RK. Semantic modelling with long-short-term memory for information retrieval. arXiv. 2014. https://arxiv.org/abs/1412.6629.
25
Devlin J, Chang M, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv. 2018. https://arxiv.org/abs/1810.04805
26
Lyu B, Chen L, Zhu S, Yu K. Let: Linguistic knowledge enhanced graph transformer for chinese short text matching. arXiv. 2021. https://arxiv.org/abs/2102.12671
27
Li B, Zhou H, He J, Wang M, Yang Y, Li L. On the sentence embeddings from bert for semantic textual similarity. arXiv. 2020. https://arxiv.org/abs/2011.05864
28
Su J, Cao J, Liu W, Ou Y. Whitening sentence representations for better semantics and faster retrieval. arXiv. 2021. https://arxiv.org/abs/2103.15316
29
Jiang T, Huang S, Zhang Z, Wang D, Zhuang F, Wei F, Huang H, Zhang L, Zhang Q. Promptbert: Improving BERT sentence embeddings with prompts. arXiv. 2022. https://arxiv.org/abs/2201.04337
30

Buhrmester M, Kwang T, Gosling SD. Amazon’s mechanical turk: A new source of inexpensive, yet high-quality, data? Perspect Psychol Sci. 2011;6(1):3–5.

31
Xu M, Text2vec: Text to vector toolkit. https://github.com/shibing624/text2vec (2023).
32
Reimers N, Gurevych I. Sentence-BERT: Sentence embeddings using siamese BERTNetworks. arXiv. 2019. https://arxiv.org/abs/1908.10084
33
Mao A, Mohri M, Zhong Y. Cross-entropy loss functions: Theoretical analysis and applications. arXiv. 2023. https://arxiv.org/abs/2304.07288
34
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. arXiv. 2017. https://arxiv.org/abs/1706.03762
35
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, et al. Transformers: State-of-the-art natural language processing. Paper presented at: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations; 2020; Online. p. 38–45.
36
Zell A. Simulation neuronaler netze 1. Bonn: Addison-Wesley; 1994.
37
Myers JL, Well AD, Lorch RF Jr. Research design and statistical analysis. Routledge; 2013.
38
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv. 2017. https://arxiv.org/abs/1707.06347
39
Kingma DP, Ba J. A method for stochastic optimization. Paper presented at: 3rd International Conference on Learning Representations, ICLR 2015; 2015 May 7–9; San Diego, CA, USA.
Cyborg and Bionic Systems
Article number: 0084
Cite this article:
Cao L, Wang C, Qi J, et al. Exploring into the Unseen: Enhancing Language-Conditioned Policy Generalization with Behavioral Information. Cyborg and Bionic Systems, 2024, 5: 0084. https://doi.org/10.34133/cbsystems.0084
Metrics & Citations  
Article History
Copyright
Rights and Permissions
Return