Compositional Instruction Following with Language Models and Reinforcement Learning

📅 2025-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low sample efficiency and weak compositional generalization in language-instructed reinforcement learning agents for multi-task settings, this paper proposes CERLLA: the first framework that deeply integrates compositional policy representation with a reinforcement learning–driven semantic parser to achieve zero-shot compositional generalization. Methodologically, it unifies reinforcement learning, semantic parsing, in-context learning, and function approximation to enable end-to-end mapping from natural language instructions to executable policies. Evaluated on 162 compositional generalization benchmark tasks, CERLLA achieves 92% accuracy—approaching the oracle upper bound (94%) and substantially outperforming non-compositional baselines (80%)—while significantly reducing sample complexity. Its core contribution is a unified architecture that jointly enables structured semantic understanding and policy composition, establishing a novel paradigm for open-domain language–action alignment in embodied agents.

Technology Category

Application Category

📝 Abstract
Combining reinforcement learning with language grounding is challenging as the agent needs to explore the environment while simultaneously learning multiple language-conditioned tasks. To address this, we introduce a novel method: the compositionally-enabled reinforcement learning language agent (CERLLA). Our method reduces the sample complexity of tasks specified with language by leveraging compositional policy representations and a semantic parser trained using reinforcement learning and in-context learning. We evaluate our approach in an environment requiring function approximation and demonstrate compositional generalization to novel tasks. Our method significantly outperforms the previous best non-compositional baseline in terms of sample complexity on 162 tasks designed to test compositional generalization. Our model attains a higher success rate and learns in fewer steps than the non-compositional baseline. It reaches a success rate equal to an oracle policy's upper-bound performance of 92%. With the same number of environment steps, the baseline only reaches a success rate of 80%.
Problem

Research questions and friction points this paper is trying to address.

Robot Learning
Complex Language Instructions
Adaptive Execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

CERLLA
Language Understanding
Skill Learning
🔎 Similar Papers
No similar papers found.