Ground-Compose-Reinforce: Tasking Reinforcement Learning Agents through Formal Language

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the challenge of grounding natural language instructions in complex perceptual (e.g., pixel-based) and action spaces—specifically, achieving efficient and generalizable language-to-embodied-action mapping without relying on handcrafted linguistic modules or large-scale environment-language paired datasets. We propose a neuro-symbolic reinforcement learning framework that integrates formal language semantics into data-driven representation learning, enabling end-to-end language→perception→action mapping without manual reward engineering or symbolic detectors. The framework supports few-shot training, compositional generalization, and cross-task transfer. We evaluate it on image-based grid-world environments and MuJoCo robotics tasks: using only a small number of instruction-behavior demonstrations, it robustly executes unseen linguistic compositions, significantly outperforming pure end-to-end baselines in both zero-shot generalization and task performance.

Technology Category

Application Category

📝 Abstract

Grounding language in complex perception (e.g. pixels) and action is a key challenge when building situated agents that can interact with humans via language. In past works, this is often solved via manual design of the language grounding or by curating massive datasets relating language to elements of the environment. We propose Ground-Compose-Reinforce, a neurosymbolic framework for grounding formal language from data, and eliciting behaviours by directly tasking RL agents through this language. By virtue of data-driven learning, our framework avoids the manual design of domain-specific elements like reward functions or symbol detectors. By virtue of compositional formal language semantics, our framework achieves data-efficient grounding and generalization to arbitrary language compositions. Experiments on an image-based gridworld and a MuJoCo robotics domain show that our approach reliably maps formal language instructions to behaviours with limited data while end-to-end, data-driven approaches fail.

Problem

Research questions and friction points this paper is trying to address.

Grounding language in complex perception and action for interactive agents

Avoiding manual design of domain-specific elements like reward functions

Achieving data-efficient grounding and generalization to language compositions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neurosymbolic framework for formal language grounding

Data-driven learning avoids manual design

Compositional semantics enable efficient generalization

🔎 Similar Papers

Video-Language Critic: Transferable Reward Functions for Language-Conditioned Robotics