🤖 AI Summary
To address the core bottlenecks of low data efficiency and poor generalization in reinforcement learning, this paper proposes a novel agent architecture integrating large language models (LLMs), episodic memory, and a world-graph representation. Methodologically, it introduces three key innovations: (1) language-grounded embeddings for bridging symbolic and sub-symbolic representations; (2) structured world-graph modeling to encode relational knowledge; and (3) a dynamic arbitration mechanism for critical state identification. The architecture synergizes symbolic reasoning with neural learning via LLM-based semantic encoding, rapid episodic memory retrieval, world-graph relational inference, and lightweight critical-state detection—enhancing semantic awareness and adaptive decision-making. Evaluated on the BabyAI-Text benchmark, the agent achieves a 76% improvement over the strongest baseline on generalization tasks such as FindObj, demonstrating substantial gains in cross-task transfer and zero-shot generalization.
📝 Abstract
Reinforcement learning (RL) has driven breakthroughs in AI, from game-play to scientific discovery and AI alignment. However, its broader applicability remains limited by challenges such as low data efficiency and poor generalizability. Recent advances suggest that large language models, with their rich world knowledge and reasoning capabilities, could complement RL by enabling semantic state modeling and task-agnostic planning. In this work, we propose the Agentic Episodic Control (AEC), a novel architecture that integrates RL with LLMs to enhance decision-making. The AEC can leverage a large language model (LLM) to map the observations into language-grounded embeddings, which further can be stored in an episodic memory for rapid retrieval of high-value experiences. Simultaneously, a World-Graph working memory module is utilized to capture structured environmental dynamics in order to enhance relational reasoning. Furthermore, a lightweight critical state detector dynamically arbitrates between the episodic memory recall and the world-model-guided exploration. On the whole, by combining the trial-and-error learning scheme with LLM-derived semantic priors, the proposed AEC can improve both data efficiency and generalizability in reinforcement learning. In experiments on BabyAI-Text benchmark tasks, AEC demonstrates substantial improvements over existing baselines, especially on complex and generalization tasks like FindObj, where it outperforms the best baseline by up to 76%. The proposed AEC framework bridges the strengths of numeric reinforcement learning and symbolic reasoning, which provides a pathway toward more adaptable and sample-efficient agents.