🤖 AI Summary
This work proposes a Neural-Symbolic Experience Replay (NSER) framework that addresses the limitations of traditional experience replay, which relies solely on numerical prediction errors for sampling and overlooks the semantic value of experiences, thereby hindering human-like learning efficiency. NSER uniquely integrates the zero-shot semantic reasoning capabilities of large language models (LLMs) with reinforcement learning replay mechanisms by extracting behavioral rules from trajectories via LLMs and translating them into differentiable first-order logic representations. These symbolic abstractions dynamically reweight the replay distribution, enabling high-level knowledge to directly guide policy optimization. The resulting proactive, knowledge-constructing replay system substantially improves sample efficiency and convergence speed across reactive, rule-based, and procedural tasks, effectively bridging the gap between numerical optimization and symbolic reasoning.
📝 Abstract
While experience replay is essential for data efficiency in reinforcement learning (RL), standard methods treat the replay buffer as a passive memory system, prioritizing samples based on numerical prediction errors rather than their semantic significance. This approach stands in contrast to human learning, which accelerates mastery by actively abstracting fragmented experiences into behavioral rules. To bridge this gap, we propose Neuro-Symbolic Experience Replay (NSER), a framework that transforms experience replay from a passive sample reuse mechanism into an active engine for knowledge construction. Specifically, NSER addresses the incompatibility between linguistic reasoning and numerical optimization through a novel neuro-symbolic grounding pipeline. It leverages Large Language Models (LLMs) in a zero-shot manner to induce candidate behavioral rules from accumulated trajectories, grounds these insights into differentiable first-order logic representations, and utilizes the resulting symbolic structures to dynamically reweight the replay distribution. By allowing abstract knowledge to directly shape policy optimization, NSER achieves consistent superior sample efficiency and convergence speed across reactive, rule-based, and procedural benchmarks.