From Passive Reuse to Active Reasoning: Grounding Large Language Models for Neuro-Symbolic Experience Replay

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
This work proposes a Neural-Symbolic Experience Replay (NSER) framework that addresses the limitations of traditional experience replay, which relies solely on numerical prediction errors for sampling and overlooks the semantic value of experiences, thereby hindering human-like learning efficiency. NSER uniquely integrates the zero-shot semantic reasoning capabilities of large language models (LLMs) with reinforcement learning replay mechanisms by extracting behavioral rules from trajectories via LLMs and translating them into differentiable first-order logic representations. These symbolic abstractions dynamically reweight the replay distribution, enabling high-level knowledge to directly guide policy optimization. The resulting proactive, knowledge-constructing replay system substantially improves sample efficiency and convergence speed across reactive, rule-based, and procedural tasks, effectively bridging the gap between numerical optimization and symbolic reasoning.
📝 Abstract
While experience replay is essential for data efficiency in reinforcement learning (RL), standard methods treat the replay buffer as a passive memory system, prioritizing samples based on numerical prediction errors rather than their semantic significance. This approach stands in contrast to human learning, which accelerates mastery by actively abstracting fragmented experiences into behavioral rules. To bridge this gap, we propose Neuro-Symbolic Experience Replay (NSER), a framework that transforms experience replay from a passive sample reuse mechanism into an active engine for knowledge construction. Specifically, NSER addresses the incompatibility between linguistic reasoning and numerical optimization through a novel neuro-symbolic grounding pipeline. It leverages Large Language Models (LLMs) in a zero-shot manner to induce candidate behavioral rules from accumulated trajectories, grounds these insights into differentiable first-order logic representations, and utilizes the resulting symbolic structures to dynamically reweight the replay distribution. By allowing abstract knowledge to directly shape policy optimization, NSER achieves consistent superior sample efficiency and convergence speed across reactive, rule-based, and procedural benchmarks.
Problem

Research questions and friction points this paper is trying to address.

experience replay
semantic significance
behavioral rules
data efficiency
reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neuro-Symbolic
Experience Replay
Large Language Models
Zero-shot Reasoning
Differentiable Logic
🔎 Similar Papers
No similar papers found.