From Passive Reuse to Active Reasoning: Grounding Large Language Models for Neuro-Symbolic Experience Replay

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work proposes a Neural-Symbolic Experience Replay (NSER) framework that addresses the limitations of traditional experience replay, which relies solely on numerical prediction errors for sampling and overlooks the semantic value of experiences, thereby hindering human-like learning efficiency. NSER uniquely integrates the zero-shot semantic reasoning capabilities of large language models (LLMs) with reinforcement learning replay mechanisms by extracting behavioral rules from trajectories via LLMs and translating them into differentiable first-order logic representations. These symbolic abstractions dynamically reweight the replay distribution, enabling high-level knowledge to directly guide policy optimization. The resulting proactive, knowledge-constructing replay system substantially improves sample efficiency and convergence speed across reactive, rule-based, and procedural tasks, effectively bridging the gap between numerical optimization and symbolic reasoning.

📝 Abstract

While experience replay is essential for data efficiency in reinforcement learning (RL), standard methods treat the replay buffer as a passive memory system, prioritizing samples based on numerical prediction errors rather than their semantic significance. This approach stands in contrast to human learning, which accelerates mastery by actively abstracting fragmented experiences into behavioral rules. To bridge this gap, we propose Neuro-Symbolic Experience Replay (NSER), a framework that transforms experience replay from a passive sample reuse mechanism into an active engine for knowledge construction. Specifically, NSER addresses the incompatibility between linguistic reasoning and numerical optimization through a novel neuro-symbolic grounding pipeline. It leverages Large Language Models (LLMs) in a zero-shot manner to induce candidate behavioral rules from accumulated trajectories, grounds these insights into differentiable first-order logic representations, and utilizes the resulting symbolic structures to dynamically reweight the replay distribution. By allowing abstract knowledge to directly shape policy optimization, NSER achieves consistent superior sample efficiency and convergence speed across reactive, rule-based, and procedural benchmarks.

Problem

Research questions and friction points this paper is trying to address.

experience replay

semantic significance

behavioral rules

data efficiency

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neuro-Symbolic

Experience Replay

Large Language Models