🤖 AI Summary
This work exposes a fundamental theoretical limitation of the sampling-generalization paradigm grounded in the Bellman equation in high-dimensional reinforcement learning. To demonstrate this, we construct a structurally simple yet informationally opaque counterexample, proving that when the Bellman equation is enforced only on a subset of states, existing generalization mechanisms systematically discard critical dynamic information—leading to insurmountable lower bounds on sample efficiency and convergence. We provide the first formal characterization of the intrinsic bottleneck of Bellman-based approximation methods. Moreover, we rigorously extend this negative result to state-of-the-art techniques such as Hindsight Experience Replay, showing that their efficacy remains fundamentally constrained by deficiencies in modeling state reachability. By moving beyond purely empirical analysis, our work establishes the first provable theoretical boundary for Bellman-based RL algorithms, offering foundational guidance for principled algorithm design.
📝 Abstract
Reinforcement Learning algorithms designed for high-dimensional spaces often enforce the Bellman equation on a sampled subset of states, relying on generalization to propagate knowledge across the state space. In this paper, we identify and formalize a fundamental limitation of this common approach. Specifically, we construct counterexample problems with a simple structure that this approach fails to exploit. Our findings reveal that such algorithms can neglect critical information about the problems, leading to inefficiencies. Furthermore, we extend this negative result to another approach from the literature: Hindsight Experience Replay learning state-to-state reachability.