🤖 AI Summary
Existing representation learning methods lack structural alignment with value-based reinforcement learning—particularly Bellman updates—leading to a misalignment between representation learning and policy optimization.
Method: We propose the Spectral Bellman Representation framework, which theoretically establishes, for the first time, an intrinsic spectral relationship between the value function distribution under the Bellman operator and the feature covariance matrix. Leveraging multi-step Bellman operators and spectral analysis, we derive an optimization objective strictly aligned with Bellman dynamics and introduce the Inherent Bellman Error constraint to jointly govern representation learning and exploration.
Contribution/Results: Our method requires only lightweight modifications to existing algorithms yet achieves significant performance gains on challenging exploration and long-horizon credit assignment tasks. Empirical results demonstrate its effectiveness, generalization capability, and structural soundness, validating the theoretical foundation and practical utility of spectral alignment in representation learning for RL.
📝 Abstract
The effect of representation has been demonstrated in reinforcement learning, from both theoretical and empirical successes. However, the existing representation learning mainly induced from model learning aspects, misaligning with our RL tasks. This work introduces Spectral Bellman Representation, a novel framework derived from the Inherent Bellman Error (IBE) condition, which aligns with the fundamental structure of Bellman updates across a space of possible value functions, therefore, directly towards value-based RL. Our key insight is the discovery of a fundamental spectral relationship: under the zero-IBE condition, the transformation of a distribution of value functions by the Bellman operator is intrinsically linked to the feature covariance structure. This spectral connection yields a new, theoretically-grounded objective for learning state-action features that inherently capture this Bellman-aligned covariance. Our method requires a simple modification to existing algorithms. We demonstrate that our learned representations enable structured exploration, by aligning feature covariance with Bellman dynamics, and improve overall performance, particularly in challenging hard-exploration and long-horizon credit assignment tasks. Our framework naturally extends to powerful multi-step Bellman operators, further broadening its impact. Spectral Bellman Representation offers a principled and effective path toward learning more powerful and structurally sound representations for value-based reinforcement learning.