๐ค AI Summary
To address theoretical ambiguity, optimization instability, and high computational cost arising from function approximation in reinforcement learning (RL) over large state-action spaces, this paper proposes a unified spectral representation framework grounded in spectral decomposition of transition operatorsโthe first systematic integration of spectral theory into RL modeling. The framework jointly models dynamics abstraction and policy optimization via spectral decomposition, enabling latent variable discovery and energy-based basis function learning, while admitting rigorous extension to partially observable MDPs. By unifying model-based and model-free optimization, it achieves both theoretical interpretability and empirical efficiency. Evaluated on over 20 challenging continuous-control tasks from the DeepMind Control Suite, the method matches or surpasses state-of-the-art model-free and model-based baselines.
๐ Abstract
In real-world applications with large state and action spaces, reinforcement learning (RL) typically employs function approximations to represent core components like the policies, value functions, and dynamics models. Although powerful approximations such as neural networks offer great expressiveness, they often present theoretical ambiguities, suffer from optimization instability and exploration difficulty, and incur substantial computational costs in practice. In this paper, we introduce the perspective of spectral representations as a solution to address these difficulties in RL. Stemming from the spectral decomposition of the transition operator, this framework yields an effective abstraction of the system dynamics for subsequent policy optimization while also providing a clear theoretical characterization. We reveal how to construct spectral representations for transition operators that possess latent variable structures or energy-based structures, which implies different learning methods to extract spectral representations from data. Notably, each of these learning methods realizes an effective RL algorithm under this framework. We also provably extend this spectral view to partially observable MDPs. Finally, we validate these algorithms on over 20 challenging tasks from the DeepMind Control Suite, where they achieve performances comparable or superior to current state-of-the-art model-free and model-based baselines.