🤖 AI Summary
In continuous control tasks, forward-backward (FB) representations suffer from spectral mismatch due to the high-rank nature of transition dynamics conflicting with low-rank architectures, hindering effective representation learning. This work proposes mitigating this issue through a temporal abstraction mechanism, whose low-pass filtering effect suppresses high-frequency spectral components of the transition operator, thereby reducing the effective rank of successor representations and controlling value function approximation error. By treating temporal abstraction as a principled means to modulate the spectral structure of the underlying Markov decision process, the approach significantly enhances the stability of FB learning and the quality of long-horizon representations under high discount factors, demonstrating that spectral alignment plays a critical role in long-range representation learning for continuous control.
📝 Abstract
Forward-backward (FB) representations provide a powerful framework for learning the successor representation (SR) in continuous spaces by enforcing a low-rank factorization. However, a fundamental spectral mismatch often exists between the high-rank transition dynamics of continuous environments and the low-rank bottleneck of the FB architecture, making accurate low-rank representation learning difficult. In this work, we analyze temporal abstraction as a mechanism to mitigate this mismatch. By characterizing the spectral properties of the transition operator, we show that temporal abstraction acts as a low-pass filter that suppresses high-frequency spectral components. This suppression reduces the effective rank of the induced SR while preserving a formal bound on the resulting value function error. Empirically, we show that this alignment is a key factor for stable FB learning, particularly at high discount factors where bootstrapping becomes error-prone. Our results identify temporal abstraction as a principled mechanism for shaping the spectral structure of the underlying MDP and enabling effective long-horizon representations in continuous control.