π€ AI Summary
This work addresses the challenge in offline goal-conditioned reinforcement learning where sparse rewards often cause misalignment between state and goal representations, leading encoders to collapse into goal-irrelevant low-dimensional subspaces and degrading policy stability. To mitigate this, the authors propose Ms.PR, a multi-scale representation learning framework that, for the first time, enforces cross-scale predictive consistency as a core constraint to achieve hierarchical alignment in latent spaceβfrom local dynamics to long-horizon goal structures. Integrating multi-scale predictive modeling, latent-space alignment constraints, and an offline RL architecture, Ms.PR supports both visual and state-based inputs and consistently enhances representation quality and policy robustness across diverse tasks, trajectory stitching scenarios, and high-noise conditions, outperforming existing methods.
π Abstract
This paper investigates robust representation learning in offline goal-conditioned reinforcement learning (GCRL). Particularly in sparse reward scenarios, learning representations that align state and goal latents is a challenge that frequently culminates in representation divergence where the encoder drifts toward a low-dimensional, goal-agnostic subspace that destabilizes policy learning. We address this issue by showing that an agent must acquire a fundamental understanding of its environment across multiple scales, from local physical dynamics to long-horizon goal-directed structure. Building on this insight, we propose Ms.PR, a framework that leverages multi-scale predictive supervision to enforce goal-directed alignment within the latent space. We demonstrate that Ms.PR leads to improved representation quality and strong performance on both vision and state-based tasks. Furthermore, we show that our approach is exceptionally resilient under realistic, challenging data regimes, maintaining state-of-the-art performance across a wide variety of tasks, trajectory stitching scenarios, and extreme noise conditions.