Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the challenge in offline goal-conditioned reinforcement learning where sparse rewards often cause misalignment between state and goal representations, leading encoders to collapse into goal-irrelevant low-dimensional subspaces and degrading policy stability. To mitigate this, the authors propose Ms.PR, a multi-scale representation learning framework that, for the first time, enforces cross-scale predictive consistency as a core constraint to achieve hierarchical alignment in latent space—from local dynamics to long-horizon goal structures. Integrating multi-scale predictive modeling, latent-space alignment constraints, and an offline RL architecture, Ms.PR supports both visual and state-based inputs and consistently enhances representation quality and policy robustness across diverse tasks, trajectory stitching scenarios, and high-noise conditions, outperforming existing methods.

📝 Abstract

This paper investigates robust representation learning in offline goal-conditioned reinforcement learning (GCRL). Particularly in sparse reward scenarios, learning representations that align state and goal latents is a challenge that frequently culminates in representation divergence where the encoder drifts toward a low-dimensional, goal-agnostic subspace that destabilizes policy learning. We address this issue by showing that an agent must acquire a fundamental understanding of its environment across multiple scales, from local physical dynamics to long-horizon goal-directed structure. Building on this insight, we propose Ms.PR, a framework that leverages multi-scale predictive supervision to enforce goal-directed alignment within the latent space. We demonstrate that Ms.PR leads to improved representation quality and strong performance on both vision and state-based tasks. Furthermore, we show that our approach is exceptionally resilient under realistic, challenging data regimes, maintaining state-of-the-art performance across a wide variety of tasks, trajectory stitching scenarios, and extreme noise conditions.

Problem

Research questions and friction points this paper is trying to address.

goal-conditioned reinforcement learning

representation learning

sparse reward

representation divergence

offline reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-scale predictive representations

goal-conditioned reinforcement learning

representation alignment