Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inaccurate temporal distance estimation and low policy learning efficiency in long-horizon goal-directed reinforcement learning, this paper proposes a multi-step pseudometric learning method. Given unlabeled offline visual observations, it constructs a goal-conditioned temporal distance metric by regressing multi-step Monte Carlo returns. The resulting pseudometric ensures both global consistency and local optimality, enabling end-to-end multi-step path concatenation. To our knowledge, this is the first work to instantiate this paradigm on real-robot manipulation tasks (Bridge dataset) and demonstrate significant improvements over prior methods in simulated tasks with horizons up to 4,000 steps. The core innovation lies in modeling temporal distance as a learnable pseudometric—eliminating reliance on explicit dynamics models or dense reward supervision—thereby enhancing generalization and sample efficiency for long-horizon goal achievement.

Technology Category

Application Category

📝 Abstract
Learning how to reach goals in an environment is a longstanding challenge in AI, yet reasoning over long horizons remains a challenge for modern methods. The key question is how to estimate the temporal distance between pairs of observations. While temporal difference methods leverage local updates to provide optimality guarantees, they often perform worse than Monte Carlo methods that perform global updates (e.g., with multi-step returns), which lack such guarantees. We show how these approaches can be integrated into a practical GCRL method that fits a quasimetric distance using a multistep Monte-Carlo return. We show our method outperforms existing GCRL methods on long-horizon simulated tasks with up to 4000 steps, even with visual observations. We also demonstrate that our method can enable stitching in the real-world robotic manipulation domain (Bridge setup). Our approach is the first end-to-end GCRL method that enables multistep stitching in this real-world manipulation domain from an unlabeled offline dataset of visual observations.
Problem

Research questions and friction points this paper is trying to address.

Estimating temporal distance between observation pairs
Integrating local and global updates for GCRL
Enabling multistep stitching in real-world robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multistep Monte-Carlo return for quasimetric learning
End-to-end goal-conditioned reinforcement learning method
Enables multistep stitching with visual offline datasets
🔎 Similar Papers
No similar papers found.