Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

To address inaccurate temporal distance estimation and low policy learning efficiency in long-horizon goal-directed reinforcement learning, this paper proposes a multi-step pseudometric learning method. Given unlabeled offline visual observations, it constructs a goal-conditioned temporal distance metric by regressing multi-step Monte Carlo returns. The resulting pseudometric ensures both global consistency and local optimality, enabling end-to-end multi-step path concatenation. To our knowledge, this is the first work to instantiate this paradigm on real-robot manipulation tasks (Bridge dataset) and demonstrate significant improvements over prior methods in simulated tasks with horizons up to 4,000 steps. The core innovation lies in modeling temporal distance as a learnable pseudometric—eliminating reliance on explicit dynamics models or dense reward supervision—thereby enhancing generalization and sample efficiency for long-horizon goal achievement.

Technology Category

Application Category

📝 Abstract

Learning how to reach goals in an environment is a longstanding challenge in AI, yet reasoning over long horizons remains a challenge for modern methods. The key question is how to estimate the temporal distance between pairs of observations. While temporal difference methods leverage local updates to provide optimality guarantees, they often perform worse than Monte Carlo methods that perform global updates (e.g., with multi-step returns), which lack such guarantees. We show how these approaches can be integrated into a practical GCRL method that fits a quasimetric distance using a multistep Monte-Carlo return. We show our method outperforms existing GCRL methods on long-horizon simulated tasks with up to 4000 steps, even with visual observations. We also demonstrate that our method can enable stitching in the real-world robotic manipulation domain (Bridge setup). Our approach is the first end-to-end GCRL method that enables multistep stitching in this real-world manipulation domain from an unlabeled offline dataset of visual observations.

Problem

Research questions and friction points this paper is trying to address.

Estimating temporal distance between observation pairs

Integrating local and global updates for GCRL

Enabling multistep stitching in real-world robotics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multistep Monte-Carlo return for quasimetric learning

End-to-end goal-conditioned reinforcement learning method

Enables multistep stitching with visual offline datasets

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)