Offline Goal-conditioned Reinforcement Learning with Quasimetric Representations

๐Ÿ“… 2025-09-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses low policy learning efficiency and biased distance estimation in offline goal-conditioned reinforcement learning (GCRL) under suboptimal and stochastic environments. To this end, we propose a quasi-metric successor representation method that integrates contrastive representation learning with a temporal distance framework. Our core innovations include: (i) enforcing triangle inequality constraints in a quasi-metric space to regularize state-goal distance representations; (ii) jointly optimizing contrastive objectives with successor features; and (iii) introducing path concatenation consistency constraints. To our knowledge, this is the first method to achieve unbiased estimation of optimal goal-reaching distances from suboptimal, stochastic offline dataโ€”combining the stability of Monte Carlo contrastive learning with the long-horizon generalization capability of quasi-metric networks. Experiments demonstrate significant improvements over state-of-the-art contrastive and quasi-metric baselines on standard offline GCRL benchmarks and high-dimensional noisy environments, particularly excelling in multi-segment path concatenation tasks.

Technology Category

Application Category

๐Ÿ“ Abstract
Approaches for goal-conditioned reinforcement learning (GCRL) often use learned state representations to extract goal-reaching policies. Two frameworks for representation structure have yielded particularly effective GCRL algorithms: (1) *contrastive representations*, in which methods learn "successor features" with a contrastive objective that performs inference over future outcomes, and (2) *temporal distances*, which link the (quasimetric) distance in representation space to the transit time from states to goals. We propose an approach that unifies these two frameworks, using the structure of a quasimetric representation space (triangle inequality) with the right additional constraints to learn successor representations that enable optimal goal-reaching. Unlike past work, our approach is able to exploit a **quasimetric** distance parameterization to learn **optimal** goal-reaching distances, even with **suboptimal** data and in **stochastic** environments. This gives us the best of both worlds: we retain the stability and long-horizon capabilities of Monte Carlo contrastive RL methods, while getting the free stitching capabilities of quasimetric network parameterizations. On existing offline GCRL benchmarks, our representation learning objective improves performance on stitching tasks where methods based on contrastive learning struggle, and on noisy, high-dimensional environments where methods based on quasimetric networks struggle.
Problem

Research questions and friction points this paper is trying to address.

Unifying contrastive and quasimetric representations for goal-conditioned reinforcement learning
Learning optimal goal-reaching distances from suboptimal data in stochastic environments
Improving performance on stitching tasks and noisy high-dimensional environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies contrastive and quasimetric representation frameworks
Learns optimal goal-reaching distances with suboptimal data
Combines Monte Carlo stability with quasimetric stitching capabilities
๐Ÿ”Ž Similar Papers
No similar papers found.