Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Offline goal-conditioned reinforcement learning struggles with reliable long-horizon goal reaching due to cumulative value estimation errors. To address this, we propose Projective Quasimetric Planning (PQP), a novel framework that constructs an asymmetric quasimetric in latent space. This metric serves two synergistic roles: (i) as a repulsive energy term to promote uniform keypoint distribution, and (ii) in conjunction with a Lagrangian out-of-distribution detection mechanism to ensure subgoal reachability. PQP further integrates metric learning, keypoint coverage, and goal-conditioned control to induce structured directional costs that guide the policy toward nearby, feasible subgoals. Evaluated on multiple navigation benchmarks, PQP significantly improves success rates on long-horizon tasks, generating semantically coherent and executable subgoal sequences. It achieves robust, scalable goal-reaching without online environmental interaction.

Technology Category

Application Category

📝 Abstract

Offline Goal-Conditioned Reinforcement Learning seeks to train agents to reach specified goals from previously collected trajectories. Scaling that promises to long-horizon tasks remains challenging, notably due to compounding value-estimation errors. Principled geometric offers a potential solution to address these issues. Following this insight, we introduce Projective Quasimetric Planning (ProQ), a compositional framework that learns an asymmetric distance and then repurposes it, firstly as a repulsive energy forcing a sparse set of keypoints to uniformly spread over the learned latent space, and secondly as a structured directional cost guiding towards proximal sub-goals. In particular, ProQ couples this geometry with a Lagrangian out-of-distribution detector to ensure the learned keypoints stay within reachable areas. By unifying metric learning, keypoint coverage, and goal-conditioned control, our approach produces meaningful sub-goals and robustly drives long-horizon goal-reaching on diverse a navigation benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Addresses long-horizon tasks in offline goal-conditioned reinforcement learning

Reduces compounding value-estimation errors via geometric solutions

Ensures reachable sub-goals with Lagrangian out-of-distribution detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

ProQ learns asymmetric distance for planning

Keypoints spread uniformly in latent space

Lagrangian detector ensures reachable keypoints

🔎 Similar Papers

No similar papers found.