🤖 AI Summary
Offline goal-conditioned reinforcement learning struggles with reliable long-horizon goal reaching due to cumulative value estimation errors. To address this, we propose Projective Quasimetric Planning (PQP), a novel framework that constructs an asymmetric quasimetric in latent space. This metric serves two synergistic roles: (i) as a repulsive energy term to promote uniform keypoint distribution, and (ii) in conjunction with a Lagrangian out-of-distribution detection mechanism to ensure subgoal reachability. PQP further integrates metric learning, keypoint coverage, and goal-conditioned control to induce structured directional costs that guide the policy toward nearby, feasible subgoals. Evaluated on multiple navigation benchmarks, PQP significantly improves success rates on long-horizon tasks, generating semantically coherent and executable subgoal sequences. It achieves robust, scalable goal-reaching without online environmental interaction.
📝 Abstract
Offline Goal-Conditioned Reinforcement Learning seeks to train agents to reach specified goals from previously collected trajectories. Scaling that promises to long-horizon tasks remains challenging, notably due to compounding value-estimation errors. Principled geometric offers a potential solution to address these issues. Following this insight, we introduce Projective Quasimetric Planning (ProQ), a compositional framework that learns an asymmetric distance and then repurposes it, firstly as a repulsive energy forcing a sparse set of keypoints to uniformly spread over the learned latent space, and secondly as a structured directional cost guiding towards proximal sub-goals. In particular, ProQ couples this geometry with a Lagrangian out-of-distribution detector to ensure the learned keypoints stay within reachable areas. By unifying metric learning, keypoint coverage, and goal-conditioned control, our approach produces meaningful sub-goals and robustly drives long-horizon goal-reaching on diverse a navigation benchmarks.