Latent Representation Alignment for Offline Goal-Conditioned Reinforcement Learning

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of erroneous generalization in goal-conditioned value functions within offline goal-conditioned reinforcement learning, particularly in long-horizon tasks where such errors severely degrade policy performance. The authors propose Latent-Aligned Value Learning (LAVL), which explicitly identifies this misgeneralization as a central bottleneck and introduces a latent representation alignment mechanism to jointly integrate value generalization and hierarchical planning within a unified framework, thereby injecting appropriate inductive bias into the model. Empirical evaluation demonstrates that LAVL achieves state-of-the-art performance on 20 out of 22 datasets in the OGBench benchmark, exhibiting substantial improvements over existing methods especially in long-horizon settings and trajectory stitching scenarios.
📝 Abstract
Offline goal-conditioned reinforcement learning (GCRL) provides a practical framework for obtaining goal-reaching policies from fixed datasets. However, learning a reliable goal-conditioned value function in long-horizon tasks remains challenging. In this paper, we identify erroneous generalization in goal-conditioned value functions as a fundamental bottleneck, and demonstrate that appropriate inductive bias in the value function is crucial for addressing the bottleneck. Building on these findings, we propose Latent-Aligned Value Learning (LAVL), an offline GCRL algorithm that integrates latent-representation-based value generalization with hierarchical planning in a unified framework. Extensive experiments on OGBench demonstrate that LAVL consistently outperforms existing offline GCRL methods, achieving the highest performance on 20 out of 22 datasets. Notably, LAVL exhibits strong performance in long-horizon tasks and trajectory stitching datasets, where prior methods suffer significant performance degradation. Our code is available at https://github.com/oh-lab/LAVL.git.
Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning
goal-conditioned reinforcement learning
value function generalization
long-horizon tasks
erroneous generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent representation alignment
offline goal-conditioned reinforcement learning
value function generalization
hierarchical planning
inductive bias
🔎 Similar Papers
2024-08-14Neural Information Processing SystemsCitations: 0