🤖 AI Summary
This work addresses the challenge of erroneous generalization in goal-conditioned value functions within offline goal-conditioned reinforcement learning, particularly in long-horizon tasks where such errors severely degrade policy performance. The authors propose Latent-Aligned Value Learning (LAVL), which explicitly identifies this misgeneralization as a central bottleneck and introduces a latent representation alignment mechanism to jointly integrate value generalization and hierarchical planning within a unified framework, thereby injecting appropriate inductive bias into the model. Empirical evaluation demonstrates that LAVL achieves state-of-the-art performance on 20 out of 22 datasets in the OGBench benchmark, exhibiting substantial improvements over existing methods especially in long-horizon settings and trajectory stitching scenarios.
📝 Abstract
Offline goal-conditioned reinforcement learning (GCRL) provides a practical framework for obtaining goal-reaching policies from fixed datasets. However, learning a reliable goal-conditioned value function in long-horizon tasks remains challenging. In this paper, we identify erroneous generalization in goal-conditioned value functions as a fundamental bottleneck, and demonstrate that appropriate inductive bias in the value function is crucial for addressing the bottleneck. Building on these findings, we propose Latent-Aligned Value Learning (LAVL), an offline GCRL algorithm that integrates latent-representation-based value generalization with hierarchical planning in a unified framework. Extensive experiments on OGBench demonstrate that LAVL consistently outperforms existing offline GCRL methods, achieving the highest performance on 20 out of 22 datasets. Notably, LAVL exhibits strong performance in long-horizon tasks and trajectory stitching datasets, where prior methods suffer significant performance degradation. Our code is available at https://github.com/oh-lab/LAVL.git.