Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This work addresses the limitations of Gaussian policies in offline goal-conditioned reinforcement learning—namely, their restricted expressiveness and the difficulty of high-level policies in generating effective subgoals—by proposing a Goal-Conditioned Mean Flow Policy. The method introduces, for the first time, an average velocity field into a hierarchical offline goal-conditioned reinforcement learning framework, enabling efficient single-step sampling from complex goal distributions through implicit modeling. Additionally, it incorporates a LeJEPA contrastive loss to enhance the discriminability and generalization of goal embeddings. Evaluated on both state-based and pixel-based tasks in the OGBench benchmark, the proposed approach significantly outperforms existing methods, overcoming the performance bottlenecks inherent in conventional Gaussian policies.

Technology Category

Application Category

📝 Abstract
Offline goal-conditioned reinforcement learning (GCRL) is a practical reinforcement learning paradigm that aims to learn goal-conditioned policies from reward-free offline data. Despite recent advances in hierarchical architectures such as HIQL, long-horizon control in offline GCRL remains challenging due to the limited expressiveness of Gaussian policies and the inability of high-level policies to generate effective subgoals. To address these limitations, we propose the goal-conditioned mean flow policy, which introduces an average velocity field into hierarchical policy modeling for offline GCRL. Specifically, the mean flow policy captures complex target distributions for both high-level and low-level policies through a learned average velocity field, enabling efficient action generation via one-step sampling. Furthermore, considering the insufficiency of goal representation, we introduce a LeJEPA loss that repels goal representation embeddings during training, thereby encouraging more discriminative representations and improving generalization. Experimental results show that our method achieves strong performance across both state-based and pixel-based tasks in the OGBench benchmark.
Problem

Research questions and friction points this paper is trying to address.

offline goal-conditioned reinforcement learning
hierarchical reinforcement learning
long-horizon control
policy expressiveness
subgoal generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

mean flow policy
hierarchical reinforcement learning
offline goal-conditioned RL
LeJEPA loss
average velocity field
🔎 Similar Papers