🤖 AI Summary
To address the challenges of limited data coverage and poor generalization in long-horizon tasks within offline goal-conditioned reinforcement learning (GCRL), this paper proposes a physics-informed regularization method. Specifically, it integrates continuous-time optimal control theory with the Eikonal partial differential equation to construct a geometrically consistent inductive bias for the value function. Within the temporal-difference learning framework, a gradient-constraint regularization term is introduced and embedded into hierarchical implicit Q-learning (HIQL), ensuring structural soundness of value predictions without requiring environment interaction. Experiments demonstrate that our approach significantly outperforms existing offline GCRL methods on large-scale navigation and complex state-transition tasks. Notably, it exhibits superior robustness in path concatenation and sparse-data regimes, while achieving enhanced cross-task generalization capability.
📝 Abstract
Offline Goal-Conditioned Reinforcement Learning (GCRL) holds great promise for domains such as autonomous navigation and locomotion, where collecting interactive data is costly and unsafe. However, it remains challenging in practice due to the need to learn from datasets with limited coverage of the state-action space and to generalize across long-horizon tasks. To improve on these challenges, we propose a Physics-informed (Pi) regularized loss for value learning, derived from the Eikonal Partial Differential Equation (PDE) and which induces a geometric inductive bias in the learned value function. Unlike generic gradient penalties that are primarily used to stabilize training, our formulation is grounded in continuous-time optimal control and encourages value functions to align with cost-to-go structures. The proposed regularizer is broadly compatible with temporal-difference-based value learning and can be integrated into existing Offline GCRL algorithms. When combined with Hierarchical Implicit Q-Learning (HIQL), the resulting method, Physics-informed HIQL (Pi-HIQL), yields significant improvements in both performance and generalization, with pronounced gains in stitching regimes and large-scale navigation tasks.