🤖 AI Summary
Offline goal-conditioned reinforcement learning often suffers from value estimation bias due to insufficient state-action coverage. This work proposes a novel approach that, for the first time, integrates the viscosity solution of the Hamilton–Jacobi–Bellman (HJB) equation with the Feynman–Kac theorem to introduce physics-informed regularization grounded in optimal control theory, thereby constraining the value iteration process. By leveraging the Feynman–Kac theorem, the associated partial differential equation is reformulated into an expectation form, enabling numerically stable Monte Carlo estimation. The method provides theoretical guarantees of consistency and significantly enhances the geometric consistency and generalization capability of the value function in both navigation and high-dimensional manipulation tasks.
📝 Abstract
Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value estimation remains a challenge due to the limited coverage of the state-action space. Recent physics-informed approaches have sought to address this by imposing physical and geometric constraints on the value function through regularization defined over first-order partial differential equations (PDEs), such as the Eikonal equation. However, these formulations can often be ill-posed in complex, high-dimensional environments. In this work, we propose a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation. By providing a physics-based inductive bias, our approach grounds the learning process in optimal control theory, explicitly regularizing and bounding updates during value iterations. Furthermore, we leverage the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients. Experiments demonstrate that our method improves geometric consistency, making it broadly applicable to navigation and high-dimensional, complex manipulation tasks. Open-source codes are available at https://github.com/HrishikeshVish/phys-fk-value-GCRL.