🤖 AI Summary
In hierarchical reinforcement learning (HRL), the dynamic evolution of low-level policies impedes high-level subgoal generation, primarily due to the complex subgoal distribution and the lack of explicit modeling of estimation uncertainty. To address this, we propose the Uncertainty-Guided Diffusion Model (UGDM), the first HRL framework to incorporate Gaussian process (GP) prior regularization into subgoal generation: it leverages GP-predicted means as guidance signals within a diffusion sampling process, enabling uncertainty-aware, diverse, and robust subgoal synthesis. UGDM avoids explicit parametric assumptions about the subgoal distribution and naturally supports continuous-control tasks. Evaluated on multiple challenging benchmarks, UGDM achieves substantial improvements in sample efficiency and final task performance, consistently outperforming state-of-the-art HRL methods.
📝 Abstract
Hierarchical reinforcement learning (HRL) learns to make decisions on multiple levels of temporal abstraction. A key challenge in HRL is that the low-level policy changes over time, making it difficult for the high-level policy to generate effective subgoals. To address this issue, the high-level policy must capture a complex subgoal distribution while also accounting for uncertainty in its estimates. We propose an approach that trains a conditional diffusion model regularized by a Gaussian Process (GP) prior to generate a complex variety of subgoals while leveraging principled GP uncertainty quantification. Building on this framework, we develop a strategy that selects subgoals from both the diffusion policy and GP's predictive mean. Our approach outperforms prior HRL methods in both sample efficiency and performance on challenging continuous control benchmarks.