Hierarchical Reinforcement Learning with Uncertainty-Guided Diffusional Subgoals

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In hierarchical reinforcement learning (HRL), the dynamic evolution of low-level policies impedes high-level subgoal generation, primarily due to the complex subgoal distribution and the lack of explicit modeling of estimation uncertainty. To address this, we propose the Uncertainty-Guided Diffusion Model (UGDM), the first HRL framework to incorporate Gaussian process (GP) prior regularization into subgoal generation: it leverages GP-predicted means as guidance signals within a diffusion sampling process, enabling uncertainty-aware, diverse, and robust subgoal synthesis. UGDM avoids explicit parametric assumptions about the subgoal distribution and naturally supports continuous-control tasks. Evaluated on multiple challenging benchmarks, UGDM achieves substantial improvements in sample efficiency and final task performance, consistently outperforming state-of-the-art HRL methods.

Technology Category

Application Category

📝 Abstract
Hierarchical reinforcement learning (HRL) learns to make decisions on multiple levels of temporal abstraction. A key challenge in HRL is that the low-level policy changes over time, making it difficult for the high-level policy to generate effective subgoals. To address this issue, the high-level policy must capture a complex subgoal distribution while also accounting for uncertainty in its estimates. We propose an approach that trains a conditional diffusion model regularized by a Gaussian Process (GP) prior to generate a complex variety of subgoals while leveraging principled GP uncertainty quantification. Building on this framework, we develop a strategy that selects subgoals from both the diffusion policy and GP's predictive mean. Our approach outperforms prior HRL methods in both sample efficiency and performance on challenging continuous control benchmarks.
Problem

Research questions and friction points this paper is trying to address.

HRL struggles with changing low-level policies affecting subgoal effectiveness
High-level policy needs complex subgoal distribution and uncertainty handling
Proposed method combines diffusion model and GP for better subgoals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses conditional diffusion model for subgoal generation
Incorporates Gaussian Process prior for uncertainty
Combines diffusion policy and GP predictive mean
🔎 Similar Papers
No similar papers found.
V
V. Wang
Department of Electrical Engineering and Automation, Aalto University, Finland
Tinghuai Wang
Tinghuai Wang
Head of Multimodal AI, Huawei Research Finland; ex-Pr.Sci.@Nokia-RC/Labs-FIN, Sony-Labs, HP-Labs-UK
Reinforcement LearningMachine LearningFoundation ModelsComputer Vision
J
J. Pajarinen
Department of Electrical Engineering and Automation, Aalto University, Finland