Off-policy Reinforcement Learning with Model-based Exploration Augmentation

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient sample diversity and low exploration efficiency in passive reinforcement learning—particularly in high-dimensional environments—this paper proposes Model-enhanced Exploration (MoGE). MoGE introduces three key innovations: (1) a diffusion-based critical state generator that actively samples high-value yet under-explored states; (2) a first-order dynamics-consistent “one-step imagined world model” to synthesize physically plausible transition experiences; and (3) a differentiable utility function that guides state generation and enables plug-and-play integration with mainstream offline policy optimization algorithms. Extensive experiments on OpenAI Gym and DeepMind Control Suite demonstrate that MoGE significantly improves both sample efficiency and final policy performance, outperforming existing exploration methods on challenging continuous-control tasks.

Technology Category

Application Category

📝 Abstract
Exploration is fundamental to reinforcement learning (RL), as it determines how effectively an agent discovers and exploits the underlying structure of its environment to achieve optimal performance. Existing exploration methods generally fall into two categories: active exploration and passive exploration. The former introduces stochasticity into the policy but struggles in high-dimensional environments, while the latter adaptively prioritizes transitions in the replay buffer to enhance exploration, yet remains constrained by limited sample diversity. To address the limitation in passive exploration, we propose Modelic Generative Exploration (MoGE), which augments exploration through the generation of under-explored critical states and synthesis of dynamics-consistent experiences through transition models. MoGE is composed of two components: (1) a diffusion-based generator that synthesizes critical states under the guidance of a utility function evaluating each state's potential influence on policy exploration, and (2) a one-step imagination world model for constructing critical transitions based on the critical states for agent learning. Our method adopts a modular formulation that aligns with the principles of off-policy learning, allowing seamless integration with existing algorithms to improve exploration without altering their core structures. Empirical results on OpenAI Gym and DeepMind Control Suite reveal that MoGE effectively bridges exploration and policy learning, leading to remarkable gains in both sample efficiency and performance across complex control tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing passive exploration in reinforcement learning
Generating under-explored critical states for improved learning
Integrating model-based augmentation with off-policy algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Augments exploration with generated critical states
Uses diffusion model guided by utility function
Integrates one-step imagination model for transitions
🔎 Similar Papers
No similar papers found.
L
Likun Wang
School of Vehicle and Mobility & College of AI, Tsinghua University
X
Xiangteng Zhang
School of Vehicle and Mobility & College of AI, Tsinghua University
Yinuo Wang
Yinuo Wang
Tsinghua University
LLMReinforcement LearningAutonomous DrivingDiffusion Model
G
Guojian Zhan
School of Vehicle and Mobility & College of AI, Tsinghua University
W
Wenxuan Wang
School of Vehicle and Mobility & College of AI, Tsinghua University
H
Haoyu Gao
School of Vehicle and Mobility & College of AI, Tsinghua University
Jingliang Duan
Jingliang Duan
University of Science and Technology Beijing
S
Shengbo Eben Li
School of Vehicle and Mobility & College of AI, Tsinghua University