Off-policy Reinforcement Learning with Model-based Exploration Augmentation

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address insufficient sample diversity and low exploration efficiency in passive reinforcement learning—particularly in high-dimensional environments—this paper proposes Model-enhanced Exploration (MoGE). MoGE introduces three key innovations: (1) a diffusion-based critical state generator that actively samples high-value yet under-explored states; (2) a first-order dynamics-consistent “one-step imagined world model” to synthesize physically plausible transition experiences; and (3) a differentiable utility function that guides state generation and enables plug-and-play integration with mainstream offline policy optimization algorithms. Extensive experiments on OpenAI Gym and DeepMind Control Suite demonstrate that MoGE significantly improves both sample efficiency and final policy performance, outperforming existing exploration methods on challenging continuous-control tasks.

Technology Category

Application Category

📝 Abstract

Exploration is fundamental to reinforcement learning (RL), as it determines how effectively an agent discovers and exploits the underlying structure of its environment to achieve optimal performance. Existing exploration methods generally fall into two categories: active exploration and passive exploration. The former introduces stochasticity into the policy but struggles in high-dimensional environments, while the latter adaptively prioritizes transitions in the replay buffer to enhance exploration, yet remains constrained by limited sample diversity. To address the limitation in passive exploration, we propose Modelic Generative Exploration (MoGE), which augments exploration through the generation of under-explored critical states and synthesis of dynamics-consistent experiences through transition models. MoGE is composed of two components: (1) a diffusion-based generator that synthesizes critical states under the guidance of a utility function evaluating each state's potential influence on policy exploration, and (2) a one-step imagination world model for constructing critical transitions based on the critical states for agent learning. Our method adopts a modular formulation that aligns with the principles of off-policy learning, allowing seamless integration with existing algorithms to improve exploration without altering their core structures. Empirical results on OpenAI Gym and DeepMind Control Suite reveal that MoGE effectively bridges exploration and policy learning, leading to remarkable gains in both sample efficiency and performance across complex control tasks.

Problem

Research questions and friction points this paper is trying to address.

Enhancing passive exploration in reinforcement learning

Generating under-explored critical states for improved learning

Integrating model-based augmentation with off-policy algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augments exploration with generated critical states

Uses diffusion model guided by utility function

Integrates one-step imagination model for transitions

🔎 Similar Papers

No similar papers found.