World Models Increase Autonomy in Reinforcement Learning

📅 2024-08-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key bottlenecks in reset-free reinforcement learning (RFRL)—namely, frequent human intervention and reliance on environmental rewards or expert demonstrations—by proposing MoReFree, a model-driven agent. Methodologically, it is the first to systematically validate the critical role of world models in reset-free settings; it introduces a decoupled framework integrating task-aware state-prioritized sampling, unsupervised intrinsic motivation for exploration, and dynamic policy updating, enabling adaptive co-optimization of exploration and policy learning. In terms of contributions and results, MoReFree achieves significantly improved data efficiency across diverse reset-free tasks, outperforming reward- or demonstration-dependent state-of-the-art baselines. It substantially reduces human reset interventions and supervision overhead, establishing a novel paradigm for autonomous, continual learning.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) is an appealing paradigm for training intelligent agents, enabling policy acquisition from the agent's own autonomously acquired experience. However, the training process of RL is far from automatic, requiring extensive human effort to reset the agent and environments. To tackle the challenging reset-free setting, we first demonstrate the superiority of model-based (MB) RL methods in such setting, showing that a straightforward adaptation of MBRL can outperform all the prior state-of-the-art methods while requiring less supervision. We then identify limitations inherent to this direct extension and propose a solution called model-based reset-free (MoReFree) agent, which further enhances the performance. MoReFree adapts two key mechanisms, exploration and policy learning, to handle reset-free tasks by prioritizing task-relevant states. It exhibits superior data-efficiency across various reset-free tasks without access to environmental reward or demonstrations while significantly outperforming privileged baselines that require supervision. Our findings suggest model-based methods hold significant promise for reducing human effort in RL. Website: https://sites.google.com/view/morefree
Problem

Research questions and friction points this paper is trying to address.

Reset-free Reinforcement Learning
Model-based RL methods
Reducing human effort in RL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-based RL adaptation
MoReFree agent introduction
Task-relevant state prioritization
🔎 Similar Papers
Z
Zhao Yang
The Leiden Institute of Advanced Computer Science, Leiden University
T
T. Moerland
The Leiden Institute of Advanced Computer Science, Leiden University
Mike Preuss
Mike Preuss
Universiteit Leiden
Artificial IntelligenceGamesChemAIOptimizationSocial Media Computing
A
A. Plaat
The Leiden Institute of Advanced Computer Science, Leiden University
Edward S. Hu
Edward S. Hu
PhD Student, University of Pennsylvania
Reinforcement LearningRobot LearningPerceptionRobotics