On Rollouts in Model-Based Reinforcement Learning

πŸ“… 2025-01-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In model-based reinforcement learning (MBRL), prediction errors accumulate with rollout depth, causing simulation distortion, policy degradation, and failure in long-horizon planning. To address this, we propose Infoprop rolloutβ€”a novel mechanism that, for the first time in MBRL, explicitly decouples aleatoric and epistemic uncertainty and models both via Bayesian neural networks. We further design an error-accumulation-aware adaptive rollout termination criterion that dynamically truncates highly uncertain trajectories. Integrated into the Dyna architecture, this yields the Infoprop-Dyna algorithm. Evaluated on MuJoCo benchmarks, Infoprop-Dyna achieves state-of-the-art performance among Dyna-style methods: average rollout length increases by over 3Γ—, synthetic data distribution fidelity improves significantly, policy sample quality enhances, and training stability substantially increases.

Technology Category

Application Category

πŸ“ Abstract
Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it. However, accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning. Thus, the accumulation of model errors is a key bottleneck in current MBRL methods. We propose Infoprop, a model-based rollout mechanism that separates aleatoric from epistemic model uncertainty and reduces the influence of the latter on the data distribution. Further, Infoprop keeps track of accumulated model errors along a model rollout and provides termination criteria to limit data corruption. We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common MuJoCo benchmark tasks while substantially increasing rollout length and data quality.
Problem

Research questions and friction points this paper is trying to address.

Model-based Reinforcement Learning
Error Accumulation
Long-term Planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Infoprop
error management
model-based reinforcement learning
πŸ”Ž Similar Papers
No similar papers found.
B
Bernd Frauenknecht
Institute for Data Science in Mechanical Engineering, RWTH Aachen University, Aachen, 52068, Germany
D
Devdutt Subhasish
Institute for Data Science in Mechanical Engineering, RWTH Aachen University, Aachen, 52068, Germany
Friedrich Solowjow
Friedrich Solowjow
RWTH Aachen University
Machine LearningControlNetworked Systems
Sebastian Trimpe
Sebastian Trimpe
Professor, RWTH Aachen University
ControlMachine LearningNetworked SystemsRobotics