On Rollouts in Model-Based Reinforcement Learning

πŸ“… 2025-01-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

213K/year
πŸ€– AI Summary
In model-based reinforcement learning (MBRL), prediction errors accumulate with rollout depth, causing simulation distortion, policy degradation, and failure in long-horizon planning. To address this, we propose Infoprop rolloutβ€”a novel mechanism that, for the first time in MBRL, explicitly decouples aleatoric and epistemic uncertainty and models both via Bayesian neural networks. We further design an error-accumulation-aware adaptive rollout termination criterion that dynamically truncates highly uncertain trajectories. Integrated into the Dyna architecture, this yields the Infoprop-Dyna algorithm. Evaluated on MuJoCo benchmarks, Infoprop-Dyna achieves state-of-the-art performance among Dyna-style methods: average rollout length increases by over 3Γ—, synthetic data distribution fidelity improves significantly, policy sample quality enhances, and training stability substantially increases.

Technology Category

Application Category

πŸ“ Abstract
Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it. However, accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning. Thus, the accumulation of model errors is a key bottleneck in current MBRL methods. We propose Infoprop, a model-based rollout mechanism that separates aleatoric from epistemic model uncertainty and reduces the influence of the latter on the data distribution. Further, Infoprop keeps track of accumulated model errors along a model rollout and provides termination criteria to limit data corruption. We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common MuJoCo benchmark tasks while substantially increasing rollout length and data quality.
Problem

Research questions and friction points this paper is trying to address.

Model-based Reinforcement Learning
Error Accumulation
Long-term Planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Infoprop
error management
model-based reinforcement learning
πŸ”Ž Similar Papers
No similar papers found.