On Rollouts in Model-Based Reinforcement Learning

📅 2025-01-28

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

In model-based reinforcement learning (MBRL), prediction errors accumulate with rollout depth, causing simulation distortion, policy degradation, and failure in long-horizon planning. To address this, we propose Infoprop rollout—a novel mechanism that, for the first time in MBRL, explicitly decouples aleatoric and epistemic uncertainty and models both via Bayesian neural networks. We further design an error-accumulation-aware adaptive rollout termination criterion that dynamically truncates highly uncertain trajectories. Integrated into the Dyna architecture, this yields the Infoprop-Dyna algorithm. Evaluated on MuJoCo benchmarks, Infoprop-Dyna achieves state-of-the-art performance among Dyna-style methods: average rollout length increases by over 3×, synthetic data distribution fidelity improves significantly, policy sample quality enhances, and training stability substantially increases.

Technology Category

Application Category

📝 Abstract

Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it. However, accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning. Thus, the accumulation of model errors is a key bottleneck in current MBRL methods. We propose Infoprop, a model-based rollout mechanism that separates aleatoric from epistemic model uncertainty and reduces the influence of the latter on the data distribution. Further, Infoprop keeps track of accumulated model errors along a model rollout and provides termination criteria to limit data corruption. We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common MuJoCo benchmark tasks while substantially increasing rollout length and data quality.

Problem

Research questions and friction points this paper is trying to address.

Model-based Reinforcement Learning

Error Accumulation

Long-term Planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Infoprop

error management

model-based reinforcement learning

🔎 Similar Papers

No similar papers found.