π€ AI Summary
In model-based reinforcement learning (MBRL), prediction errors accumulate with rollout depth, causing simulation distortion, policy degradation, and failure in long-horizon planning. To address this, we propose Infoprop rolloutβa novel mechanism that, for the first time in MBRL, explicitly decouples aleatoric and epistemic uncertainty and models both via Bayesian neural networks. We further design an error-accumulation-aware adaptive rollout termination criterion that dynamically truncates highly uncertain trajectories. Integrated into the Dyna architecture, this yields the Infoprop-Dyna algorithm. Evaluated on MuJoCo benchmarks, Infoprop-Dyna achieves state-of-the-art performance among Dyna-style methods: average rollout length increases by over 3Γ, synthetic data distribution fidelity improves significantly, policy sample quality enhances, and training stability substantially increases.
π Abstract
Model-based reinforcement learning (MBRL) seeks to enhance data efficiency by learning a model of the environment and generating synthetic rollouts from it. However, accumulated model errors during these rollouts can distort the data distribution, negatively impacting policy learning and hindering long-term planning. Thus, the accumulation of model errors is a key bottleneck in current MBRL methods. We propose Infoprop, a model-based rollout mechanism that separates aleatoric from epistemic model uncertainty and reduces the influence of the latter on the data distribution. Further, Infoprop keeps track of accumulated model errors along a model rollout and provides termination criteria to limit data corruption. We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common MuJoCo benchmark tasks while substantially increasing rollout length and data quality.