The Surprising Difficulty of Search in Model-Based Reinforcement Learning

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work challenges the conventional view that model inaccuracy is the primary cause of search strategy failure in model-based reinforcement learning. It demonstrates that even with highly accurate models, performance can still degrade due to distributional shift induced by search. The study argues that mitigating the mismatch between the policy and the data distribution is more critical than merely improving the accuracy of the dynamics or value functions. Building on this insight, the authors propose a novel approach for effectively integrating planning into policy learning. The method achieves state-of-the-art performance across multiple established benchmark tasks, empirically validating both the efficacy and generalizability of the proposed mechanism.

Technology Category

Application Category

📝 Abstract

This paper investigates search in model-based reinforcement learning (RL). Conventional wisdom holds that long-term predictions and compounding errors are the primary obstacles for model-based RL. We challenge this view, showing that search is not a plug-and-play replacement for a learned policy. Surprisingly, we find that search can harm performance even when the model is highly accurate. Instead, we show that mitigating distribution shift matters more than improving model or value function accuracy. Building on this insight, we identify key techniques for enabling effective search, achieving state-of-the-art performance across multiple popular benchmark domains.

Problem

Research questions and friction points this paper is trying to address.

model-based reinforcement learning

distribution shift

compounding errors

policy optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

model-based reinforcement learning

distribution shift

compounding errors

policy optimization