🤖 AI Summary
This work challenges the conventional view that model inaccuracy is the primary cause of search strategy failure in model-based reinforcement learning. It demonstrates that even with highly accurate models, performance can still degrade due to distributional shift induced by search. The study argues that mitigating the mismatch between the policy and the data distribution is more critical than merely improving the accuracy of the dynamics or value functions. Building on this insight, the authors propose a novel approach for effectively integrating planning into policy learning. The method achieves state-of-the-art performance across multiple established benchmark tasks, empirically validating both the efficacy and generalizability of the proposed mechanism.
📝 Abstract
This paper investigates search in model-based reinforcement learning (RL). Conventional wisdom holds that long-term predictions and compounding errors are the primary obstacles for model-based RL. We challenge this view, showing that search is not a plug-and-play replacement for a learned policy. Surprisingly, we find that search can harm performance even when the model is highly accurate. Instead, we show that mitigating distribution shift matters more than improving model or value function accuracy. Building on this insight, we identify key techniques for enabling effective search, achieving state-of-the-art performance across multiple popular benchmark domains.