WorldPlanner: Monte Carlo Tree Search and MPC with Action-Conditioned Visual World Models

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the limitations of behavioral cloning—namely, poor task transferability and high dependence on large, expert-labeled datasets—by proposing a vision-based world model framework for autonomous planning. Methodologically, it introduces an action-conditioned visual world model trained on minimal unstructured play data to capture environment dynamics; a diffusion-model-driven action sampler to mitigate hallucination in multi-step prediction; and the first integration of Monte Carlo Tree Search (MCTS) with zeroth-order model predictive control (MPC) for end-to-end, long-horizon visual–action joint optimization. An optional reward model can be incorporated to enhance planning robustness. Evaluated on three real-robot manipulation tasks, the framework significantly outperforms behavioral cloning baselines in success rate and cross-task generalization. It establishes a new paradigm for data-efficient, generalizable robotic planning.

Technology Category

Application Category

📝 Abstract

Robots must understand their environment from raw sensory inputs and reason about the consequences of their actions in it to solve complex tasks. Behavior Cloning (BC) leverages task-specific human demonstrations to learn this knowledge as end-to-end policies. However, these policies are difficult to transfer to new tasks, and generating training data is challenging because it requires careful demonstrations and frequent environment resets. In contrast to such policy-based view, in this paper we take a model-based approach where we collect a few hours of unstructured easy-to-collect play data to learn an action-conditioned visual world model, a diffusion-based action sampler, and optionally a reward model. The world model -- in combination with the action sampler and a reward model -- is then used to optimize long sequences of actions with a Monte Carlo Tree Search (MCTS) planner. The resulting plans are executed on the robot via a zeroth-order Model Predictive Controller (MPC). We show that the action sampler mitigates hallucinations of the world model during planning and validate our approach on 3 real-world robotic tasks with varying levels of planning and modeling complexity. Our experiments support the hypothesis that planning leads to a significant improvement over BC baselines on a standard manipulation test environment.

Problem

Research questions and friction points this paper is trying to address.

Robots need to understand environments and reason about action consequences

Behavior Cloning policies are difficult to transfer across different tasks

Planning with world models requires mitigating hallucinations during optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-conditioned visual world model learning

Diffusion-based action sampler for planning

Monte Carlo Tree Search with MPC execution

🔎 Similar Papers

Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments