MCTS-EP: Empowering Embodied Planning with Online Preference Optimization

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the weak planning capability and low sample efficiency of embodied agents in complex environments. We propose an online preference optimization framework that integrates large language models (LLMs) with Monte Carlo tree search (MCTS). Methodologically, MCTS guides exploration and multi-step reasoning, while the LLM enables multimodal state understanding and action generation; we further introduce a search-augmented online preference learning mechanism—provably superior to conventional policy gradient methods under strongly convex loss and interpretable as an efficient extension of Generative Adversarial Imitation Learning (GAIL). Experiments demonstrate substantial improvements: task success rates of 92% and 87% on ALFWorld, an average reward of 0.81 on WebShop, and significant reductions in interaction steps (from 18.7/19.5 to 10.2/9.9), confirming simultaneous gains in planning quality and sample efficiency.

Technology Category

Application Category

📝 Abstract
This paper introduces MCTS-EP, an online learning framework that combines large language models (LLM) with Monte Carlo Tree Search (MCTS) for training embodied agents. MCTS-EP integrates three key components: MCTS-guided exploration for preference data collection, efficient multi-modal reasoning mechanism, and iterative training pipeline based on preference optimization. We theoretically prove that MCTS-EP achieves better performance bounds than conventional on-policy algorithms when the loss function is strongly convex, and demonstrate that it can be formulated as a search-enhanced variant of GAIL. MCTS-EP achieves state-of-the-art performace across serval benchmarks. In ALFWorld, it achieves 92% and 87% success rates for textual and visual tasks. In WebShop, it reaches an average reward of 0.81. MTCS-EP also reduces average interaction steps from from 18.7/19.5 to 10.2/9.9 steps in visual ALFWorld.Code available at: https://github.com/xuhang-2/Embodied-Agent-Planning
Problem

Research questions and friction points this paper is trying to address.

Training embodied agents using LLMs with Monte Carlo Tree Search
Improving performance bounds over conventional on-policy algorithms
Achieving state-of-the-art results in embodied planning benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLM with Monte Carlo Tree Search
MCTS-guided exploration for preference data collection
Iterative training pipeline using preference optimization
🔎 Similar Papers
No similar papers found.