🤖 AI Summary
Existing planning-learning paradigms for text games (e.g., MCTS combined with RL) suffer from reliance on multiple iterative rollouts, limited semantic understanding, and inefficient exploration. To address these limitations, this paper proposes a dynamic memory-augmented LLM-MCTS planning framework. Its core innovation is a novel dynamic memory mechanism that jointly models intra-episode immediate feedback and inter-episode experience memory, enabling large language models to refine action evaluation and subtree selection in a single forward pass. Evaluated on multiple games in the Jericho benchmark, our framework achieves superior task success rates with only one initial planning rollout—outperforming state-of-the-art iterative methods while drastically improving planning efficiency. This work establishes a new paradigm for efficient, language-grounded decision-making.
📝 Abstract
Text-based games provide valuable environments for language-based autonomous agents. However, planning-then-learning paradigms, such as those combining Monte Carlo Tree Search (MCTS) and reinforcement learning (RL), are notably time-consuming due to extensive iterations. Additionally, these algorithms perform uncertainty-driven exploration but lack language understanding and reasoning abilities. In this paper, we introduce the Monte Carlo planning with Dynamic Memory-guided Large language model (MC-DML) algorithm. MC-DML leverages the language understanding and reasoning capabilities of Large Language Models (LLMs) alongside the exploratory advantages of tree search algorithms. Specifically, we enhance LLMs with in-trial and cross-trial memory mechanisms, enabling them to learn from past experiences and dynamically adjust action evaluations during planning. We conduct experiments on a series of text-based games from the Jericho benchmark. Our results demonstrate that the MC-DML algorithm significantly enhances performance across various games at the initial planning phase, outperforming strong contemporary methods that require multiple iterations. This demonstrates the effectiveness of our algorithm, paving the way for more efficient language-grounded planning in complex environments.