SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search

📅 2025-12-28

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Large language models (LLMs) struggle with self-correction under linear reasoning, leading to failures in complex planning tasks. Method: We propose a tri-agent collaborative cognitive architecture—Planner, Simulator, and Critic—tightly integrated with Monte Carlo Tree Search (MCTS). This framework transforms sparse-reward search into dense, reflection-guided self-correcting reasoning: the Planner generates candidate actions; the Simulator performs LLM-based dynamic environment simulation; and the Critic delivers multi-granular, verifiable reflective feedback. Symbolic action spaces and dynamic reward modeling further enhance planning robustness and efficiency. Results: On the DailyLifeAPIs benchmark, our method achieves 83.6% accuracy—exceeding the best prior approach by over 16 percentage points—while reducing token consumption. It is the first approach to realize a fully autonomous planning loop that is semantically rich, feedback-dense, and result-verifiable.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) often falter at complex planning tasks that require exploration and self-correction, as their linear reasoning process struggles to recover from early mistakes. While search algorithms like Monte Carlo Tree Search (MCTS) can explore alternatives, they are often ineffective when guided by sparse rewards and fail to leverage the rich semantic capabilities of LLMs. We introduce SPIRAL (Symbolic LLM Planning via Grounded and Reflective Search), a novel framework that embeds a cognitive architecture of three specialized LLM agents into an MCTS loop. SPIRAL's key contribution is its integrated planning pipeline where a Planner proposes creative next steps, a Simulator grounds the search by predicting realistic outcomes, and a Critic provides dense reward signals through reflection. This synergy transforms MCTS from a brute-force search into a guided, self-correcting reasoning process. On the DailyLifeAPIs and HuggingFace datasets, SPIRAL consistently outperforms the default Chain-of-Thought planning method and other state-of-the-art agents. More importantly, it substantially surpasses other state-of-the-art agents; for example, SPIRAL achieves 83.6% overall accuracy on DailyLifeAPIs, an improvement of over 16 percentage points against the next-best search framework, while also demonstrating superior token efficiency. Our work demonstrates that structuring LLM reasoning as a guided, reflective, and grounded search process yields more robust and efficient autonomous planners. The source code, full appendices, and all experimental data are available for reproducibility at the official project repository.

Problem

Research questions and friction points this paper is trying to address.

Enhances LLM planning with guided self-correction

Integrates specialized agents into search for dense rewards

Improves accuracy and efficiency in complex task planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates three specialized LLM agents into MCTS loop

Uses Planner, Simulator, Critic for guided reflective search

Transforms MCTS into self-correcting reasoning process

🔎 Similar Papers

Fast and Accurate Task Planning using Neuro-Symbolic Language Models and Multi-level Goal Decomposition