SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search

📅 2025-12-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle with self-correction under linear reasoning, leading to failures in complex planning tasks. Method: We propose a tri-agent collaborative cognitive architecture—Planner, Simulator, and Critic—tightly integrated with Monte Carlo Tree Search (MCTS). This framework transforms sparse-reward search into dense, reflection-guided self-correcting reasoning: the Planner generates candidate actions; the Simulator performs LLM-based dynamic environment simulation; and the Critic delivers multi-granular, verifiable reflective feedback. Symbolic action spaces and dynamic reward modeling further enhance planning robustness and efficiency. Results: On the DailyLifeAPIs benchmark, our method achieves 83.6% accuracy—exceeding the best prior approach by over 16 percentage points—while reducing token consumption. It is the first approach to realize a fully autonomous planning loop that is semantically rich, feedback-dense, and result-verifiable.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) often falter at complex planning tasks that require exploration and self-correction, as their linear reasoning process struggles to recover from early mistakes. While search algorithms like Monte Carlo Tree Search (MCTS) can explore alternatives, they are often ineffective when guided by sparse rewards and fail to leverage the rich semantic capabilities of LLMs. We introduce SPIRAL (Symbolic LLM Planning via Grounded and Reflective Search), a novel framework that embeds a cognitive architecture of three specialized LLM agents into an MCTS loop. SPIRAL's key contribution is its integrated planning pipeline where a Planner proposes creative next steps, a Simulator grounds the search by predicting realistic outcomes, and a Critic provides dense reward signals through reflection. This synergy transforms MCTS from a brute-force search into a guided, self-correcting reasoning process. On the DailyLifeAPIs and HuggingFace datasets, SPIRAL consistently outperforms the default Chain-of-Thought planning method and other state-of-the-art agents. More importantly, it substantially surpasses other state-of-the-art agents; for example, SPIRAL achieves 83.6% overall accuracy on DailyLifeAPIs, an improvement of over 16 percentage points against the next-best search framework, while also demonstrating superior token efficiency. Our work demonstrates that structuring LLM reasoning as a guided, reflective, and grounded search process yields more robust and efficient autonomous planners. The source code, full appendices, and all experimental data are available for reproducibility at the official project repository.
Problem

Research questions and friction points this paper is trying to address.

Enhances LLM planning with guided self-correction
Integrates specialized agents into search for dense rewards
Improves accuracy and efficiency in complex task planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates three specialized LLM agents into MCTS loop
Uses Planner, Simulator, Critic for guided reflective search
Transforms MCTS into self-correcting reasoning process
🔎 Similar Papers
No similar papers found.
Y
Yifan Zhang
Vanderbilt University, Nashville, TN, USA
Giridhar Ganapavarapu
Giridhar Ganapavarapu
IBM Research
Artificial IntelligenceBlockchain
Srideepika Jayaraman
Srideepika Jayaraman
Senior Research Engineer, IBM Research
Time seriesMachine LearningNatural Language Processing
B
Bhavna Agrawal
IBM T.J. Watson Research Center, Yorktown Heights, NY , USA
D
Dhaval Patel
IBM T.J. Watson Research Center, Yorktown Heights, NY , USA
Achille Fokoue
Achille Fokoue
IBM Research
Artificial IntelligenceKnowledge Representation and ReasoningSemantic WebOntologiesDescription Logics