SGA-MCTS: Decoupling Planning from Execution via Training-Free Atomic Experience Retrieval

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

230K/year
🤖 AI Summary
Large language models struggle with high inference latency and poor generalization from fine-tuning in complex, multi-step decision-making tasks. This work proposes a training-free, non-parametric planning approach that decouples planning from execution: offline, it leverages Monte Carlo Tree Search to generate delexicalized state-goal-action (SGA) atoms; online, it dynamically re-ground these atoms via a hybrid symbolic-semantic retrieval mechanism to construct soft reasoning prompts. Operating with frozen model weights, the method effectively balances the depth of System 2 reasoning with the speed of System 1 inference. Evaluated across multiple challenging benchmarks, the approach enables open-source, frozen models to match the performance of advanced systems such as GPT-5, all without task-specific fine-tuning.

Technology Category

Application Category

📝 Abstract
LLM-powered systems require complex multi-step decision-making abilities to solve real-world tasks, yet current planning approaches face a trade-off between the high latency of inference-time search and the limited generalization of supervised fine-tuning. To address this limitation, we introduce \textbf{SGA-MCTS}, a framework that casts LLM planning as non-parametric retrieval. Offline, we leverage Monte Carlo Tree Search (MCTS) to explore the solution space and distill high-fidelity trajectories into State-Goal-Action (SGA) atoms. These atoms are de-lexicalized primitives that abstract concrete entities into symbolic slots, preserving reusable causal logic while discarding domain-specific noise. Online, a retrieval-augmented agent employs a hybrid symbolic-semantic mechanism to fetch relevant SGAs and re-ground them into the current context as soft reasoning hints. Empirical results on complex benchmarks demonstrate that this paradigm enables frozen, open-weights models to match the performance of SOTA systems (e.g., GPT-5) without task-specific fine-tuning. By effectively amortizing the heavy computational cost of search, SGA-MCTS achieves System 2 reasoning depth at System 1 inference speeds, rendering autonomous planning both scalable and real-time feasible.
Problem

Research questions and friction points this paper is trying to address.

LLM planning
multi-step decision-making
inference latency
generalization
real-time planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

SGA-MCTS
non-parametric retrieval
atomic experience
symbolic-semantic reasoning
amortized planning