SGA-MCTS: Decoupling Planning from Execution via Training-Free Atomic Experience Retrieval

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Large language models struggle with high inference latency and poor generalization from fine-tuning in complex, multi-step decision-making tasks. This work proposes a training-free, non-parametric planning approach that decouples planning from execution: offline, it leverages Monte Carlo Tree Search to generate delexicalized state-goal-action (SGA) atoms; online, it dynamically re-ground these atoms via a hybrid symbolic-semantic retrieval mechanism to construct soft reasoning prompts. Operating with frozen model weights, the method effectively balances the depth of System 2 reasoning with the speed of System 1 inference. Evaluated across multiple challenging benchmarks, the approach enables open-source, frozen models to match the performance of advanced systems such as GPT-5, all without task-specific fine-tuning.

Technology Category

Application Category

📝 Abstract

LLM-powered systems require complex multi-step decision-making abilities to solve real-world tasks, yet current planning approaches face a trade-off between the high latency of inference-time search and the limited generalization of supervised fine-tuning. To address this limitation, we introduce \textbf{SGA-MCTS}, a framework that casts LLM planning as non-parametric retrieval. Offline, we leverage Monte Carlo Tree Search (MCTS) to explore the solution space and distill high-fidelity trajectories into State-Goal-Action (SGA) atoms. These atoms are de-lexicalized primitives that abstract concrete entities into symbolic slots, preserving reusable causal logic while discarding domain-specific noise. Online, a retrieval-augmented agent employs a hybrid symbolic-semantic mechanism to fetch relevant SGAs and re-ground them into the current context as soft reasoning hints. Empirical results on complex benchmarks demonstrate that this paradigm enables frozen, open-weights models to match the performance of SOTA systems (e.g., GPT-5) without task-specific fine-tuning. By effectively amortizing the heavy computational cost of search, SGA-MCTS achieves System 2 reasoning depth at System 1 inference speeds, rendering autonomous planning both scalable and real-time feasible.

Problem

Research questions and friction points this paper is trying to address.

LLM planning

multi-step decision-making

inference latency

generalization

real-time planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

SGA-MCTS

non-parametric retrieval

atomic experience