🤖 AI Summary
Current tree-search paradigms for large language model (LLM) inference suffer from domain fragmentation and weak theoretical foundations—particularly regarding the ambiguous role of reward signals: are they transient search heuristics or persistent learning objectives?
Method: We propose the first formal unification framework that explicitly decouples search mechanisms, reward modeling, and state transitions. It rigorously distinguishes inference-time search guidance from learning-time reward modeling and establishes a modular taxonomy. By integrating tree-search algorithms, reinforcement learning principles, and LLM fine-tuning techniques, the framework jointly enables inference-time scaling and model self-improvement.
Contribution/Results: Our work provides the first principled, component-aware foundation for autonomous agents—enabling interpretable, scalable, and self-evolving systems. It clarifies conceptual boundaries, resolves foundational ambiguities in reward usage, and offers a systematic theoretical pathway toward autonomous agent development.
📝 Abstract
Deliberative tree search is a cornerstone of modern Large Language Model (LLM) research, driving the pivot from brute-force scaling toward algorithmic efficiency. This single paradigm unifies two critical frontiers: extbf{Test-Time Scaling (TTS)}, which deploys on-demand computation to solve hard problems, and extbf{Self-Improvement}, which uses search-generated data to durably enhance model parameters. However, this burgeoning field is fragmented and lacks a common formalism, particularly concerning the ambiguous role of the reward signal -- is it a transient heuristic or a durable learning target? This paper resolves this ambiguity by introducing a unified framework that deconstructs search algorithms into three core components: the emph{Search Mechanism}, emph{Reward Formulation}, and emph{Transition Function}. We establish a formal distinction between transient extbf{Search Guidance} for TTS and durable extbf{Parametric Reward Modeling} for Self-Improvement. Building on this formalism, we introduce a component-centric taxonomy, synthesize the state-of-the-art, and chart a research roadmap toward more systematic progress in creating autonomous, self-improving agents.