Caterpillar of Thoughts: The Optimal Test-Time Algorithm for Large Language Models

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the problem of optimally allocating test-time computation for large language models under a fixed inference budget to maximize output quality. By modeling test-time computation as a Markov decision process with full backtracking capability to any prior reasoning state, the authors prove that the theoretically optimal strategy requires generating only “caterpillar tree” structures—trees that reduce to a simple path upon leaf removal—thereby drastically shrinking the search space. Building on this insight, they propose CaT, an efficient algorithm that jointly optimizes tree-structured reasoning and computational scheduling. Experiments demonstrate that CaT achieves higher task success rates than Tree-of-Thoughts while substantially reducing token consumption, offering a compelling balance between efficiency and performance.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) can often produce substantially better outputs when allowed to use additional test-time computation, such as sampling, chain of thought, backtracking, or revising partial solutions. Despite the growing empirical success of such techniques, there is limited theoretical understanding of how inference time computation should be structured, or what constitutes an optimal use of a fixed computation budget. We model test-time computation as an algorithm interacting with a Markov chain: at any point, the algorithm may resume generation from any previously observed state. That is, unlike standard Markov chains where the states are drawn passively, we allow the algorithm to backtrack to any previously observed state of the Markov chain at any time. Many of the existing test-time algorithms, such as Chain-of-Thought (CoT) (Wei et al., 2023), Tree-of-Thoughts (ToT) (Yao et al., 2023), or Best-of-$k$ (Brown et al., 2024) could be seen as specific algorithms in this model. We prove that while backtracking can reduce the number of generations exponentially, a very limited form of backtracking is theoretically sufficient. Namely, we show that the optimal algorithm always generates a caterpillar tree. That is, if we remove the leaves of the state tree generated by the optimal algorithm, we obtain a path. Motivated by our characterization of the optimal algorithm, we present Caterpillar of Thoughts (CaT), a new test-time computation algorithm, reducing the number of token/state generations. Our empirical evaluation shows that CaT, compared to ToT, achieves a better success rate while also reducing the number of token generations.

Problem

Research questions and friction points this paper is trying to address.

test-time computation

large language models

optimal inference

backtracking

computation budget

Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time computation

backtracking

caterpillar tree