🤖 AI Summary
This work investigates the theoretical lower bound on the number of reasoning tokens required by large language models (LLMs) for chain-of-thought (CoT) reasoning as a function of input size. By extending the Bounded-Attention Prefix Oracle (BAPO) model and combining complexity-theoretic lower-bound proofs with explicit algorithmic constructions, the paper establishes the first rigorous Ω(n) token complexity lower bound for CoT reasoning across canonical tasks such as binary majority, ternary matching, and graph reachability. Empirical evaluations confirm that leading LLMs exhibit nearly linear growth in reasoning token consumption on these tasks and suffer significant performance degradation under constrained token budgets, thereby revealing an intrinsic relationship between reasoning length and input scale.
📝 Abstract
Inference-time scaling via chain-of-thought (CoT) reasoning is a major driver of state-of-the-art LLM performance, but it comes with substantial latency and compute costs. We address a fundamental theoretical question: how many reasoning tokens are required to solve a problem as input size grows? By extending the bounded attention prefix oracle (BAPO) model--an abstraction of LLMs that quantifies the information flow required to solve a task--we prove lower bounds on the CoT tokens required for three canonical BAPO-hard tasks: binary majority, triplet matching, and graph reachability. We show that each requires $\Omega(n)$ reasoning tokens when the input size is $n$. We complement these results with matching or near-matching upper bounds via explicit constructions. Finally, our experiments with frontier reasoning models show approximately linear reasoning token scaling on these tasks and failures when constrained to smaller reasoning budgets, consistent with our theoretical lower bounds. Together, our results identify fundamental bottlenecks in inference-time compute through CoT and offer a principled tool for analyzing optimal reasoning length.