π€ AI Summary
This work elucidates the theoretical mechanisms underlying the success and failure of Chain-of-Thought (CoT) reasoning, focusing on the trade-off between its benefits and error accumulation. By constructing a learning-theoretic framework, CoT is modeled as an interaction between an answer mapping and an autoregressive rule for generating intermediate reasoning steps. The study introduces the first tight decomposition of reasoning risk into two components: Oracle Trajectory Risk (OTR), which captures the benefit of idealized reasoning, and Trajectory Mismatch Risk (TMR), which quantifies the cost of deviation from such an oracle. Theoretical analysis reveals that under unstable conditions, TMR can grow unboundedly, whereas under stable conditions, error propagation is governed by a precise amplification factor, yielding bounded, linear, or exponential growth regimes. This characterization fully delineates the validity boundary of CoT and identifies key factors governing the benefitβcost trade-off, providing a theoretical foundation for designing reliable reasoning systems.
π Abstract
We develop a learning-theoretic framework for understanding Chain of Thought (CoT). We model CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and define the reasoning risk of a hypothesis under this interaction. Our first result is a tight canonical decomposition of this risk into two terms with opposing roles: an oracle-trajectory risk (OTR), which captures the benefit of CoT and reduces to a target-domain risk in a domain adaptation problem, and a trajectory-mismatch risk (TMR), which captures the cost of CoT through error accumulation along mismatched reasoning trajectories. We then show that this cost is unavoidable without structure: if any one of the loss, the hypothesis answer map, or the chain rule lacks stability, the TMR can be arbitrarily large even when the OTR is zero and the hypothesis is uniformly close to the ground truth. Conversely, under stability, we prove a tight upper bound on the TMR governed by an exact amplification factor that identifies bounded, linear, and exponential error-growth regimes. Together, these results give a precise theory of when CoT helps, when it hurts, and what controls the transition between the two.