🤖 AI Summary
This work establishes a theoretical lower bound on the number of chain-of-thought (CoT) reasoning steps required by hard-attention Transformers to solve fundamental algorithmic problems. Specifically, it targets TC⁰ problems—including Parity and integer multiplication—whose circuit complexity suggests apparent tractability. The authors develop a unified analytical framework integrating information bottleneck analysis, problem reduction, and precise hard-attention modeling. Their main contribution is the first tight (up to logarithmic factors) lower bound of Ω(n/log n) on CoT step count for such tasks. This result refutes prior optimistic expressivity estimates derived from circuit complexity and establishes, for the first time, a quantitative relationship between CoT length and the intrinsic computational depth of the problem. It reveals a fundamental limitation: short-chain CoT reasoning is provably insufficient for solving these problems. The analysis provides critical theoretical grounding for understanding the reasoning capabilities—and inherent limitations—of large language models.
📝 Abstract
Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers' expressivity from $TC^0$ to $PTIME$, their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $TC^0$, such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of CoT steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought reasoning.