To CoT or To Loop? A Formal Comparison Between Chain-of-Thought and Looped Transformers

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Understanding the fundamental computational distinctions between chain-of-thought (CoT) reasoning and looped Transformer architectures remains an open challenge. Method: We conduct a systematic theoretical comparison using formal language theory, computational complexity analysis, and directed acyclic graph (DAG) computation models. Contribution/Results: We establish the first rigorous computational separation between these paradigms: Looped Transformers efficiently support deterministic parallel computation—e.g., DAG evaluation—due to their implicit recurrence over fixed-length sequences; in contrast, CoT excels at approximate compositional reasoning under stochastic decoding, particularly on self-reducible problems, owing to its explicit sequential decomposition. Our analysis reveals an intrinsic complementarity between deep recursion (embodied by looped Transformers) and iterative chain expansion (characteristic of CoT). This work provides the first verifiable theoretical criterion—and corresponding practical guidance—for selecting inference architectures in large language models, grounded in provable computational properties rather than empirical heuristics.

Technology Category

Application Category

📝 Abstract

Chain-of-Thought (CoT) and Looped Transformers have been shown to empirically improve performance on reasoning tasks and to theoretically enhance expressivity by recursively increasing the number of computational steps. However, their comparative capabilities are still not well understood. In this paper, we provide a formal analysis of their respective strengths and limitations. We show that Looped Transformers can efficiently simulate parallel computations for deterministic tasks, which we formalize as evaluation over directed acyclic graphs. In contrast, CoT with stochastic decoding excels at approximate inference for compositional structures, namely self-reducible problems. These separations suggest the tasks for which depth-driven recursion is more suitable, thereby offering practical cues for choosing between reasoning paradigms.

Problem

Research questions and friction points this paper is trying to address.

Compare CoT and Looped Transformers' reasoning capabilities

Analyze strengths in parallel vs. compositional tasks

Guide paradigm choice based on task requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Looped Transformers simulate parallel computations efficiently

CoT excels at approximate inference tasks

Formal comparison guides reasoning paradigm choice

🔎 Similar Papers

On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding