🤖 AI Summary
This work addresses the significant yet poorly understood performance variations of Chain-of-Thought (CoT) reasoning across different tasks by providing the first theoretical framework for its step-by-step inference process. The authors model CoT as a Markov chain and propose that its effectiveness hinges on the consistency of the transition kernels between reasoning steps, while also quantifying how noise in intermediate steps degrades performance. Through rigorous theoretical analysis, they prove that consistent transition kernels substantially reduce sample complexity. To validate these predictions, they construct synthetic benchmark experiments that align with their theoretical findings and offer a principled explanation for the observed disparities in CoT’s empirical success across real-world tasks. This study thus establishes a novel theoretical lens and analytical framework for understanding and improving CoT reasoning.
📝 Abstract
Chain-of-Thought (CoT) prompting is a widely used inference-time technique for improving reasoning, yet its gains are uneven across tasks. We analyze when and why CoT helps by modeling the step-wise reasoning trajectory as a Markov chain. Each intermediate step is a state and the dependence between steps is captured by a transition kernel. Our theory identifies transition alignment, whether instances share a common step-wise transition kernel, as the key determinant of CoT's effectiveness. When transitions are identical across steps, CoT reduces inference-time sample complexity: fewer context sample trajectories suffice to recover the final decision. In contrast, when transitions differ across steps, these gains can vanish. We further quantify how noise in intermediate steps modulates CoT's benefit. Beyond theory, we design synthetic benchmarks that isolate these factors to complement prior results on real-world tasks and to empirically validate our predictions.