🤖 AI Summary
This study investigates how chain-of-thought (CoT) supervised learning affects the generalization capability of Transformers on symbolic reasoning tasks, with particular focus on how algorithmic complexity constrains generalization upper bounds.
Method: We propose a three-parameter logistic curve model to characterize learning dynamics and conduct controlled experiments on symbolic tasks with tunable complexity, comparing answer-only supervision against CoT supervision while analyzing grokking phenomena and internal computation pathways.
Contribution/Results: We identify, for the first time, a “reasoning trajectory infidelity” phase early in training and demonstrate that CoT fundamentally reshapes model-internal computation. CoT significantly accelerates generalization on low-complexity tasks (e.g., addition, sorting) but fails to overcome inherent bottlenecks in high-complexity tasks (e.g., list intersection). Furthermore, we quantify the dynamic evolution of reasoning fidelity during training, establishing a novel, interpretable, and measurable framework for understanding CoT’s mechanistic role in symbolic reasoning.
📝 Abstract
Chain-of-thought (CoT) supervision can substantially improve transformer performance, yet the mechanisms by which models learn to follow and benefit from CoT remain poorly understood. We investigate these learning dynamics through the lens of grokking by pretraining transformers on symbolic reasoning tasks with tunable algorithmic complexity and controllable data composition to study their generalization. Models were trained under two settings: (i) producing only final answers, and (ii) emitting explicit CoT traces before answering. Our results show that while CoT generally improves task performance, its benefits depend on task complexity. To quantify these effects, we model the accuracy of the logarithmic training steps with a three-parameter logistic curve, revealing how the learning speed and shape vary with task complexity, data distribution, and the presence of CoT supervision. We also uncover a transient trace unfaithfulness phase: early in training, models often produce correct answers while skipping or contradicting CoT steps, before later aligning their reasoning traces with answers. Empirically, we (1) demonstrate that CoT accelerates generalization but does not overcome tasks with higher algorithmic complexity, such as finding list intersections; (2) introduce a kinetic modeling framework for understanding transformer learning; (3) characterize trace faithfulness as a dynamic property that emerges over training; and (4) show CoT alters internal transformer computation mechanistically.