The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how chain-of-thought (CoT) supervised learning affects the generalization capability of Transformers on symbolic reasoning tasks, with particular focus on how algorithmic complexity constrains generalization upper bounds. Method: We propose a three-parameter logistic curve model to characterize learning dynamics and conduct controlled experiments on symbolic tasks with tunable complexity, comparing answer-only supervision against CoT supervision while analyzing grokking phenomena and internal computation pathways. Contribution/Results: We identify, for the first time, a “reasoning trajectory infidelity” phase early in training and demonstrate that CoT fundamentally reshapes model-internal computation. CoT significantly accelerates generalization on low-complexity tasks (e.g., addition, sorting) but fails to overcome inherent bottlenecks in high-complexity tasks (e.g., list intersection). Furthermore, we quantify the dynamic evolution of reasoning fidelity during training, establishing a novel, interpretable, and measurable framework for understanding CoT’s mechanistic role in symbolic reasoning.

Technology Category

Application Category

📝 Abstract
Chain-of-thought (CoT) supervision can substantially improve transformer performance, yet the mechanisms by which models learn to follow and benefit from CoT remain poorly understood. We investigate these learning dynamics through the lens of grokking by pretraining transformers on symbolic reasoning tasks with tunable algorithmic complexity and controllable data composition to study their generalization. Models were trained under two settings: (i) producing only final answers, and (ii) emitting explicit CoT traces before answering. Our results show that while CoT generally improves task performance, its benefits depend on task complexity. To quantify these effects, we model the accuracy of the logarithmic training steps with a three-parameter logistic curve, revealing how the learning speed and shape vary with task complexity, data distribution, and the presence of CoT supervision. We also uncover a transient trace unfaithfulness phase: early in training, models often produce correct answers while skipping or contradicting CoT steps, before later aligning their reasoning traces with answers. Empirically, we (1) demonstrate that CoT accelerates generalization but does not overcome tasks with higher algorithmic complexity, such as finding list intersections; (2) introduce a kinetic modeling framework for understanding transformer learning; (3) characterize trace faithfulness as a dynamic property that emerges over training; and (4) show CoT alters internal transformer computation mechanistically.
Problem

Research questions and friction points this paper is trying to address.

Investigating how chain-of-thought supervision affects transformer learning dynamics
Quantifying learning speed variations with task complexity and data distribution
Characterizing the emergence of reasoning trace faithfulness during training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-thought supervision accelerates transformer generalization
Kinetic modeling framework quantifies transformer learning dynamics
Trace faithfulness emerges dynamically during training process
🔎 Similar Papers
No similar papers found.