The Kinetics of Reasoning: How Chain-of-Thought Shapes Learning in Transformers?

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

This study investigates how chain-of-thought (CoT) supervised learning affects the generalization capability of Transformers on symbolic reasoning tasks, with particular focus on how algorithmic complexity constrains generalization upper bounds. Method: We propose a three-parameter logistic curve model to characterize learning dynamics and conduct controlled experiments on symbolic tasks with tunable complexity, comparing answer-only supervision against CoT supervision while analyzing grokking phenomena and internal computation pathways. Contribution/Results: We identify, for the first time, a “reasoning trajectory infidelity” phase early in training and demonstrate that CoT fundamentally reshapes model-internal computation. CoT significantly accelerates generalization on low-complexity tasks (e.g., addition, sorting) but fails to overcome inherent bottlenecks in high-complexity tasks (e.g., list intersection). Furthermore, we quantify the dynamic evolution of reasoning fidelity during training, establishing a novel, interpretable, and measurable framework for understanding CoT’s mechanistic role in symbolic reasoning.

Technology Category

Application Category

📝 Abstract

Chain-of-thought (CoT) supervision can substantially improve transformer performance, yet the mechanisms by which models learn to follow and benefit from CoT remain poorly understood. We investigate these learning dynamics through the lens of grokking by pretraining transformers on symbolic reasoning tasks with tunable algorithmic complexity and controllable data composition to study their generalization. Models were trained under two settings: (i) producing only final answers, and (ii) emitting explicit CoT traces before answering. Our results show that while CoT generally improves task performance, its benefits depend on task complexity. To quantify these effects, we model the accuracy of the logarithmic training steps with a three-parameter logistic curve, revealing how the learning speed and shape vary with task complexity, data distribution, and the presence of CoT supervision. We also uncover a transient trace unfaithfulness phase: early in training, models often produce correct answers while skipping or contradicting CoT steps, before later aligning their reasoning traces with answers. Empirically, we (1) demonstrate that CoT accelerates generalization but does not overcome tasks with higher algorithmic complexity, such as finding list intersections; (2) introduce a kinetic modeling framework for understanding transformer learning; (3) characterize trace faithfulness as a dynamic property that emerges over training; and (4) show CoT alters internal transformer computation mechanistically.

Problem

Research questions and friction points this paper is trying to address.

Investigating how chain-of-thought supervision affects transformer learning dynamics

Quantifying learning speed variations with task complexity and data distribution

Characterizing the emergence of reasoning trace faithfulness during training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-thought supervision accelerates transformer generalization

Kinetic modeling framework quantifies transformer learning dynamics

Trace faithfulness emerges dynamically during training process

🔎 Similar Papers

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency