Scaling Competence, Shrinking Reasoning: Cognitive Signatures in Language Model Learning

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the dynamic evolution of reasoning capabilities during language model fine-tuning. Addressing two core questions—whether reasoning tokens function analogously to human working memory, and how models transition from explicit step-by-step reasoning to automated problem-solving—we propose a cognitive science–inspired learning trajectory analysis framework. Specifically, we model reasoning token length as a diagnostic signal across training stages and introduce quantitative metrics to track its nonlinear evolution (initial increase followed by decrease). Empirical results reveal that model reasoning behavior progresses through four distinct phases mirroring human cognitive development, culminating in reasoning internalization and omission without performance degradation. Crucially, this work establishes reasoning token length as a novel convergence criterion for fine-tuning—enabling principled early stopping and enhancing interpretability. It thus introduces both a conceptual framework and empirical validation for reasoning-aware optimization in foundation model adaptation.

Technology Category

Application Category

📝 Abstract
We analyze reasoning in language models during task-specific fine-tuning and draws parallel between reasoning tokens--intermediate steps generated while solving problem and the human working memory. Drawing from cognitive science, we align training dynamics with the Four Stages of Competence: models initially produce incorrect outputs without reasoning, then begin reasoning (but still fail), eventually reason effectively, and finally solve tasks without explicit reasoning. We find that reasoning token length expands as performance improves, peaks at the stage of conscious competence, then declines as the model internalizes the task. Notably, after training, models retain performance even when reasoning is removed--suggesting it scaffolded learning but is no longer needed. This progression offers actionable insights: reasoning token dynamics can serve as a signal for diagnosing training stage, identifying convergence, and guiding early stopping. We propose metrics to track this trajectory and argue that reasoning behavior is valuable for understanding and optimizing reasoning model training.
Problem

Research questions and friction points this paper is trying to address.

Analyzes reasoning token dynamics during fine-tuning to diagnose training stages.
Investigates the relationship between reasoning length and model competence progression.
Proposes metrics to optimize reasoning model training via token behavior insights.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Models track reasoning token length dynamics
Reasoning peaks then declines during competence stages
Metrics guide training diagnosis and early stopping
🔎 Similar Papers
No similar papers found.