Reasoning-Trace Collapse: Evaluating the Loss of Explicit Reasoning During Fine-Tuning

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

This study addresses "reasoning trace collapse"—a phenomenon in explicitly trained reasoning models where standard supervised fine-tuning erodes structured reasoning capabilities while preserving answer generation performance, due to the absence of intermediate reasoning traces in training data. The work formally defines this issue for the first time and introduces a loss masking strategy that mitigates collapse without requiring teacher-generated reasoning traces. By developing a structured evaluation framework that disentangles answer correctness from reasoning validity, and conducting reasoning-conditional performance analyses, the authors demonstrate that conventional fine-tuning rapidly degrades valid reasoning traces—a deterioration obscured by traditional answer-only metrics. Their proposed method significantly enhances both the reliability and structural consistency of model-generated reasoning.

📝 Abstract

Explicit reasoning models are trained to produce intermediate reasoning traces before final answers, but downstream fine-tuning is often performed on ordinary instruction-response data that contains no such traces. We show that this mismatch can induce reasoning-trace collapse: a fine-tuned model continues to produce plausible final answers while losing the structurally valid explicit reasoning traces that made it a reasoning model in the first place. We introduce a structural evaluation framework that separates answer correctness from reasoning-trace validity, measuring valid, empty, missing, and truncated reasoning alongside reasoning-conditioned task performance. Using this framework, we study four open-weight reasoning models and find that standard supervised fine-tuning can rapidly suppress valid reasoning traces, and that answer-only metrics can substantially obscure this failure: in several settings, performance conditional on valid reasoning remains high while the rate of valid reasoning falls sharply. We further show that simple loss-masking strategies can substantially mitigate collapse without requiring teacher-generated reasoning traces. These results suggest that evaluations of fine-tuned reasoning models should report structural reasoning reliability metrics in addition to final-answer performance, especially when adaptation data does not contain explicit reasoning traces.

Problem

Research questions and friction points this paper is trying to address.

reasoning-trace collapse

explicit reasoning

fine-tuning

reasoning reliability

structural evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning-trace collapse

structural evaluation framework

loss-masking