Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

131K/year

🤖 AI Summary

This work identifies and empirically validates a previously overlooked issue in long chain-of-thought (Long-CoT) training data: even when the final answer is correct, redundant or erroneous continuations following the solution can adversely affect the fine-tuning of large language models. To address this “harmful continuation” phenomenon, the authors propose a suffix ablation method based on a deletion-style editor, which leverages both predictive uncertainty and geometric misalignment in hidden-state representations to construct a lightweight proxy—Harmful Continuation Cut (HCC)—for efficiently detecting and removing harmful segments. Experimental results demonstrate that eliminating such harmful continuations significantly improves the performance of CoT-based supervised fine-tuning, confirming the detrimental impact of these artifacts and showing that HCC effectively approximates the true harmful boundary.

📝 Abstract

Long chain-of-thought (CoT) traces are widely used as supervision for reasoning-oriented LLM SFT, yet answer-correct traces can still lead to markedly different fine-tuning outcomes. We study post-conclusion continuation in answer-correct long-CoT data: a continuation where the answer appears sufficiently supported, but the trace continues with additional reasoning that remains in the supervised target. To test its training effect, we use a delete-only editor to construct answer-preserving suffix removal and compare CoT-based SFT on the original and processed traces. We observe improved SFT outcomes after removing the editor-identified post-conclusion continuation, suggesting that this continuation is harmful to training in our setting. We therefore refer to this empirically supported phenomenon as harmful continuation. Beyond this intervention, we further characterize the removed post-conclusion continuation through uncertainty and hidden-state progress. We observe persistent local uncertainty together with weakened terminal-directional progress, forming an uncertainty--geometry mismatch. Finally, we instantiate Harmful Continuation Cut (HCC), a lightweight boundary proxy that approximates the editor-identified post-conclusion continuation boundary.

Problem

Research questions and friction points this paper is trying to address.

harmful continuation

chain-of-thought

supervised fine-tuning

post-conclusion continuation

LLM reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

harmful continuation

long chain-of-thought

supervised fine-tuning