Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Supervised fine-tuning (SFT) of large language models (LLMs) often inherits the teacher model’s “overthinking” behavior, resulting in unnecessarily lengthy and inefficient chain-of-thought (CoT) reasoning traces. Method: We propose Long-Short Mixture Supervised Fine-Tuning (LS-Mixture SFT), a novel CoT data mixing paradigm that preserves structural integrity while integrating both concise rewrites and detailed reasoning samples. Our approach combines CoT distillation, structure-preserving rewriting, and reasoning capability alignment to break the transmission bottleneck of teacher-model deficiencies. Contribution/Results: Evaluated across multiple benchmarks, LS-Mixture SFT achieves an average accuracy improvement of +2.3% and reduces response length by 47.61%, significantly enhancing the trade-off between reasoning efficiency and accuracy without compromising logical completeness.

Technology Category

Application Category

📝 Abstract
Recent advances in large language models have demonstrated that Supervised Fine-Tuning (SFT) with Chain-of-Thought (CoT) reasoning data distilled from large reasoning models (e.g., DeepSeek R1) can effectively transfer reasoning capabilities to non-reasoning models. However, models fine-tuned with this approach inherit the"overthinking"problem from teacher models, producing verbose and redundant reasoning chains during inference. To address this challenge, we propose extbf{L}ong- extbf{S}hort Chain-of-Thought extbf{Mixture} extbf{S}upervised extbf{F}ine- extbf{T}uning ( extbf{LS-Mixture SFT}), which combines long CoT reasoning dataset with their short counterparts obtained through structure-preserved rewriting. Our experiments demonstrate that models trained using the LS-Mixture SFT method, compared to those trained with direct SFT, achieved an average accuracy improvement of 2.3% across various benchmarks while substantially reducing model response length by approximately 47.61%. This work offers an approach to endow non-reasoning models with reasoning capabilities through supervised fine-tuning while avoiding the inherent overthinking problems inherited from teacher models, thereby enabling efficient reasoning in the fine-tuned models.
Problem

Research questions and friction points this paper is trying to address.

Addresses overthinking in fine-tuned reasoning models
Combines long and short reasoning chains for efficiency
Improves accuracy while reducing response length
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines long and short Chain-of-Thought datasets
Uses structure-preserved rewriting for short CoT
Reduces response length while improving accuracy
🔎 Similar Papers
No similar papers found.
B
Bin Yu
Harbin Institute of Technology, Zhongguancun Academy
H
Hang Yuan
East China Normal University, Zhongguancun Academy
Y
Yuliang Wei
Harbin Institute of Technology
B
Bailing Wang
Harbin Institute of Technology
Weizhen Qi
Weizhen Qi
Zhongguancun Academy & Zhongguancun Institute of Artificial Intelligence
Natural Language Processing
K
Kai Chen
Zhongguancun Academy, Zhongguancun Institute of Artificial Intelligence