🤖 AI Summary
This work addresses the limited reasoning capability of lightweight non-reasoning language models. We propose an architecture-agnostic, inference-cost-free answer distillation method: high-quality answers generated by strong reasoning models (e.g., DeepSeek-R1, o1) serve as supervision signals for supervised fine-tuning (SFT), directly enhancing the target model’s question-answering performance. To our knowledge, this is the first systematic study leveraging reasoning-model outputs as a knowledge source to distill and augment non-reasoning models. Extensive evaluation across multiple benchmarks—including MMLU and GSM8K—demonstrates consistent and substantial improvements after distillation; on certain tasks, distilled models approach the performance of their strong reasoning teachers. These results validate that low-cost models can efficiently acquire reasoning-like capabilities through high-fidelity answer distillation, establishing a novel paradigm for capability transfer between models.
📝 Abstract
Recent advancements in large language models (LLMs), such as DeepSeek-R1 and OpenAI-o1, have demonstrated the significant effectiveness of test-time scaling, achieving substantial performance gains across various benchmarks. These advanced models utilize deliberate"thinking"steps to systematically enhance answer quality. In this paper, we propose leveraging these high-quality outputs generated by reasoning-intensive models to improve less computationally demanding, non-reasoning models. We explore and compare methodologies for utilizing the answers produced by reasoning models to train and improve non-reasoning models. Through straightforward Supervised Fine-Tuning (SFT) experiments on established benchmarks, we demonstrate consistent improvements across various benchmarks, underscoring the potential of this approach for advancing the ability of models to answer questions directly.