Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability

📅 2025-04-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited reasoning capability of lightweight non-reasoning language models. We propose an architecture-agnostic, inference-cost-free answer distillation method: high-quality answers generated by strong reasoning models (e.g., DeepSeek-R1, o1) serve as supervision signals for supervised fine-tuning (SFT), directly enhancing the target model’s question-answering performance. To our knowledge, this is the first systematic study leveraging reasoning-model outputs as a knowledge source to distill and augment non-reasoning models. Extensive evaluation across multiple benchmarks—including MMLU and GSM8K—demonstrates consistent and substantial improvements after distillation; on certain tasks, distilled models approach the performance of their strong reasoning teachers. These results validate that low-cost models can efficiently acquire reasoning-like capabilities through high-fidelity answer distillation, establishing a novel paradigm for capability transfer between models.

Technology Category

Application Category

📝 Abstract
Recent advancements in large language models (LLMs), such as DeepSeek-R1 and OpenAI-o1, have demonstrated the significant effectiveness of test-time scaling, achieving substantial performance gains across various benchmarks. These advanced models utilize deliberate"thinking"steps to systematically enhance answer quality. In this paper, we propose leveraging these high-quality outputs generated by reasoning-intensive models to improve less computationally demanding, non-reasoning models. We explore and compare methodologies for utilizing the answers produced by reasoning models to train and improve non-reasoning models. Through straightforward Supervised Fine-Tuning (SFT) experiments on established benchmarks, we demonstrate consistent improvements across various benchmarks, underscoring the potential of this approach for advancing the ability of models to answer questions directly.
Problem

Research questions and friction points this paper is trying to address.

Enhance non-reasoning models using reasoning model outputs
Improve answer quality via reasoning-intensive model training
Apply supervised fine-tuning for benchmark performance gains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Use reasoning model answers for training
Improve non-reasoning models via SFT
Enhance answer quality systematically
🔎 Similar Papers
No similar papers found.