Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning

📅 2025-08-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) rely heavily on massive datasets and multi-stage training to enhance reasoning capabilities—entailing prohibitive computational costs—and the scaling laws for small-scale knowledge distillation remain poorly understood. To address this, we propose Data-Efficient Distillation (DED), a framework that advances the Pareto frontier of reasoning performance through three key innovations: (1) dynamic evaluation and selection of optimal teacher models; (2) construction of a high-information-density, small-scale dataset (only 0.8K samples) via policy learning and diverse trajectory rollouts; and (3) explicit modeling of multi-path reasoning trajectories to improve student generalization. Evaluated on AIME 2024/2025, MATH-500, and LiveCodeBench, DED significantly outperforms conventional scaling approaches, achieving superior domain-specific accuracy and cross-task generalization. This work establishes a new paradigm for lightweight, data-efficient reasoning capability transfer.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) demonstrate remarkable reasoning capabilities in tasks such as algorithmic coding and mathematical problem-solving. Recent methods have improved reasoning through expanded corpus and multistage training combining reinforcement learning and supervised fine-tuning. Although some methods suggest that small but targeted dataset can incentivize reasoning via only distillation, a reasoning scaling laws is still taking shape, increasing computational costs. To address this, we propose a data-efficient distillation framework (DED) that optimizes the Pareto frontier of reasoning distillation. Inspired by the on-policy learning and diverse roll-out strategies of reinforcement learning, the key idea of our approach is threefold: (1) We identify that benchmark scores alone do not determine an effective teacher model. Through comprehensive comparisons of leading reasoning LLMs, we develop a method to select an optimal teacher model. (2) While scaling distillation can enhance reasoning, it often degrades out-of-domain performance. A carefully curated, smaller corpus achieves a balanced trade-off between in-domain and out-of-domain capabilities. (3) Diverse reasoning trajectories encourage the student model to develop robust reasoning skills. We validate our method through evaluations on mathematical reasoning (AIME 2024/2025, MATH-500) and code generation (LiveCodeBench), achieving state-of-the-art results with only 0.8k carefully curated examples, bypassing the need for extensive scaling. Our systematic analysis demonstrates that DED outperforms existing methods by considering factors beyond superficial hardness, token length, or teacher model capability. This work offers a practical and efficient pathway to advanced reasoning while preserving general capabilities.
Problem

Research questions and friction points this paper is trying to address.

Optimizing reasoning distillation with minimal data
Balancing in-domain and out-of-domain performance
Enhancing reasoning skills via diverse trajectories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal teacher model selection method
Small curated corpus balances capabilities
Diverse reasoning trajectories enhance robustness
🔎 Similar Papers
No similar papers found.
X
Xiaojun Wu
Zhongxing Telecom Equipment(ZTE), China
X
Xiaoguang Jiang
Zhongxing Telecom Equipment(ZTE), China
H
Huiyang Li
Zhongxing Telecom Equipment(ZTE), China
J
Jucai Zhai
Zhongxing Telecom Equipment(ZTE), China
D
Dengfeng Liu
Zhongxing Telecom Equipment(ZTE), China
Q
Qiaobo Hao
Zhongxing Telecom Equipment(ZTE), China
H
Huang Liu
Zhongxing Telecom Equipment(ZTE), China
Zhiguo Yang
Zhiguo Yang
Zhongxing Telecom Equipment(ZTE), China
Ji Xie
Ji Xie
Research Intern, UC Berkeley
Computer VisionImage GenerationMulti-Modal
N
Ninglun Gu
China Mobile Communications Group Co Ltd
J
Jin Yang
China Mobile Communications Group Co Ltd
K
Kailai Zhang
China Mobile Communications Group Co Ltd
Y
Yelun Bao
China Mobile Communications Group Co Ltd
J
Jun Wang
China Mobile Communications Group Co Ltd