🤖 AI Summary
This work addresses a critical limitation in large model knowledge distillation, where the strongest teacher’s outputs are often adopted indiscriminately, disregarding the alignment between the student’s current learning capacity and the instructional content. To overcome this, the authors propose a Student-Centric Answer Sampling (SCAS) framework that, for the first time, incorporates the student’s learning cost into the selection of supervision signals, thereby moving beyond the conventional paradigm that prioritizes teacher performance alone. SCAS introduces an efficient proxy metric for learning cost derived from token-wise gradient decomposition, enabling dynamic selection of the most suitable answers during training. Extensive experiments across 30 teachers, 6 student architectures, and 8 tasks demonstrate that SCAS consistently and significantly enhances student model performance.
📝 Abstract
LLM training increasingly relies on teacher-generated supervision, from synthetic responses to reasoning traces and tool-use demonstrations. Current practice often chooses the highest-performing teacher to generate student training data, implicitly treating teacher test performance as a proxy for teaching quality. We show that this assumption can fail: even when multiple teachers provide correct answers to the same question, the answer from the strongest teacher is not necessarily the best supervision for a given student. To address this gap, we propose Student-Centric Answer Sampling (SCAS), a framework that selects from verified teacher-generated answers according to their estimated student-centric learning cost. Motivated by a token-wise gradient decomposition, we derive an efficient forward-only proxy for this cost and use it to guide answer selection during training. Experiments across 30 teacher models, 6 student base models, and 8 tasks show that SCAS consistently improves student performance, suggesting that effective distillation should prioritize supervision matched to the current student rather than teacher strength alone.