AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In speculative decoding (SD), conventional knowledge distillation minimizes the full-token KL divergence between the draft and target models—a objective misaligned with the core SD goal of maximizing token acceptance rate; moreover, capacity-constrained draft models struggle to fully absorb target-model knowledge. This paper proposes AdaSPEC: a lightweight reference model identifies hard-to-approximate tokens, enabling acceptance-rate-oriented selective knowledge distillation—applying KL constraints only to easily approximated tokens and dynamically focusing distribution alignment where it matters most. AdaSPEC thus overcomes the inherent mismatch between standard distillation objectives and SD’s acceptance-driven optimization. Experiments across diverse tasks and model scales demonstrate that AdaSPEC consistently outperforms DistillSpec, achieving up to a 15% absolute improvement in acceptance rate, substantial inference speedup, and strict preservation of generation quality.

Technology Category

Application Category

📝 Abstract
Speculative Decoding (SD) accelerates large language model inference by employing a small draft model to generate predictions, which are then verified by a larger target model. The effectiveness of SD hinges on the alignment between these models, which is typically enhanced by Knowledge Distillation (KD). However, conventional KD methods aim to minimize the KL divergence between the draft and target models across all tokens, a goal that is misaligned with the true objective of SD, which is to maximize token acceptance rate. Therefore, draft models often struggle to fully assimilate the target model's knowledge due to capacity constraints, leading to suboptimal performance. To address this challenge, we propose AdaSPEC, a novel method that incorporates selective token filtering into the KD process. AdaSPEC utilizes a reference model to identify and filter out difficult-to-fit tokens, enabling the distillation of a draft model that better aligns with the target model on simpler tokens. This approach improves the overall token acceptance rate without compromising generation quality. We evaluate AdaSPEC across diverse tasks, including arithmetic reasoning, instruction-following, coding, and summarization, using model configurations of 31M/1.4B and 350M/2.7B parameters. Our results demonstrate that AdaSPEC consistently outperforms the state-of-the-art DistillSpec method, achieving higher acceptance rates across all tasks (up to 15%). The code is publicly available at https://github.com/yuezhouhu/adaspec.
Problem

Research questions and friction points this paper is trying to address.

Improves speculative decoding by selective knowledge distillation
Addresses draft model misalignment with target model objectives
Enhances token acceptance rates without quality degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective token filtering in knowledge distillation
Improves token acceptance rate without quality loss
Outperforms DistillSpec with higher acceptance rates
🔎 Similar Papers
No similar papers found.
Yuezhou Hu
Yuezhou Hu
Tsinghua University
J
Jiaxin Guo
Tsinghua University
X
Xinyu Feng
Georgia Institute of Technology
T
Tuo Zhao
Georgia Institute of Technology