Multi-Token Prediction via Self-Distillation

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes an online self-distillation method to accelerate autoregressive language model inference without requiring auxiliary models or specialized inference pipelines. By leveraging multi-token sequences generated by the model itself as supervision signals, the approach enables parallel prediction of multiple tokens in a single forward pass, without altering the model architecture or introducing additional components. Evaluated on reasoning benchmarks such as GSM8K, the method achieves an average speedup exceeding 3× while incurring less than a 5% drop in accuracy. This significantly enhances inference efficiency while preserving the original model structure and deployment simplicity.

Technology Category

Application Category

📝 Abstract
Existing techniques for accelerating language model inference, such as speculative decoding, require training auxiliary speculator models and building and deploying complex inference pipelines. We consider a new approach for converting a pretrained autoregressive language model from a slow single next token prediction model into a fast standalone multi-token prediction model using a simple online distillation objective. The final model retains the exact same implementation as the pretrained initial checkpoint and is deployable without the addition of any auxiliary verifier or other specialized inference code. On GSM8K, our method produces models that can decode more than $3\times$ faster on average at $<5\%$ drop in accuracy relative to single token decoding performance.
Problem

Research questions and friction points this paper is trying to address.

multi-token prediction
language model inference
speculative decoding
self-distillation
autoregressive models
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-token prediction
self-distillation
language model acceleration
online distillation
speculative decoding
🔎 Similar Papers
No similar papers found.