Antidistillation Sampling

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face security risks as their inference traces can be exploited for knowledge distillation, enabling unauthorized model extraction. Method: This paper proposes a distillation-resistant sampling strategy that actively “poisons” inference trajectories by probabilistically reweighting the next-token distribution under controllable entropy constraints—without degrading the original model’s performance. Crucially, it decouples security enforcement from functional integrity and dynamically modulates perturbation intensity based on path-sensitivity analysis. Contribution/Results: Experiments across multiple benchmarks demonstrate that the method preserves ≥99% of the teacher model’s accuracy while reducing the distilled student model’s performance by 35–62%, significantly enhancing robustness against distillation attacks. This work establishes a novel paradigm for copyright protection of generative AI models.

Technology Category

Application Category

📝 Abstract
Frontier models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. emph{Antidistillation sampling} provides exactly this capability. By strategically modifying a model's next-token probability distribution, antidistillation sampling poisons reasoning traces, rendering them significantly less effective for distillation while preserving the model's practical utility. For further details, see https://antidistillation.com.
Problem

Research questions and friction points this paper is trying to address.

Prevent model distillation via generated reasoning traces
Modify token probabilities to poison distillation data
Maintain model utility while blocking distillation effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modifies next-token probability strategically
Poisons reasoning traces for distillation
Preserves model utility effectively
🔎 Similar Papers
No similar papers found.