Antidistillation Sampling

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

Large language models (LLMs) face security risks as their inference traces can be exploited for knowledge distillation, enabling unauthorized model extraction. Method: This paper proposes a distillation-resistant sampling strategy that actively “poisons” inference trajectories by probabilistically reweighting the next-token distribution under controllable entropy constraints—without degrading the original model’s performance. Crucially, it decouples security enforcement from functional integrity and dynamically modulates perturbation intensity based on path-sensitivity analysis. Contribution/Results: Experiments across multiple benchmarks demonstrate that the method preserves ≥99% of the teacher model’s accuracy while reducing the distilled student model’s performance by 35–62%, significantly enhancing robustness against distillation attacks. This work establishes a novel paradigm for copyright protection of generative AI models.

Technology Category

Application Category

📝 Abstract

Frontier models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. emph{Antidistillation sampling} provides exactly this capability. By strategically modifying a model's next-token probability distribution, antidistillation sampling poisons reasoning traces, rendering them significantly less effective for distillation while preserving the model's practical utility. For further details, see https://antidistillation.com.

Problem

Research questions and friction points this paper is trying to address.

Prevent model distillation via generated reasoning traces

Modify token probabilities to poison distillation data

Maintain model utility while blocking distillation effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modifies next-token probability strategically

Poisons reasoning traces for distillation

Preserves model utility effectively

🔎 Similar Papers

Multiple importance sampling for stochastic gradient estimation