The Distillation Game: Adaptive Attacks & Efficient Defenses

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Model distillation enhances practicality but renders models vulnerable to imitation attacks, raising a critical trade-off between security and utility. This work proposes a teacher-student minimax game framework that introduces, for the first time, an adaptive student evaluation paradigm, demonstrating its marked superiority over passive evaluation. Building on this insight, the authors design PoE (Product-of-Experts), a lightweight and efficient forward-propagation defense mechanism that integrates instance reweighting, output suppression, and example-value surrogates. Evaluations on the GSM8K and MATH benchmarks show that PoE significantly narrows the robustness gap with high-cost defense methods while maintaining high-quality reasoning trajectories and incurring substantially lower computational overhead.

📝 Abstract

Distillation attacks create a deployment trade-off for model providers: the same outputs that make a model more useful can also make it easier to imitate. We study this trade-off through a minimax game between a utility-constrained teacher and an adaptive student. Our framework yields tractable one-sided response rules: an adaptive evaluation rule in which the student reweights high-value examples, and a teacher-side defense template that suppresses outputs most useful for distillation. From a cheap proxy for example value, we derive Product-of-Experts (PoE), a simple forward-pass-only defense that combines the teacher with a proxy student during generation. Empirically, adaptive evaluation reveals a large passive--adaptive gap: on state-of-the-art defenses, adaptive students recover substantially more capability than passive evaluation suggests on GSM8K and MATH. Under this stronger evaluation, the apparent robustness gap between expensive defenses and PoE narrows considerably, while PoE remains substantially cheaper and preserves higher-quality reasoning traces. Overall, our results suggest that strong distillation remains difficult to stop, and that progress on antidistillation should be judged against adaptive students rather than passive ones. Our code is available at: https://github.com/ysfalh/distillation-game.

Problem

Research questions and friction points this paper is trying to address.

distillation attacks

model imitation

adaptive students

utility trade-off

antidistillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

distillation attacks

adaptive evaluation

Product-of-Experts