🤖 AI Summary
This work addresses the vulnerability of state-of-the-art generative models to multi-account distillation attacks, where existing defenses often degrade user experience or are easily circumvented. The authors propose a novel anti-distillation sampling mechanism that binds private random seeds to both the semantics and frequency of user queries. This approach ensures that benign users receive independent, unbiased, and statistically consistent high-quality outputs, while simultaneously inducing implicit correlations in the data collected by attackers, thereby impairing the generalization capability of their distilled models. Theoretical analysis within a uniform convergence framework demonstrates that the method significantly reduces the convergence rate of distillation models. Empirical evaluations across image generation, mathematical reasoning, and code generation tasks confirm its effectiveness in suppressing student model performance, achieving the first lossless defense against multi-account distillation attacks.
📝 Abstract
Frontier commercial generative models face a growing threat from distillation, whereby a distiller harvests generated responses and trains a competing model of its own at drastically lower cost. Existing defenses either rely on modifying the models outputs, thereby sacrificing response quality for benign users, or on behavioral detection methods, which can be readily circumvented by distributing queries across multiple accounts. In this work, we propose Lossless Anti-Distillation Sampling (LADS), a novel sampling scheme specifically designed to counter multi-account distillation while maintaining a lossless experience for benign users. Concretely, LADS derives the randomness underlying each generation from a private seed determined by the semantic content of the query and the number of times the user has queried the model. By construction, every benign user receives a response independently sampled from the original model at each visit, and thus experiences no distortion. In contrast, for a distiller, different accounts share latent randomness whenever their queries fall in the same semantic bucket. As a result, the harvested data becomes correlated, potentially reducing sample diversity and degrading generalization. Using uniform convergence theory, we show that LADS provably degrades the convergence rate of the distillers generalization gap relative to standard i.i.d. sampling in both unconditional and conditional generation settings. Experiments on image generation, mathematical reasoning, and code generation confirm that LADS substantially degrades the performance of distilled students while preserving exact statistical fidelity for individual users.