š¤ AI Summary
Existing membership inference attacks (MIAs) fail against distillation-based generative models because they rely on instance-level memorization, whereas student models only observe teacher-generated samplesālacking direct access to individual characteristics of the teacherās training data. This work pioneers a paradigm shift from instance-level to distribution-level MIAs. We propose a set-based attack framework that quantifies distributional alignment between the teacherās training data and generated samples using statistical divergence measuresāincluding Maximum Mean Discrepancy (MMD) and KullbackāLeibler (KL) divergenceāand integrates set-level statistical hypothesis testing for membership inference. Our approach overcomes the fundamental limitation that teacher training instances are not directly identifiable in distilled generative models. Empirical evaluation across diverse distillation-based generative architectures demonstrates substantial improvements in membership/non-membership classification accuracy. Results confirm that distributional information about the teacherās training data is stably and measurably leakedāeven without access to individual training examples.
š Abstract
Membership inference attacks (MIAs) determine whether certain data instances were used to train a model by exploiting the differences in how the model responds to seen versus unseen instances. This capability makes MIAs important in assessing privacy leakage within modern generative AI systems. However, this paper reveals an oversight in existing MIAs against emph{distilled generative models}: attackers can no longer detect a teacher model's training instances individually when targeting the distilled student model, as the student learns from the teacher-generated data rather than its original member data, preventing direct instance-level memorization. Nevertheless, we find that student-generated samples exhibit a significantly stronger distributional alignment with teacher's member data than non-member data. This leads us to posit that MIAs emph{on distilled generative models should shift from instance-level to distribution-level statistics}. We thereby introduce a emph{set-based} MIA framework that measures emph{relative} distributional discrepancies between student-generated dataemph{sets} and potential member/non-member dataemph{sets}, Empirically, distributional statistics reliably distinguish a teacher's member data from non-member data through the distilled model. Finally, we discuss scenarios in which our setup faces limitations.