Membership Inference Attack Should Move On to Distributional Statistics for Distilled Generative Models

šŸ“… 2025-02-05
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Existing membership inference attacks (MIAs) fail against distillation-based generative models because they rely on instance-level memorization, whereas student models only observe teacher-generated samples—lacking direct access to individual characteristics of the teacher’s training data. This work pioneers a paradigm shift from instance-level to distribution-level MIAs. We propose a set-based attack framework that quantifies distributional alignment between the teacher’s training data and generated samples using statistical divergence measures—including Maximum Mean Discrepancy (MMD) and Kullback–Leibler (KL) divergence—and integrates set-level statistical hypothesis testing for membership inference. Our approach overcomes the fundamental limitation that teacher training instances are not directly identifiable in distilled generative models. Empirical evaluation across diverse distillation-based generative architectures demonstrates substantial improvements in membership/non-membership classification accuracy. Results confirm that distributional information about the teacher’s training data is stably and measurably leaked—even without access to individual training examples.

Technology Category

Application Category

šŸ“ Abstract
Membership inference attacks (MIAs) determine whether certain data instances were used to train a model by exploiting the differences in how the model responds to seen versus unseen instances. This capability makes MIAs important in assessing privacy leakage within modern generative AI systems. However, this paper reveals an oversight in existing MIAs against emph{distilled generative models}: attackers can no longer detect a teacher model's training instances individually when targeting the distilled student model, as the student learns from the teacher-generated data rather than its original member data, preventing direct instance-level memorization. Nevertheless, we find that student-generated samples exhibit a significantly stronger distributional alignment with teacher's member data than non-member data. This leads us to posit that MIAs emph{on distilled generative models should shift from instance-level to distribution-level statistics}. We thereby introduce a emph{set-based} MIA framework that measures emph{relative} distributional discrepancies between student-generated dataemph{sets} and potential member/non-member dataemph{sets}, Empirically, distributional statistics reliably distinguish a teacher's member data from non-member data through the distilled model. Finally, we discuss scenarios in which our setup faces limitations.
Problem

Research questions and friction points this paper is trying to address.

Shift MIAs to distributional statistics for distilled models
Detect training data via distributional alignment in generative models
Introduce set-based MIA framework for distributional discrepancy measurement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shifted MIA to distributional statistics
Introduced set-based MIA framework
Measured relative distributional discrepancies