On Membership Inference Attacks in Knowledge Distillation

📅 2025-05-17

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the vulnerability of student models to membership inference attacks (MIAs) during knowledge distillation, where their privacy robustness is significantly weaker than that of teacher models. We first systematically reveal an inherent asymmetry in MIA robustness between teacher and student models—a previously unexplored phenomenon. To mitigate this, we propose five plug-and-play privacy-enhancing distillation mechanisms: gradient masking, output smoothing, label perturbation, and ensemble-based robustness optimization. Extensive experiments demonstrate that our approach reduces student model MIA accuracy by 12.3% on average, while incurring less than 0.8% degradation in downstream task performance. The method thus substantially improves the privacy security of knowledge distillation without compromising utility. Our implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Nowadays, Large Language Models (LLMs) are trained on huge datasets, some including sensitive information. This poses a serious privacy concern because privacy attacks such as Membership Inference Attacks (MIAs) may detect this sensitive information. While knowledge distillation compresses LLMs into efficient, smaller student models, its impact on privacy remains underexplored. In this paper, we investigate how knowledge distillation affects model robustness against MIA. We focus on two questions. First, how is private data protected in teacher and student models? Second, how can we strengthen privacy preservation against MIAs in knowledge distillation? Through comprehensive experiments, we show that while teacher and student models achieve similar overall MIA accuracy, teacher models better protect member data, the primary target of MIA, whereas student models better protect non-member data. To address this vulnerability in student models, we propose 5 privacy-preserving distillation methods and demonstrate that they successfully reduce student models' vulnerability to MIA, with ensembling further stabilizing the robustness, offering a reliable approach for distilling more secure and efficient student models. Our implementation source code is available at https://github.com/richardcui18/MIA_in_KD.

Problem

Research questions and friction points this paper is trying to address.

Investigates privacy risks of Membership Inference Attacks in Knowledge Distillation

Compares member data protection in teacher vs student models

Proposes methods to enhance privacy in distilled student models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Investigates MIA robustness in knowledge distillation

Proposes 5 privacy-preserving distillation methods

Ensembling stabilizes model robustness against MIA

🔎 Similar Papers

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models