Membership and Memorization in LLM Knowledge Distillation

📅 2025-08-09

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This study is the first to systematically uncover dual privacy risks—membership inference and memorization leakage—in large language model (LLM) knowledge distillation (KD). Addressing privacy hazards introduced when teacher models are trained on sensitive data, we evaluate six mainstream KD methods across three model families (GPT-2, LLaMA-2, OPT) and seven NLP tasks, quantifying how distillation objectives, student data composition, and task types affect privacy leakage. We propose a novel “modular privacy analysis” framework, revealing substantial heterogeneity in privacy propagation across network blocks. Experimental results demonstrate that all existing LLM KD methods inherit and transmit teacher-model privacy risks—but to varying degrees—and critically, that membership inference and memorization leakage exhibit significant inconsistency, challenging the conventional assumption of their equivalence.

Technology Category

Application Category

📝 Abstract

Recent advances in Knowledge Distillation (KD) aim to mitigate the high computational demands of Large Language Models (LLMs) by transferring knowledge from a large ''teacher'' to a smaller ''student'' model. However, students may inherit the teacher's privacy when the teacher is trained on private data. In this work, we systematically characterize and investigate membership and memorization privacy risks inherent in six LLM KD techniques. Using instruction-tuning settings that span seven NLP tasks, together with three teacher model families (GPT-2, LLAMA-2, and OPT), and various size student models, we demonstrate that all existing LLM KD approaches carry membership and memorization privacy risks from the teacher to its students. However, the extent of privacy risks varies across different KD techniques. We systematically analyse how key LLM KD components (KD objective functions, student training data and NLP tasks) impact such privacy risks. We also demonstrate a significant disagreement between memorization and membership privacy risks of LLM KD techniques. Finally, we characterize per-block privacy risk and demonstrate that the privacy risk varies across different blocks by a large margin.

Problem

Research questions and friction points this paper is trying to address.

Investigates privacy risks in LLM knowledge distillation techniques

Analyzes impact of KD components on membership and memorization risks

Characterizes varying privacy risks across different model blocks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing privacy risks in six LLM KD techniques

Investigating impact of KD components on privacy

Characterizing per-block privacy risk variations

🔎 Similar Papers

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models