Turning Black Box into White Box: Dataset Distillation Leaks

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work systematically uncovers a critical privacy risk in existing dataset distillation methods: when compressing real data into synthetic data, these approaches may implicitly encode the training trajectory of models, thereby leaking sensitive information about the original dataset. To expose this vulnerability, the authors propose an Information Revelation Attack (IRA) that integrates model inversion and membership inference techniques to effectively infer the distillation algorithm, model architecture, membership status, and even reconstruct sensitive samples from the synthetic data alone. Experimental results demonstrate that IRA can accurately identify both the distillation method and model structure, and successfully recover original sensitive data with high fidelity. These findings fundamentally challenge the prevailing assumption that dataset distillation inherently preserves privacy, revealing instead a severe and previously underappreciated privacy leakage risk.

Technology Category

Application Category

📝 Abstract
Dataset distillation compresses a large real dataset into a small synthetic one, enabling models trained on the synthetic data to achieve performance comparable to those trained on the real data. Although synthetic datasets are assumed to be privacy-preserving, we show that existing distillation methods can cause severe privacy leakage because synthetic datasets implicitly encode the weight trajectories of the distilled model, they become over-informative and exploitable by adversaries. To expose this risk, we introduce the Information Revelation Attack (IRA) against state-of-the-art distillation techniques. Experiments show that IRA accurately predicts both the distillation algorithm and model architecture, and can successfully infer membership and recover sensitive samples from the real dataset.
Problem

Research questions and friction points this paper is trying to address.

dataset distillation
privacy leakage
synthetic data
membership inference
information revelation
Innovation

Methods, ideas, or system contributions that make the work stand out.

dataset distillation
privacy leakage
Information Revelation Attack
membership inference
synthetic data
🔎 Similar Papers
No similar papers found.