A Label is Worth a Thousand Images in Dataset Distillation

📅 2024-06-15

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This paper investigates the mechanistic role and dominance of soft labels in dataset distillation. Addressing the limitation of existing methods—overemphasis on synthetic image generation while neglecting label information—we establish soft labels as the primary determinant of distillation performance, validated for the first time via systematic ablation studies and theoretical analysis. Our contributions are threefold: (1) We demonstrate the necessity of structured soft labels, which encode inter-class semantic relationships beyond hard labels; (2) We discover an image–label scaling law, revealing their joint optimization principle; and (3) We construct a Pareto frontier for data-efficient learning, enabling quantitative characterization of the accuracy–data-size trade-off. Extensive experiments on CIFAR-10/100 and ImageNet confirm significant improvements in few-shot generalization. The code is publicly released to ensure reproducibility.

Technology Category

Application Category

📝 Abstract

Data $ extit{quality}$ is a crucial factor in the performance of machine learning models, a principle that dataset distillation methods exploit by compressing training datasets into much smaller counterparts that maintain similar downstream performance. Understanding how and why data distillation methods work is vital not only for improving these methods but also for revealing fundamental characteristics of"good"training data. However, a major challenge in achieving this goal is the observation that distillation approaches, which rely on sophisticated but mostly disparate methods to generate synthetic data, have little in common with each other. In this work, we highlight a largely overlooked aspect common to most of these methods: the use of soft (probabilistic) labels. Through a series of ablation experiments, we study the role of soft labels in depth. Our results reveal that the main factor explaining the performance of state-of-the-art distillation methods is not the specific techniques used to generate synthetic data but rather the use of soft labels. Furthermore, we demonstrate that not all soft labels are created equal; they must contain $ extit{structured information}$ to be beneficial. We also provide empirical scaling laws that characterize the effectiveness of soft labels as a function of images-per-class in the distilled dataset and establish an empirical Pareto frontier for data-efficient learning. Combined, our findings challenge conventional wisdom in dataset distillation, underscore the importance of soft labels in learning, and suggest new directions for improving distillation methods. Code for all experiments is available at https://github.com/sunnytqin/no-distillation.

Problem

Research questions and friction points this paper is trying to address.

Data Distillation

Soft Labels

Compression Effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Soft Labels

Dataset Distillation

Structured Information

🔎 Similar Papers

GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost