🤖 AI Summary
Dataset distillation (DD) suffers significant performance degradation in complex scenarios. To address this, we propose Discriminative Feature Enhancement Distillation (EDF), the first DD method to integrate Grad-CAM activation maps into the distillation framework to precisely localize and enhance critical discriminative regions in synthesized images. EDF further introduces a loss-aware supervision signal weighting mechanism to improve generalization under limited-sample conditions. Additionally, we construct Comp-DD—the first benchmark explicitly designed to evaluate DD under varying complexity levels—comprising eight challenging and eight easy subsets derived from ImageNet-1K. On the complex subsets of ImageNet-1K, EDF substantially outperforms state-of-the-art methods, empirically validating the pivotal role of discriminative region modeling in dataset distillation. Both the source code and the Comp-DD benchmark are publicly released to foster community advancement.
📝 Abstract
Dataset distillation has demonstrated strong performance on simple datasets like CIFAR, MNIST, and TinyImageNet but struggles to achieve similar results in more complex scenarios. In this paper, we propose EDF (emphasizes the discriminative features), a dataset distillation method that enhances key discriminative regions in synthetic images using Grad-CAM activation maps. Our approach is inspired by a key observation: in simple datasets, high-activation areas typically occupy most of the image, whereas in complex scenarios, the size of these areas is much smaller. Unlike previous methods that treat all pixels equally when synthesizing images, EDF uses Grad-CAM activation maps to enhance high-activation areas. From a supervision perspective, we downplay supervision signals that have lower losses, as they contain common patterns. Additionally, to help the DD community better explore complex scenarios, we build the Complex Dataset Distillation (Comp-DD) benchmark by meticulously selecting sixteen subsets, eight easy and eight hard, from ImageNet-1K. In particular, EDF consistently outperforms SOTA results in complex scenarios, such as ImageNet-1K subsets. Hopefully, more researchers will be inspired and encouraged to improve the practicality and efficacy of DD. Our code and benchmark will be made public at https://github.com/NUS-HPC-AI-Lab/EDF.