π€ AI Summary
Existing dataset distillation methods often overlook the characteristics of downstream tasks, leading to a misalignment between distillation objectives and actual task requirements, which degrades model performance. This work introduces sample difficulty into dataset distillation for the first time, proposing a Difficulty-Guided Sampling (DGS) module and a Difficulty-Aware Guidance (DAG) strategy. DGS operates as a plug-in post-processing component that samples distilled images according to a target difficulty distribution, while DAG dynamically aligns sample difficulty with task demands during training. Notably, the approach enhances task-oriented data selection without modifying the underlying distillation framework. Extensive experiments across multiple distillation algorithms and settings demonstrate consistent and significant improvements in downstream image classification performance, underscoring the universal value of difficulty-awareness in bridging the gap between distillation goals and task-specific needs.
π Abstract
In this paper, we propose difficulty-guided sampling (DGS) to bridge the target gap between the distillation objective and the downstream task, therefore improving the performance of dataset distillation. Deep neural networks achieve remarkable performance but have time and storage-consuming training processes. Dataset distillation is proposed to generate compact, high-quality distilled datasets, enabling effective model training while maintaining downstream performance. Existing approaches typically focus on features extracted from the original dataset, overlooking task-specific information, which leads to a target gap between the distillation objective and the downstream task. We propose leveraging characteristics that benefit the downstream training into data distillation to bridge this gap. Focusing on the downstream task of image classification, we introduce the concept of difficulty and propose DGS as a plug-in post-stage sampling module. Following the specific target difficulty distribution, the final distilled dataset is sampled from image pools generated by existing methods. We also propose difficulty-aware guidance (DAG) to explore the effect of difficulty in the generation process. Extensive experiments across multiple settings demonstrate the effectiveness of the proposed methods. It also highlights the broader potential of difficulty for diverse downstream tasks.