🤖 AI Summary
To address the underutilization of soft labels from synthetic datasets and the high sensitivity of model training to loss function design in knowledge distillation, this paper proposes GIFT—a zero-overhead, parameter-free, plug-and-play general framework. GIFT enhances soft label exploitation via soft label refinement and a cosine similarity–based loss function, fully unlocking fine-grained inter-class relational information embedded in soft labels, thereby significantly improving model robustness and generalization on distilled data. Our experiments first uncover the critical sensitivity of soft-label loss functions to synthetic-data training. Extensive evaluations across multi-scale benchmarks—including ImageNet-1K—demonstrate that GIFT consistently outperforms state-of-the-art distillation methods. Notably, it achieves up to a 30.8% improvement in cross-optimizer generalization performance, all without incurring any additional computational cost.
📝 Abstract
Recent advancements in dataset distillation have demonstrated the significant benefits of employing soft labels generated by pre-trained teacher models. In this paper, we introduce a novel perspective by emphasizing the full utilization of labels. We first conduct a comprehensive comparison of various loss functions for soft label utilization in dataset distillation, revealing that the model trained on the synthetic dataset exhibits high sensitivity to the choice of loss function for soft label utilization. This finding highlights the necessity of a universal loss function for training models on synthetic datasets. Building on these insights, we introduce an extremely simple yet surprisingly effective plug-and-play approach, GIFT, which encompasses soft label refinement and a cosine similarity-based loss function to efficiently leverage full label information. Extensive experiments indicate that GIFT consistently enhances state-of-the-art dataset distillation methods across various dataset scales, without incurring additional computational costs. Importantly, GIFT significantly enhances cross-optimizer generalization, an area previously overlooked. For instance, on ImageNet-1K with IPC = 10, GIFT enhances the state-of-the-art method RDED by 30.8% in cross-optimizer generalization. Our code is available at https://github.com/LINs-lab/GIFT.