🤖 AI Summary
In dataset distillation, existing methods produce synthetic data with high redundancy and insufficient diversity. To address this, we propose DiRe—a plug-and-play diversity regularization framework that explicitly enforces sample-level diversity without modifying the backbone architecture. DiRe is the first method to jointly leverage cosine similarity and Euclidean distance to construct an explicit diversity regularizer, seamlessly integrating into any gradient-matching-based distillation pipeline. By synergistically constraining inter-sample distribution discrepancies via these two complementary metrics, DiRe significantly enhances intra-class dispersion and inter-class discriminability of synthetic data. Extensive experiments on CIFAR-10/100, Tiny-ImageNet, and ImageNet-1K demonstrate consistent improvements in both classification accuracy and diversity metrics across mainstream distillation methods—including DC, DM, and DSA—surpassing all existing state-of-the-art approaches.
📝 Abstract
In Dataset Condensation, the goal is to synthesize a small dataset that replicates the training utility of a large original dataset. Existing condensation methods synthesize datasets with significant redundancy, so there is a dire need to reduce redundancy and improve the diversity of the synthesized datasets. To tackle this, we propose an intuitive Diversity Regularizer (DiRe) composed of cosine similarity and Euclidean distance, which can be applied off-the-shelf to various state-of-the-art condensation methods. Through extensive experiments, we demonstrate that the addition of our regularizer improves state-of-the-art condensation methods on various benchmark datasets from CIFAR-10 to ImageNet-1K with respect to generalization and diversity metrics.