🤖 AI Summary
Dataset distillation suffers from significant performance degradation under high images-per-class (IPC) or high-resolution settings, while diffusion-based generation often lacks diversity, leading to redundant distilled samples. To address these issues, we propose an adversarially guided curriculum sampling framework: the distillation process is decomposed into multiple curriculum stages, where progressively refined discriminator feedback dynamically steers the diffusion model to generate increasingly representative and diverse samples—from simple to complex—thereby mitigating mode collapse and coverage bias. This work is the first to integrate adversarial training with curriculum learning within a diffusion-based distillation pipeline, enabling structured, low-redundancy data compression. Our method achieves absolute test accuracy improvements of 4.1% on Imagewoof and 2.1% on ImageNet-1K, substantially outperforming existing state-of-the-art approaches.
📝 Abstract
Dataset distillation aims to encapsulate the rich information contained in dataset into a compact distilled dataset but it faces performance degradation as the image-per-class (IPC) setting or image resolution grows larger. Recent advancements demonstrate that integrating diffusion generative models can effectively facilitate the compression of large-scale datasets while maintaining efficiency due to their superiority in matching data distribution and summarizing representative patterns. However, images sampled from diffusion models are always blamed for lack of diversity which may lead to information redundancy when multiple independent sampled images are aggregated as a distilled dataset. To address this issue, we propose Adversary-guided Curriculum Sampling (ACS), which partitions the distilled dataset into multiple curricula. For generating each curriculum, ACS guides diffusion sampling process by an adversarial loss to challenge a discriminator trained on sampled images, thus mitigating information overlap between curricula and fostering a more diverse distilled dataset. Additionally, as the discriminator evolves with the progression of curricula, ACS generates images from simpler to more complex, ensuring efficient and systematic coverage of target data informational spectrum. Extensive experiments demonstrate the effectiveness of ACS, which achieves substantial improvements of 4.1% on Imagewoof and 2.1% on ImageNet-1k over the state-of-the-art.