Enhancing Diffusion-based Dataset Distillation via Adversary-Guided Curriculum Sampling

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dataset distillation suffers from significant performance degradation under high images-per-class (IPC) or high-resolution settings, while diffusion-based generation often lacks diversity, leading to redundant distilled samples. To address these issues, we propose an adversarially guided curriculum sampling framework: the distillation process is decomposed into multiple curriculum stages, where progressively refined discriminator feedback dynamically steers the diffusion model to generate increasingly representative and diverse samples—from simple to complex—thereby mitigating mode collapse and coverage bias. This work is the first to integrate adversarial training with curriculum learning within a diffusion-based distillation pipeline, enabling structured, low-redundancy data compression. Our method achieves absolute test accuracy improvements of 4.1% on Imagewoof and 2.1% on ImageNet-1K, substantially outperforming existing state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Dataset distillation aims to encapsulate the rich information contained in dataset into a compact distilled dataset but it faces performance degradation as the image-per-class (IPC) setting or image resolution grows larger. Recent advancements demonstrate that integrating diffusion generative models can effectively facilitate the compression of large-scale datasets while maintaining efficiency due to their superiority in matching data distribution and summarizing representative patterns. However, images sampled from diffusion models are always blamed for lack of diversity which may lead to information redundancy when multiple independent sampled images are aggregated as a distilled dataset. To address this issue, we propose Adversary-guided Curriculum Sampling (ACS), which partitions the distilled dataset into multiple curricula. For generating each curriculum, ACS guides diffusion sampling process by an adversarial loss to challenge a discriminator trained on sampled images, thus mitigating information overlap between curricula and fostering a more diverse distilled dataset. Additionally, as the discriminator evolves with the progression of curricula, ACS generates images from simpler to more complex, ensuring efficient and systematic coverage of target data informational spectrum. Extensive experiments demonstrate the effectiveness of ACS, which achieves substantial improvements of 4.1% on Imagewoof and 2.1% on ImageNet-1k over the state-of-the-art.
Problem

Research questions and friction points this paper is trying to address.

Enhancing dataset distillation performance with diffusion models
Reducing information redundancy in diffusion-sampled distilled datasets
Improving diversity and coverage in curriculum-based distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversary-guided Curriculum Sampling for diversity
Partitions dataset into evolving complexity curricula
Uses adversarial loss to reduce information redundancy
🔎 Similar Papers
No similar papers found.
L
Lexiao Zou
Harbin Institute of Technology, Shenzhen
G
Gongwei Chen
Harbin Institute of Technology, Shenzhen
Yanda Chen
Yanda Chen
Anthropic
Natural Language ProcessingMachine Learning
M
Miao Zhang
Harbin Institute of Technology, Shenzhen