Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dataset distillation (DD) suffers significant performance degradation under high images-per-class (IPC) settings. Method: This paper proposes a curriculum-based distillation–real-data co-selection framework, introducing a novel “coarse-to-fine” progressive selection mechanism that dynamically adapts to the evolving stages of synthetic data. It integrates curriculum-guided selection, feature-space similarity measurement, stage-wise confidence weighting, and joint optimization of distilled and real data—overcoming the modality incompatibility bottleneck inherent in conventional single-shot, independent sample selection. Contribution/Results: On CIFAR-10, CIFAR-100, and Tiny-ImageNet under high-IPC configurations, the method improves classification accuracy by 6.6%, 5.8%, and 3.4%, respectively. On Tiny-ImageNet, it achieves 60.2% accuracy with 20% compression—only 0.3% below full-dataset training—demonstrating substantially enhanced practicality and generalization of DD in high-IPC scenarios.

Technology Category

Application Category

📝 Abstract
Dataset distillation (DD) excels in synthesizing a small number of images per class (IPC) but struggles to maintain its effectiveness in high-IPC settings. Recent works on dataset distillation demonstrate that combining distilled and real data can mitigate the effectiveness decay. However, our analysis of the combination paradigm reveals that the current one-shot and independent selection mechanism induces an incompatibility issue between distilled and real images. To address this issue, we introduce a novel curriculum coarse-to-fine selection (CCFS) method for efficient high-IPC dataset distillation. CCFS employs a curriculum selection framework for real data selection, where we leverage a coarse-to-fine strategy to select appropriate real data based on the current synthetic dataset in each curriculum. Extensive experiments validate CCFS, surpassing the state-of-the-art by +6.6% on CIFAR-10, +5.8% on CIFAR-100, and +3.4% on Tiny-ImageNet under high-IPC settings. Notably, CCFS achieves 60.2% test accuracy on ResNet-18 with a 20% compression ratio of Tiny-ImageNet, closely matching full-dataset training with only 0.3% degradation. Code: https://github.com/CYDaaa30/CCFS.
Problem

Research questions and friction points this paper is trying to address.

Addresses incompatibility between distilled and real images in dataset distillation
Proposes curriculum coarse-to-fine selection for high-IPC dataset distillation
Improves test accuracy in high-IPC settings across multiple datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum selection framework for real data
Coarse-to-fine strategy for data compatibility
Combines distilled and real data effectively
🔎 Similar Papers
2024-05-15arXiv.orgCitations: 6
Yanda Chen
Yanda Chen
Anthropic
Natural Language ProcessingMachine Learning
G
Gongwei Chen
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
M
Miao Zhang
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
W
Weili Guan
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
L
Liqiang Nie
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen