Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization

šŸ“… 2026-02-16
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
This work addresses the fundamental trade-off between accuracy and efficiency in large-scale dataset distillation. While optimization-based class-decoupled methods achieve high fidelity at substantial computational cost, non-optimization approaches suffer from limited accuracy. To overcome this limitation, we propose Explore-Exploit Distillation (E²D), the first method to introduce an exploration-exploitation mechanism into dataset distillation. E²D preserves semantic completeness through full-graph initialization and employs a two-stage optimization strategy: it first uniformly explores high-loss regions and then focuses updates to accelerate convergence. Our approach sets a new state of the art on ImageNet-1K with an 18Ɨ speedup over prior methods, and on the much larger ImageNet-21K, it simultaneously achieves significantly higher accuracy and a 4.3Ɨ acceleration, effectively breaking the longstanding bottleneck between distillation efficiency and fidelity.

Technology Category

Application Category

šŸ“ Abstract
Dataset distillation compresses the original data into compact synthetic datasets, reducing training time and storage while retaining model performance, enabling deployment under limited resources. Although recent decoupling-based distillation methods enable dataset distillation at large-scale, they continue to face an efficiency gap: optimization-based decoupling methods achieve higher accuracy but demand intensive computation, whereas optimization-free decoupling methods are efficient but sacrifice accuracy. To overcome this trade-off, we propose Exploration-Exploitation Distillation (E^2D), a simple, practical method that minimizes redundant computation through an efficient pipeline that begins with full-image initialization to preserve semantic integrity and feature diversity. It then uses a two-phase optimization strategy: an exploration phase that performs uniform updates and identifies high-loss regions, and an exploitation phase that focuses updates on these regions to accelerate convergence. We evaluate E^2D on large-scale benchmarks, surpassing the state-of-the-art on ImageNet-1K while being 18x faster, and on ImageNet-21K, our method substantially improves accuracy while remaining 4.3x faster. These results demonstrate that targeted, redundancy-reducing updates, rather than brute-force optimization, bridge the gap between accuracy and efficiency in large-scale dataset distillation. Code is available at https://github.com/ncsu-dk-lab.
Problem

Research questions and friction points this paper is trying to address.

dataset distillation
large-scale
efficiency-accuracy trade-off
optimization
synthetic datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

dataset distillation
exploration-exploitation optimization
large-scale learning
efficient training
synthetic data
šŸ”Ž Similar Papers
No similar papers found.