🤖 AI Summary
Existing dataset condensation (DC) methods overlook the multi-domain heterogeneity of modern datasets, leading to poor cross-domain generalization. This paper proposes Multi-Domain Dataset Condensation (MDDC), a novel paradigm unifying compression and generalization across both single- and multi-domain settings. Our approach introduces two key innovations: (1) a Domain-Aware Module (DAM) that implicitly models domain-specific features via learnable spatial masks; and (2) a pseudo-domain labeling method grounded in frequency-domain statistics, eliminating the need for ground-truth domain annotations. By integrating frequency-domain analysis, synthetic image optimization, and modular training-time design, MDDC achieves significant improvements over state-of-the-art methods across intra-domain, out-of-domain, and cross-architecture evaluations—while strictly adhering to the image-per-class (IPC) constraint.
📝 Abstract
Dataset Condensation (DC) has emerged as a promising solution to mitigate the computational and storage burdens associated with training deep learning models. However, existing DC methods largely overlook the multi-domain nature of modern datasets, which are increasingly composed of heterogeneous images spanning multiple domains. In this paper, we extend DC and introduce Multi-Domain Dataset Condensation (MDDC), which aims to condense data that generalizes across both single-domain and multi-domain settings. To this end, we propose the Domain-Aware Module (DAM), a training-time module that embeds domain-related features into each synthetic image via learnable spatial masks. As explicit domain labels are mostly unavailable in real-world datasets, we employ frequency-based pseudo-domain labeling, which leverages low-frequency amplitude statistics. DAM is only active during the condensation process, thus preserving the same images per class (IPC) with prior methods. Experiments show that DAM consistently improves in-domain, out-of-domain, and cross-architecture performance over baseline dataset condensation methods.