DAM: Domain-Aware Module for Multi-Domain Dataset Condensation

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing dataset condensation (DC) methods overlook the multi-domain heterogeneity of modern datasets, leading to poor cross-domain generalization. This paper proposes Multi-Domain Dataset Condensation (MDDC), a novel paradigm unifying compression and generalization across both single- and multi-domain settings. Our approach introduces two key innovations: (1) a Domain-Aware Module (DAM) that implicitly models domain-specific features via learnable spatial masks; and (2) a pseudo-domain labeling method grounded in frequency-domain statistics, eliminating the need for ground-truth domain annotations. By integrating frequency-domain analysis, synthetic image optimization, and modular training-time design, MDDC achieves significant improvements over state-of-the-art methods across intra-domain, out-of-domain, and cross-architecture evaluations—while strictly adhering to the image-per-class (IPC) constraint.

Technology Category

Application Category

📝 Abstract

Dataset Condensation (DC) has emerged as a promising solution to mitigate the computational and storage burdens associated with training deep learning models. However, existing DC methods largely overlook the multi-domain nature of modern datasets, which are increasingly composed of heterogeneous images spanning multiple domains. In this paper, we extend DC and introduce Multi-Domain Dataset Condensation (MDDC), which aims to condense data that generalizes across both single-domain and multi-domain settings. To this end, we propose the Domain-Aware Module (DAM), a training-time module that embeds domain-related features into each synthetic image via learnable spatial masks. As explicit domain labels are mostly unavailable in real-world datasets, we employ frequency-based pseudo-domain labeling, which leverages low-frequency amplitude statistics. DAM is only active during the condensation process, thus preserving the same images per class (IPC) with prior methods. Experiments show that DAM consistently improves in-domain, out-of-domain, and cross-architecture performance over baseline dataset condensation methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses multi-domain dataset condensation challenges

Proposes Domain-Aware Module for cross-domain generalization

Enhances performance without explicit domain labels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Domain-Aware Module for multi-domain condensation

Uses frequency-based pseudo-domain labeling without explicit labels

Embeds domain features via learnable spatial masks

🔎 Similar Papers

Elucidating the Design Space of Dataset Condensation