🤖 AI Summary
To address insufficient cross-domain generalization caused by source-target distribution shifts in domain adaptation, this paper proposes the Frequency-Pixel Connect (FPC) framework—the first targeted data augmentation paradigm that synergistically integrates frequency and pixel domains. FPC synthesizes semantically consistent yet domain-discriminative samples by fusing source-domain pixel content with target-domain spectral magnitude. It requires no expert priors, is dataset-agnostic, and supports unified adaptation across multimodal domains—including vision, medical imaging, audio, and astronomy. Leveraging FFT/IFFT-based spectral manipulation, cross-domain mixing, and contrastive learning to enforce intra-class compactness and inter-class separation, FPC achieves average cross-domain accuracy gains of 4.2–9.7% on four real-world benchmarks. These improvements significantly surpass both generic and dataset-specific augmentation methods, empirically validating the broad robustness benefits of joint frequency-spatial enhancement for out-of-distribution generalization.
📝 Abstract
Out-of-domain (OOD) robustness under domain adaptation settings, where labeled source data and unlabeled target data come from different distributions, is a key challenge in real-world applications. A common approach to improving OOD robustness is through data augmentations. However, in real-world scenarios, models trained with generic augmentations can only improve marginally when generalized under distribution shifts toward unlabeled target domains. While dataset-specific targeted augmentations can address this issue, they typically require expert knowledge and extensive prior data analysis to identify the nature of the datasets and domain shift. To address these challenges, we propose Frequency-Pixel Connect, a domain-adaptation framework that enhances OOD robustness by introducing a targeted augmentation in both the frequency space and pixel space. Specifically, we mix the amplitude spectrum and pixel content of a source image and a target image to generate augmented samples that introduce domain diversity while preserving the semantic structure of the source image. Unlike previous targeted augmentation methods that are both dataset-specific and limited to the pixel space, Frequency-Pixel Connect is dataset-agnostic, enabling broader and more flexible applicability beyond natural image datasets. We further analyze the effectiveness of Frequency-Pixel Connect by evaluating the performance of our method connecting same-class cross-domain samples while separating different-class examples. We demonstrate that Frequency-Pixel Connect significantly improves cross-domain connectivity and outperforms previous generic methods on four diverse real-world benchmarks across vision, medical, audio, and astronomical domains, and it also outperforms other dataset-specific targeted augmentation methods.