🤖 AI Summary
In real-world scenarios, variations in image backgrounds, styles, and acquisition devices severely degrade model out-of-distribution (OOD) generalization. Conventional data augmentation lacks robustness, while dataset-specific augmentation relies heavily on expert priors. Existing methods struggle to jointly optimize frequency-domain adaptability and pixel-level detail preservation. To address this, we propose D-GAP—a gradient-guided adaptive frequency-spatial joint augmentation framework. D-GAP is the first method to generate frequency-sensitive maps directly from task gradients, enabling prior-free frequency-magnitude interpolation and pixel-wise mixing in a synergistically optimized manner. This effectively mitigates model overfitting to domain-specific frequency components. Extensive experiments on four real-world datasets and three OOD benchmarks demonstrate consistent improvements: +5.3% and +1.8% average accuracy gains over generic and dataset-customized augmentation baselines, respectively.
📝 Abstract
Out-of-domain (OOD) robustness is challenging to achieve in real-world computer vision applications, where shifts in image background, style, and acquisition instruments always degrade model performance. Generic augmentations show inconsistent gains under such shifts, whereas dataset-specific augmentations require expert knowledge and prior analysis. Moreover, prior studies show that neural networks adapt poorly to domain shifts because they exhibit a learning bias to domain-specific frequency components. Perturbing frequency values can mitigate such bias but overlooks pixel-level details, leading to suboptimal performance. To address these problems, we propose D-GAP (Dataset-agnostic and Gradient-guided augmentation in Amplitude and Pixel spaces), improving OOD robustness by introducing targeted augmentation in both the amplitude space (frequency space) and pixel space. Unlike conventional handcrafted augmentations, D-GAP computes sensitivity maps in the frequency space from task gradients, which reflect how strongly the model responds to different frequency components, and uses the maps to adaptively interpolate amplitudes between source and target samples. This way, D-GAP reduces the learning bias in frequency space, while a complementary pixel-space blending procedure restores fine spatial details. Extensive experiments on four real-world datasets and three domain-adaptation benchmarks show that D-GAP consistently outperforms both generic and dataset-specific augmentations, improving average OOD performance by +5.3% on real-world datasets and +1.8% on benchmark datasets.