🤖 AI Summary
Existing single-domain generalization object detection (S-DGOD) methods rely on generic data augmentation, lacking physically grounded priors, and thus struggle to enhance cross-domain robustness. This work pioneers the integration of atmospheric optical principles into S-DGOD, proposing a physically interpretable perturbation model operating in the image frequency domain. The model enables controllable synthesis of non-ideal imaging conditions—including scattering, blur, and chromatic bias—by explicitly modeling their spectral signatures. Crucially, our approach requires no architectural or loss-function modifications; instead, it improves domain-invariant representation learning solely through physics-guided frequency-domain augmentation. On the DWD and Cityscapes-C benchmarks, our method achieves absolute mAP gains of +7.3% and +7.2% over strong baselines, respectively, outperforming all existing S-DGOD approaches. This establishes a novel, interpretable, and physically grounded paradigm for generalizable object detection.
📝 Abstract
Single-Domain Generalized Object Detection~(S-DGOD) aims to train on a single source domain for robust performance across a variety of unseen target domains by taking advantage of an object detector. Existing S-DGOD approaches often rely on data augmentation strategies, including a composition of visual transformations, to enhance the detector's generalization ability. However, the absence of real-world prior knowledge hinders data augmentation from contributing to the diversity of training data distributions. To address this issue, we propose PhysAug, a novel physical model-based non-ideal imaging condition data augmentation method, to enhance the adaptability of the S-DGOD tasks. Drawing upon the principles of atmospheric optics, we develop a universal perturbation model that serves as the foundation for our proposed PhysAug. Given that visual perturbations typically arise from the interaction of light with atmospheric particles, the image frequency spectrum is harnessed to simulate real-world variations during training. This approach fosters the detector to learn domain-invariant representations, thereby enhancing its ability to generalize across various settings. Without altering the network architecture or loss function, our approach significantly outperforms the state-of-the-art across various S-DGOD datasets. In particular, it achieves a substantial improvement of $7.3%$ and $7.2%$ over the baseline on DWD and Cityscape-C, highlighting its enhanced generalizability in real-world settings.