π€ AI Summary
To address the weak generalization capability of single-domain generalization (SDG) object detectors on unseen target domains, this paper proposes a novel framework integrating semantic guidance, directional modeling, and frequency-domain enhancement. Methodologically: (1) CLIP-based semantic priors are leveraged to enforce cross-domain semantic alignment; (2) the von MisesβFisher distribution is adopted to explicitly model feature directionality, enhancing discriminability and structural robustness; (3) a joint Fourier amplitude-phase perturbation strategy is designed to explicitly simulate domain shifts in the frequency domain, thereby improving feature diversity and structural consistency. Extensive experiments on a challenging adverse-weather driving benchmark demonstrate that our approach significantly outperforms existing state-of-the-art SDG methods, validating its superior generalization ability and robustness under complex cross-domain scenarios.
π Abstract
Single Domain Generalization (SDG) for object detection aims to train a model on a single source domain that can generalize effectively to unseen target domains. While recent methods like CLIP-based semantic augmentation have shown promise, they often overlook the underlying structure of feature distributions and frequency-domain characteristics that are critical for robustness. In this paper, we propose a novel framework that enhances SDG object detection by integrating the von Mises-Fisher (vMF) distribution and Fourier transformation into a CLIP-guided pipeline. Specifically, we model the directional features of object representations using vMF to better capture domain-invariant semantic structures in the embedding space. Additionally, we introduce a Fourier-based augmentation strategy that perturbs amplitude and phase components to simulate domain shifts in the frequency domain, further improving feature robustness. Our method not only preserves the semantic alignment benefits of CLIP but also enriches feature diversity and structural consistency across domains. Extensive experiments on the diverse weather-driving benchmark demonstrate that our approach outperforms the existing state-of-the-art method.