🤖 AI Summary
This paper addresses the degradation of content representation and misalignment in orientation prediction for oriented object detection under unknown target domains caused by image style shifts, and formally introduces the novel task of domain-generalized oriented object detection. To enhance cross-domain robustness, we propose a dual-cooperative framework: (i) Rotation-Aware Content Consistency learning (RAC) to preserve orientation invariance, and (ii) Style-Enhanced Consistency learning (SEC) to improve content generalizability. Additionally, we integrate CLIP-guided style hallucination, rotation-invariant feature modeling, and multi-domain style disentanglement training. Extensive experiments on multiple cross-domain benchmarks demonstrate substantial improvements over state-of-the-art methods, achieving new SOTA performance. Notably, our approach significantly boosts both detection accuracy and orientation stability on unseen domains.
📝 Abstract
Oriented object detection has been rapidly developed in the past few years, but most of these methods assume the training and testing images are under the same statistical distribution, which is far from reality. In this paper, we propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target domains. Learning domain generalized oriented object detectors is particularly challenging, as the cross-domain style variation not only negatively impacts the content representation, but also leads to unreliable orientation predictions. To address these challenges, we propose a generalized oriented object detector (GOOD). After style hallucination by the emerging contrastive language-image pre-training (CLIP), it consists of two key components, namely, rotation-aware content consistency learning (RAC) and style consistency learning (SEC). The proposed RAC allows the oriented object detector to learn stable orientation representation from style-diversified samples. The proposed SEC further stabilizes the generalization ability of content representation from different image styles. Extensive experiments on multiple cross-domain settings show the state-of-the-art performance of GOOD. Source code will be publicly available.