Visually Similar Pair Alignment for Robust Cross-Domain Object Detection

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-domain object detection suffers from performance degradation due to visual discrepancies—such as color shifts, orientation variations, and fogging—between source and target domains; existing feature alignment methods neglect sample-level visual similarity and thus fail to jointly address domain-specific biases and general visual variations. This paper proposes a visual-similarity-aware source-target feature alignment framework: it introduces an updateable memory bank to separately store and dynamically retrieve foreground and background features from the source domain; and designs a fine-grained, disentangled visual similarity metric for discriminative feature matching. To our knowledge, this is the first work to empirically validate the critical role of visual-similarity-based alignment in cross-domain detection. Our method achieves 53.1 mAP on Foggy Cityscapes and 62.3 mAP on Sim10k, surpassing state-of-the-art approaches by 1.2 and 4.1 mAP, respectively.

Technology Category

Application Category

📝 Abstract
Domain gaps between training data (source) and real-world environments (target) often degrade the performance of object detection models. Most existing methods aim to bridge this gap by aligning features across source and target domains but often fail to account for visual differences, such as color or orientation, in alignment pairs. This limitation leads to less effective domain adaptation, as the model struggles to manage both domain-specific shifts (e.g., fog) and visual variations simultaneously. In this work, we demonstrate for the first time, using a custom-built dataset, that aligning visually similar pairs significantly improves domain adaptation. Based on this insight, we propose a novel memory-based system to enhance domain alignment. This system stores precomputed features of foreground objects and background areas from the source domain, which are periodically updated during training. By retrieving visually similar source features for alignment with target foreground and background features, the model effectively addresses domain-specific differences while reducing the impact of visual variations. Extensive experiments across diverse domain shift scenarios validate our method's effectiveness, achieving 53.1 mAP on Foggy Cityscapes and 62.3 on Sim10k, surpassing prior state-of-the-art methods by 1.2 and 4.1 mAP, respectively.
Problem

Research questions and friction points this paper is trying to address.

Aligning visually similar pairs for better domain adaptation
Reducing domain gaps in object detection models
Improving cross-domain feature alignment with memory-based system
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligning visually similar pairs for domain adaptation
Memory-based system storing precomputed source features
Retrieving similar source features for target alignment
🔎 Similar Papers
No similar papers found.