Remedying Target-Domain Astigmatism for Cross-Domain Few-Shot Object Detection

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the “target-domain astigmatism” problem in cross-domain few-shot object detection (CD-FSOD)—characterized by scattered model attention, inaccurate localization, and redundant predictions—by drawing inspiration from human foveal vision. The authors propose a center-surround attention optimization framework that integrates three synergistic modules: positive-mode refinement, negative-context modulation, and text-semantic alignment. This design effectively focuses attention on semantically relevant targets and enhances boundary discrimination. The study is the first to identify and formally name the “target-domain astigmatism” phenomenon. Through Transformer-based attention distance analysis, class-prototype-guided attention reshaping, and vision-language semantic alignment, the proposed method achieves state-of-the-art performance across six CD-FSOD benchmarks.

Technology Category

Application Category

📝 Abstract

Cross-domain few-shot object detection (CD-FSOD) aims to adapt pretrained detectors from a source domain to target domains with limited annotations, suffering from severe domain shifts and data scarcity problems. In this work, we find a previously overlooked phenomenon: models exhibit dispersed and unfocused attention in target domains, leading to imprecise localization and redundant predictions, just like a human cannot focus on visual objects. Therefore, we call it the target-domain Astigmatism problem. Analysis on attention distances across transformer layers reveals that regular fine-tuning inherently shows a trend to remedy this problem, but results are still far from satisfactory, which we aim to enhance in this paper. Biologically inspired by the human fovea-style visual system, we enhance the fine-tuning's inherent trend through a center-periphery attention refinement framework, which contains (1) a Positive Pattern Refinement module to reshape attention toward semantic objects using class-specific prototypes, simulating the visual center region; (2) a Negative Context Modulation module to enhance boundary discrimination by modeling background context, simulating the visual periphery region; and (3) a Textual Semantic Alignment module to strengthen center-periphery distinction through cross-modal cues. Our bio-inspired approach transforms astigmatic attention into focused patterns, substantially improving adaptation to target domains. Experiments on six challenging CD-FSOD benchmarks consistently demonstrate improved detection accuracy and establish new state-of-the-art results.

Problem

Research questions and friction points this paper is trying to address.

cross-domain few-shot object detection

target-domain astigmatism

attention dispersion

domain shift

data scarcity

Innovation

Methods, ideas, or system contributions that make the work stand out.

target-domain astigmatism

center-periphery attention

few-shot object detection