DSAA: Dual-Stage Attribute Activation for Fine-grained Open Vocabulary Detection

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
This work addresses the challenge in open-vocabulary object detection where fine-grained recognition is often hindered by dominant category signals that suppress the binding of attributes—such as color and texture—to object instances. To mitigate this, the authors propose a two-stage attribute activation framework: first, an attribute prefix adapter injects explicit attribute priors during text embedding; second, a Key/Value modulation module enhances the attention representation of attribute-related tokens within the BERT encoding stage. Additionally, an attribute-aware contrastive loss is introduced to improve discrimination among instances of the same category but differing attributes. This approach uniquely integrates prefix-guided prompting with attention modulation in open-vocabulary detection, substantially strengthening attribute semantic representation and binding accuracy. Experiments on the FG-OVD benchmark demonstrate consistent and significant improvements in fine-grained detection performance across multiple state-of-the-art models.
📝 Abstract
Open-Vocabulary Object Detection (OVD) models break the limitations of closed-set detection, enabling the iden- tification of unseen categories through natural language prompts. However, they exhibit notable limitations in fine- grained detection tasks involving attributes like color, ma- terial, and texture. We attribute this performance bottle- neck in OVD models to a core issue: when category sig- nals dominate, OVD models tend to marginalize attribute information during inference. This leads to incorrect bind- ing between attributes and target objects. To address this, we propose the Dual-Stage Attribute Activation (DSAA) framework, which enhances fine-grained detection capa- bilities by strengthening attribute semantics at two criti- cal stages. In the text embedding stage, we employ At- tribute Prefix Adapter (APA) module to generate attribute prefixes that inject explicit attribute priors. To further am- plify the influence of these attributes, our Key/Value (K/V) Modulator module then intervenes during the BERT encod- ing phase, selectively enhancing the Key and Value vec- tors of the corresponding attribute tokens. In addition, we introduce an attribute-aware contrastive loss to improve discrimination among same-category instances with differ- ent attributes during training. Experimental results on the FG-OVD benchmark demonstrate the effectiveness of our method across various mainstream open-vocabulary mod- els.
Problem

Research questions and friction points this paper is trying to address.

Open-Vocabulary Object Detection
Fine-grained Detection
Attribute Binding
Semantic Marginalization
Unseen Categories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Stage Attribute Activation
Open-Vocabulary Object Detection
Attribute Semantics Enhancement
Key/Value Modulator
Attribute-Aware Contrastive Loss
🔎 Similar Papers
No similar papers found.
D
Donghong Jiang
Beijing University of Posts and Telecommunications
E
Endian Lin
Beijing University of Posts and Telecommunications
H
Hanqing Liu
Beijing University of Posts and Telecommunications
Mingjie Liu
Mingjie Liu
Assistant Professor, Department of Chemistry, University of Florida
computational materials scienceenergy conversion and storagemachine learningdata scienceAI-driven materials design
L
Luoping Cui
Beijing University of Posts and Telecommunications
Z
Zhao Yang
Beijing E-Hualu Information Technology Co., Ltd.
C
Chuang Zhu
Beijing University of Posts and Telecommunications; State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing, China