Parameter-Efficient Semantic Augmentation for Enhancing Open-Vocabulary Object Detection

πŸ“… 2026-04-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the performance degradation of open-vocabulary object detection models under significant domain shift, which stems from the sparsity and coarse granularity of category label semantics that hinder fine-grained semantic capture. To overcome this limitation, the authors propose the HSA-DINO framework, which leverages a multi-scale prompt bank and an image feature pyramid to extract hierarchical semantics. It dynamically selects domain-specific local prompts to progressively refine textual representations from coarse to fine granularity. Furthermore, a semantic-aware routing mechanism adaptively chooses the optimal enhancement strategy during inference, preserving the pre-trained model’s generalization capability without requiring fine-tuning. This parameter-efficient approach achieves strong domain adaptability across vertical domains, outperforming state-of-the-art methods on OV-COCO and multiple domain-specific benchmarks while effectively balancing domain adaptation and open-vocabulary generalization.
πŸ“ Abstract
Open-vocabulary object detection (OVOD) enables models to detect any object category, including unseen ones. Benefiting from large-scale pre-training, existing OVOD methods achieve strong detection performance on general scenarios (e.g., OV-COCO) but suffer severe performance drops when transferred to downstream tasks with substantial domain shifts. This degradation stems from the scarcity and weak semantics of category labels in domain-specific task, as well as the inability of existing models to capture auxiliary semantics beyond coarse-grained category label. To address these issues, we propose HSA-DINO, a parameter-efficient semantic augmentation framework for enhancing open-vocabulary object detection. Specifically, we propose a multi-scale prompt bank that leverages image feature pyramids to capture hierarchical semantics and select domain-specific local semantic prompts, progressively enriching textual representations from coarse to fine-grained levels. Furthermore, we introduce a semantic-aware router that dynamically selects the appropriate semantic augmentation strategy during inference, thereby preventing parameter updates from degrading the generalization ability of the pre-trained OVOD model. We evaluate HSA-DINO on OV-COCO, several vertical domain datasets, and modified benchmark settings. The results show that HSA-DINO performs favorably against previous state-of-the-art methods, achieving a superior trade-off between domain adaptability and open-vocabulary generalization.
Problem

Research questions and friction points this paper is trying to address.

open-vocabulary object detection
domain shift
semantic augmentation
category label scarcity
downstream task adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

parameter-efficient
semantic augmentation
open-vocabulary object detection
multi-scale prompt bank
semantic-aware router
πŸ”Ž Similar Papers
No similar papers found.