DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection

📅 2024-06-21

🏛️ Neural Information Processing Systems

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

To address low recall in class-agnostic object detection (OD), particularly under out-of-distribution (OOD-OD) conditions, caused by semantic overlap among categories, this paper proposes a self-supervised prompt learning framework built upon vision-language models (VLMs). The method introduces a semantic dispersion-based hyperspherical prompt expansion mechanism, incorporating a maximum angular coverage (MAC) early-stopping criterion and parent-child prompt semantic consistency constraints to construct a highly discriminative prompt set without category annotations. Additionally, it designs a dispersion loss and a semantic uncertainty-driven prompt splitting strategy to optimize the embedding space. Evaluated on class-agnostic OD tasks using MS-COCO and LVIS, the approach achieves a 20.1% improvement in average recall (AR). Under OOD-OD settings, it attains a 21.3% higher AP compared to SAM, demonstrating robust generalization to unseen categories and distributions.

Technology Category

Application Category

📝 Abstract

Class-agnostic object detection (OD) can be a cornerstone or a bottleneck for many downstream vision tasks. Despite considerable advancements in bottom-up and multi-object discovery methods that leverage basic visual cues to identify salient objects, consistently achieving a high recall rate remains difficult due to the diversity of object types and their contextual complexity. In this work, we investigate using vision-language models (VLMs) to enhance object detection via a self-supervised prompt learning strategy. Our initial findings indicate that manually crafted text queries often result in undetected objects, primarily because detection confidence diminishes when the query words exhibit semantic overlap. To address this, we propose a Dispersing Prompt Expansion (DiPEx) approach. DiPEx progressively learns to expand a set of distinct, non-overlapping hyperspherical prompts to enhance recall rates, thereby improving performance in downstream tasks such as out-of-distribution OD. Specifically, DiPEx initiates the process by self-training generic parent prompts and selecting the one with the highest semantic uncertainty for further expansion. The resulting child prompts are expected to inherit semantics from their parent prompts while capturing more fine-grained semantics. We apply dispersion losses to ensure high inter-class discrepancy among child prompts while preserving semantic consistency between parent-child prompt pairs. To prevent excessive growth of the prompt sets, we utilize the maximum angular coverage (MAC) of the semantic space as a criterion for early termination. We demonstrate the effectiveness of DiPEx through extensive class-agnostic OD and OOD-OD experiments on MS-COCO and LVIS, surpassing other prompting methods by up to 20.1% in AR and achieving a 21.3% AP improvement over SAM. The code is available at https://github.com/jason-lim26/DiPEx.

Problem

Research questions and friction points this paper is trying to address.

Enhancing object detection recall rates

Addressing semantic overlap in text queries

Improving class-agnostic object detection performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised prompt learning strategy

Dispersing Prompt Expansion approach

Maximum angular coverage criterion

🔎 Similar Papers

No similar papers found.