NoOVD: Novel Category Discovery and Embedding for Open-Vocabulary Object Detection

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in open-vocabulary object detection where unseen categories are frequently misclassified as background, leading to degraded recall. To mitigate this issue, the authors propose NoOVD, a framework that leverages the frozen knowledge of a pretrained vision-language model (VLM) to guide the discovery of novel categories. The approach introduces a Knowledge-guided Feature Pyramid Network (K-FPN) module that enables self-distillation, preventing novel classes from being erroneously aligned with the background. Additionally, a Recall-oriented Region Proposal Network (R-RPN) dynamically adjusts proposal confidence scores during inference to enhance recall. Notably, NoOVD requires no additional training data and achieves state-of-the-art performance across multiple benchmarks, including OV-LVIS, OV-COCO, and Objects365.

Technology Category

Application Category

📝 Abstract
Despite the remarkable progress in open-vocabulary object detection (OVD), a significant gap remains between the training and testing phases. During training, the RPN and RoI heads often misclassify unlabeled novel-category objects as background, causing some proposals to be prematurely filtered out by the RPN while others are further misclassified by the RoI head. During testing, these proposals again receive low scores and are removed in post-processing, leading to a significant drop in recall and ultimately weakening novel-category detection performance.To address these issues, we propose a novel training framework-NoOVD-which innovatively integrates a self-distillation mechanism grounded in the knowledge of frozen vision-language models (VLMs). Specifically, we design K-FPN, which leverages the pretrained knowledge of VLMs to guide the model in discovering novel-category objects and facilitates knowledge distillation-without requiring additional data-thus preventing forced alignment of novel objects with background.Additionally, we introduce R-RPN, which adjusts the confidence scores of proposals during inference to improve the recall of novel-category objects. Cross-dataset evaluations on OV-LVIS, OV-COCO, and Objects365 demonstrate that our approach consistently achieves superior performance across multiple metrics.
Problem

Research questions and friction points this paper is trying to address.

open-vocabulary object detection
novel category discovery
region proposal network
recall drop
background misclassification
Innovation

Methods, ideas, or system contributions that make the work stand out.

open-vocabulary object detection
novel category discovery
vision-language models
self-distillation
region proposal network
🔎 Similar Papers
No similar papers found.
Y
Yupeng Zhang
College of Intelligence and Computing, Tianjin University; Key Research Center for Surface Monitoring and Analysis of Relics, State Administration of Cultural Heritage
Ruize Han
Ruize Han
SUAT
Computer VisionMultimedia AnalysisVideo UnderstandingActive Vision
Z
Zhiwei Chen
School of Artificial Intelligence, Nanchang University
Wei Feng
Wei Feng
School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University
Computer VisionImage ProcessingMachine Learning
Liang Wan
Liang Wan
College of Intelligence and Computing, Medical College, Tianjin University
computer visionmedical image processing