๐ค AI Summary
This work addresses the challenge of catastrophic forgetting and costly retraining in open-vocabulary object detection when continuously learning new categories. It formally defines the task of Continual Open-Vocabulary Detection (COVD) and introduces the Novel-114 benchmark. The study reveals that pretrained visual encoders already possess sufficient representational capacity for novel concepts, with the main bottleneck lying in the stability of visionโlanguage semantic alignment. To this end, the authors propose NoIn-Det, a parameter-efficient framework that requires no additional parameters: keeping the visual encoder frozen, it selectively fine-tunes only a small subset of parameters in the text branch that are beneficial for learning new concepts. Experiments demonstrate that NoIn-Det significantly outperforms existing continual learning approaches, effectively acquiring new classes while fully preserving knowledge of previously seen ones.
๐ Abstract
Open-vocabulary object detection (OVD) has made significant progress, enabling detectors to generalize from seen to unseen categories. However, real-world category spaces continually evolve, and existing OVD models still struggle with newly emerging concepts, while repeated full retraining is prohibitively expensive. To this end, we introduce a new task setting, termed Continual OVD with Novel Concept Injection (COVD), where models sequentially learn incoming novel concept groups while preserving prior concepts and original open-vocabulary knowledge, along with a new benchmark, Novel-114. Our key observation is that pretrained visual encoders often already perceive and represent many novel concepts, and the main bottleneck lies in the lack of stable semantic alignment between visual representations and textual concepts. Based on this, we propose NoIn-Det, an efficient continual injection framework without additional parameters. NoIn-Det freezes the visual encoder, preserves the text representation space using only texts of common concepts and previously injected concepts, and injects novel concepts by updating only a small subset of text-branch parameters beneficial to novel concept learning. Extensive experiments show that NoIn-Det effectively learns novel concepts, preserves old knowledge, and consistently outperforms existing continual learning methods for VLMs without introducing additional parameters.Novel-114 and the code will be released.