🤖 AI Summary
This work addresses the limitations of existing remote sensing image object counting methods, which are confined to predefined categories and struggle to generalize to novel classes without costly re-annotation and retraining. To overcome this, we propose RS-OVC, the first open-vocabulary counting model tailored for remote sensing imagery, thereby introducing open-vocabulary counting to this domain for the first time. RS-OVC leverages textual and/or visual conditioning to enable accurate zero-shot counting of categories unseen during training, eliminating the need for additional annotations. Experimental results demonstrate that RS-OVC achieves high counting accuracy on unseen classes, significantly enhancing the practicality and adaptability of remote sensing monitoring in dynamic real-world scenarios.
📝 Abstract
Object-Counting for remote-sensing (RS) imagery is attracting increasing research interest due to its crucial role in a wide and diverse set of applications. While several promising methods for RS object-counting have been proposed, existing methods focus on a closed, pre-defined set of object classes. This limitation necessitates costly re-annotation and model re-training to adapt current approaches for counting of novel objects that have not been seen during training, and severely inhibits their application in dynamic, real-world monitoring scenarios. To address this gap, in this work we propose RS-OVC - the first Open Vocabulary Counting (OVC) model for Remote-Sensing and aerial imagery. We show that our model is capable of accurate counting of novel object classes, that were unseen during training, based solely on textual and/or visual conditioning.