🤖 AI Summary
This work addresses the high annotation cost of fully supervised object detection in remote sensing imagery, where scenes are densely packed and exhibit high class diversity. To this end, we propose the first sparse partial weakly supervised object detection framework. Our approach introduces the SOS-Student model to learn orientation and scale information from sparsely provided weak annotations, incorporates a multi-level pseudo-label filtering mechanism to enhance the quality of unlabeled data utilization, and adopts a class-balanced sparse partitioning strategy to mitigate long-tailed distribution issues. Evaluated on the DOTA and DIOR benchmarks, the proposed method significantly outperforms existing fully supervised, semi-supervised, and weakly supervised approaches, achieving high detection accuracy with minimal annotation effort and establishing an efficient, cost-effective paradigm for remote sensing object detection.
📝 Abstract
A consistent trend throughout the research of oriented object detection has been the pursuit of maintaining comparable performance with fewer and weaker annotations. This is particularly crucial in the remote sensing domain, where the dense object distribution and a wide variety of categories contribute to prohibitively high costs. Based on the supervision level, existing oriented object detection algorithms can be broadly grouped into fully supervised, semi-supervised, and weakly supervised methods. Within the scope of this work, we further categorize them to include sparsely supervised and partially weakly-supervised methods. To address the challenges of large-scale labeling, we introduce the first Sparse Partial Weakly-Supervised Oriented Object Detection framework, designed to efficiently leverage only a few sparse weakly-labeled data and plenty of unlabeled data. Our framework incorporates three key innovations: (1) We design a Sparse-annotation-Orientation-and-Scale-aware Student (SOS-Student) model to separate unlabeled objects from the background in a sparsely-labeled setting, and learn orientation and scale information from orientation-agnostic or scale-agnostic weak annotations. (2) We construct a novel Multi-level Pseudo-label Filtering strategy that leverages the distribution of model predictions, which is informed by the model's multi-layer predictions. (3) We propose a unique sparse partitioning approach, ensuring equal treatment for each category. Extensive experiments on the DOTA and DIOR datasets show that our framework achieves a significant performance gain over traditional oriented object detection methods mentioned above, offering a highly cost-effective solution. Our code is publicly available at https://github.com/VisionXLab/SPWOOD.