🤖 AI Summary
Fully supervised oriented object detection in remote sensing imagery incurs prohibitively high annotation costs, while existing weakly and semi-supervised methods fail to address the critical challenge of sparse annotations in dense-scene scenarios. Method: This paper introduces Sparse-Annotation Oriented Object Detection (SAOOD), a novel learning paradigm targeting two key challenges under extremely sparse rotation-instance labeling: foreground representation overfitting and interference from unlabeled objects (false negatives). To tackle these, we propose a stepwise teacher model integrating curriculum-driven progressive pseudo-label generation, consistency regularization, dynamic pseudo-label mining, and instance-level reweighting of losses for unlabeled objects. Contribution/Results: On the DOTA benchmark, our method achieves near fully supervised performance using only 10% of annotated samples—substantially outperforming state-of-the-art weakly and semi-supervised approaches—and establishes an effective trade-off between annotation efficiency and detection accuracy.
📝 Abstract
Although fully-supervised oriented object detection has made significant progress in multimodal remote sensing image understanding, it comes at the cost of labor-intensive annotation. Recent studies have explored weakly and semi-supervised learning to alleviate this burden. However, these methods overlook the difficulties posed by dense annotations in complex remote sensing scenes. In this paper, we introduce a novel setting called sparsely annotated oriented object detection (SAOOD), which only labels partial instances, and propose a solution to address its challenges. Specifically, we focus on two key issues in the setting: (1) sparse labeling leading to overfitting on limited foreground representations, and (2) unlabeled objects (false negatives) confusing feature learning. To this end, we propose the S$^2$Teacher, a novel method that progressively mines pseudo-labels for unlabeled objects, from easy to hard, to enhance foreground representations. Additionally, it reweights the loss of unlabeled objects to mitigate their impact during training. Extensive experiments demonstrate that S$^2$Teacher not only significantly improves detector performance across different sparse annotation levels but also achieves near-fully-supervised performance on the DOTA dataset with only 10% annotation instances, effectively balancing detection accuracy with annotation efficiency. The code will be public.