๐ค AI Summary
This work addresses the lack of instance-level spatial layout modeling in oriented object detection (OOD) under point supervision. We propose the first weakly supervised learning framework leveraging geometric relationships among instances. Methodologically, we innovatively integrate Gaussian distribution modeling, Voronoi tessellation, and watershed analysis to formulate a triple-constraint loss: (i) Gaussian overlap loss for modeling instance density distributions; (ii) Voronoi watershed loss to capture spatial competition; and (iii) multi-view consistency loss to enhance geometric robustness. Additionally, edge-aware loss and copy-paste augmentation are introduced to improve boundary localization accuracy. Our approach achieves state-of-the-art mAP scores of 62.61% on DOTA, 86.15% on HRSC, and 34.71% on FAIR1Mโdemonstrating significant improvements for dense, rotated object detection. The method exhibits strong robustness to varying instance densities while maintaining computational efficiency and model lightweightness.
๐ Abstract
With the rapidly increasing demand for oriented object detection (OOD), recent research involving weakly-supervised detectors for learning OOD from point annotations has gained great attention. In this paper, we rethink this challenging task setting with the layout among instances and present Point2RBox-v2. At the core are three principles: 1) Gaussian overlap loss. It learns an upper bound for each instance by treating objects as 2D Gaussian distributions and minimizing their overlap. 2) Voronoi watershed loss. It learns a lower bound for each instance through watershed on Voronoi tessellation. 3) Consistency loss. It learns the size/rotation variation between two output sets with respect to an input image and its augmented view. Supplemented by a few devised techniques, e.g. edge loss and copy-paste, the detector is further enhanced.To our best knowledge, Point2RBox-v2 is the first approach to explore the spatial layout among instances for learning point-supervised OOD. Our solution is elegant and lightweight, yet it is expected to give a competitive performance especially in densely packed scenes: 62.61%/86.15%/34.71% on DOTA/HRSC/FAIR1M. Code is available at https://github.com/VisionXLab/point2rbox-v2.