🤖 AI Summary
In weakly supervised oriented object detection (WS-OOD), horizontal bounding box (HBox) supervision alone leads to severe scale estimation bias in rotated bounding box (RBox) regression. To address this, we propose Adaptive Bounding Box Scaling (ABBS) and Symmetric Prior Angle (SPA) loss: ABBS enables scale-adaptive alignment between HBox ground truth and RBox predictions; SPA—introducing geometric symmetry as a self-supervised signal for the first time—mitigates learning collapse caused by multi-view inconsistency. Our method integrates HBox supervision, end-to-end differentiable rotation regression, multi-view augmentation (original/rotated/flipped), and symmetry-driven angular priors. Evaluated on DOTA and HRSC, it achieves state-of-the-art performance, improving mAP by 3.2–5.7% over existing HBox-supervised approaches, with significant gains in joint scale and orientation estimation accuracy.
📝 Abstract
Weakly supervised oriented object detection (WS-OOD) has gained attention as a cost-effective alternative to fully supervised methods, providing both efficiency and high accuracy. Among weakly supervised approaches, horizontal bounding box (HBox)-supervised OOD stands out for its ability to directly leverage existing HBox annotations while achieving the highest accuracy under weak supervision settings. This paper introduces adaptive bounding box scaling and symmetry-prior-based orientation prediction, called ABBSPO, a framework for WS-OOD. Our ABBSPO addresses limitations of previous HBox-supervised OOD methods, which compare ground truth (GT) HBoxes directly with the minimum circumscribed rectangles of predicted RBoxes, often leading to inaccurate scale estimation. To overcome this, we propose: (i) Adaptive Bounding Box Scaling (ABBS), which appropriately scales GT HBoxes to optimize for the size of each predicted RBox, ensuring more accurate scale prediction; and (ii) a Symmetric Prior Angle (SPA) loss that exploits inherent symmetry of aerial objects for self-supervised learning, resolving issues in previous methods where learning collapses when predictions for all three augmented views (original, rotated, and flipped) are consistently incorrect. Extensive experimental results demonstrate that ABBSPO achieves state-of-the-art performance, outperforming existing methods.