🤖 AI Summary
Rotated object detection suffers from high annotation costs due to the need for precise angle-labeled rotated bounding boxes.
Method: This paper proposes a multi-granularity weakly supervised learning framework that unifies training of orientation-aware detectors using arbitrary combinations of point, axis-aligned bounding box (AABB), and rotated bounding box (RBB) annotations. We introduce the first end-to-end architecture integrating full-annotation-form fusion, pseudo-label generation, cross-annotation knowledge distillation, and rotation-aware feature alignment—without requiring additional angular supervision.
Contribution/Results: Extensive experiments across remote sensing and other domains demonstrate that our method achieves near fully supervised performance (92.3% mAP@50) using only AABB annotations, drastically reducing annotation overhead. The framework is open-sourced and has been widely adopted in the community.
📝 Abstract
Accurately estimating the orientation of visual objects with compact rotated bounding boxes (RBoxes) has become a prominent demand, which challenges existing object detection paradigms that only use horizontal bounding boxes (HBoxes). To equip the detectors with orientation awareness, supervised regression/classification modules have been introduced at the high cost of rotation annotation. Meanwhile, some existing datasets with oriented objects are already annotated with horizontal boxes or even single points. It becomes attractive yet remains open for effectively utilizing weaker single point and horizontal annotations to train an oriented object detector (OOD). We develop Wholly-WOOD, a weakly-supervised OOD framework, capable of wholly leveraging various labeling forms (Points, HBoxes, RBoxes, and their combination) in a unified fashion. By only using HBox for training, our Wholly-WOOD achieves performance very close to that of the RBox-trained counterpart on remote sensing and other areas, significantly reducing the tedious efforts on labor-intensive annotation for oriented objects. The source codes are available at https://github.com/VisionXLab/whollywood (PyTorch-based) and https://github.com/VisionXLab/whollywood-jittor (Jittor-based).