🤖 AI Summary
Existing detection-based cross-view geolocation methods struggle to accurately capture the geometric structure of oriented objects due to their reliance on horizontal bounding boxes, and their localization accuracy is further limited by feature map scaling, falling short of the performance achieved by costly segmentation-based approaches. To address these limitations, this work proposes OSGeo, a novel framework that introduces rotated bounding boxes (RBoxes) for cross-view geolocation for the first time. The authors also construct CVOGL-R, the first dataset annotated with RBoxes, and design a direction-aware multi-scale detection architecture coupled with an orientation-sensitive regression head. Extensive experiments demonstrate that OSGeo achieves or surpasses the accuracy of state-of-the-art segmentation methods across multiple benchmarks while reducing annotation costs by over an order of magnitude.
📝 Abstract
Cross-View object geo-localization (CVOGL) aims to precisely determine the geographic coordinates of a query object from a ground or drone perspective by referencing a satellite map. Segmentation-based approaches offer high precision but require prohibitively expensive pixel-level annotations, whereas more economical detection-based methods suffer from lower accuracy. This performance disparity in detection is primarily caused by two factors: the poor geometric fit of Horizontal Bounding Boxes (HBoxes) for oriented objects and the degradation in precision due to feature map scaling. Motivated by these, we propose leveraging Rotated Bounding Boxes (RBoxes) as a natural extension of the detection-based paradigm. RBoxes provide a much tighter geometric fit to oriented objects. Building on this, we introduce OSGeo, a novel geo-localization framework, meticulously designed with a multi-scale perception module and an orientation-sensitive head to accurately regress RBoxes. To support this scheme, we also construct and release CVOGL-R, the first dataset with precise RBox annotations for CVOGL. Extensive experiments demonstrate that our OSGeo achieves state-of-the-art performance, consistently matching or even surpassing the accuracy of leading segmentation-based methods but with an annotation cost that is over an order of magnitude lower.