🤖 AI Summary
This work addresses the limitations of existing query-based detectors in handling arbitrarily oriented, texture-sparse tiny objects, which stem primarily from insufficient exploitation of geometric information during feature decoding and cross-stage inconsistencies caused by stage-wise bipartite matching. To overcome these challenges, the authors propose IGOFormer, the first framework to explicitly model and integrate an object’s intrinsic geometric structure into the query-based detection pipeline. Specifically, they design an intrinsic geometry-aware decoder that enhances orientation awareness through geometric embedding extrapolation, and introduce a query-adaptive momentum bipartite matching mechanism—combining exponential moving average with a smoothing factor—to improve cross-stage matching stability. Evaluated on DOTA-V1.0 with a Swin-T backbone under single-scale settings, the method achieves 78.00% AP$_{50}$, significantly outperforming current state-of-the-art approaches.
📝 Abstract
Recent query-based detectors have achieved remarkable progress, yet their performance remains constrained when handling objects with arbitrary orientations, especially for tiny objects capturing limited texture information. This limitation primarily stems from the underutilization of intrinsic geometry during pixel-based feature decoding and the occurrence of inter-stage matching inconsistency caused by stage-wise bipartite matching. To tackle these challenges, we present IGOFormer, a novel query-based oriented object detector that explicitly integrates intrinsic geometry into feature decoding and enhances inter-stage matching stability. Specifically, we design an Intrinsic Geometry-aware Decoder, which enhances the object-related features conditioned on an object query by injecting complementary geometric embeddings extrapolated from their correlations to capture the geometric layout of the object, thereby offering a critical geometric insight into its orientation. Meanwhile, a Momentum-based Bipartite Matching scheme is developed to adaptively aggregate historical matching costs by formulating an exponential moving average with query-specific smoothing factors, effectively preventing conflicting supervisory signals arising from inter-stage matching inconsistency. Extensive experiments and ablation studies demonstrate the superiority of our IGOFormer for aerial oriented object detection, achieving an AP$_{50}$ score of 78.00\% on DOTA-V1.0 using Swin-T backbone under the single-scale setting. The code will be made publicly available.