๐ค AI Summary
Existing methods for aerial object detection struggle to model rotational equivariance due to the arbitrary orientations of objects in aerial imagery; most rely on data augmentation or achieve only approximate equivariance, lacking rigorously equivariant architectures and empirical validation. Method: We propose the first end-to-end strictly rotationally equivariant aerial object detection framework: (i) an equivariant backbone and neck built upon group convolutions, carefully avoiding downsampling operations that break equivariance; and (ii) a lightweight, group-feature-driven multi-branch detection head that reduces parameters while improving localization and classification accuracy. Results: Our method achieves state-of-the-art performance on DOTA-v1.0, DOTA-v1.5, and DIOR-Rโparticularly excelling in detecting small and arbitrarily oriented objects. It provides the first systematic empirical validation that strict rotational equivariance yields substantial, measurable gains in aerial detection performance.
๐ Abstract
Due to the arbitrary orientation of objects in aerial images, rotation equivariance is a critical property for aerial object detectors. However, recent studies on rotation-equivariant aerial object detection remain scarce. Most detectors rely on data augmentation to enable models to learn approximately rotation-equivariant features. A few detectors have constructed rotation-equivariant networks, but due to the breaking of strict rotation equivariance by typical downsampling processes, these networks only achieve approximately rotation-equivariant backbones. Whether strict rotation equivariance is necessary for aerial image object detection remains an open question. In this paper, we implement a strictly rotation-equivariant backbone and neck network with a more advanced network structure and compare it with approximately rotation-equivariant networks to quantitatively measure the impact of rotation equivariance on the performance of aerial image detectors. Additionally, leveraging the inherently grouped nature of rotation-equivariant features, we propose a multi-branch head network that reduces the parameter count while improving detection accuracy. Based on the aforementioned improvements, this study proposes the Multi-branch head rotation-equivariant single-stage Detector (MessDet), which achieves state-of-the-art performance on the challenging aerial image datasets DOTA-v1.0, DOTA-v1.5 and DIOR-R with an exceptionally low parameter count.