🤖 AI Summary
Addressing the challenges of large intra- and inter-modal discrepancies and high computational overhead in rotating object detection for multispectral remote sensing imagery, this paper proposes a lightweight detection framework. Our method introduces three key innovations: (1) a Heterogeneous Feature Extraction Network (HFEN) that leverages large-kernel convolutions for efficient cross-modal feature modeling; (2) a Single-Modality Supervision (SMS) mechanism to mitigate modality imbalance and enhance feature discriminability; and (3) a rule-driven Conditional Multi-Modal Label Fusion (CMLF) strategy to improve cross-modal localization consistency and robustness. Extensive experiments on DroneVehicle, VEDAI, and OGSOD benchmarks demonstrate that our approach significantly outperforms state-of-the-art methods in both accuracy and efficiency—achieving superior detection performance while substantially reducing computational complexity and memory footprint. The framework thus strikes an effective balance between practical deployability and generalization capability.
📝 Abstract
Oriented object detection for multi-spectral imagery faces significant challenges due to differences both within and between modalities. Although existing methods have improved detection accuracy through complex network architectures, their high computational complexity and memory consumption severely restrict their performance. Motivated by the success of large kernel convolutions in remote sensing, we propose MO R-CNN, a lightweight framework for multi-spectral oriented detection featuring heterogeneous feature extraction network (HFEN), single modality supervision (SMS), and condition-based multimodal label fusion (CMLF). HFEN leverages inter-modal differences to adaptively align, merge, and enhance multi-modal features. SMS constrains multi-scale features and enables the model to learn from multiple modalities. CMLF fuses multimodal labels based on specific rules, providing the model with a more robust and consistent supervisory signal. Experiments on the DroneVehicle, VEDAI and OGSOD datasets prove the superiority of our method. The source code is available at:https://github.com/Iwill-github/MORCNN.