MO R-CNN: Multispectral Oriented R-CNN for Object Detection in Remote Sensing Image

📅 2025-09-21

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Addressing the challenges of large intra- and inter-modal discrepancies and high computational overhead in rotating object detection for multispectral remote sensing imagery, this paper proposes a lightweight detection framework. Our method introduces three key innovations: (1) a Heterogeneous Feature Extraction Network (HFEN) that leverages large-kernel convolutions for efficient cross-modal feature modeling; (2) a Single-Modality Supervision (SMS) mechanism to mitigate modality imbalance and enhance feature discriminability; and (3) a rule-driven Conditional Multi-Modal Label Fusion (CMLF) strategy to improve cross-modal localization consistency and robustness. Extensive experiments on DroneVehicle, VEDAI, and OGSOD benchmarks demonstrate that our approach significantly outperforms state-of-the-art methods in both accuracy and efficiency—achieving superior detection performance while substantially reducing computational complexity and memory footprint. The framework thus strikes an effective balance between practical deployability and generalization capability.

Technology Category

Application Category

📝 Abstract

Oriented object detection for multi-spectral imagery faces significant challenges due to differences both within and between modalities. Although existing methods have improved detection accuracy through complex network architectures, their high computational complexity and memory consumption severely restrict their performance. Motivated by the success of large kernel convolutions in remote sensing, we propose MO R-CNN, a lightweight framework for multi-spectral oriented detection featuring heterogeneous feature extraction network (HFEN), single modality supervision (SMS), and condition-based multimodal label fusion (CMLF). HFEN leverages inter-modal differences to adaptively align, merge, and enhance multi-modal features. SMS constrains multi-scale features and enables the model to learn from multiple modalities. CMLF fuses multimodal labels based on specific rules, providing the model with a more robust and consistent supervisory signal. Experiments on the DroneVehicle, VEDAI and OGSOD datasets prove the superiority of our method. The source code is available at:https://github.com/Iwill-github/MORCNN.

Problem

Research questions and friction points this paper is trying to address.

Detecting oriented objects in multispectral remote sensing images

Addressing high computational complexity in existing detection methods

Handling feature differences within and between spectral modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight framework with heterogeneous feature extraction

Single modality supervision for multi-scale learning

Condition-based multimodal label fusion rules

🔎 Similar Papers

Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey