MO R-CNN: Multispectral Oriented R-CNN for Object Detection in Remote Sensing Image

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of large intra- and inter-modal discrepancies and high computational overhead in rotating object detection for multispectral remote sensing imagery, this paper proposes a lightweight detection framework. Our method introduces three key innovations: (1) a Heterogeneous Feature Extraction Network (HFEN) that leverages large-kernel convolutions for efficient cross-modal feature modeling; (2) a Single-Modality Supervision (SMS) mechanism to mitigate modality imbalance and enhance feature discriminability; and (3) a rule-driven Conditional Multi-Modal Label Fusion (CMLF) strategy to improve cross-modal localization consistency and robustness. Extensive experiments on DroneVehicle, VEDAI, and OGSOD benchmarks demonstrate that our approach significantly outperforms state-of-the-art methods in both accuracy and efficiency—achieving superior detection performance while substantially reducing computational complexity and memory footprint. The framework thus strikes an effective balance between practical deployability and generalization capability.

Technology Category

Application Category

📝 Abstract
Oriented object detection for multi-spectral imagery faces significant challenges due to differences both within and between modalities. Although existing methods have improved detection accuracy through complex network architectures, their high computational complexity and memory consumption severely restrict their performance. Motivated by the success of large kernel convolutions in remote sensing, we propose MO R-CNN, a lightweight framework for multi-spectral oriented detection featuring heterogeneous feature extraction network (HFEN), single modality supervision (SMS), and condition-based multimodal label fusion (CMLF). HFEN leverages inter-modal differences to adaptively align, merge, and enhance multi-modal features. SMS constrains multi-scale features and enables the model to learn from multiple modalities. CMLF fuses multimodal labels based on specific rules, providing the model with a more robust and consistent supervisory signal. Experiments on the DroneVehicle, VEDAI and OGSOD datasets prove the superiority of our method. The source code is available at:https://github.com/Iwill-github/MORCNN.
Problem

Research questions and friction points this paper is trying to address.

Detecting oriented objects in multispectral remote sensing images
Addressing high computational complexity in existing detection methods
Handling feature differences within and between spectral modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight framework with heterogeneous feature extraction
Single modality supervision for multi-scale learning
Condition-based multimodal label fusion rules
L
Leiyu Wang
College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
B
Biao Jin
College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
Feng Huang
Feng Huang
Neusoft Medical System
MRIreconstructionsegmentationregistration
L
Liqiong Chen
School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou 350002, China
Z
Zhengyong Wang
College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
X
Xiaohai He
College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China
H
Honggang Chen
College of Electronics and Information Engineering, Sichuan University, Chengdu 610065, China, and also with the Yunnan Key Laboratory of Software Engineering, Yunnan University, Kunming 650600, China