🤖 AI Summary
This work addresses the challenge of drone detection in misaligned RGB and infrared remote sensing imagery, where spatial misregistration, small target sizes, and complex backgrounds degrade performance. To tackle this, the authors propose LER-YOLO, a novel framework featuring an uncertainty-aware object alignment module that generates spatial reliability maps. These maps guide a sparse Mixture-of-Experts (MoE) fusion module to adaptively select among RGB-dominant, infrared-dominant, or interactive fusion pathways, enabling reliability-driven cross-modal feature integration. Without increasing model capacity, the method achieves 89.7±0.2% AP50 on the MBU benchmark, with a peak performance of 89.9%, substantially outperforming existing approaches in misaligned multimodal scenarios and demonstrating the efficacy of reliability-guided routing for cross-modal fusion.
📝 Abstract
Detecting small unmanned aerial vehicles from RGB-infrared remote-sensing pairs remains challenging due to tiny target scale, cluttered backgrounds, and spatial misalignment between heterogeneous sensors. Existing bimodal detectors often align or fuse features without assessing the reliability of local cross-sensor correspondence, allowing mismatch artifacts to propagate into the detection head. To address this issue, we propose LER-YOLO, a reliability-aware sparse mixture-of-experts framework for misaligned RGB-infrared UAV detection. LER-YOLO first introduces an Uncertainty-Aware Target Alignment module that resamples visible features toward the infrared reference and estimates a spatial reliability map. This reliability prior is then used by a Reliability-Guided Sparse MoE Fusion module to adaptively select k experts from RGB-dominant, infrared-dominant, and interactive fusion experts, enabling trustworthy cross-modal interaction while suppressing unreliable fusion. Experiments on the public MBU benchmark under a YOLOv5s-family protocol show that LER-YOLO achieves 89.7+/-0.2% AP50 over three independent seeds, with a best result of 89.9%. Extensive ablations, parameter-matched comparisons, synthetic-shift evaluations, and complexity analysis demonstrate that the gains mainly come from reliability-guided expert routing rather than increased model capacity.