LER-YOLO: Reliability-Aware Expert Routing for Misaligned RGB-Infrared UAV Detection

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the challenge of drone detection in misaligned RGB and infrared remote sensing imagery, where spatial misregistration, small target sizes, and complex backgrounds degrade performance. To tackle this, the authors propose LER-YOLO, a novel framework featuring an uncertainty-aware object alignment module that generates spatial reliability maps. These maps guide a sparse Mixture-of-Experts (MoE) fusion module to adaptively select among RGB-dominant, infrared-dominant, or interactive fusion pathways, enabling reliability-driven cross-modal feature integration. Without increasing model capacity, the method achieves 89.7±0.2% AP50 on the MBU benchmark, with a peak performance of 89.9%, substantially outperforming existing approaches in misaligned multimodal scenarios and demonstrating the efficacy of reliability-guided routing for cross-modal fusion.

📝 Abstract

Detecting small unmanned aerial vehicles from RGB-infrared remote-sensing pairs remains challenging due to tiny target scale, cluttered backgrounds, and spatial misalignment between heterogeneous sensors. Existing bimodal detectors often align or fuse features without assessing the reliability of local cross-sensor correspondence, allowing mismatch artifacts to propagate into the detection head. To address this issue, we propose LER-YOLO, a reliability-aware sparse mixture-of-experts framework for misaligned RGB-infrared UAV detection. LER-YOLO first introduces an Uncertainty-Aware Target Alignment module that resamples visible features toward the infrared reference and estimates a spatial reliability map. This reliability prior is then used by a Reliability-Guided Sparse MoE Fusion module to adaptively select k experts from RGB-dominant, infrared-dominant, and interactive fusion experts, enabling trustworthy cross-modal interaction while suppressing unreliable fusion. Experiments on the public MBU benchmark under a YOLOv5s-family protocol show that LER-YOLO achieves 89.7+/-0.2% AP50 over three independent seeds, with a best result of 89.9%. Extensive ablations, parameter-matched comparisons, synthetic-shift evaluations, and complexity analysis demonstrate that the gains mainly come from reliability-guided expert routing rather than increased model capacity.

Problem

Research questions and friction points this paper is trying to address.

UAV detection

RGB-infrared misalignment

cross-modal fusion

reliability assessment

small object detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

reliability-aware routing

sparse mixture-of-experts

RGB-infrared misalignment