🤖 AI Summary
To address the insufficient robustness of autonomous robot target tracking under variable illumination conditions, this paper proposes an adaptive multi-spectral pan-tilt system that fuses RGB and long-wave infrared (LWIR) video streams. Our method introduces a novel frame-level multi-scale RGB-LWIR fusion strategy and an illumination-driven dynamic detection model switching mechanism, enabling customized training across three illumination levels (<10, 10–1000, >1000 lux). The system integrates 33 YOLO variants with multi-scale feature alignment and 11-level adjustable pixel-wise modality fusion. Experimental results demonstrate detection confidence scores of 92.8%, 92.0%, and 71.0% under full-light, low-light, and zero-light conditions, respectively—significantly outperforming YOLOv5n and YOLOv11n baselines. This work effectively overcomes the performance limitations of single-modal vision under extreme illumination, advancing robust multi-spectral perception for autonomous robotics.
📝 Abstract
Autonomous robotic platforms are playing a growing role across the emergency services sector, supporting missions such as search and rescue operations in disaster zones and reconnaissance. However, traditional red-green-blue (RGB) detection pipelines struggle in low-light environments, and thermal-based systems lack color and texture information. To overcome these limitations, we present an adaptive framework that fuses RGB and long-wave infrared (LWIR) video streams at multiple fusion ratios and dynamically selects the optimal detection model for each illumination condition. We trained 33 You Only Look Once (YOLO) models on over 22,000 annotated images spanning three light levels: no-light (<10 lux), dim-light (10-1000 lux), and full-light (>1000 lux). To integrate both modalities, fusion was performed by blending aligned RGB and LWIR frames at eleven ratios, from full RGB (100/0) to full LWIR (0/100) in 10% increments. Evaluation showed that the best full-light model (80/20 RGB-LWIR) and dim-light model (90/10 fusion) achieved 92.8% and 92.0% mean confidence; both significantly outperformed the YOLOv5 nano (YOLOv5n) and YOLOv11 nano (YOLOv11n) baselines. Under no-light conditions, the top 40/60 fusion reached 71.0%, exceeding baselines though not statistically significant. Adaptive RGB-LWIR fusion improved detection confidence and reliability across all illumination conditions, enhancing autonomous robotic vision performance.