🤖 AI Summary
Infrared object detection models suffer from insufficient robustness under cross-modal (RGB→infrared) transfer and distribution shifts. To address this, we propose WiSE-OD, a Weight-space Integration for Robust Object Detection framework that—uniquely—fuses zero-shot and fine-tuned model weights directly in parameter space, without additional training or inference overhead, thereby jointly improving both accuracy and robustness. For systematic evaluation, we introduce two novel image-level corruption benchmarks: LLVIP-C and FLIR-C. Extensive experiments—including out-of-distribution cross-modal evaluation, linear probing, and multi-architecture validation (Faster R-CNN, YOLOv5, DETR)—demonstrate that WiSE-OD significantly enhances robustness against diverse corruptions (e.g., noise, blur, weather artifacts) and domain shifts, while preserving original task accuracy.
📝 Abstract
Object detection (OD) in infrared (IR) imagery is critical for low-light and nighttime applications. However, the scarcity of large-scale IR datasets forces models to rely on weights pre-trained on RGB images. While fine-tuning on IR improves accuracy, it often compromises robustness under distribution shifts due to the inherent modality gap between RGB and IR. To address this, we introduce LLVIP-C and FLIR-C, two cross-modality out-of-distribution (OOD) benchmarks built by applying corruption to standard IR datasets. Additionally, to fully leverage the complementary knowledge from RGB and infrared trained models, we propose WiSE-OD, a weight-space ensembling method with two variants: WiSE-OD$_{ZS}$, which combines RGB zero-shot and IR fine-tuned weights, and WiSE-OD$_{LP}$, which blends zero-shot and linear probing. Evaluated across three RGB-pretrained detectors and two robust baselines, WiSE-OD improves both cross-modality and corruption robustness without any additional training or inference cost.