🤖 AI Summary
Current autonomous driving vision systems suffer significant performance degradation under adverse conditions such as fog, rain, snow, and strong glare. While near-infrared (NIR) and long-wave infrared (LWIR) modalities offer robustness potential, their adoption is hindered by inherent modality limitations and the absence of large-scale, benchmark-quality datasets. To address this gap, we introduce RASMD—the first open-source RGB–short-wave infrared (SWIR) multimodal driving dataset—comprising 100,000 precisely spatially aligned, synchronized image pairs across diverse challenging weather and illumination conditions. RASMD provides cross-modal alignment annotations, image translation subsets, and fine-grained object detection labels. We further propose an RGB+SWIR joint detection framework and a condition-driven cross-modal translation model, achieving a 12.3% mAP improvement under adverse conditions. This work establishes the first SWIR-based autonomous driving benchmark, enabling rigorous evaluation and advancing robust multimodal perception research.
📝 Abstract
Current autonomous driving algorithms heavily rely on the visible spectrum, which is prone to performance degradation in adverse conditions like fog, rain, snow, glare, and high contrast. Although other spectral bands like near-infrared (NIR) and long-wave infrared (LWIR) can enhance vision perception in such situations, they have limitations and lack large-scale datasets and benchmarks. Short-wave infrared (SWIR) imaging offers several advantages over NIR and LWIR. However, no publicly available large-scale datasets currently incorporate SWIR data for autonomous driving. To address this gap, we introduce the RGB and SWIR Multispectral Driving (RASMD) dataset, which comprises 100,000 synchronized and spatially aligned RGB-SWIR image pairs collected across diverse locations, lighting, and weather conditions. In addition, we provide a subset for RGB-SWIR translation and object detection annotations for a subset of challenging traffic scenarios to demonstrate the utility of SWIR imaging through experiments on both object detection and RGB-to-SWIR image translation. Our experiments show that combining RGB and SWIR data in an ensemble framework significantly improves detection accuracy compared to RGB-only approaches, particularly in conditions where visible-spectrum sensors struggle. We anticipate that the RASMD dataset will advance research in multispectral imaging for autonomous driving and robust perception systems.