🤖 AI Summary
Visible-light perception degrades severely under adverse weather and low-illumination conditions, while large-scale thermal imaging benchmark datasets remain scarce. Method: We introduce MS²—the first large-scale multi-spectral stereo dataset—capturing synchronized RGB, NIR, thermal, LiDAR, and GNSS/IMU data, with semi-dense depth ground truth. We establish the first standardized cross-modal (thermal/RG/NIR) benchmark for thermal-image depth estimation, systematically analyzing performance degradation under domain shift and adverse conditions. Our methodology includes multi-sensor spatiotemporal synchronization, semi-dense ground-truth generation, unified mono-/stereo cross-modal evaluation protocols, and thermal feature modeling with domain adaptation analysis. Contribution/Results: Comprehensive evaluation of 12 state-of-the-art depth models on the MS² test set demonstrates thermal imaging’s superior robustness at night and in rain. All data, code, and benchmark results are publicly released to advance standardization and robustness in thermal perception research.
📝 Abstract
Achieving robust and accurate spatial perception under adverse weather and lighting conditions is crucial for the high-level autonomy of self-driving vehicles and robots. However, existing perception algorithms relying on the visible spectrum are highly affected by weather and lighting conditions. A long-wave infrared camera (i.e., thermal imaging camera) can be a potential solution to achieve high-level robustness. However, the absence of large-scale datasets and standardized benchmarks remains a significant bottleneck to progress in active research for robust visual perception from thermal images. To this end, this manuscript provides a large-scale Multi-Spectral Stereo (MS$^2$) dataset that consists of stereo RGB, stereo NIR, stereo thermal, stereo LiDAR data, and GNSS/IMU information along with semi-dense depth ground truth. MS$^2$ dataset includes 162K synchronized multi-modal data pairs captured across diverse locations (e.g., urban city, residential area, campus, and high-way road) at different times (e.g., morning, daytime, and nighttime) and under various weather conditions (e.g., clear-sky, cloudy, and rainy). Secondly, we conduct a thorough evaluation of monocular and stereo depth estimation networks across RGB, NIR, and thermal modalities to establish standardized benchmark results on MS$^2$ depth test sets (e.g., day, night, and rainy). Lastly, we provide in-depth analyses and discuss the challenges revealed by the benchmark results, such as the performance variability for each modality under adverse conditions, domain shift between different sensor modalities, and potential research direction for thermal perception. Our dataset and source code are publicly available at https://sites.google.com/view/multi-spectral-stereo-dataset and https://github.com/UkcheolShin/SupDepth4Thermal.