๐ค AI Summary
This work addresses the challenge of real-time disparity estimation in thermal binocular stereo vision under low-texture and low-contrast conditions. Methodologically, we propose an efficient and accurate framework featuring a lightweight backbone for 3D cost volume construction, a novel multi-scale channel-spatial joint attention mechanism, andโcruciallyโthe first knowledge distillation strategy specifically designed for sparse thermal ground-truth disparity maps to enhance generalization. We further introduce a channel-spatial collaborative attention refinement module to significantly improve feature discriminability. Evaluated on multiple thermal stereo benchmarks, our method achieves >30 FPS real-time inference while surpassing state-of-the-art accuracy. It demonstrates strong robustness in all-weather applications, including nighttime UAV inspection and confined-space cleaning robots.
๐ Abstract
We introduce ThermoStereoRT, a real-time thermal stereo matching method designed for all-weather conditions that recovers disparity from two rectified thermal stereo images, envisioning applications such as night-time drone surveillance or under-bed cleaning robots. Leveraging a lightweight yet powerful backbone, ThermoStereoRT constructs a 3D cost volume from thermal images and employs multi-scale attention mechanisms to produce an initial disparity map. To refine this map, we design a novel channel and spatial attention module. Addressing the challenge of sparse ground truth data in thermal imagery, we utilize knowledge distillation to boost performance without increasing computational demands. Comprehensive evaluations on multiple datasets demonstrate that ThermoStereoRT delivers both real-time capacity and robust accuracy, making it a promising solution for real-world deployment in various challenging environments. Our code will be released on https://github.com/SJTU-ViSYS-team/ThermoStereoRT