🤖 AI Summary
To address the scarcity of annotated data in thermal image semantic segmentation and the failure of knowledge transfer from RGB pre-trained models under nighttime low-light conditions, this paper proposes a cross-spectral unsupervised domain adaptation (UDA) method. Our approach introduces three key innovations: (1) a mask-based mutual learning strategy enabling uncertainty-aware bidirectional knowledge distillation between RGB and thermal models; (2) a prototype-based self-supervised loss that mitigates inter-domain distribution shift under low illumination via prototype contrastive learning; and (3) a cross-spectral collaborative modeling mechanism to enhance modality complementarity. Evaluated on multiple thermal image benchmarks, our method significantly outperforms existing UDA approaches and achieves performance comparable to state-of-the-art supervised methods—particularly improving segmentation accuracy in nighttime scenarios.
📝 Abstract
In autonomous driving, thermal image semantic segmentation has emerged as a critical research area, owing to its ability to provide robust scene understanding under adverse visual conditions. In particular, unsupervised domain adaptation (UDA) for thermal image segmentation can be an efficient solution to address the lack of labeled thermal datasets. Nevertheless, since these methods do not effectively utilize the complementary information between RGB and thermal images, they significantly decrease performance during domain adaptation. In this paper, we present a comprehensive study on cross-spectral UDA for thermal image semantic segmentation. We first propose a novel masked mutual learning strategy that promotes complementary information exchange by selectively transferring results between each spectral model while masking out uncertain regions. Additionally, we introduce a novel prototypical self-supervised loss designed to enhance the performance of the thermal segmentation model in nighttime scenarios. This approach addresses the limitations of RGB pre-trained networks, which cannot effectively transfer knowledge under low illumination due to the inherent constraints of RGB sensors. In experiments, our method achieves higher performance over previous UDA methods and comparable performance to state-of-the-art supervised methods.