🤖 AI Summary
To address the performance degradation of hyperspectral image (HSI) semantic segmentation in autonomous driving under adverse weather and lighting conditions, this paper proposes a Multi-Scale Spectral Attention Module (MSAM). MSAM employs parallel multi-kernel 1D convolutions and an adaptive feature aggregation mechanism to efficiently model cross-scale spectral responses, and is integrated into the UNet-SC skip-connection architecture to enhance spatial-spectral joint representation. Remarkably, the method achieves substantial accuracy gains with negligible computational overhead—only +0.02% additional parameters and +0.82% GFLOPS increase. On three benchmarks—HyKo-VIS v2, HSI-Drive v2, and Hyperspectral City v2—it improves mean Intersection-over-Union (mIoU) by 3.61% and mean F1-score (mF1) by 3.80% on average. These results validate the superiority of multi-scale spectral modeling over single-scale alternatives and establish a new lightweight paradigm for HSI-aware perception in autonomous systems.
📝 Abstract
Recent advances in autonomous driving (AD) have highlighted the potential of Hyperspectral Imaging (HSI) for enhanced environmental perception, particularly in challenging weather and lighting conditions. However, efficiently processing its high-dimensional spectral data remains a significant challenge. This paper introduces a Multi-scale Spectral Attention Module (MSAM) that enhances spectral feature extraction through three parallel 1D convolutions with varying kernel sizes between 1 to 11, coupled with an adaptive feature aggregation mechanism. By integrating MSAM into UNet's skip connections (UNet-SC), our proposed UNet-MSAM achieves significant improvements in semantic segmentation performance across multiple HSI datasets: HyKo-VIS v2, HSI-Drive v2, and Hyperspectral City v2. Our comprehensive experiments demonstrate that with minimal computational overhead (on average 0.02% in parameters and 0.82% GFLOPS), UNet-MSAM consistently outperforms UNet-SC, achieving average improvements of 3.61% in mean IoU and 3.80% in mF1 across the three datasets. Through extensive ablation studies, we have established that multi-scale kernel combinations perform better than single-scale configurations. These findings demonstrate the potential of HSI processing for AD and provide valuable insights into designing robust, multi-scale spectral feature extractors for real-world applications.