Hyperspectral Adapter for Semantic Segmentation with Vision Foundation Models

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Existing hyperspectral image (HSI) semantic segmentation methods, built upon RGB-optimized architectures, struggle to model joint spectral–spatial features, leading to suboptimal performance in complex scenes. To address this, we propose a lightweight adapter framework that leverages a frozen pre-trained vision transformer as the backbone. Our approach introduces a spectral transformer to capture long-range inter-channel spectral dependencies, a spectral-aware spatial prior module to enhance local structural modeling, and a modality-aware interaction block for cross-modal feature alignment and injection. Crucially, the vision backbone remains fixed—no fine-tuning is required—yet HSI representation capability is significantly improved. Evaluated on three autonomous-driving HSI benchmarks, our method achieves state-of-the-art performance, substantially outperforming both RGB-based baselines and existing HSI-specific approaches. Results demonstrate strong robustness and generalization in real-world driving scenarios.

Technology Category

Application Category

📝 Abstract

Hyperspectral imaging (HSI) captures spatial information along with dense spectral measurements across numerous narrow wavelength bands. This rich spectral content has the potential to facilitate robust robotic perception, particularly in environments with complex material compositions, varying illumination, or other visually challenging conditions. However, current HSI semantic segmentation methods underperform due to their reliance on architectures and learning frameworks optimized for RGB inputs. In this work, we propose a novel hyperspectral adapter that leverages pretrained vision foundation models to effectively learn from hyperspectral data. Our architecture incorporates a spectral transformer and a spectrum-aware spatial prior module to extract rich spatial-spectral features. Additionally, we introduce a modality-aware interaction block that facilitates effective integration of hyperspectral representations and frozen vision Transformer features through dedicated extraction and injection mechanisms. Extensive evaluations on three benchmark autonomous driving datasets demonstrate that our architecture achieves state-of-the-art semantic segmentation performance while directly using HSI inputs, outperforming both vision-based and hyperspectral segmentation methods. We make the code available at https://hyperspectraladapter.cs.uni-freiburg.de.

Problem

Research questions and friction points this paper is trying to address.

Improving semantic segmentation for hyperspectral imaging data

Overcoming limitations of RGB-optimized architectures for HSI analysis

Enabling robust robotic perception in challenging visual conditions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hyperspectral adapter leveraging pretrained vision foundation models

Spectral transformer with spectrum-aware spatial prior module

Modality-aware interaction block for hyperspectral-RGB feature integration

🔎 Similar Papers

AMBER - Advanced SegFormer for Multi-Band Image Segmentation: an application to Hyperspectral Imaging