LESSViT: Robust Hyperspectral Representation Learning under Spectral Configuration Shift

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the limited generalization of hyperspectral image models caused by spectral configuration discrepancies across sensors—such as variations in wavelength coverage, band sampling, and channel dimensions—and proposes LESSViT, a sensor-flexible Vision Transformer architecture. LESSViT enables efficient explicit spatial-spectral joint modeling via low-rank decomposition, supports arbitrary spectral inputs through channel-agnostic patch embedding and wavelength-aware positional encoding, and significantly reduces computational complexity with a novel LESS Attention mechanism. Furthermore, the authors introduce HyperMAE, a pretraining strategy that leverages decoupled spatial-spectral masking and hierarchical channel sampling. Evaluated on the SpectralEarth benchmark, LESSViT maintains strong in-domain performance while substantially improving robustness to spectral shifts, demonstrating its effectiveness for scalable and generalizable hyperspectral representation learning.
📝 Abstract
Modeling hyperspectral imagery (HSI) across different sensors presents a fundamental challenge due to variations in wavelength coverage, band sampling, and channel dimensionality. As a result, models trained under a fixed spectral configuration often fail to generalize to other sensors. Existing Vision Transformer (ViT) approaches either rely on implicit spectral modeling with fixed channel assumptions or adopt explicit spatial-spectral attention with prohibitive computational cost, leading to a fundamental trade-off between efficiency and expressiveness. In this work, we introduce Low-rank Efficient Spatial-Spectral ViT (LESSViT), a sensor-flexible architecture for cross-spectral generalization. LESSViT is built on LESS Attention, a structured low-rank factorization that models joint spatial-spectral interactions through separable spatial and spectral components, reducing the complexity of full spatial-spectral attention from $O(N^2 C^2)$ to $O(rNC)$, where $N$ is the number of spatial tokens, $C$ is the number of spectral channels, and $r$ is the rank of the low-rank approximation. We further incorporate channel-agnostic patch embedding and wavelength-aware positional encoding to support flexible spectral inputs. To enable efficient and robust pretraining, we introduce a hyperspectral masked autoencoder (HyperMAE) with decoupled spatial-spectral masking and hierarchical channel sampling. We evaluate LESSViT under a cross-spectral generalization setting that simulates cross-sensor variability. Experiments on the SpectralEarth benchmark demonstrate that LESSViT improves robustness under spectral shifts while remaining competitive in-distribution, and explicit and efficient spatial-spectral modeling is essential for scalable and generalizable hyperspectral representation learning.
Problem

Research questions and friction points this paper is trying to address.

hyperspectral imagery
spectral configuration shift
cross-sensor generalization
spatial-spectral modeling
representation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

LESSViT
low-rank attention
spatial-spectral modeling
hyperspectral representation learning
cross-spectral generalization
🔎 Similar Papers
No similar papers found.
H
Haozhe Si
Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, IL, USA
Y
Yuxuan Wan
Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, IL, USA
Y
Yuqing Wang
Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, IL, USA
M
Minh Do
Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, IL, USA
Han Zhao
Han Zhao
Department of Computer Science, University of Illinois Urbana-Champaign
Machine LearningAlgorithmic FairnessDomain AdaptationProbabilistic Circuits