Cosine-Normalized Attention for Hyperspectral Image Classification

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of conventional dot-product attention in Transformers, which conflates the magnitude and direction of hyperspectral features and thus fails to capture their essential angular structural properties. To overcome this, the authors propose a cosine-normalized attention mechanism that projects query and key embeddings onto a unit hypersphere and computes attention scores using squared cosine similarity, thereby explicitly modeling angular relationships while reducing sensitivity to magnitude variations. This mechanism is integrated into a lightweight spatial-spectral Transformer framework and trained under an extremely low-supervision strategy. Experimental results demonstrate that the proposed model significantly outperforms recent Transformer- and Mamba-based methods on three benchmark datasets, achieving state-of-the-art classification performance even with a lightweight backbone network.
📝 Abstract
Transformer-based methods have improved hyperspectral image classification (HSIC) by modeling long-range spatial-spectral dependencies; however, their attention mechanisms typically rely on dot-product similarity, which mixes feature magnitude and orientation and may be suboptimal for hyperspectral data. This work revisits attention scoring from a geometric perspective and introduces a cosine-normalized attention formulation that aligns similarity computation with the angular structure of hyperspectral signatures. By projecting query and key embeddings onto a unit hypersphere and applying a squared cosine similarity, the proposed method emphasizes angular relationships while reducing sensitivity to magnitude variations. The formulation is integrated into a spatial-spectral Transformer and evaluated under extremely limited supervision. Experiments on three benchmark datasets demonstrate that the proposed approach consistently achieves higher performance, outperforming several recent Transformer- and Mamba-based models despite using a lightweight backbone. In addition, a controlled analysis of multiple attention score functions shows that cosine-based scoring provides a reliable inductive bias for hyperspectral representation learning.
Problem

Research questions and friction points this paper is trying to address.

hyperspectral image classification
attention mechanism
dot-product similarity
feature magnitude
angular structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

cosine-normalized attention
hyperspectral image classification
angular similarity
spatial-spectral Transformer
inductive bias
🔎 Similar Papers
No similar papers found.