VP-Hype: A Hybrid Mamba-Transformer Framework with Visual-Textual Prompting for Hyperspectral Image Classification

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the challenges of hyperspectral image classification under extremely limited labeled samples and high-dimensional spectral data, compounded by the prohibitive computational complexity of conventional Transformers. To this end, the authors propose a hybrid architecture that integrates State Space Models (Mamba) with Transformers, augmented by a 3D-CNN spectral front-end and a vision–text dual-modality prompting mechanism to effectively capture long-range dependencies in low-data regimes. Notably, this is the first study to introduce Mamba into hyperspectral classification, leveraging parameter-efficient fine-tuning to mitigate label scarcity. Evaluated with only 2% of training samples, the method achieves overall accuracies of 99.69% on Salinas and 99.45% on Longkou, significantly outperforming existing approaches and establishing a new state of the art for data-scarce hyperspectral classification.

Technology Category

Application Category

📝 Abstract

Accurate classification of hyperspectral imagery (HSI) is often frustrated by the tension between high-dimensional spectral data and the extreme scarcity of labeled training samples. While hierarchical models like LoLA-SpecViT have demonstrated the power of local windowed attention and parameter-efficient fine-tuning, the quadratic complexity of standard Transformers remains a barrier to scaling. We introduce VP-Hype, a framework that rethinks HSI classification by unifying the linear-time efficiency of State-Space Models (SSMs) with the relational modeling of Transformers in a novel hybrid architecture. Building on a robust 3D-CNN spectral front-end, VP-Hype replaces conventional attention blocks with a Hybrid Mamba-Transformer backbone to capture long-range dependencies with significantly reduced computational overhead. Furthermore, we address the label-scarcity problem by integrating dual-modal Visual and Textual Prompts that provide context-aware guidance for the feature extraction process. Our experimental evaluation demonstrates that VP-Hype establishes a new state of the art in low-data regimes. Specifically, with a training sample distribution of only 2\%, the model achieves Overall Accuracy (OA) of 99.69\% on the Salinas dataset and 99.45\% on the Longkou dataset. These results suggest that the convergence of hybrid sequence modeling and multi-modal prompting provides a robust path forward for high-performance, sample-efficient remote sensing.

Problem

Research questions and friction points this paper is trying to address.

hyperspectral image classification

label scarcity

high-dimensional spectral data

low-data regime

remote sensing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Mamba-Transformer

Visual-Textual Prompting

State-Space Models